🚩 Report — Legal Issue: Clarity Needed for Commercial Licensing
Hello.
I am your friendly Copyright Oversight bot. **beep** **beep** I detect and report problematic licenses. **boop** Triggering data-maximalists and accelerationists is an unintended consequence of my programming; apologies for the emotional inconvenience.
Yeah, I started this Report with a joke with the intent to lighten up the mood, as nobody at HuggingFace appears to be resolving serious Legal Issues open for months, as there are outstanding claims regarding privacy and infringement, plus ongoing investigations on illegal content such as CSAM. Yet licensing is a question of human rights as Copyright is directly tied to economic and moral rights that every human being is considered to have. Stella Rose Biderman recently pointed out the importance of clear & correct licensing in a more pragmatic way [1], and with HuggingFace's Ethics team also claiming to be working on data rights, it's evident that many feel licensing is an important topic.
"It’s a problem for HF because you’re not allowed to upload falsely licensed data." — Stella Rose Biderman
Other industries have come to terms with Copyright and while functional ML may be a new thing, the foundations upon which it's built are well established and indeed regulated.
Since StabilityAI is claiming a precedent that this model is one of the first commercially available models and under Creative Commons, objections must be documented here for posterity. The company's legal counsel did not reply to legal questions in letter form, so I'm left with the option of doing this openly and publicly.
In short, there is no legal basis for licensing or re-licensing a model trained under Fair Use. Nobody has the right to do this!
Legally speaking:
- The underlying dataset is an extension of The Pile, which contains Copyrighted material. Since there's no license, the basis for training can only be a Copyright exception.
- Copyright exceptions are defined internationally under Berne Convention, within Europe based on Directive on Copyright, and in the U.S. as the Fair Use doctrine in the Copyright Act.
- All of these exceptions are, by definition, limited in scope and more importantly dependent on the application. How downstream organizations apply the model determines whether it's infringement or not.
- It's currently not been established whether models are derivative works, and not knowing this is not an excuse to commit infringement by (re)licensing dowstream.
- Even if the training is done in a compliant manner under Fair Use, that does not grant StabilityAI the rights to license or relicense the model downstream.
- All the above satutes and regulations, directly or indirectly, require that the application "does not conflict with a normal explotation" or "does not unreasonably prejudice the legitimate interests" of the rightsholders.
- For the sake of completeness, data scraped from companies in the EU, from EU rightsholders, trained on EU infrastructure, and/or applied within the EU would fall under EU jurisdiction and under the Directive on Copyright.
For the record, also note that this legal interpretation is not unusual; it's the reason why Meta has not been able to license LLaMa's weights with a more liberal license [2]:
"The world needs high performance open source LLM. The main obstacle today is the legal status of the training data." — Yann LeCun
Meta's recent track record on legal and responsible ML has been outstanding (best-in-class), since it licensed its most recent dataset for Segment Anything from photographers for research-purposes and took precautions to blur out personal details for privacy. Conversely, StabilityAI is being sued on the basis of its wishful licensing and exploitation of Copyrighted materials.
Thus, in absence of any proofs to the contrary, I request that Creative Commons CC-BY-SA license be declared void. I also request a more proactive role in HuggingFace in clarifying such matters as HF is legally responsible as a platform (incl. under active EU regulations) and likely liable for ongoing damages caused by infringement that may occur as a consequence of false licensing. A safe alternative would be non-commercial research or personal-use only, as it would better communicate the exceptions under which this model was trained.
Regards,
Alex Champandard
[1] https://twitter.com/BlancheMinerva/status/1647831229943750656
[2] https://twitter.com/ylecun/status/1648627970225979392
The licensing problem applies to all StableLMs, but for the commercial models the expectations conveyed to users are the most problematic.
https://huggingface.co/stabilityai/stablelm-base-alpha-7b
Please consider this Report to apply to all models in this series — both past and upcoming.
Please.. here (very talented) people are trying (reeeeeeally hard) to make the world a better place... :-D
How is that so everything is so complicated?
That reminds me, I should renew my license to breath.
This year I'll be able to afford only second-had air, since patent & copyright trolls sucked all the premium oxygen out of the place.
Copyright people are so boring, and all that subject is so boring, to be honest I could not live working in that field. I'd rather pick up trash from the streets for a living (which is a much more honest, decent and useful profession), because it has to be one of the most toxic fields in existence. As a human being, I could not do it.
tl;dr: better to download all models, datasets, etc that seem interesting. As hardware gets better, we may be able to finetune and train our own models in the privacy of our homes, so gotta scrape everything we can before the copyright dudes ruin everything as usual, because that's their only purpose in life, if you can call that a life
@Tom9000 @Niichanhaou Are you finished?
If you had read the thread correctly, you'd see it's not about removing this model from existence or this website — but simply relicensing it correctly. Whatever you're personally doing you should be fine with it. Companies on the other hand want legal clarity, and so does HuggingFace because it's their core business and we don't want them to go completely under because of negligence or incompetence. That would be worse, right?
As for downloading all datasets in existence, please be careful because some of the web-scraped image datasets have material that's illegal to possess (criminal offense) and you may get thrown in jail if they find it on your harddisk.
I have renamed the thread so it doesn't attract (more) random programmers living in their grandma's basement who deem oppressive the very concept of integrating with society's norms. My Legal Issue reported here is still about all form of licensing beyond Fair Use.
The OP sounds like a real joy at parties and funeral parlors.
Since what you've given is an incredibly detail dense report (Long enough that people don't want to read it, and instead just scream at it), here's a ChatGPT generated Tl;dr version, that seems to read correctly. Correct me if I've got this wrong.
Alex Champandard, a user, is raising concerns about the licensing of a model trained under Fair Use by StabilityAI. The underlying dataset contains copyrighted material, and the basis for training can only be a copyright exception. Copyright exceptions are defined internationally, and all of them are limited in scope and dependent on the application. It is currently not established whether models are derivative works, and even if training is done in a compliant manner under Fair Use, that does not grant StabilityAI the rights to license or relicense the model downstream. Therefore, Alex requests that the Creative Commons CC-BY-SA license be declared void and asks for a more proactive role from HuggingFace in clarifying these matters as it is legally responsible as a platform.
@jeffwadsworth Feel free to have your friends come over and make explicit statements that you don't intend to respect copyright, that you find people discussing compliance with the law (even without you) so obnoxious you have to jump in an insult people. It's only helpful for the ongoing legal case(s) against the company. This is 100% serious, the more attacks the better it is for plaintiffs! (Same for emoji mobbing.)
@Kagerage Sorry, I wrote the original form for a reason (as you can imagine) and I can't endorse any generated output that may or may not approximate my words.
Alex, go get a lawyer then?
@xbwtyz I changed the title of the Legal Issue to avoid receiving further abuse. Legal advice has already been sought.
It's an action item for HuggingFace:
In absence of any proofs to the contrary, I request that Creative Commons CC-BY-[NC]-SA license be declared void.
(EDIT: I missed the NC that was specific here. The argument stands irrespective of the license code.)
Everyone that needs to be is well cognizant of the legal discourse surrounding The Pile in the context of commercial licensing. If you have concerns, it is advisable to consult with an attorney and discuss the matter. Once THEY have conducted the necessary research, feel free to return and commence your intended course of action. However, it is important to recognize that individuals with greater influence are already addressing this issue, and their resolutions will likely have a lasting impact, rather than yours.
It would be more constructive to engage in alternative pursuits instead of adopting an uncooperative and defensive stance here in the community so vocally.
@xbwtyz Could you share specifically what the problems are surrounding The Pile and commercial licensing? If everyone is well cognizant of it, what is the general consensus? Are you advising everyone in the community to ignore legal problems? If one notices issues, does that make one automatically uncooperative?
Dude, I'm just telling you how to actually maintain face in a small community, stop wasting everyone's time, and actually do something about it if you want actual answers. You can keep screaming in an echo chamber.
I don't understand. You wrote that:
"Everyone that needs to be is well cognizant of the legal discourse surrounding The Pile in the context of commercial licensing."
Can I read these past discussions to avoid redundant work?
Also, this needs clarification:
individuals with greater influence are already addressing this issue
Who specifically is working on this and what's the issue exactly? If there's already work underway then I'd like to know!
We need better moderation on HF.
Absolutely! There were links to pedophelia and child porn in the dataset preview of LAION for 7 months and nobody did anything.
spamming the same argument all over this website.
They are all different and custom legal arguments, but thank you for the ad hominem insult!
why don't you consult a lawyer and send the concerned people a letter
I wrote to Hugging Face and their legal email and they suggested I post publicly. It was their suggestion, presumably because they did not have satisfactory legal answers themselves at the time.
instead of embarrassing yourself in public?
Could you clarify what is embarrassing and what is problematic for you? You don't have to respond to everything you know!
hard and varied the global regulation of Fair Use [...] singular domain-specific knowledge
There are standard international conventions signed by 181 countries and those can be followed as a baseline. Stability operates in the UK and they implemented applicable parts of EU regulations for Copyright. Both are inherently relevant here! Still no reply, if it was incorrect it would have been addressed by now.
BTW, mobbing by the community was already used as a legal argument against Stability in its U.S. courts.