News Nvidia being sued by writers for unauthorized use of their works in generative AI training

Just throwing this out there to stir some thought:
If I buy a book for my kid, they read it and learn, is this copywrite infringement? What if I dont buy the book, but instead goto the local library and read it for free? Does my kid now have to cite the author anytime they speak because their learned-experience comes from a book, or a collection of 1000s, they read?

Humans learn through our collective experiences, including from books we read, we dont just gain knowledge through osmosis. The author of those books we read do not own our thoughts, just as they shouldnt own the "thoughts" of AI. I guess the question is, what constitutes original thought, one could argue otherwise that the books the authors wrote arent original works but a collection of personal experiences the author gathered from reading others' books.
 
Last edited:
Just throwing this out there to stir some though:
If I buy a book for my kid, they read it and learn, is this copywrite infringement? What if I dont buy the book, but instead goto the local library and read it for free? Does my kid now have to cite the author anytime they speak because their learned-experience comes from a book, or a collection of 1000s, they read?
This is a commercial use issue. If you or your child made a commercial product derived from these author's work's, they would likely come for their cut as well.

nVidia looks to have publicly disclosed the use of these author's works and then tried to scrub it from existence once the authors were made aware and nVidia had already gotten what they needed from the demo. These are not out of copyright works, so are still able to generate revenue for their rights holders.

But, this is just wot I believe to be the actual issue, as money is always causes these types of suits.
 
Next step is likely, that large book publishers will change their legal stuff, when they get some money for providing a licence.

Makes me wonder though... if a language model is going to be pretty much the mean average of everything ever written (with no differentation of whether it is from Shakespeare or from some dude, who claims to have the ultimate guide for becoming a millionaire over night), won't that make the AI pretty much a mean average? Not that it would necessarily be a bad thing as such. But perhaps training an AI specifically with the Bible in Russian, for the purpose of telling Russians that there is difference between the Old Testament and New Testament, wouldn't that make more sense in such a context, than to have the AI be trained on whatever Russian state TV is saying?
This is a commercial use issue. If you or your child made a commercial product derived from these author's work's, they would likely come for their cut as well.

nVidia looks to have publicly disclosed the use of these author's works and then tried to scrub it from existence once the authors were made aware and nVidia had already gotten what they needed from the demo. These are not out of copyright works, so are still able to generate revenue for their rights holders.

But, this is just wot I believe to be the actual issue, as money is always causes these types of suits.
Yeah. Example would be e.g. a commercial movie based on the aformentioned book (and such isn't in the public domain), without permission for such an undertaking.

Similarly, if someone takes a RTX 4090 and the drivers, and tasks an AI with providing a blueprint, based on which to produce a commercial GPU, there might be some objections, in regard to patents and stuff.
 
  • Like
Reactions: Order 66
This is a commercial use issue. If you or your child made a commercial product derived from these author's work's, they would likely come for their cut as well.

nVidia looks to have publicly disclosed the use of these author's works and then tried to scrub it from existence once the authors were made aware and nVidia had already gotten what they needed from the demo. These are not out of copyright works, so are still able to generate revenue for their rights holders.

But, this is just wot I believe to be the actual issue, as money is always causes these types of suits.
If this was a 1:1 situation I would agree, but were talking about AI using the works of 1000s, "196,640 books" to be exact, to "train" AI into creating something completely new. This lawsuit does not point to any output from AI where an author is stating 'that is my original work they used without citation', they are merely suing because their work was used for training/learning.

The very words I'm writing now were influnced by all the books I've read and experiences in my life, should I be sued for copywrite because I learned something in my life? Who knows, I might even take those experiences that influence me and use it to make MILLION$ and grow a successful company... I owe those authors nothing! Whether this is a human or a machine shouldnt matter IMO.
 
If this was a 1:1 situation I would agree, but were talking about AI using the works of 1000s, "196,640 books" to be exact, to "train" AI into creating something completely new.
How about stealing 196,640 dollars from one person, or one dollar each from 196,640 people. Is one stealing and the other not?

The big difference is that no single victim can claim any significant amount of damage, since they were all stolen from.

Does it create something "completely new"? No, it does not. it is only a novel way to steal.
 
  • Like
Reactions: PEnns
How about stealing 196,640 dollars from one person, or one dollar each from 196,640 people. Is one stealing and the other not?

The big difference is that no single victim can claim any significant amount of damage, since they were all stolen from.

Does it create something "completely new"? No, it does not. it is only a novel way to steal.
I dont think its stealing, just as I dont believe the MILLIONS of people going to the library and reading a book for free to gain knowledge is stealing. Again I ask, how is this any different it being a machine vs human?
And if an AI creates "something completely new" that goes bad, then who is liable?

Just wondering....
Oppenheimer read 1000s of books which gave him knowledge to create the atomic bomb, of those books I guarantee many were obtained "free" so to Findecanor was this knowledge stolen?... who is liable?
 
The very words I'm writing now were influnced by all the books I've read and experiences in my life, should I be sued for copywrite because I learned something in my life? Who knows, I might even take those experiences that influence me and use it to make MILLION$ and grow a successful company... I owe those authors nothing! Whether this is a human or a machine shouldnt matter IMO.
You also paid for every single book read that wasn't provided to you for free by someone who did or that was out of copyright.

You already paid the authors their due when you bought their book or the library you read them from did when THEY bought it from the publisher.

NVIDIA didn't.

We can certainly argue that AI learning is no different from human learning but if that's the case then it should be subject to the exact same laws we are subject to and that means AI companies need to pay for the content their AI is digesting just as we have to.
 
Last edited:
  • Like
Reactions: NedSmelly
Just throwing this out there to stir some thought:
If I buy a book for my kid, they read it and learn, is this copywrite infringement? What if I dont buy the book, but instead goto the local library and read it for free? Does my kid now have to cite the author anytime they speak because their learned-experience comes from a book, or a collection of 1000s, they read?

Humans learn through our collective experiences, including from books we read, we dont just gain knowledge through osmosis. The author of those books we read do not own our thoughts, just as they shouldnt own the "thoughts" of AI. I guess the question is, what constitutes original thought, one could argue otherwise that the books the authors wrote arent original works but a collection of personal experiences the author gathered from reading others' books.
fair use is a thing.

However LLM "ai's" are for profit and just copy stuff.

Imagine if you took a book, copied it, and republished it to get $ from it.

thats what llm's do with their content. They steal others work. They didn't pay for the thing nor did they ask permission to use it.
 
And if an AI creates "something completely new" that goes bad, then who is liable?

Just wondering....
Good question.

I believe a certain company that owns a certain AI that is working in a "good" way, will always take credit for it.

But that will change quickly / deny responsibility should their AI product goes bad....
 
Ethically, though, it seems rather clear that "intelligence" that can only function by taking everyone else's work should probably be doing more than just printing money for its operators without accountability.
I'd be careful where you're going with this though. You wrote an article and were presumably paid for this article even though the article summarizes or paraphrases what is in a Reuters article. Did you get their permission to write this article and make money from what is essentially their work?
 
I'd be careful where you're going with this though. You wrote an article and were presumably paid for this article even though the article summarizes or paraphrases what is in a Reuters article. Did you get their permission to write this article and make money from what is essentially their work?
Did you miss where the reports were linked to in this article? This is a common citation situation that you're trying to compare to a language model built off of stolen work.
 
I see a significant tax being levied on AI model users and producers taxing the users (Nvidia for example) and paying the content owners (Individual writers, or perhaps Disney) The AI models are essentially databases of no value without content. If the politicians favor the content producers, we will se something like a 90% tax on AI software and hardware and software products. What does this do the the perceived value of companies like NVIDIA? It gets complicated since many databases will be trained solely on open source data and literature with expired copyrights.
The answer will come form politicians, worldwide.
 
And if an AI creates "something completely new" that goes bad, then who is liable?

Just wondering....
Until AI is granted some kind of legal status as a person, then the liable parties are the people/companies who own the hardware that the AI is running on, the people/companies who created the software that claims to be the AI, and anyone else that has a part in allowing the AI to do whatever 'bad' thing it did.
 
I am reminding myself about Forum rules and policies especially those centered on "No GRAPES".

However, I believe that I am safe in saying that if you entangle such things as AI, corporations, legal status, liability, politicians, technology, and worldwide then even good things can become bad.

Einstein is generally quoted with saying:

"The significant problems we face cannot be solved at the same level of thinking we were at when we created them."

(Full disclosure: Or words and phrasing to that effect per various debates.)

I do not believe that AI is a higher level of thinking.

Think about it.
 
  • Like
Reactions: Order 66
Did you miss where the reports were linked to in this article? This is a common citation situation that you're trying to compare to a language model built off of stolen work.

No I didn't, but AI will cite sources too and yet some are suing because they're accessing their content without paying. How long until a simple citation is no longer satisfactory and they start demanding they pay a fee since they're not getting traffic and advertising money yet the other sites are, similar to how social media and search engine sites are having to pay companies whose articles are linked through their sites. Most of the articles on secondary source sites like TomsHardware would not exist or have to pay a fee to exist if pay to use laws were in place, and most sites like TomsHardware would cease to exist because far too little original content is posted to finance their existence.

If AI is doing one thing very well it's exposing hipocracy, and people who cry foul need to see how their crying foul could quite possible do more harm than good.
 
No I didn't, but AI will cite sources too and yet some are suing because they're accessing their content without paying.
If the content is behind a pay wall any user could be sued for circumventing it and accessing the content for free as that would be piracy.

If I access pay walled content without paying a fee and in using it to write an article end up citing it, I can still be sued over it, the citations don't change that.

Humans don't have the right to access copyrighted content without paying for it and AI doesn't either.

A copyright violation is a copyright violation, it doesn't matter who does it or why.
 
Last edited:
AI Bros still pushing the idea that they have steal any and all intellectual IP as long as they launder it through an AI first? Yeah that's not gonna fly as the mounting legal challenges start to show and limitless legal liability starts to become aparent.

Simply put, you absolutely can not, in no uncertain terms, use another persons Intellectual Property without permission. Fair Use is not this unlimited magic word, especially when it involves anything commercial.

https://www.copyright.gov/fair-use/#:~:text=Fair use is a legal,protected works in certain circumstances.

Transformative uses are those that add something new, with a further purpose or different character, and do not substitute for the original use of the work.

If the use includes a large portion of the copyrighted work, fair use is less likely to be found; if the use employs only a small amount of copyrighted material, fair use is more likely.

Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner’s original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work (for example, by displacing sales of the original) and/or whether the use could cause substantial harm if it were to become widespread.

Feeding entire copywrites works to an AI as "training data", then having that AI regurgitate those works word for word while selling access to it as a way to not have to purchase those original works, breaks all sorts of rules. Companies ignored all this while racing to produce the "best trained" AI in order to capitalize on the AI gold rush, trusting on using magical words and obfuscation to hide blatant IP theft.

These is nothing wrong with github feeding source code to it's AI as training data, provided it has the legal right to that code. Copying GPL code without following the GPL itself is unlawful, regardless if it's laundered through an AI or not.
 
Feeding entire copywrites works to an AI as "training data", then having that AI regurgitate those works word for word while selling access to it as a way to not have to purchase those original works, breaks all sorts of rules. Companies ignored all this while racing to produce the "best trained" AI in order to capitalize on the AI gold rush, trusting on using magical words and obfuscation to hide blatant IP theft.
Something that might badly bite them in the ass if the courts decide to establish a legal understanding that AI is allowed to freely access copyrighted works for the purposes of training as it's merely learning as humans do.

Because funnily enough, humans also learn as humans do and I'm sure you can imagine where that might end up.
 
  • Like
Reactions: Dantte
Something that might badly bite them in the ass if the courts decide to establish a legal understanding that AI is allowed to freely access copyrighted works for the purposes of training as it's merely learning as humans do.

Because funnily enough, humans also learn as humans do and I'm sure you can imagine where that might end up.
100%.

I will admit I have way more questions than answers, but I'm of the opinion that "you cant make an omelet without breaking a few eggs." We have to let people/AI fail before we can succeed and move forward.

I want to give AI the same freedom to learn/train in the same way a Human does and see what happen, if they then regurgitate the information word for word (Harvord University anyone?) instead of coming up with their own OG thoughts, yes 100% sue them (or their creators), but not before.
 
  • Like
Reactions: helper800
Something that might badly bite them in the ass if the courts decide to establish a legal understanding that AI is allowed to freely access copyrighted works for the purposes of training as it's merely learning as humans do.

Because funnily enough, humans also learn as humans do and I'm sure you can imagine where that might end up.

Yeah no court is going to say that. These arguments have already been made in various forms throughout the years and is the entire reason Fair Use doctrine was established. The only difference now is the use of magical words like "AI" and "Neural Network" to try to hide IP theft. I guarantee they deliberately didn't consult copyright attorneys on the matter as a way to avoid legal liability in a "better to ask forgiveness then permission" tactic.

If I take a digital copy of a copyrighted book, and feed it through a perl / python program that parses deconstructs the material into segments, then reconstitutes those segments on demand, then I have still violated the IP ownership rights of that book. Now if I argue it's for non-commercial educational purposes and do not generate any revenue from the act, the courts would look favorably as I'm not harming the original copyright holder. Alternatively if I instead sell the results of my program in a way that leads a reasonable person to purchase my program instead of the copyright holders book, then the courts will look very unfavorably against me.

Intent and economical impact have massive implications on the ruling of fair use. And since "AI" is the new gold rush, meaning all these people are trying to commercialize access to derivative works of unlicensed copyrighted material, the courts are going to crush them. You can summarize this entire argument as "Amazon can not sell digital copies of textbooks they have not purchased the license for".
 
Last edited:
Yeah no court is going to say that. These arguments have already been made in various forms throughout the years and is the entire reason Fair Use doctrine was established.
And copyright holders have been trying to quash Fair Use at every opportunity since.
Handing (e.g. Disney) a legal instrument that says "you do not distribute a direct copy of a work we hold copyright for, you cannot produce a direct copy of our copyrighted work using your tool, but you looked at our work in the past, so you owe us money in perpetuity" is an absolutely terrible idea.
 
You an summarize this entire argument as "Amazon can not sell digital copies of textbooks they have not purchased the license for".
I made that very argument up thread.
Yeah no court is going to say that.

I do agree it's unlikely bordering on impossible for any court to actually accept that sort of argument, that's the point I was trying to get across, that it's a demented argument that's doomed to fail and that on the unlikely event it's actually accepted, it would hurt them far more than it could ever help.
 
And copyright holders have been trying to quash Fair Use at every opportunity since.
Handing (e.g. Disney) a legal instrument that says "you do not distribute a direct copy of a work we hold copyright for, you cannot produce a direct copy of our copyrighted work using your tool, but you looked at our work in the past, so you owe us money in perpetuity" is an absolutely terrible idea.

There is no tool being handed period, these rulings already exist as case precedent. People are always trying to make a buck and it's easier to do that if the can shortcut the process and use other peoples works. The commercialization aspect is what crush's any attempt at justifying using copyrighted material without a license.

I even linked the legal framework of Fair Use doctrine, it's very easy to understand the four point test the courts have created. Of all the aspects, purpose and economic harm are weighed the most. Was the unlicensed use for commercial or educational use? Did the unlicensed use have a negative economic impact to the copyright holder? If the answer is "commercial" and "yes" then it's a slam dunk case.

Because of how slam dunk that is, AI Bro's keep trying to argue that unauthorized copying and distribution is not stealing if "AI" and "Neural Networks" is doing the theft and distribution. Those arguments might work in enthusiasts circles but professional jurists have historically take a very dim view towards attempting to use language to circumvent copyright law. The entire "but humans can blah" is also a dead argument as there exists precedent on transformative and derivative works. If a human reads a book and memorized every sentence, then later writes those sentences and recreates the work to sell, the human is still guilty of copyright violation. Who and what is doing the copying and distribution is irrelevant, only that copywrited material was copied and distributed without a valid license.
 
Last edited: