News Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train...

Page 3 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

Vixzer

Distinguished
Apr 5, 2014
12
3
18,515
I understand the outrage because it feels like a betrayal of trust between the platform and the users regarding the content they provided voluntarily to make the platform successful. However, to not see this coming based on what is going on all over the web right now with AI is naive to say the least.

Companies exist to make money, selling their data to AI companies is a new way to monetize the content they have when things like ad revenue is dropping year over year. Stack Overflow is just one of many sites that are content rich where taking this approach to revenue generation will become the norm. Most people will never care that this is the case either.

Unless people want to pay out of pocket to use these types of platforms, it is highly unlikely this practice will change. We need to start being realistic about how these platforms make money when they aren't charging their users anything to pay for their operational costs. As many, many people have pointed out about these platforms in the past, you and your content are the products they are selling if they aren't charging you anything to use the platform.
When the "service is free", you (your info) are the product.
 
  • Like
Reactions: bit_user

Giroro

Splendid
Fun fact: When an AI trainer removes credit from a source to conceal plagiarism, they call it "laundering".

Now I need to figure out what to call it when a company suddenly decides you can't sue them for any reason forever, because you made an online account once.
 
  • Like
Reactions: slightnitpick
Uh, if there is not a physically written signature, then there is no contract. This is a fundamental as contract law gets.

You cannot enter into a legally binding agreement with the click of a mouse.

Because of this, StackOverflow has zero rights over any user generated content, unless they have a properly signed legal document from both parties stating that they do.

Terms of Service are meaningless without a signature, but everyone believes they are binding and so they go on and will this ridiculous reality into existence by mere voluntary acquiescence.

To the harmed parities, get a contract lawyer who understands the above, take StackOverflow to court, and lets set a precedent across all industry to end the ToS charade once and for all.
Well I guess I can stop paying on my mortgage and credits card, since there was no physical signature..

/s if it's not clear. One can absolutely enter into a legally binding contract by the click of a mouse. Always read the fine print, always read the EULA in its entirety.
 
May 9, 2024
1
1
10
Vote with your feet. There are other technical communities out there besides them. Let the AI Bots have it. Bots can write it, bots can read it and it will stagnate and die. They will either reverse course, go a subscription route, or go under.
 
  • Like
Reactions: Bluoper

Giroro

Splendid
So here's my pitch on how we fix the AI copyright issue before it gets too far off the rails:

If an AI is trained on any amount of "publicly available" data, then 100% of the output of that AI is to be considered public domain and non-copy writable, forever. If "Nobody" made that output, then "Nobody" will be allowed to own it.
For example, if an AI wrote a movie script: a movie based on that script would be a derivative work that could be protected by copywrite. But anybody anywhere would be allowed to make another movie with that exact script, for free.
A movie company would never accept that, so therefore we would have no more AI generated movie scripts - and this is accomplished in a way that would not require an attempt to ban technology that is frankly too late to stop, at least on a technical level.
It would severely remove the financial motivations of plagiarists who are using AI simply to profit from theft of other people's works.
 

TJ Hooker

Titan
Ambassador
This may depend on jurisdiction. I know that oral agreements are legally binding. They're just hard to enforce if one of the parties disagrees about what was agree upon, unless additional evidence exists.

According to the following website, what makes a contract legally binding are the particular elements, not a particular format, though some statutory exceptions do exist: https://www.lawdepot.com/resources/business-articles/are-verbal-contracts-legally-binding/

Edit to add: I just read a bit more from that website. Apparently copyright is covered under the common law "statute of frauds", which may or may not still be in operation depending on common law jurisdiction.

According to Cornell Law: https://www.law.cornell.edu/wex/statute_of_frauds

So it looks like you may be right. A written contract physically signed by both parties is required. I'd bet that the courts are more likely to favorably interpret a terms of service agreement as an 'electronic signature' than they are to nullify Stack Overflows terms of service in this case, but your argument could very well be worth pursuing in court.
California's modernized "statute of frauds" appears at first glance to my non-lawyer eyes to support Stack Overflow, here:
Your first link appears to be about the requirement (in specific cases) for written contracts, in contrast to oral agreements (see the cases referenced within your link, para 20). I don't think anyone is arguing that agreeing to a digital TOS is an oral agreement, so I don't think it's relevant. Your 2nd link appears to reinforce the idea that, at least in California, electronic agreements are explicitly considered to constitute a "written contract". I don't see how anything you linked could be interpreted as supporting LibertyWell's claim.
 
  • Like
Reactions: bit_user

TJ Hooker

Titan
Ambassador
" However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site. "

Which is of no importance in Europe, at least in the countries I know, in Europe the Law cannot be called into question by a private contract unless the Law specifically provides for this exception.

Afterwards, whether the EU will enforce the law is another question. It has been years since we have still been unable to resell our games on Steam even though it is a right for the European consumer, and Steam is still legally accessible on the European market..
The Toms article is misstating the Stack Overflow TOS in the line you quoted. What the TOS actually says about user content ("Subscriber Content") is that it's perpetually licensed to (not owned by) Stack Overflow. Given that EU law (as anywhere else) obviously permits the licensing of protected IP, I don't see any reason to assume that would be unenforceable.
 
Last edited:

JamesJones44

Reputable
Jan 22, 2021
704
649
5,760
LLMs are a wet dream for all these companies sitting on a trove of other people's work.
Finally they have a way to monetize all the goodwill they've amassed. Someone that will finally buy the thoughts and creations they claim all rights to.
Nevermind that you're burning all the trust you've built with users. What are they gonna do about it?

They can always move to the fediverse, where their data can be scraped like it's 2001 again. But hey, at least they"own" it.
I think the real issue is people in general have become over complacent in the last 10ish years when it comes to non-political issues. Companies and people can dump all over someone(s) and it will generate some blow back on Social Media but no one takes any real action against those doing it. LVMH could come out and say anyone who buys their products is dumb and after weathering the momentary rants about it on Social Media I think it would actually increase their sales
 

slightnitpick

Proper
Nov 2, 2023
133
88
160
Your first link appears to be about the requirement (in specific cases) for written contracts, in contrast to oral agreements (see the cases referenced within your link, para 20). I don't think anyone is arguing that agreeing to a digital TOS is an oral agreement, so I don't think it's relevant. Your 2nd link appears to reinforce the idea that, at least in California, electronic agreements are explicitly considered to constitute a "written contract". I don't see how anything you linked could be interpreted as supporting LibertyWell's claim.
I was changing my opinion as I added links. I thought this was understandable in context given I explicitly stated near the end of the post that I believe that the California statute "supports Stack Overflow". But that's just California. There's no way in hell I'm looking at the other 49 states (one or two of which are not common law jurisdiction - e.g. Louisiana which is based on the Napoleonic code) plus the other common law countries.
 
  • Like
Reactions: TJ Hooker

35below0

Commendable
Jan 3, 2024
1,148
515
1,590
Fun fact: When an AI trainer removes credit from a source to conceal plagiarism, they call it "laundering".

Now I need to figure out what to call it when a company suddenly decides you can't sue them for any reason forever, because you made an online account once.
You call a lawyer.
 
Mar 31, 2024
3
12
15
The notion that valid contracts require a physical signature is patently false. You can enter into a legally binding contract with the click of a mouse.

"a genuine clickwrap agreement, in which a service provider places a TOS just adjacent to or below a click-button (or check-box), has been held to be sufficient to indicate the user agreed to the listed terms."
https://www.eff.org/wp/clicks-bind-ways-users-agree-online-terms-service

Of course, sometimes some of the terms themselves are not legally enforceable, but that's a different matter.

Disagree, but even still:

My cat ran over the keyboard and hit the enter button before I could read it.

Since there is no way to verify who clicked "I agree" there is no contract. Period.
 
  • Like
Reactions: CmdrShepard

Nicholas Steel

Distinguished
Sep 12, 2015
28
7
18,535
They should be editing their posts and adding in mistakes and errors, much, muuuuch harder for moderators to detect and correct.

Deleted comments would be restored with somewhat ease because the Delete function likely doesn't actually delete it, instead just hiding it from regular Users.
 

TJ Hooker

Titan
Ambassador
Disagree, but even still:

My cat ran over the keyboard and hit the enter button before I could read it.

Since there is no way to verify who clicked "I agree" there is no contract. Period.
Unfortunately the law still applies whether you agree with it or not. And the current law in the US, upheld in court multiple times (e.g. Feldman v Google), is that clicking I Agree to a digital TOS is a legally binding agreement.

Sure, if you end up in court over a digital TOS dispute you could try claiming you never agreed, that it was your cat or whatnot. Not that different than claiming that a signature on a piece of paper isn't yours in the case of a physical written contract. No idea what your odds of winning would be with that defense. I think contract disputes would go to civil court, where the standard of proof is much lower (they'd just have to prove that it's 'more likely than not' that it was you who clicked Agree). Plus you'd be perjuring yourself, not sure what risks that would entail.
 
Last edited:

Tech0000

Reputable
Jan 30, 2021
23
20
4,515
So regardless of the legality (i.e. vulnerability for law suits in EU the US etc.) of what SO is doing, the conclusion is going to be that the best programmers and best architects will not share their insightful answers at SO any longer. point period.

Come to think of it, they may not share anything any more, anywhere (with poor TOS) just to have some one else harvest and monetize their contribution. All instincts just tell me it's monetization of plagiarism - just plain ethically wrong regardless of legality.

So with that, while SO will (potentially - pending potential legal actions or not) make some money in the short term from their agreement with open AI, their decision will kill their forum long term (maybe even medium term). Because SO (and similar forums) stand and fall with the sharpest people are voluntarily contributing their best ideas and solutions to real problems. They will no longer contribute to SO - point period.

SO's way out (to keep SO alive) would be to pay contributors in the future for providing best answer to a question. But honestly this is a new can of worms that can be gamed by cleaver people posting and the answering the same question... so nah...

More likely is that there will be a new generation of forums that will have water tight TOS with legally water tight change process (so provider cannot just arbitrarily change TOS and then own your contributions and can resell them) so that people can trust in them not to become the next SO farm.

It is sad it has to come to this but as a client once told during a turbulent pre-IPO time "greed has no limits" and when people are offered enough money... values and ethics goes out of the window.

I'm so done with SO it's unbelievable - bye, bye.
 
May 9, 2024
2
4
15
Stack Overflow is "Free".
You pay nothing.
Instead, you donate your content.
To be clear, I think what SO is doing may be legal but IMO it's not ethical.
Maybe the solution is to subscribe to a service like SO that charges a fee to cover the cost of operation and make a profit, but offers air-tight TOS that protects the contributors content. People like me who are seeking help, pay a fee for access, and contributors who are providing a lot of correct answers pay nothing or even make some money. I hope it happens.
 

bit_user

Polypheme
Ambassador
the conclusion is going to be that the best programmers and best architects will not share their insightful answers at SO any longer.
Seems like wishful thinking. I'm sure some will still use the site as before.

I publish a small amount of open source software. I don't really care who uses it, or for what. If some AI model learning from it will help someone in some way, I'm okay with that. Whether or not AI can use my code for training ultimately won't make much difference, though.

Unlike language, AI doesn't really need to read a ton of code to become competent. Since programming is a task that can be quantified, you could probably just feed a foundational model some books or language standard documents and then just have it learn by doing. Give it a series of programming problems, a compiler, and a scoring system which judges how well it does.

I think that's not too unlike how it designed a CPU:
 
  • Like
Reactions: TJ Hooker

bit_user

Polypheme
Ambassador
Maybe the solution is to subscribe to a service like SO that charges a fee to cover the cost of operation and make a profit, but offers air-tight TOS that protects the contributors content.
Seems like more wishful thinking.

StackOverflow benefits from the network effect. The more people use it, the more worthwhile it becomes. So, its popularity acts like a positive feedback loop.

However, there's also a downside to that phenomenon - it's really difficult to boostrap a new site like it. Back when it started, there weren't a whole lot of sites like it. There was Yahoo Answers and maybe Quora, but the Internet was pretty ripe for SO to come along and show them all how it's done. If you start a new competitor that costs money to use and has virtually no content... who would pay for that? And with no users, you'll never get enough content to make it worthwhile and that positive feedback loop never gets going.

Worse yet, you can never be guaranteed some site admins aren't cheating and selling access to the data anyhow. Or that crafty AI companies aren't circumventing its protections and scraping the data without permission.

Like it or not, I think we've basically made ourselves obsolete (in the long run). It's a pandora's box. It can't be closed. At least, not without sending all of humanity back to the dark ages, which would undoubtedly cost billions of lives in the process (wars, famine, all the usual post apocalyptic Mad Max stuff, etc.).
 

Lug

Commendable
May 19, 2021
6
3
1,515
No, if that were a thing it would be listed on the official site of the commission as a right for the consumer but it's not.
They do state that the consumer has the right to return a product that doesn't work (anymore) and that is what happened here.
https://curia.europa.eu/jcms/upload/docs/application/pdf/2012-07/cp120094en.pdf

"By its judgment delivered today, the Court explains that the principle of exhaustion of thedistribution right applies not only where the copyright holder markets copies of hissoftware on a material medium (CD-ROM or DVD) but also where he distributes them bymeans of downloads from his website."

"Therefore, even if the licence1 Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection ofcomputer programs (OJ 2009 L 111, p. 16).www.curia.europa.euagreement prohibits a further transfer, the rightholder can no longer oppose the resale ofthat copy. "
 
AI is essentially automated plagiarism.

If you ask AI about a subject you are knowledgeable about, you'll quickly find that all it does is aggregate what it finds on the internet into an answer that reflects the current zeitgeist (most widely held current opinion/belief) about that subject. It often steals verbiage from multiple sources and combines them, such that you can see the same phrases and descriptions.

This is the core of how all "Mr Wizard" ChatGPT style chat bots work. They are not reaching into the aether channeling mystic zero point energy to find answers to the mysteries of the universe. They are just doing google searches and gluing multiple, usually incorrect, responses together.
 
Well I guess I can stop paying on my mortgage and credits card, since there was no physical signature..

/s if it's not clear. One can absolutely enter into a legally binding contract by the click of a mouse. Always read the fine print, always read the EULA in its entirety.

It .. depends. Virtually all EULA's are nonsense and unenforceable. In order for a contract to be valid there needs to firstly be consideration, if it entails one party "giving up" something then that party must also obtain something else of value. It also needs to be clear with what is being gained and given up clearly stated, lately courts have started clamping down on "vague legalese" statements. Finally it can not violate existing law, this is where most of the whole "you can't sue us forever" statements fail hard. Forced arbitration only goes so far and Judges have been known to disguard it entirely if they think it's one sided.
 
  • Like
Reactions: Li Ken-un

bit_user

Polypheme
Ambassador
This is the core of how all "Mr Wizard" ChatGPT style chat bots work. They are not reaching into the aether channeling mystic zero point energy to find answers to the mysteries of the universe. They are just doing google searches and gluing multiple, usually incorrect, responses together.
Conventional LLMs don't do google searches. I can't say none do or ever will, but that's not fundamentally how the technology works. Their knowledge is represented in the model, itself.

Given that they're modeled on an rough approximation of how the brain works, this really shouldn't even come as a surprise or be so difficult to accept.
 
Last edited:
  • Like
Reactions: TJ Hooker

slightnitpick

Proper
Nov 2, 2023
133
88
160
It .. depends. Virtually all EULA's are nonsense and unenforceable. In order for a contract to be valid there needs to firstly be consideration, if it entails one party "giving up" something then that party must also obtain something else of value. It also needs to be clear with what is being gained and given up clearly stated, lately courts have started clamping down on "vague legalese" statements. Finally it can not violate existing law, this is where most of the whole "you can't sue us forever" statements fail hard. Forced arbitration only goes so far and Judges have been known to disguard it entirely if they think it's one sided.
This is true, but it's quite easy to argue that very little consideration is necessary in the SO case as the users are literally giving it away. Providing a forum for the users to give away their knowledge, get credited for the initial give-away, and also get their knowledge corrected or added to, would probably be considered sufficient consideration by a court.

So what if SO makes further money off of this from a future license to an AI company? A person who purchased a painting for $100 from the artist is entitled to rent it out or sell it off for $1 million later without any more consideration to the artist. To make the parallel more stringent, someone who licensed full rights for a painting for $10 is entitled to exercise those rights by licensing rights to make prints of, or mashups featuring parts of, the painting.
 
  • Like
Reactions: 35below0

slightnitpick

Proper
Nov 2, 2023
133
88
160
Given that they're modeled on an rough approximation of how the brain works, this really shouldn't even come as a surprise or be so difficult to accept.
They may*be modeled on how the human brain works to learn grammar and syntax. I'm not sure they're modeled on how the human brain works to learn content.

https://cognitiveworld.com/articles/large-language-models-a-cognitive-and-neuroscience-perspective
“The many failures of LLMs on non-linguistic tasks do not undermine them as good models of language processing,” wrote the authors in conclusion. “After all, the set of areas that support language processing in the human brain also cannot do math, solve logical problems, or even track the meaning of a story across multiple paragraphs.”

“Finally, to those who are looking to language models as a route to AGI, we suggest that, instead of or in addition to scaling up the size of the models, more promising solutions will come in the form of modular architectures — pre-specified or emergent — that, like the human brain, integrate language processing with additional systems that carry out perception, reasoning, and planning.”
* And even that is still a questionable hypothesis in need of testing. And given the demands on LLMs it's doubtful that the programmers are even attempting to model human language and idea development:
https://www.sciencedirect.com/science/article/pii/S1364661323002024
A model of human language processing should receive the same types of input, and face the same linguistic challenges, as humans. For instance, LLMs should exhibit a human-like trajectory of language acquisition, based on input of the size and content available to children. They should process not only text but also speech/sign and use multimodal information. Such work is already underway; yet future research should incorporate additional neurocognitive constraints.
  • In current LLMs, all words are available for processing in parallel. However, human language processing is strongly shaped by the sequential nature of the input because working memory capacity is limited: representations of past input decay and interfere with one another.
  • Current LLMs are feed-forward, but the human brain critically relies on recurrent processing.
  • LLMs integrate information over thousands of words, whereas language processing brain regions integrate information over fewer than 15 words (broader context is integrated in downstream regions related to episodic cognition [12]).
  • LLMs are fine-tuned on a variety of tasks, but not every task conveyed via language is a linguistic task. Language processing brain regions selectively engage in linguistic processing but not in, for example, arithmetic, logical entailment, common sense reasoning, social cognition, or processing of event schema [12].
 
  • Like
Reactions: palladin9479
They may*be modeled on how the human brain works to learn grammar and syntax. I'm not sure they're modeled on how the human brain works to learn content.

Oh that myth. Yeah it has nothing to do with how the human brains axions and neurons actually work because neurologists aren't even sure about how they work yet. The whole "AI Training is how your brain works" is really just a reference to it using massive parallelism to construct equally massive arrays that are then used as reference points by the algorithms. First you need to identify unique constructs, for english language it's words "like", "the", "cat", "house" and so forth along with grammatical phrases like "going to". Each construct is given a unique number. Now during learning when the training algorithm hits a sequence of words, it goes to that position in the array and increments it. "The brown fox" would be Data['the','brown','fox'] and that position would be incremented. Of course that is just a three dimensional, there are tens of thousands of positions so imagine an Data[] reference with sixty thousand entries, each of them the ID of a word of phrase and the value being the number of times that combination was found in the training data. This model would essentially represent every possible combination of words and phrases in the english language. Take that previous three values ['the','brown','fox'], we could then lookup all the possible fourth values as ['the','brown','fox',X] then chose the one with the highest value, meaning it's found the most often next to those three words, say it's 'jumped' for Data['the','brown','fox','jumped']. We want the fifth value we do the same thing, lookup all entries of the fifth dimension and pick the one with the highest value to append to the return. And keep doing this until we hit a stop condition.

We've been able to do this for decades just ... the computational requirements were insane as the method was scalar (linear) of computing each dimension one at a time. The miracle was that someone discovered a way to sanely ingest that information using vector processing. Combining the most powerful vector processors of our time (NVidia GPUs) with higher bandwidth memory, storage and web crawling (aka google) the developers were able to gather truly massive amounts of raw information and feed it directly into that vector processing to produce that ginormous mathematical construct.