News Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train...

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.

bit_user

Polypheme
Ambassador
They may*be modeled on how the human brain works to learn grammar and syntax. I'm not sure they're modeled on how the human brain works to learn content.

https://cognitiveworld.com/articles/large-language-models-a-cognitive-and-neuroscience-perspective
I was making a more fundamental point than that. The notion that LLMs are "just statistics" is about as accurate as saying a wet brain is "just statistics", which is to say not at all. Even if a LLM's design takes liberties in how it approaches certain tasks, it's still neuromorphic in its fundamental nature. The basic mechanics of its connectivity, activation functions, and learning are all based on a highly-abstracted model of how biological brains work.

* And even that is still a questionable hypothesis in need of testing. And given the demands on LLMs it's doubtful that the programmers are even attempting to model human language and idea development:
https://www.sciencedirect.com/science/article/pii/S1364661323002024
To me, this is somewhat akin to someone in the early industrial age showing that a wood-fired, steam-powered automobile is infeasible and therefore horse-drawn buggies will never be made obsolete. It's fair to understand the capabilities and limitations of existing tech, but we shouldn't presume it represents the only path of development.

I think it would be nuts to argue that LLMs are the ultimate form of neuromorphic computing. It's just a particular branch of the field that showed unexpectedly promising results. We can anticipate new techniques and architectures arising, in the coming years and decades. Maybe some of those will borrow more heavily from what's understood about human cognition, but an even more exciting aspect is that they could take a somewhat or substantially different course!
 

bit_user

Polypheme
Ambassador
Oh that myth. Yeah it has nothing to do with how the human brains axions and neurons actually work because neurologists aren't even sure about how they work yet.
I'm not claiming everything is known about brains, but certainly enough is known about them, in abstract, to apply similar techniques. Not to mention you'd obviously need to have a pretty good understanding to interface with them like NeuraLink is doing!


The whole "AI Training is how your brain works" is really just a reference to it using massive parallelism to construct equally massive arrays that are then used as reference points by the algorithms. First you need to identify unique constructs, for english language it's words "like", "the", "cat", "house" and so forth along with grammatical phrases like "going to". Each construct is given a unique number. Now during learning when the training algorithm hits a sequence of words, it goes to that position in the array and increments it. "The brown fox" would be Data['the','brown','fox'] and that position would be incremented. Of course that is just a three dimensional, there are tens of thousands of positions so imagine an Data[] reference with sixty thousand entries, each of them the ID of a word of phrase and the value being the number of times that combination was found in the training data. This model would essentially represent every possible combination of words and phrases in the english language. Take that previous three values ['the','brown','fox'], we could then lookup all the possible fourth values as ['the','brown','fox',X] then chose the one with the highest value, meaning it's found the most often next to those three words, say it's 'jumped' for Data['the','brown','fox','jumped']. We want the fifth value we do the same thing, lookup all entries of the fifth dimension and pick the one with the highest value to append to the return. And keep doing this until we hit a stop condition.
No, this is completely wrong. Cite a source ...if you can.

If it were just a simple statistical model of text sequences, that wouldn't explain how it could write a sonnet about some topic nobody has ever written one about, such as computer memory. It also doesn't explain how it generates answers to questions or forms logically consistent answers (when it does, which happens a lot more than a simple statistical model ever would).

All of these things happen because it has a higher-level model of language than mere word sequences. It has some basic model of concepts and complex internal state.
 
Last edited:

slightnitpick

Proper
Nov 2, 2023
132
86
160
I'm not claiming everything is known about brains, but certainly enough is known about them, in abstract, to apply similar techniques. Not to mention you'd obviously need to have a pretty good understanding to interface with them like NeuraLink is doing!
That's like saying you need a good understanding of a CPU's ISA in order to implement a USB interface. Understanding what the brain intends to do with certain signals or potentials is a lot different than understanding how a particular brain region, or the entire brain, works. As of now we aren't replacing brain regions, we don't have the ability to, and when we do it will first be with stem-cell grown biological cells. It's going to be a long, long while before we can even begin to understand enough to think to replace small parts of the brain with something other than self-assembling tissues and cells.
To me, this is somewhat akin to someone in the early industrial age showing that a wood-fired, steam-powered automobile is infeasible and therefore horse-drawn buggies will never be made obsolete. It's fair to understand the capabilities and limitations of existing tech, but we shouldn't presume it represents the only path of development.
Sure, and that's part of the point. The only thing driving a silicon understanding of the human brain is health care needs and the occasional transhumanist multimillionaire. There's a lot more profit to be made going in a completely different direction. Even the first steam-powered automobile makers weren't trying to replicate a horse (though it's another story for the first people attempting to build a heavier than air aircraft :D ).
 
Last edited:
  • Like
Reactions: adamboy64

bit_user

Polypheme
Ambassador
That's like saying you need a good understanding of a CPU's ISA in order to implement a USB interface.
The point I was responding to was Palladin's claim that "neurologists aren't even sure about how they (the human brains axions and neurons) work yet." You're taking my answer out of context.

Understanding what the brain intends to do with certain signals or potentials is a lot different than understanding how a particular brain region, or the entire brain, works.
It's possible to understand the basic mechanisms well enough that we can abstract the operational principles and use them to design machines or algorithms that also learn, process, and represent information in a different way. The field of artificial neural networks goes back more than 50 years!

Technology is full of examples where people have studied how nature solves a problem or performs some specific trick and then abstracted the basic principles and replicated the behavior in some machine that differs in most other respects. It needn't be a precise replica of the exact solution implemented in biology to still overcome a similar challenge in a similar way.

Sure, and that's part of the point. The only thing driving a silicon understanding of the human brain is health care needs and the occasional transhumanist multimillionaire.
No, I'm sure that the AI community is also interested in unlocking the remaining secrets of human cognition, in order to replicate and possibly improve on them.
 

CmdrShepard

Prominent
Dec 18, 2023
262
217
560
Have any evidence of that actually happening?
You don't really need evidence -- just check what data Windows 10/11 collect and send to Microsoft by default then spend a couple of hours devising group policy for all elements of the operating system + browsers + office to stop that and you will hopefully understand what I am talking about.
I think that gets overstated and romanticized by Hollywood. It's a story they like to tell about themselves, in order to set themselves apart from other criminals. It turns out they're all just parasites, feeding off the rest of society. Any code they have is principally out of self-interest.
That may be so, but at least they weren't killing that proverbial goose which lays those golden eggs like the current crop of parasites does. They knew when to stop leeching in order not to antagonize the host.
Most answers on Stack Overflow are things people learned from someone or somewhere else, like the user manual, a blog, a book, etc. Good answers will generally link to some, but people frequently don't. Aren't they just as guilty as an AI model, if they regurgitate facts they learned elsewhere, without attribution?
SO rules are that you should link to source when you have it and copy the relevant bit of info in case the link goes dead. Guilty of what exactly? Again, the scale of a single or even a couple hundred million persons doing the same thing doesn't come even close to what AI models are doing.
No, having an output that deals in probability distributions doesn't make it intrinsically statistical in nature. The structure of knowledge modeled by these things is not representable on the basis of mere correlation and joint probabilities.
I disagree, read on.
Do you ever have the experience where you're talking or writing and it occurs to you that there are several ways you could proceed? You have to make a decision which way is best, and that's essentially the process they're modelling.
Yes I do.

However, when I get to the point that I don't know what to say or write I will either stop or say I don't know for sure or even proceed to learn whatever bits I am missing while the LLM is just going to pick the next set of tokens from the lower probability bucket and output what they call a "hallucination".
How much do you know about how it works? There are plenty of stochastic processes at work in the brain. Yes, neuromorphic computing distills and abstracts the mechanisms used in biology, but there are often fairly direct analogs between the two.
Does LLM have an internal monologue? Does it have awareness of itself? Does it have uninterrupted stream of consciousness? Can it combine knowledge from two totally unrelated areas into something novel like humans can? Does it have plasticity like human brain does?

Based on all that, I'd say how the brain works is still plenty different and people trying to equate it with some stochastic machine learning process are being quite simplistic and nihilistic about the whole thing.
However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site. "
SO is breaking EU GDPR law if they are not honoring user requests for data removal, because EULAs can't carve out exceptions from established laws.
Unlike language, AI doesn't really need to read a ton of code to become competent. Since programming is a task that can be quantified, you could probably just feed a foundational model some books or language standard documents and then just have it learn by doing. Give it a series of programming problems, a compiler, and a scoring system which judges how well it does.
But then that's not a LLM anymore -- it's a custom machine learning model for programming.
 
Last edited:

35below0

Commendable
Jan 3, 2024
1,145
511
1,590
SO is breaking EU GDPR law if they are not honoring user requests for data removal, because EULAs can't carve out exceptions from established laws.
That is not data removal. Not what the GDPR is for. It's for forcing legal entities and institutions to relinquish control over your personal information they have gathered with or without permission.

Posting on a forum is an act of giving information away, but not personal information. It's a different kind of data and not subject to the scope of the GDPR.

If your dentist has your date of birth somewhere on their files you can force them to remove the data. If a forum has your musings and ramblings archived, that is not personal information of any description, even if you deliberately made posts to describe yourself, your date of birth, gender, possibly made up gender and opinions on gender issues or culture wars, your social security number, sperm count or whatever else you may have posted.
The TOS for the forum will have taken care of the legality of your posts. You would have agreed to post nothing of a personal nature that could fall under the GDPR. If you violated TOS first, you would have very little claim of abuse. Trying to use GDPR to force deletion of writings you no longer want public, but have previously already agreed will be made public, is itself abuse.

tl;dr
no
 
  • Like
Reactions: slightnitpick

CmdrShepard

Prominent
Dec 18, 2023
262
217
560
That is not data removal. Not what the GDPR is for. It's for forcing legal entities and institutions to relinquish control over your personal information they have gathered with or without permission.

Posting on a forum is an act of giving information away, but not personal information. It's a different kind of data and not subject to the scope of the GDPR.

If your dentist has your date of birth somewhere on their files you can force them to remove the data. If a forum has your musings and ramblings archived, that is not personal information of any description, even if you deliberately made posts to describe yourself, your date of birth, gender, possibly made up gender and opinions on gender issues or culture wars, your social security number, sperm count or whatever else you may have posted.
The TOS for the forum will have taken care of the legality of your posts. You would have agreed to post nothing of a personal nature that could fall under the GDPR. If you violated TOS first, you would have very little claim of abuse. Trying to use GDPR to force deletion of writings you no longer want public, but have previously already agreed will be made public, is itself abuse.

tl;dr
no
If OpenAI removes personal data they are also removing the attribution to the content they "licensed" from the user. That would make their use of said content illegal since they can't prove who they licensed it from.
 

35below0

Commendable
Jan 3, 2024
1,145
511
1,590
If OpenAI removes personal data they are also removing the attribution to the content they "licensed" from the user. That would make their use of said content illegal since they can't prove who they licensed it from.
Point being the GDPR is a privacy legislation, not Intellectual Property legislation.

It cannot be used to revoke or retain control of IP.
 
  • Like
Reactions: bit_user

bit_user

Polypheme
Ambassador
Does LLM have an internal monologue? Does it have awareness of itself? Does it have uninterrupted stream of consciousness? Can it combine knowledge from two totally unrelated areas into something novel like humans can? Does it have plasticity like human brain does?

Based on all that, I'd say how the brain works is still plenty different and people trying to equate it with some stochastic machine learning process are being quite simplistic and nihilistic about the whole thing.
Nobody is claiming these LLMs are AGI. Far from it. However, what is claimed is that the way they're structured and the way they learn (i.e. back-propagation) is based on a highly abstract model of how brains learn, in real life. Not some giant super-dimensinoal probability model, or whatever the heck Palladin was talking about.

Did you know there's a flatworm with a central nervous system consisting of about 100 neurons that have a fixed configuration for its entire life? Their connectivity is genetically programmed into it. I'm not saying it has zero learning capacity (maybe it can modify the weights of those synapses, over time?) but certainly not much.

It just goes to show that just as nature has a high degree of variation in its implementation strategies, artificial neural networks can also vary quite a bit, while still qualifying as bio-mimicry.
 

MrYossu

Distinguished
Dec 15, 2013
130
4
18,585
OpenAI never did anything for me for free, not to mention their name is a biggest lie ever as neither their code nor their models are open-source.
I think the bigger lie is the "I" in the name. There's no intelligence in it at all, if anything, they should be named OpenAS, for Artificial Stupidity!
 

MrYossu

Distinguished
Dec 15, 2013
130
4
18,585
The most creative users will now leave the platform and create their own places to share their knowledge.
Sadly, that won't happen. This is only the latest in a long series of SO decisions that have been hugely unpopular, and the only attempt I know if to complete failed miserably. SO has too much traction to be toppled easily. that's one of the main problems.