News Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train...

Page 4 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
Status
Not open for further replies.
They may*be modeled on how the human brain works to learn grammar and syntax. I'm not sure they're modeled on how the human brain works to learn content.

https://cognitiveworld.com/articles/large-language-models-a-cognitive-and-neuroscience-perspective
I was making a more fundamental point than that. The notion that LLMs are "just statistics" is about as accurate as saying a wet brain is "just statistics", which is to say not at all. Even if a LLM's design takes liberties in how it approaches certain tasks, it's still neuromorphic in its fundamental nature. The basic mechanics of its connectivity, activation functions, and learning are all based on a highly-abstracted model of how biological brains work.

* And even that is still a questionable hypothesis in need of testing. And given the demands on LLMs it's doubtful that the programmers are even attempting to model human language and idea development:
https://www.sciencedirect.com/science/article/pii/S1364661323002024
To me, this is somewhat akin to someone in the early industrial age showing that a wood-fired, steam-powered automobile is infeasible and therefore horse-drawn buggies will never be made obsolete. It's fair to understand the capabilities and limitations of existing tech, but we shouldn't presume it represents the only path of development.

I think it would be nuts to argue that LLMs are the ultimate form of neuromorphic computing. It's just a particular branch of the field that showed unexpectedly promising results. We can anticipate new techniques and architectures arising, in the coming years and decades. Maybe some of those will borrow more heavily from what's understood about human cognition, but an even more exciting aspect is that they could take a somewhat or substantially different course!
 
Oh that myth. Yeah it has nothing to do with how the human brains axions and neurons actually work because neurologists aren't even sure about how they work yet.
I'm not claiming everything is known about brains, but certainly enough is known about them, in abstract, to apply similar techniques. Not to mention you'd obviously need to have a pretty good understanding to interface with them like NeuraLink is doing!


The whole "AI Training is how your brain works" is really just a reference to it using massive parallelism to construct equally massive arrays that are then used as reference points by the algorithms. First you need to identify unique constructs, for english language it's words "like", "the", "cat", "house" and so forth along with grammatical phrases like "going to". Each construct is given a unique number. Now during learning when the training algorithm hits a sequence of words, it goes to that position in the array and increments it. "The brown fox" would be Data['the','brown','fox'] and that position would be incremented. Of course that is just a three dimensional, there are tens of thousands of positions so imagine an Data[] reference with sixty thousand entries, each of them the ID of a word of phrase and the value being the number of times that combination was found in the training data. This model would essentially represent every possible combination of words and phrases in the english language. Take that previous three values ['the','brown','fox'], we could then lookup all the possible fourth values as ['the','brown','fox',X] then chose the one with the highest value, meaning it's found the most often next to those three words, say it's 'jumped' for Data['the','brown','fox','jumped']. We want the fifth value we do the same thing, lookup all entries of the fifth dimension and pick the one with the highest value to append to the return. And keep doing this until we hit a stop condition.
No, this is completely wrong. Cite a source ...if you can.

If it were just a simple statistical model of text sequences, that wouldn't explain how it could write a sonnet about some topic nobody has ever written one about, such as computer memory. It also doesn't explain how it generates answers to questions or forms logically consistent answers (when it does, which happens a lot more than a simple statistical model ever would).

All of these things happen because it has a higher-level model of language than mere word sequences. It has some basic model of concepts and complex internal state.
 
Last edited:
I'm not claiming everything is known about brains, but certainly enough is known about them, in abstract, to apply similar techniques. Not to mention you'd obviously need to have a pretty good understanding to interface with them like NeuraLink is doing!
That's like saying you need a good understanding of a CPU's ISA in order to implement a USB interface. Understanding what the brain intends to do with certain signals or potentials is a lot different than understanding how a particular brain region, or the entire brain, works. As of now we aren't replacing brain regions, we don't have the ability to, and when we do it will first be with stem-cell grown biological cells. It's going to be a long, long while before we can even begin to understand enough to think to replace small parts of the brain with something other than self-assembling tissues and cells.
To me, this is somewhat akin to someone in the early industrial age showing that a wood-fired, steam-powered automobile is infeasible and therefore horse-drawn buggies will never be made obsolete. It's fair to understand the capabilities and limitations of existing tech, but we shouldn't presume it represents the only path of development.
Sure, and that's part of the point. The only thing driving a silicon understanding of the human brain is health care needs and the occasional transhumanist multimillionaire. There's a lot more profit to be made going in a completely different direction. Even the first steam-powered automobile makers weren't trying to replicate a horse (though it's another story for the first people attempting to build a heavier than air aircraft 😀 ).
 
Last edited:
  • Like
Reactions: adamboy64
That's like saying you need a good understanding of a CPU's ISA in order to implement a USB interface.
The point I was responding to was Palladin's claim that "neurologists aren't even sure about how they (the human brains axions and neurons) work yet." You're taking my answer out of context.

Understanding what the brain intends to do with certain signals or potentials is a lot different than understanding how a particular brain region, or the entire brain, works.
It's possible to understand the basic mechanisms well enough that we can abstract the operational principles and use them to design machines or algorithms that also learn, process, and represent information in a different way. The field of artificial neural networks goes back more than 50 years!

Technology is full of examples where people have studied how nature solves a problem or performs some specific trick and then abstracted the basic principles and replicated the behavior in some machine that differs in most other respects. It needn't be a precise replica of the exact solution implemented in biology to still overcome a similar challenge in a similar way.

Sure, and that's part of the point. The only thing driving a silicon understanding of the human brain is health care needs and the occasional transhumanist multimillionaire.
No, I'm sure that the AI community is also interested in unlocking the remaining secrets of human cognition, in order to replicate and possibly improve on them.
 
Have any evidence of that actually happening?
You don't really need evidence -- just check what data Windows 10/11 collect and send to Microsoft by default then spend a couple of hours devising group policy for all elements of the operating system + browsers + office to stop that and you will hopefully understand what I am talking about.
I think that gets overstated and romanticized by Hollywood. It's a story they like to tell about themselves, in order to set themselves apart from other criminals. It turns out they're all just parasites, feeding off the rest of society. Any code they have is principally out of self-interest.
That may be so, but at least they weren't killing that proverbial goose which lays those golden eggs like the current crop of parasites does. They knew when to stop leeching in order not to antagonize the host.
Most answers on Stack Overflow are things people learned from someone or somewhere else, like the user manual, a blog, a book, etc. Good answers will generally link to some, but people frequently don't. Aren't they just as guilty as an AI model, if they regurgitate facts they learned elsewhere, without attribution?
SO rules are that you should link to source when you have it and copy the relevant bit of info in case the link goes dead. Guilty of what exactly? Again, the scale of a single or even a couple hundred million persons doing the same thing doesn't come even close to what AI models are doing.
No, having an output that deals in probability distributions doesn't make it intrinsically statistical in nature. The structure of knowledge modeled by these things is not representable on the basis of mere correlation and joint probabilities.
I disagree, read on.
Do you ever have the experience where you're talking or writing and it occurs to you that there are several ways you could proceed? You have to make a decision which way is best, and that's essentially the process they're modelling.
Yes I do.

However, when I get to the point that I don't know what to say or write I will either stop or say I don't know for sure or even proceed to learn whatever bits I am missing while the LLM is just going to pick the next set of tokens from the lower probability bucket and output what they call a "hallucination".
How much do you know about how it works? There are plenty of stochastic processes at work in the brain. Yes, neuromorphic computing distills and abstracts the mechanisms used in biology, but there are often fairly direct analogs between the two.
Does LLM have an internal monologue? Does it have awareness of itself? Does it have uninterrupted stream of consciousness? Can it combine knowledge from two totally unrelated areas into something novel like humans can? Does it have plasticity like human brain does?

Based on all that, I'd say how the brain works is still plenty different and people trying to equate it with some stochastic machine learning process are being quite simplistic and nihilistic about the whole thing.
However, Stack Overflow's Terms of Service contains a clause carving out Stack Overflow's irrevocable ownership of all content subscribers provide to the site. "
SO is breaking EU GDPR law if they are not honoring user requests for data removal, because EULAs can't carve out exceptions from established laws.
Unlike language, AI doesn't really need to read a ton of code to become competent. Since programming is a task that can be quantified, you could probably just feed a foundational model some books or language standard documents and then just have it learn by doing. Give it a series of programming problems, a compiler, and a scoring system which judges how well it does.
But then that's not a LLM anymore -- it's a custom machine learning model for programming.
 
Last edited:
SO is breaking EU GDPR law if they are not honoring user requests for data removal, because EULAs can't carve out exceptions from established laws.
That is not data removal. Not what the GDPR is for. It's for forcing legal entities and institutions to relinquish control over your personal information they have gathered with or without permission.

Posting on a forum is an act of giving information away, but not personal information. It's a different kind of data and not subject to the scope of the GDPR.

If your dentist has your date of birth somewhere on their files you can force them to remove the data. If a forum has your musings and ramblings archived, that is not personal information of any description, even if you deliberately made posts to describe yourself, your date of birth, gender, possibly made up gender and opinions on gender issues or culture wars, your social security number, sperm count or whatever else you may have posted.
The TOS for the forum will have taken care of the legality of your posts. You would have agreed to post nothing of a personal nature that could fall under the GDPR. If you violated TOS first, you would have very little claim of abuse. Trying to use GDPR to force deletion of writings you no longer want public, but have previously already agreed will be made public, is itself abuse.

tl;dr
no
 
  • Like
Reactions: slightnitpick
That is not data removal. Not what the GDPR is for. It's for forcing legal entities and institutions to relinquish control over your personal information they have gathered with or without permission.

Posting on a forum is an act of giving information away, but not personal information. It's a different kind of data and not subject to the scope of the GDPR.

If your dentist has your date of birth somewhere on their files you can force them to remove the data. If a forum has your musings and ramblings archived, that is not personal information of any description, even if you deliberately made posts to describe yourself, your date of birth, gender, possibly made up gender and opinions on gender issues or culture wars, your social security number, sperm count or whatever else you may have posted.
The TOS for the forum will have taken care of the legality of your posts. You would have agreed to post nothing of a personal nature that could fall under the GDPR. If you violated TOS first, you would have very little claim of abuse. Trying to use GDPR to force deletion of writings you no longer want public, but have previously already agreed will be made public, is itself abuse.

tl;dr
no
If OpenAI removes personal data they are also removing the attribution to the content they "licensed" from the user. That would make their use of said content illegal since they can't prove who they licensed it from.
 
If OpenAI removes personal data they are also removing the attribution to the content they "licensed" from the user. That would make their use of said content illegal since they can't prove who they licensed it from.
Point being the GDPR is a privacy legislation, not Intellectual Property legislation.

It cannot be used to revoke or retain control of IP.
 
  • Like
Reactions: bit_user
Does LLM have an internal monologue? Does it have awareness of itself? Does it have uninterrupted stream of consciousness? Can it combine knowledge from two totally unrelated areas into something novel like humans can? Does it have plasticity like human brain does?

Based on all that, I'd say how the brain works is still plenty different and people trying to equate it with some stochastic machine learning process are being quite simplistic and nihilistic about the whole thing.
Nobody is claiming these LLMs are AGI. Far from it. However, what is claimed is that the way they're structured and the way they learn (i.e. back-propagation) is based on a highly abstract model of how brains learn, in real life. Not some giant super-dimensinoal probability model, or whatever the heck Palladin was talking about.

Did you know there's a flatworm with a central nervous system consisting of about 100 neurons that have a fixed configuration for its entire life? Their connectivity is genetically programmed into it. I'm not saying it has zero learning capacity (maybe it can modify the weights of those synapses, over time?) but certainly not much.

It just goes to show that just as nature has a high degree of variation in its implementation strategies, artificial neural networks can also vary quite a bit, while still qualifying as bio-mimicry.
 
OpenAI never did anything for me for free, not to mention their name is a biggest lie ever as neither their code nor their models are open-source.
I think the bigger lie is the "I" in the name. There's no intelligence in it at all, if anything, they should be named OpenAS, for Artificial Stupidity!
 
The most creative users will now leave the platform and create their own places to share their knowledge.
Sadly, that won't happen. This is only the latest in a long series of SO decisions that have been hugely unpopular, and the only attempt I know if to complete failed miserably. SO has too much traction to be toppled easily. that's one of the main problems.
 
Nobody is claiming these LLMs are AGI. Far from it. However, what is claimed is that the way they're structured and the way they learn (i.e. back-propagation) is based on a highly abstract model of how brains learn, in real life.
It may be loosely bsed on that, but it's not a replica of a human brain. You seem to be anthropomorphizing LLM capabilities. They don't learn -- they are trained. Learning implies having free will (to choose what to learn among other things, and to initiate learning on your own) and there's nothing spontaneous nor consensual about the LLM training process.
Not some giant super-dimensinoal probability model, or whatever the heck Palladin was talking about.
That's exactly what they are, you also seem to be romanticizing them. They don't write poetry because they have a concept of what poetry is (even if they cna explain it by citing Wikipedia) -- they can only do so because they were fine-tuned on a database of poems.
 
You seem to be anthropomorphizing LLM capabilities. They don't learn -- they are trained.
In machine learning, the terms "off-line learning" and "training" are used somewhat interchangeably. The distinction drawn is between on-line learning (i.e. learn as you go) and off-line learning (i.e. training). Neural networks can indeed be used in both ways, but LLMs aren't trained online and I didn't mean to imply otherwise.

Learning implies having free will (to choose what to learn among other things, and to initiate learning on your own)
Oh, now who's anthropomorphizing?? Are you saying insects and other small invertebrates can't learn, because they lack "free will" or the ability to decide when to learn? Learning is a very basic biological function. You're overloading it with all sorts of baggage, based on how humans learn abstract knowledge.

That's exactly what they are,
They're not. If you just model statistical correlation between words, what you get out would lack any cohesive structure or narrative arc. There'd be no way they could present the same information in different styles & structures without modelling the actual concepts represented by the words and phrases. It would also be remarkably inefficient and it's not even clear how you'd manage the degree of sparsity needed to fit such a model in memory.

They don't write poetry because they have a concept of what poetry is (even if they cna explain it by citing Wikipedia) -- they can only do so because they were fine-tuned on a database of poems.
They don't simply memorize poems and stitch together different parts. They actually learn patterns of poetry, like rhyme schemes and meter. The trick that neural networks are so good at doing is finding patterns in data, and these aren't limited to just the low-level patterns, but also higher-order patterns, including what you could reasonably call concepts and the relationships between them (i.e. ontologies).
 
Last edited:
Are you saying insects and other small invertebrates can't learn, because they lack "free will" or the ability to decide when to learn?
I said no such thng, but you can't seriously compare human learning with that.
Learning is a very basic biological function. You're overloading it with all sorts of baggage, based on how humans learn abstract knowledge.
That's because you (and others doing the anthropomorphizing) are comparing humans and AI, not tapeworms and AI.
They're not. If you just model statistical correlation between words, what you get out would lack any cohesive structure or narrative arc. There'd be no way they could present the same information in different styles & structures without modelling the actual concepts represented by the words and phrases. It would also be remarkably inefficient and it's not even clear how you'd manage the degree of sparsity needed to fit such a model in memory.
They have no notion of concepts. They do have a sparse mapping of relation between tokens. They don't understand what those tokens mean. They simply don't have a frame of reference like we have.
They don't simply memorize poems and stitch together different parts.
I never said they do that -- the models that can write poems are fine-tuned using instructions to write poems. That requires curated dataset along with instructions on how poetry is written. That will make them memorize those instructions but they still won't understand what they are writing.

Problem is that human philosophy after a couple of millenia still struggles with some elementary concepts such as what it means to know something or what it means to understand something (not to mention what is consciousness) which should have been understood and well defined centuries ago.

I am afraid that if more people are willing to so eagerly abstract what defines us humans and apply the same to a machine just because it learned to sweet-talk them then we have already lost the AI war.

On a side note I just rewatched Matrix yesterday and there is this part where Morpheus says:
We have only bits and pieces of information. But what we know for certain is that in the early 21st century all of mankind was united in celebration. We marveled at our own magnificence as we gave birth to AI. A singular consciousness that spawned an entire race of machines.
Kinda sounds prophetic 25 years later, no?

As a final point I ask you this -- if you know a recipe for a meal, does your knowledge of it count if you can't actually prepare said meal yourself?
 
Last edited:
I said no such thng, but you can't seriously compare human learning with that.
Learning is a biological process harnessed by brains. Where I think you're drawing the distinction isn't actually related to learning, but rather abstractions and higher-order thought processes. In either case, learning basically works the same way at the cellular level.

They have no notion of concepts.
Of course they do, but I know it's just a waste of my time to try to convince you of something you don't want to believe.

I am afraid that if more people are willing to so eagerly abstract what defines us humans and apply the same to a machine just because it learned to sweet-talk them then we have already lost the AI war.
Without ever having seriously studied the theory behind how they're implemented, you've already made up your mind about not only current but all future AI technology?

If you only ever seek out explanations that confirm your beliefs and preconceptions, you risk finding yourself out of step with reality, at some point. Is the possibility that you're underestimating AI so threatening to you that you close your mind to all information that might seed any doubt?

On a side note I just rewatched Matrix yesterday and there is this part where Morpheus says:
We have only bits and pieces of information. But what we know for certain is that in the early 21st century all of mankind was united in celebration. We marveled at our own magnificence as we gave birth to AI. A singular consciousness that spawned an entire race of machines.
Kinda sounds prophetic 25 years later, no?
No, it's not remotely true that "all of mankind is united in celebration". Don't tell me you're enthralled by some romantic self-conception as a lone or rare dissident. Heck, even plenty of AI experts are plenty worried about the potential of AI, to the point of signing open letters and raising alarms by other means.

It's also pretty unoriginal and not so different than what we'd already seen in the Terminator franchise. In a lot of ways, I think movies like 2001: A Space Odyssey, War Games, and Terminator 1 & 2 paved the way for The Matrix, because it acquainted mainstream society with the idea of AI taking over and declaring war on humanity. If Matrix had to shoulder that burden, as well, it might've been too much.

What I appreciated about The Matrix was the notion that you could have hyper-realistic simulations via a direct brain interface and the idea that everyday life could be such a simulation. Not new ideas, either, but their rendition of that scenario was very well-executed and it impressed me to see such ideas burst forth into mainstream culture. There's a certain gratification in seeing somewhat fringe ideas you'd read about and contemplated, suddenly brought to life in such compelling fashion.

As a final point I ask you this -- if you know a recipe for a meal, does your knowledge of it count if you can't actually prepare said meal yourself?
I think so, because you can do other things with it than simply execute the recipe. You can pass it on to someone else or you might relay certain details of it, for entertainment purposes, if it's weird or exotic. It could also inform you in ways that help you with other recipes or food preparation. Heck, depending on what dish it's for, it might even influence what you order at a restaurant or where you choose to dine.
 
Last edited:
Learning is a biological process harnessed by brains. Where I think you're drawing the distinction isn't actually related to learning, but rather abstractions and higher-order thought processes. In either case, learning basically works the same way at the cellular level.
No, I am drawing distinction on proper comparisons (human .vs. "AI", not tapeworm .vs. "AI"). What current crop of "AI" is doing while being trained isn't learning -- it's rote memorization.
Of course they do, but I know it's just a waste of my time to try to convince you of something you don't want to believe.
As you already said you are a programmer I am sure you can do some digging and point me to some scientific paper published in a well respected and peer-reviewed scientific journal which proves your point?

If you can't, then what you are saying is based on your anthropomorphizing of LLMs, and the onus of proving they actually understand concepts is on you and those AI tech bros who are parroting that view to hype what they are peddling.
Without ever having seriously studied the theory behind how they're implemented, you've already made up your mind about not only current but all future AI technology?
Creating an AI is an attempt to create a perfect human servant -- one who is all-powerful, all-knowing, and (the creators hope so) benevolent. I read enough SF in my life to know where that leads.
If you only ever seek out explanations that confirm your beliefs and preconceptions, you risk finding yourself out of step with reality, at some point.
Thanks, but I can still tell wishful thinking of the unwashed masses and AI tech bros' shilling from reality.
Is the possibility that you're underestimating AI so threatening to you that you close your mind to all information that might seed any doubt?
I am not underestimating it at all. I've read how it works, and experimented with local models a lot. Just today I asked llama3 to translate some simple short bits of text from English to German and the results were wildly inaccurate. In comparison, Google Translate at least used German words with correct meaning.
No, it's not remotely true that "all of mankind is united in celebration". Don't tell me you're enthralled by some romantic self-conception as a lone or rare dissident.
Not at all, but you are reading into "all of mankind" way too literally when it was poetic quote from the movie.
Heck, even plenty of AI experts are plenty worried about the potential of AI, to the point of signing open letters and raising alarms by other means.
Yes but there's still a large majority of people who have wool pulled over their eyes by the tech companies.
It's also pretty unoriginal and not so different than what we'd already seen in the Terminator franchise. In a lot of ways, I think movies like 2001: A Space Odyssey, War Games, and Terminator 1 & 2 paved the way for The Matrix, because it acquainted mainstream society with the idea of AI taking over and declaring war on humanity. If Matrix had to shoulder that burden, as well, it might've been too much.
Before all those movies there was a 1966 novel Colossus, and even before that, way back in 1909, a book called The Machine Stops -- apparently even people who lived a century ago and who never saw a computer in their life or knew about the concept were able to imagine how human dependence on machines might be their undoing. Also, 1984 predicted a lot of stuff that's going on (Ministry of Peace / Defense waging wars, Ministry of Information censoring the same, history revisionism, TVs that watch you, etc).
I think so, because you can do other things with it than simply execute the recipe. You can pass it on to someone else or you might relay certain details of it, for entertainment purposes, if it's weird or exotic. It could also inform you in ways that help you with other recipes or food preparation. Heck, depending on what dish it's for, it might even influence what you order at a restaurant or where you choose to dine.
What I was trying to say is that there is a huge step between theoretical and practical knowledge. Theoretical knowledge is based on faith -- you literally have to trust others that they aren't lying to you. Practical knowledge on the other hand is sometimes very dangerous to acquire directly -- for example you don't want to learn that a bullet can kill you by actually being shot and killed. The practical knowledge is further divided into observable knowledge (seeing someone get killed by a bullet), and personal experience (burning your fingers on a hot stove). The AI is currently limited to theoretical knowledge and as such it is susceptible to manipulation. Heck, current training datasets already have pro-western cultural and moral bias -- something that's limited to like 1/8th of the Earth population.

I hope I am wrong, but the cynic in me says nothing good will come out of it.
 
No, I am drawing distinction on proper comparisons (human .vs. "AI", not tapeworm .vs. "AI"). What current crop of "AI" is doing while being trained isn't learning -- it's rote memorization.
No, it actually must learn higher-order patterns and representations, because the model isn't nearly big enough to hold all of the training data by simple memorization. The point of their training is that (assuming you can't cheat by memorizing) you can't predict what someone is going to say without a degree of understanding of what they're saying. So, by scoring how good it is at predicting the next word, they're rating how well it's learning things like the subject matter, context, style, and the perspective of the author or speaker, for the texts within the training dataset.

As you already said you are a programmer I am sure you can do some digging and point me to some scientific paper published in a well respected and peer-reviewed scientific journal which proves your point?
That's like expecting someone who has taken an undergrad course in particle physics later to provide a paper reference to convince someone else of basic quantum dynamics principles. I think a paper reference isn't what you really want. You'd be better served by looking at a good introductory text on modern AI theory & practice. I'd just search for such a thing on Amazon, same as you can do.

Creating an AI is an attempt to create a perfect human servant
It's a tool that's better at certain aspects of information processing than classical techniques (and worse in others). We write programs to solve problems and AI is interesting because it's capable of solving certain types of problems that have stymied even the best programmers since the advent of computer science.

You might say LLMs are a sledgehammer for some of the things people are trying to use them for and I wouldn't disagree. That's not the same as saying they're one specific thing or that they don't have any good application.

Just today I asked llama3 to translate some simple short bits of text from English to German and the results were wildly inaccurate. In comparison, Google Translate at least used German words with correct meaning.
Well, was that model specifically trained to do translation between those languages?

And you think Google Translate isn't using AI?

That reminds me of a video I saw long ago, spanning most of these topics. It's somewhat dated, but I think that's okay, considering the level of the discussion.

I think you'd probably find it worth watching, but I don't blame you if you don't. I typically don't watch videos people cite in forum posts and generally tend to avoid Youtube, myself.
 
Last edited:
Status
Not open for further replies.