If I am not mistaken, the established law is that you have and retain copyright over what you created and you aren't automatically giving that copyright up by publishing.
IANAL, but I believe that for web sites with user-generated content, the default law most commonly practised is that when no other ULA is in place, the user-generated content is owned by
both the poster and the web site it was posted on. Each owns their own copy.
In other words, they can do whatever they want with their own copy of what you have posted.
If the content is under a Creative Commons license however, ... then any copy must retain attribution to the original poster.
Here is where I can see a legal conflict: feeding content into a LLM strips away the attribution.
Content can be fed into a LLM without issue only if the content is owned by them, is explicitly Public Domain or if they somehow found a novel way to weave attribution into the LLM. (which I find unlikely)
You'd often see the argument "But it is
learning. Humans are allowed to learn stuff freely. This is just machines doing it".
That argument does not hold up:
First, humans and LLMs learn differently. Humans create mental models of what they ingest and feed
those interpretations into their neural networks. LLMs feed the raw text. It is actually more difficult for humans to learn raw text.
Second, even if humans do learn raw text, it is still
plagiarism to publish copyrighted text without attribution, even if you recite it from memory. LLMs will do that given the right parameters: but chatbots typically have measures in place to detect when the LLM does it, and stop the output from happening.