You people still don't get the difference between human and "AI", do you?
I'll say it once more -- it's the scale.
No human developer in their lifetime can be exposed to the same amount of copyrighted code as the AI models were during training.
Therefore, the risk of a human developer introducing copyrighted code is several orders of magnitude lower, especially given that human developers working on such projects will make a conscious effort to avoid it because, unlike LLM, they understand the concept of copyrighted code while LLM just spits out statistical probabilities often without the ability to reference sources.
Scale doesn't matter. What matters is if you were to judge one code source on copyright and license issues, you should judge all code sources on copyright and license issues. Just as they have plagiarism detectors for university papers, we should have them for all code submitted to all projects.
You argue that the scale of exposure matters, but I say it doesn't. If someone lived on a desert island and never saw a single YouTube video, and then we gave them the camera and had them create a youtube short. If the work they generated collided with the copyright of somebody else's work, would we give them a pass? No, we wouldn't. They would still be guilty of copyright violations. In copyright law, intent and lack of exposure don't matter. All that matters is that the creator substantially replicated the original work.
However, I believe copyright violations are inherent in code. There is no way we can avoid them. We have a number of tools that define good code, design patterns, and code idioms, and all of these restrictions on what makes good code narrow the range of what can be generated.
The above brings us back to the need for plagiarism detectors. How can you know if you violate copyright if you can't test it?
Also, I suggest looking at the claims of plagiarizing in the music, publishing, and academic paper industry. Numerous plagiarism, some intentional, some accidental, some editing mistakes, forgetting to put in quotations and bibliographical references. Again I think many of these plagiarism claims were unavoidable because of limited way of expressing music, or ideas using a highly restricted set of tools.
In over 20 years of programming, I've always assumed my code violates copyright on somebody else's code somewhere just because, as I said above, the limited expressiveness of coding it's highly likely there will be a copyright violation.
When I solve a problem using my Junior programmers, ChatGPT and Copilot, I give them the specs and descriptions of what needs to be done the same way I would humans. I shape their code by telling them what they did wrong and almost never change their code directly. Just like other Junior programmers, I need to keep a careful eye on the code and make sure it generates the right test cases, but at the end of the day, I get code that looks like something I would've generated myself.
I think another reason why people are getting cranky is because LLM systems creating code is another example of how humans are not special. History is full of examples of when some other creature is doing something we thought was uniquely human (tool use, language use), and somebody gets their knickers in a twist. I suggest giving up on the idea that humans are any center of any universe. We are just a collection of cells bumbling around, occasionally doing something good.
The need for humans in software development will drop significantly in the next 20 years. Start planning your exit path now. At this time, it looks like there will be a core of people generating new ideas, expressions, etc., to be fed into AI systems and become available to everybody. However, since the future will arrive too soon and be in the wrong order, we will have AI systems that generate new ideas independently and don't need humans to solve problems.