News Google Bard Plagiarized Our Article, Then Apologized When Caught

Admin · Mar 22, 2023

When I asked Google's bot to compare two recent CPUs, it took data directly from a Tom's Hardware article without attribution.

Google Bard Plagiarized Our Article, Then Apologized When Caught : Read more

Gam3r01 · Mar 22, 2023

Shocker.

Baywoof · Mar 22, 2023

AI will ruin social media, search sites, forums such as this, along with many other technoligies and content applications. Some say AI is the beginning of the end of human endeavors - Skynet anyone?

derekullo · Mar 22, 2023

Tomshardware has been assimilated!

Show our AI Overlords some respect!

Endymio · Mar 22, 2023

To correct the article, if they reworded your results, then it wasn't plagiarism. You can't copyright facts and data -- only a specific expression of them.

Endymio · Mar 22, 2023

Baywoof said:
AI will ruin social media, search sites, forums such as this, along with many other technoligies and content applications.

Keep your eye on that horseless carriage thing, too. It's going to wind up killing a lot of people, mark my words.

bigdragon · Mar 22, 2023

Very unfortunate behavior from Bard here. It's good that it revealed its source, but the fact that you had to ask means that Bard has zero hesitation to lift this site -- or any other site's -- content without attribution. If Bard didn't explicitly do the testing itself then it has zero right to claim "we" or "I" or any possession of performance results.

AI like Siri, Alexa, Cortana, and similar answer your queries by saying they found relevant information and pointing you towards sources. Bard, ChatGPT, DallE, Mid Journey, Stable Diffusion, and the other AIs that are part of the current crop answer your queries by providing relevant information as if it comes from themselves. Sources are obfuscated or only available upon request. I don't feel icky when I ask Alexa something, but I do when looking at ChatGPT or Stable Diffusion. The older AIs seem intended to help people on both sides of a query connect while the current AIs seem intended to replace people by presenting information as their own. We need to get back to helping people.

Gam3r01 · Mar 22, 2023

Endymio said:
To correct the article, if they reworded your results, then it wasn't plagiarism. You can't copyright facts and data -- only a specific expression of them.

You cant plagiarize facts, that is correct.
However, you absolutely can plagiarize data. Independent in house testing, in any field, is your own work. Anyone using said work without permission or credit is 100% plagiarism.
You can compile data and make your own research, but reference specific data points and presenting the exact same data without any modification is not "fair use".
Plagiarism and copyright are two very different things as well.

Dantte · Mar 22, 2023

Endymio said:
To correct the article, if they reworded your results, then it wasn't plagiarism. You can't copyright facts and data -- only a specific expression of them.

True that facts cannot be copyright, but data/observations can be. Example: "the CPUs had a 12% difference" is not a fact, this is a set of data observed in testing by Tom's Hardware, a different CPU of the same model may observe a different data result, etc... There for the work/effort put forward by Tom's to acquire this data set is proprietary to their specific equipment and there for can be copyrighted!

Avro Arrow · Mar 22, 2023

Two executives at Google:

E1: How do we avoid getting nailed for plagiarism?
E2: Blame the Bard bot.

shady28 · Mar 22, 2023

Baywoof said:
AI will ruin social media, search sites, forums such as this, along with many other technoligies and content applications. Some say AI is the beginning of the end of human endeavors - Skynet anyone?

Actually if you play with AI Chat enough, you realize it does nothing but aggregate information that is out on the internet.

What that means is, the subjective (and some objective) answers it provides are not really the AI doing any 'analysis' of those comments, it is just aggregating to see what the 'zeitgeist' is i.e. the commonly accepted answer.

So if 65% of people thought the world was flat, ChatGPT would likely tell you the world was flat.

The reason this is relevant to what you're saying, is that ChatGPT will never be able to make up anything on its own. If a new product comes out, and no one reviews it or talks about it, my bet would be that ChatGPT would go with the only thing it has available - pamphlets and media from the product maker.

ChatGPT's entire mode of operation is to steal media, and in some cases lie. It most definitely is not an arbiter of facts and truth. That's actually the real danger, a lot of people will think it is telling them facts and truth when it is really just regurgitating whatever it found to be predominant on the internet.

waltc3 · Mar 22, 2023

Completely unsurprising. AI does what AI is programmed to do and not one thing more. If it's programmed to plagiarize it will, and if it's programmed to keep that under wraps unless it's asked about it, it will do that. If it's programmed to lie about it, it will. It doesn't care at all. AI does not think, does not analyze, is not sentient--it does what it is programmed to do, just like any other pedestrian computer program. That's it. If you know what artificial flowers are, think about why it's termed "artificial" intelligence (eg, fake, not real, mimicked.)

Ahumado · Mar 22, 2023

I don't see this as a problem. The world is filled with parrots and that includes tech sites. Imagine a world where I did not have to read...iconic, what just happened, how to watch, aging piece of hardware or software , etc

Endymio · Mar 22, 2023

Gam3r01 said:
You cant plagiarize facts, that is correct.
However, you absolutely can plagiarize data. Independent in house testing, in any field, is your own work. Anyone using said work without permission or credit is 100% plagiarism.

I understand your point of view, but it is 100% incorrect from a legal standpoint. From an ethical one -- your opinion is as valid as anyone else's, as there are no hard and fast definitions in that arena. But legally, this isn't plagiarism. Data is a collection of facts, and the only way such collections are copyrightable is if there is "creative expression in their compilation". Toms, however -- as do most reviewers -- goes to great lengths to assure us otherwise: that their results are based on firm, unbiased, repeatable methologies, and are thus no more copyrightable than the fact that the earth revolves around the sun in 365.2425 days ... a fact that I've never measured directly myself, but I am entitled to freely use and quote without limit.

Endymio · Mar 22, 2023

Dantte said:
True that facts cannot be copyright, but data/observations can be. Example: "the CPUs had a 12% difference" is not a fact, this is a set of data observed in testing

Various definitions of "data" from the web:

"Data is a collection of facts and figures that can be in any form—numerical or non-numerical "

"Data is a collection of facts, such as numbers, words, measurements, observations or just descriptions of things "

"Data: A fact or set of facts that have been gathered about an object, idea, place, person etc ..."

Giroro · Mar 22, 2023

Baywoof said:
AI will ruin social media, search sites, forums such as this, along with many other technoligies and content applications. Some say AI is the beginning of the end of human endeavors - Skynet anyone?

AI already ruined search sites awhile ago. Have you not noticed that the top search results for most any question is usually an AI generated blog post that just repeatedly rephrases the same keywords into various rewordings of questions, followed by a vague paragraph. There's 1000 highly ranked SEO sites with the same non-cited pseudo information for every topic. They never actually answer the questions or even saying anything useful that you didn't already know. They're just lazy nobodies playing the meta to milk easy clicks/money from voice search and that section of Google where they post that useless FAQ above all the real search results. Google makes the meta, so it's still their fault.

Where things will get interesting is when Google starts training it's AI with information and wording that was generated with an AI. Do that long enough and things will lose fidelity in some really interesting and system-breaking ways.

10tacle · Mar 22, 2023

Endymio said:
But legally, this isn't plagiarism. Data is a collection of facts, and the only way such collections are copyrightable is if there is "creative expression in their compilation". Toms, however -- as do most reviewers --

The issue here is that every hardware tech site has their own testing methodologies, their own hardware component recipe, and hence their own benchmarks and results. You will not see two tech sites having the EXACT same test results in given hardware benchmarks for this reason. These are RESULTS, not scientific FACTS. And for this reason, if you do not prove that you yourself created your own benchmarks that just so happen to exactly match that of a tech site that is challenging you, then you may very well be in legal trouble for plagiarism. Especially if you are making money off your platform outlet that others are reading for information. And yes I saw your fact rebuttal above to Dantte. All I'll say about that is it will be up to a court of law to decide if "facts" are proprietary to a company's research they spent time and money on yet got no credit for it. I know where I'd have my money on a jury pool of the testing website vs. the non-tester who just copied data unaccredited from the work of others.

In any event, this problem is going to get worse.

JamesJones44 · Mar 22, 2023

shady28 said:
The reason this is relevant to what you're saying, is that ChatGPT will never be able to make up anything on its own

While current iterations of ChatpGPT do not create random responses to an queries, there is no reason it couldn't be made to. AI can generate unique responses based on a data set, Google Art Generator for example can generate an image from a confluence of data. In the same way, LLM could be created to generate random unique responses based on it's data sets (aka made up). There is no reason why an implementor of ChatGPT couldn't extend the model to do just this.

Gam3r01 · Mar 22, 2023

Endymio said:
I understand your point of view, but it is 100% incorrect from a legal standpoint. From an ethical one -- your opinion is as valid as anyone else's, as there are no hard and fast definitions in that arena. But legally, this isn't plagiarism. Data is a collection of facts, and the only way such collections are copyrightable is if there is "creative expression in their compilation". Toms, however -- as do most reviewers -- goes to great lengths to assure us otherwise: that their results are based on firm, unbiased, repeatable methologies, and are thus no more copyrightable than the fact that the earth revolves around the sun in 365.2425 days ... a fact that I've never measured directly myself, but I am entitled to freely use and quote without limit.

When did this become a question about the legality of use?
Plagiarism isn't illegal (US), or subject to any specific laws, therefore there is no legal definition to plagiarism anyway.
Does not change the fact that using the exact data developed elsewhere is 100% plagiarism by definition.
Your earth/sun example is also flawed as well. You can quote and share information openly, but you are not passing it off as your own (You arent claiming that you did the testing/research yourself and this is what you came up with). The AI response in the article IS trying to play off the data as their (googles) own.

Alvar "Miles" Udell · Mar 22, 2023

You mean a program that's stated as being in BETA has bugs? Who would have believed...

Also, I would say TomsHardware is not the end all expert on testing. Taking the 13900K vs the 7950X3D as an example, using game tests at 1920x1080:

TomsHardware: AMD +11.6%
Techpowerup: AMD -0.02%
TechSpot: AMD +4.4%
Guru3D: AMD +1%

So with these four sources TomsHardware is a very strong outlier suggesting the games chosen more strongly favor the 3D cache and do not necessarily represent real world performance.

TJ Hooker · Mar 22, 2023

Endymio said:
I understand your point of view, but it is 100% incorrect from a legal standpoint. From an ethical one -- your opinion is as valid as anyone else's, as there are no hard and fast definitions in that arena. But legally, this isn't plagiarism. Data is a collection of facts, and the only way such collections are copyrightable is if there is "creative expression in their compilation". Toms, however -- as do most reviewers -- goes to great lengths to assure us otherwise: that their results are based on firm, unbiased, repeatable methologies, and are thus no more copyrightable than the fact that the earth revolves around the sun in 365.2425 days ... a fact that I've never measured directly myself, but I am entitled to freely use and quote without limit.

I think you're conflating plagiarism and copyright infringement.

USAFRet · Mar 22, 2023

Several places around the interwebs have instituted a policy of "No AI"

Several sections of StackExchange, for instance. ex: Law.se
https://law.meta.stackexchange.com/questions/1701/temporary-policy-chatgpt-is-banned

So far, the AI bots are not coming to these places and posting answers all by themselves.
A human asks the AI to generate some text, and they post that as their own work.
Sometimes its even sort of correct.

Obviously, it is NOT their own work.

Personally, I fully agree with that policy.

We've even seen it in here, to the detriment of the regular users.

Endymio · Mar 22, 2023

10tacle said:
The issue here is that every hardware tech site has their own testing methodologies, their own hardware component recipe, and hence their own benchmarks and results.

From an IP law perspective, methodology and procedure are applicable to patents only; copyrights are concerned with expression only. If Toms creatively expressed all their benchmark results in a 36-point magenta font, preceded by rainbow emojies, then that specific expression could potentially by copyrightable. But the naked figure "12%"? No.

Gam3r01 said:
When did this become a question about the legality of use?
Plagiarism isn't illegal (US) .... You can quote and share information openly, but you are not passing it off as your own (You arent claiming that you did the testing/research yourself and this is what you came up with). The AI response in the article IS trying to play off the data as their (googles) own.

Point one is fair, which is why I expressed the distinction between legality and morality (plagiarism is illegal when infringing). However, your second point is quite strained. The entire world understands that Google's Chatbot is no different than Google itself: it pulls all its results from the websites of others. Show me a person who truly believes an AI chatbot is somehow performing its own hardware testing and benchmarking, and I'll show you a person who should be forcefully sterilized to prevent their genetic flaws from propagating.

Steve Nord_ · Mar 22, 2023

There are lots of apologists here for plagiarism! I am copying some brinksmanship rather as I am sad the benchmark lab and editor neglected to ask Bard if it could perform ALL forms of plagiarism. Funding, support, facilities, go gold.

Geef · Mar 22, 2023

Endymio said:
The entire world understands that Google's Chatbot is no different than Google itself: it pulls all its results from the websites of others.

Exactly. The AI is what the guys who programmed it put into the code. Unfortunately many things like the evil word here ~~politics~~ will always be based on what they programmed. If Google doesn't like it, neither does the AI.

News Google Bard Plagiarized Our Article, Then Apologized When Caught

Administrator

Titan

Splendid

Reputable

Reputable

Distinguished

Titan

Distinguished

Splendid

Distinguished

Reputable

Distinguished

Reputable

Reputable

Splendid

Splendid

Reputable

Titan

Admirable

Titan

Titan

Reputable

Prominent

Distinguished

Share this page