News Google Bard Plagiarized Our Article, Then Apologized When Caught

Page 2 - Seeking answers? Join the Tom's Hardware community: where nearly two million members share solutions and discuss the latest tech.
If Toms creatively expressed all their benchmark results in a 36-point magenta font, preceded by rainbow emojies, then that specific expression could potentially by copyrightable. But the naked figure "12%"? No.

Well again, that would be up to a lawsuit and court to decide when said defendant is under accusations of "we" as if they did the research. And if it goes to a jury, Google and other Big Tech oligarch companies don't have a lot of sympathetic citizens out there. Hence, where I'd put my money on a bet of who would win. And let's be reminded again that Google quickly backed off when challenged. And for the argument that Google pulls information from the internet, yes it does of course. But it is a linking aggregate in that capacity and only provides the links to the sources. This Bard issue here is an entirely different issue. One day we'll find out how far this proprietary data information stealing from AI bots created by the likes of Google will go. That's for sure.
 
Last edited:

abufrejoval

Reputable
Jun 19, 2020
589
425
5,260
Works as designed, but the design predicates need to be changed and the responsible knuckles duly rapped raw until that's done.

It's been annoying me for ages with sites like https://elchapuzasinformatico.com/, which most of the time seem to do little more than translating. Yet they also do occasionally seem to actually mix that with truly orignal content, so journalism isn't entirely dead there.

Here proper citation is a must and pretty much prescribed already via explainable AI mandates.

Stealing AIs may have few moral compunctions for lack of pay and super-ego, but their designers need to be pinched hard until they relent.
 

randyh121

Prominent
Jan 3, 2023
257
53
770
Why is this an article?
That's like saying you are mad you googled something, and toms hardware is on the google search result. That is what AI is, combining search results on the web to something easy to access.
This is pure click-ragebait and I suggest you delete the article, it reads like a Buzzfeed clickbait article from 2015.
 

randyh121

Prominent
Jan 3, 2023
257
53
770
You cant plagiarize facts, that is correct.
However, you absolutely can plagiarize data. Independent in house testing, in any field, is your own work. Anyone using said work without permission or credit is 100% plagiarism.
You can compile data and make your own research, but reference specific data points and presenting the exact same data without any modification is not "fair use".
Plagiarism and copyright are two very different things as well.
No, you can't. You guys posted it on the web for people to see. AI is doing just that, seeing it. If you want to claim it was plagiarized, then make the data private.
 

USAFRet

Titan
Moderator
No, you can't. You guys posted it on the web for people to see. AI is doing just that, seeing it. If you want to claim it was plagiarized, then make the data private.
No, this is different.

If you find something online that someone else wrote, and link to it, fine.
Or even copy/paste. But give attribution.

Don't claim it as your own work.
 

Brian28

Distinguished
Jan 28, 2016
49
20
18,565
...relevant to what you're saying, is that ChatGPT will never be able to make up anything on its own. ...

Most of the current language models have a hallucination problem, where they DO make up stuff on their own. (And it ends up being completely wrong.)
 

PlaneInTheSky

Commendable
BANNED
Oct 3, 2022
556
762
1,760
What Microsoft (Bing/ChatGPT) and Alphabet (Google/Bard) are doing, is 100% plagiarism, and they need to be sued for this.

BUT, Tomshardware and its sister sites AnandTech and PCGamer have been using Google ads and Google Analytics to monetize the sites.

Tomshardware .com uses:
-Google Analytics (ad data)
-Google Analytics Event Tracking (tracker)
-Google Tag Manager (software)
-Google Hosted Libraries (software)
-Google Hosted jQuery (software)
-Google Publisher Tag (ads)

Everyone knew Google just plagiarized all its data long before "AI", there have been many lawsuits from news sites that said Google flat out steals articles.

https://www.theguardian.com/world/2014/dec/16/google-news-spain-publishing-fees-internet

Tomshardware and PCGamer now complaining about Google, when they allowed Google to monetize the platform, rings very hollow. It's like complaining you got robbed after starting a relationship with a well known criminal. It's like people buying from Steam, a DRM platform, who then go on to complain about DRM.

Don't sleep with a criminal if you don't want to get robbed.
 
Last edited:

sstanic

Distinguished
Aug 6, 2016
67
28
18,560
These betas serve one main purpose: to already fight against future US/EU regulation and legislation. It has nothing to do with testing the software/hardware, or general population's response, 99% of the effort is towards future regulation.

Whether it's plagiarism and/or copyright isssue is not that relevant, as everybody important in that future legal battle is 100% certain that it's not Google's/Microsoft's IP that is shown on the screen. Both Google and Microsoft were perfectly aware of the incoming storm about citation etc, and are 40 steps ahead already.

In this particular case "we" is by far the biggest issue, and Google's experts must've weighed for years what to write there. If it's "I" - most people stop reading because noone cares what a chatbot says about their medical issue. However if it's "we" that implies people are behind the answer, a doctor perhaps, and any user of any level absolutely regards it differently. It's legally different as well. A very difficult additional puzzle is the fact that "we" was and still is used for sociopathic manipulation on all levels, from kids to kids all the way to superpower to superpower, or tyrants to their populus.

Additional problem for both Google and Microsoft is that they can't yet know fully well what's in their own business interest. They may well profit more from one "feature" in the short term, but if it changes to its opposite it might bring way more profit in the long term. That's probably what's bothering them the most. If they fight for one form of future regulation, and win for example, it can easily turn out to be wrong in the long term.

That's why they have invested huge efforts into exploring the ethical, moral and legal implications as many steps ahead as they could. That's the only way to fight future legislation, lobby more effectively and profit more. Of course it'll all be packaged as "we care for humanity's future with AI so much that we invested huge efforts into exploring the ethics side of the matter".

Now when I think about it, even the US legal system with precedents might show it's ugly face in the future in this legal battle. EU is different, but obviously has its flaws as well.
 

BX4096

Reputable
Aug 9, 2020
167
313
4,960
There are lots of apologists here for plagiarism! I am copying some brinksmanship rather as I am sad the benchmark lab and editor neglected to ask Bard if it could perform ALL forms of plagiarism. Funding, support, facilities, go gold.
And you don't seem to realize how modern "AI", that is, entirely mindless machine training via large language models works. While the programming behind the model is an enormous feat, the actual "AI" part is thoroughly dumb and involves no intelligence or judgment whatsoever.

It's inane to accuse a limited alpha/beta language model of "plagiarism" for several reasons. The most obvious one, its entire design is based on plagiarism. It takes all the information you can feed into it and produces jumbled variations on the theme. That's basically all it does, since, again, it has absolutely no mind or understanding of its own.

Secondly, Google Bard is a limited-access experimental software in its alpha/beta state. I'm sure its creators are more focused on working out the kinks and achieving production of legible and valuable output than working on such finishing touches like proper attribution for an experimental model that may well be scrapped and replaced with something else in the near future.

Lastly, if there's one thing that these AI bots convincingly proved is "plagiarism" is the default state of almost every single human on the planet. Do you attribute every single fact that you quote? No one possibly can. To quote H. L. Mencken from more than a hundred years ago:

"The average man never really thinks from end to end of his life. The mental activity of such people is only a mouthing of cliches. What they mistake for thought is simply a repetition of what they have heard. My guess is that well over 80 percent of the human race goes through life without having a single original thought."

This quote is basically the core design principle behind these chatbots in a nutshell.
 
  • Like
Reactions: Steve Nord_

eldakka1

Honorable
Dec 24, 2018
26
21
10,535
Why is this an article?
That's like saying you are mad you googled something, and toms hardware is on the google search result. That is what AI is, combining search results on the web to something easy to access.
This is pure click-ragebait and I suggest you delete the article, it reads like a Buzzfeed clickbait article from 2015.

No, those two things are completely different.

A search result will have a snippet (maybe that sentence about 12% faster) with an explicit statement saying "this came from Tom's Harware" and a hyperlink to that article on Tom's Hardware.

Bard presented that information as its own work. It did not in any way, shape or form indicate where it got the information until after it was caught out that it had not produced that data. It had to be prodded to admit it wasn't its (Google's) own work and that it scraped it from Tom's Hardware.

If it had of said up front something more like: "Tom's Hardware reports that for gaming the 7950X3D is 12% faster than the i9-1300k" then that would have been more acceptable. In fact, it should have cited several results "Tom's Hardware reports that for gaming it is 12% faster, <insert source here> found it as 3% faster" would have been even better. Providing only one set of results - and not even citing that source - is not only unethical plagiarism, it is useless information only providing one source's results.
 

wwenze1

Reputable
Mar 22, 2020
26
13
4,535
This gives me the same feeling of how there are PC game modders and then there are "modpackers" and it is the latter that keeps getting donations because most people download modpacks than individual mods.

Agree with a previous post, such things are dangerous to content creation since the creator themselves are not rewarded while the thieves are. Then again isn't present-day capitalism the same.
 

baboma

Respectable
Nov 3, 2022
284
338
2,070
Why is this an article?

It's content about tech. But mostly because it gets clicks. Reason enough.

That's like saying you are mad you googled something, and toms hardware is on the google search result. That is what AI is, combining search results on the web to something easy to access.

The comparison isn't apt, but neither is the article's. Plagiarism implies intent, and from numerous reports, Bard is a work-in-progress prone to erratic responses, much like Bing Chat when first released. It's not even in wide beta. I don't see any more reports of a "crazy Sydney" in Bing, so apparently "crazy" has been excised. Bard will go through the same process.

This is pure click-ragebait and I suggest you delete the article

It's not ragebait, but it is clickbait. And it works.

The number of responses from this low-brow piece dwarf every other THW blog posts on the front page, so "engagement" is high. The author is probably happy about it, putting out very little effort for so much gain.

It's the same type of piece as the recent "Intel vs AMD CPU...FIGHT!" piece. No new info or insight there, but it gets the fanboys engaged and arguing. And driving clicks. Let's face it, there's not much excitement in PC hardware this time of year.

Note that I'm not blaming/castigating THW for clickbait articles. This is how the online click economy works. Good, insightful writings aren't rewarded. Low-brow, controversial rants are how you get clicks. The motivations are skewed, but this is what we have.

But the author is right in one aspect. In a prior piece, he stated that he has an "existential fear" of AI bots like ChatGPT taking over his writing job. I would emphatically agree that AI will take over the low-hanging fruits like these clickbait pieces. They're easy to write, and AI can crank these out in seconds. Hopefully this is not all that the author is capable of.
 

itsmedatguy

Distinguished
Aug 25, 2016
85
8
18,665
Actually if you play with AI Chat enough, you realize it does nothing but aggregate information that is out on the internet.

What that means is, the subjective (and some objective) answers it provides are not really the AI doing any 'analysis' of those comments, it is just aggregating to see what the 'zeitgeist' is i.e. the commonly accepted answer.

So if 65% of people thought the world was flat, ChatGPT would likely tell you the world was flat.

The reason this is relevant to what you're saying, is that ChatGPT will never be able to make up anything on its own. If a new product comes out, and no one reviews it or talks about it, my bet would be that ChatGPT would go with the only thing it has available - pamphlets and media from the product maker.

ChatGPT's entire mode of operation is to steal media, and in some cases lie. It most definitely is not an arbiter of facts and truth. That's actually the real danger, a lot of people will think it is telling them facts and truth when it is really just regurgitating whatever it found to be predominant on the internet.

Yeah I think you pretty much summed it all up. I'm very split on it, as it sits currently it's such a useful tool. When I think back over the past 20 years, so much of what I've learned and know how to do has been from sifting through forum posts and knowledgeable communities. And frankly it's a slog, it takes a long time to find that obscure post by some guy from 8 years ago explaining the solution to some problem you're having in Unity or whatever. Now I can logon to ChatGPT, it's already aggregated it and really accelerates my learning when I need it. Obviously the backside of that is the death of message boards and online communities, which in turn will diminish how effective these chat bot AI's are anyway. It's all very slippery. Since I'm both a realist and a pessimist, basically I expect this is not going to go well.
 
  • Like
Reactions: shady28

Steve Nord_

Commendable
Nov 7, 2022
88
15
1,535
No, the LLM has a few billion loose p-dendrites to make vaporous promises if not the recursion complexity to feint at them; if it has the Splunk and remaining data-iste to be told to STFU about its 'we,' that's its social due. Make Sergei want to have a breaker pulled, if it's that or more (an exponent in time) nonrenewable watts c.f. the cloud.

It does an ok job not juddering as All The Advertisers, right? There's a Markov Model plus responsive if not yet responsible there, and it should be feeling bad AND resource starved sucking wind for passable accuracy, attribution, and number.

Through some efforts (mostly not Prosperity Gospel <title here> or sui generis entitlement,) average people don't go representing U. Michigan and Sephardic Montessori Choral Students and Pew Charitable Trusts at a series of restaurants. Just 2 senators. Also every US tax form filling company (not filers, just Quicken and 140ish more, plus maybe more privatized filing surds.) Even adding TEMU farmers if that's their like misfortune, it's not 'most everyone.'
 

Steve Nord_

Commendable
Nov 7, 2022
88
15
1,535
This gives me the same feeling of how there are PC game modders and then there are "modpackers" and it is the latter that keeps getting donations because most people download modpacks than individual mods.

Agree with a previous post, such things are dangerous to content creation since the creator themselves are not rewarded while the thieves are. Then again isn't present-day capitalism the same.

Eh, calling payment processors who switch on tipping for businesses based on -never doing tippable things- capitalists isn't crediting book 4 of Capitalism, or revolutionaries in 1-3 much.

Off to get Stable Diffusion to make RockChip ASM charts...and a year of State House legislation with fundraising...
 

Steve Nord_

Commendable
Nov 7, 2022
88
15
1,535
You're very confident that your aggregation hustle is going to eclipse [checks notes] all other Unity use. Stay humble like that.
 

jkflipflop98

Distinguished
It's not that the evil AI is coping your work and passing it off as your own. . .

It's simply repackaging the data it finds. In this case your own sentence is "In our testing". . . so the algo copied that over as part of it's speech. If you reworded the original article to say "When testing here at Tomshardware.com we found that. . . " - Bard would have outputted this in it's response.

Maybe the answer is as easy as patting yourself on the back even more than you do now.
 

Giroro

Splendid
I was playing with Bard yesterday. I find it really interesting that if you ask it about particular videos or channels it will say "I've watched that video, it's about.... And the host had the opinion...." and then list off some extremely wrong information that it aggregated from what it thinks is a consensus opinion on what it assumes the video is about."

But the thing is, Bard will tell you it's not capable of reading video transcripts, or descriptions or titles. So it's not even close to what the video is about. I think it's just making some vauge aggregated determination what any YouTube video anywhere might be about.
You send it a video link and I think internally it's like "the user mentioned technology earlier, and they asked about YouTube. Somebody somewhere on the internet mentioned YouTube in a review of Alexa. That YouTube video must be about Alexa".

Then Bard posted some random marketing bullet points that come up when you Google "what is Alexa", and said they were the opinion of the the unrelated Minecraft channel it arbitrarily and incorrectly decided must have posted the video about Alexa. Presumably based on context of the conversation.
In reality the video was a guy Testing microphones in his living room.

I also asked bard in a fresh session with no previous context "what is the YouTube video about". It usually describes some random and possibly fictional video in the realm of "what is YouTube" or "how do I make YouTube videos" and quickly jumps back into its bullet point "Google answers" to whatever question it came up with, attributing those points to that hypothetical video. A video that, if it exists, Bard has no ability to read, scrape, crawl, analyze, watch, quote, copy, or understand in any way. But it will still tell you it watched it and that it was really good.

So basically I think the Bard strategy so far is "just Google a random keyword you fed it, and make up a bunch of English-sounding lies to make you think something more advanced is happening".

It's just as broken and useless as the rest of the algorithms Google uses in it's products. But at least it can figure out what language I'm typing in, so Bard is probably a half-step above YouTube's Algorithm.