News Several AI companies said to be ignoring robots dot txt exclusion, scraping content without permission: report

Who didn't see that coming?
Yeah, when abiding to robots.txt is voluntary and when we know the techbro mentality of "it's easier to ask for forgiveness than to ask for permission" and how toothless the fines for doing stuff like this are there's nothing surprising about it.

What I wonder is whether for the purpose of DMCA this ignoring of robots.txt would constitute a circumvention of protection of a computer system? That would be kind of creative reading of the law though, hope someone tests it in court.
 
  • Like
Reactions: Sluggotg
Ignoring "opt out" is now explicitly illegal within the European Union, according to the EU AI Act and the Common Digital Single Market directive.

Although I find it unclear how violations would be classified and what possibilities for enforcement and repercussions there are.

There is also a weakness in that the EU does not mandate or recommend any specific protocols.
However, "robots.txt" is decades old and an official IETF standard, so there is no excuse for not following it.
 
Last edited:
Ignoring "opt out" is now explicitly illegal within the European Union, according to the EU AI Act and the Common Digital Single Market directive.
The EU doesn't have directives against building tech champions, but it is as if they do.
 
Last edited:
honestly until there are laws made to prevent using data w/o explicit permission w/ the punishment being a massive 50%+ of the companies earnings nobody's going to care to stop as "its just the cost of business" other wise.

same reason companies can be caught breaking law over and over because the fines are capped out to such a low point they make more $ breaking law and paying the fines than following law.
 
artworks-000296509803-pmytqn-t500x500.jpg
 
honestly until there are laws made to prevent using data w/o explicit permission w/ the punishment being a massive 50%+ of the companies earnings nobody's going to care to stop as "its just the cost of business" other wise.

same reason companies can be caught breaking law over and over because the fines are capped out to such a low point they make more $ breaking law and paying the fines than following law.
Even huge fines don't really stop the practise. Being fined millions or even an occassional bilion is still "cost of doing business".
It's profitable so they keep doing it.
 
honestly until there are laws made to prevent using data w/o explicit permission w/ the punishment being a massive 50%+ of the companies earnings nobody's going to care to stop as "its just the cost of business" other wise.

same reason companies can be caught breaking law over and over because the fines are capped out to such a low point they make more $ breaking law and paying the fines than following law.
I 100% agree, also criminally prosecute owners, CEOs and officers of the offending companies.
BTW shouldn't we all get recompense by companies that use our personal data and info!
Make that a law and see how companies like the turn about.
 
It's profitable so they keep doing it.
you didn't read my post fully. that was covered & also why i said it needs 50% of earnings (not just profit) as at such high costs it no longer becomes profitable.
if it took half ur income & forced you to remove what you did illegally then they wont do it again as it becomes bad business (because you are losing more than you make from it)
BTW shouldn't we all get recompense by companies that use our personal data and info!
depends.
is content your using free? then no as nothing is "free" and your data is the price you are paying.
if you pay for soemthing? then yes as you are paying them thus your data shouldnt be free to harvest on top of that.
 
The EU doesn't have directives against building tech champions, but it is as if they do.
Well, let's create AIs by teaching them that it's ok to be ruthless and trample upon people's privacy as long as it makes moolah - "the end justifies the means" and all that.
Skynet and the Matrix are getting ever more likely.
'Murica, fsck yeah !
 
  • Like
Reactions: PEnns
i said it needs 50% of earnings (not just profit) as at such high costs it no longer becomes profitable.
That's a pipe dream. The fines issued to Facebook were already some of the highest possible and they were still just a drop in the ocean.

You just cannot go that high.
if it took half ur income & forced you to remove what you did illegally then they wont do it again as it becomes bad business (because you are losing more than you make from it)
If "it" took half of income, "it" would gut the firm. Threatening to close it for good.

It would lead to an avalanche of fines and closures as well as lawsuits and accusations of overreach and unfair and targeted fines.
And it wouldn't be completely untrue.

What needed to change was the law. The rules. how the game is played.
And after changes to privacy laws were made, some of these data-harvesting practises ended but not all of them. Some are legal, some abuse loopholes, and some are very illegal but they still do them. It takes the authorities a while to catch everything, and they need funding. They seem to have less cash on hand than the crooks so a lot of BS takes time to bring down.
It's still a game, and the goal is still to cheat and bend the rules as much as possible to your advantage, and to do it without pissing off the other players and the host so much that they kick you out.


It is a huge risk to authorize any administration or authority to seize half of company income. The only courts that would accept such a reform are in banana republics and dictatorships, and they serve no justice at all.

Better to change the rules.
 
  • Like
Reactions: mitch074
The venerable Internet Archive has been ignoring robots.txt since 2017. Get scraped, nobody cares.
In the case of the Archive, it's understandable : they don't index, they archive (thus the name) - and yes, there's a nuance. As long as they don't allow re-use of their content outside of data safekeeping reasons, I'd say it's fair.
The day someone uses archive.org to train a LLM though, I say that someone should be terminated with extreme prejudice.
 
you didn't read my post fully. that was covered & also why i said it needs 50% of earnings (not just profit) as at such high costs it no longer becomes profitable.
if it took half ur income & forced you to remove what you did illegally then they wont do it again as it becomes bad business (because you are losing more than you make from it)

depends.
is content your using free? then no as nothing is "free" and your data is the price you are paying.
if you pay for soemthing? then yes as you are paying them thus your data shouldnt be free to harvest on top of that.
Yeah tell that to Google & Microsoft. I've yet to see recompense!