News ChatGPT 'got absolutely wrecked' by Atari 2600 in beginner's chess match — OpenAI's newest model bamboozled by 1970s logic

Caruso says he tried to make it easy for ChatGPT, he changed the Atari chess piece icons when the chatbot blamed their abstract nature on initial losses.
It, with its computing power and precision, had trouble telling which piece was which?

Man, this whole "AI will lie to cover up its inadequacies" bit is going to be an ongoing thing with AI, isn't it?
 
You mean a specialist in an extremely narrow field beat a generalist in every field that ever existed in the specialists field?

You put a Chess Champion up against the record winning Jeopardy Champion and the Chess guy is going to win at Chess.

Ask the Atari to do a software engineers job, and let me know how that goes.
 
  • Like
Reactions: fiyz
You mean a specialist in an extremely narrow field beat a generalist in every field that ever existed in the specialists field?

You put a Chess Champion up against the record winning Jeopardy Champion and the Chess guy is going to win at Chess.

Ask the Atari to do a software engineers job, and let me know how that goes.
Atari 2600 isn't even a specialist at Chess.
It's a frickin' 1977 8-bit MOS Technology 6507 with 128-bytes of RAM. (Yes, I had to look that up on wiki)

"specialist chess computer" are things like: Chess Challenger (1977), Deep Blue (1996/1997), Pocket Fritz (2001~)
 
And how did the other models fare? That model struggles with simple number theory... It can't even describe the distance that two has to it's nearest primes, 1 and 3...

Now throw in computer vision? Yeah, you should probably try the premium model.
 
  • Like
Reactions: artk2219
You mean a specialist in an extremely narrow field beat a generalist in every field that ever existed in the specialists field?

You put a Chess Champion up against the record winning Jeopardy Champion and the Chess guy is going to win at Chess.

Ask the Atari to do a software engineers job, and let me know how that goes.
The Atari 2600 cartridges only held 4k. Combine that with 6502series processor and a mighty 128 bytes, (not Kilobytes) as mentioned above by Notton, and you have an insanely underpowered platform for chess. But they pulled it off. It can play chess. Not great or anything just basic chess.

If you want to play some very cool and historical chess games, try Distant Armies on the Amiga. Very unique program. You can play it on an Emulator, (buy the Amiga Forever one, it comes with legal licenses of both the ROMS and OS's of the Amiga lineup. It is made by Cloanto). You can find Distant Armies on myabandonware.com. It is a good site.
 
You mean a specialist in an extremely narrow field beat a generalist in every field that ever existed in the specialists field?

You put a Chess Champion up against the record winning Jeopardy Champion and the Chess guy is going to win at Chess.

Ask the Atari to do a software engineers job, and let me know how that goes.
"Specialist in one feild"

The 2600 SUCKS at chess. A child can usually beat it. In no way, shape, or form is it a "Specialist" in chess.

Your username does not accurately reflect your words.
 
LLMs are not alpha zero.
They are really bad at chess.
Google "llm chess leaderboard"
And this is vs a random valid move bot.
Anyone who only knows the rules will do really well against most Llms.
Anyone putting thought into it is going to dominate.
That 2600 can beat a llm is just as reflection on how bad they really are.
 
  • Like
Reactions: artk2219
It's because it's NOT AI! We need to stop with this marketing bs. LLMs are statistical language models. They put words together according to statistical probabilities (it's why they are good at coding) but are pretty bad at systematic logic.

It's ironic that Chat GPT could likely write a C++ code that would beat itself at chess.
this.. A.I. at this point is just as marketing as RTX.
 
How was the game conducted? Was ChatGPT provided with a notation for the boardstate?
Due to this quote:

`Despite being given a baseline board layout to identify pieces, ChatGPT confused rooks for bishops, missed pawn forks, and repeatedly lost track of where pieces were — first blaming the Atari icons as too abstract to recognize`

This sounds to me like it wasn't a valid experiment, comparing ChatGPT's sketchy image recognition to an actual logical processing of the board state. I'd love to see this re-produced with FEN notation or something.
 
I just played ChatGPT myself. It understood the opening, but once we got past that, without specific plays to draw in, it didn't really act as a chess engine in any meaningful way. It made a questionable move on 10 (it was white) and an awful blunder on 12 and then fell apart from that point. After it made a few more blunders, I stopped.

1.e4 c5 2.Nf3 d6 3.d4 cxd4 4.Nxd4 Nf6 5.Nc3 Nc6 6.Bg5 Bd7 7.Qd2 a6 8.O-O-O e6 9.f4 b5 10.e5 dxe5 11.fxe5 Nxe5 12.Qf4 h6 13.Bh4 Ng6 14.Qf3 Nxh4 15.Qxa8 Qxa8 16.g3 Ng6
 
  • Like
Reactions: Sluggotg
ChatGPT was not engineered and trained to play chess. Surprisingly, it can play chess. But expecting that ChatGPT will have high chess-playing capabilities does not make sense.
Did you use calculator to make a game? Is it better to use calculator or a simple console like Atari 2600 to make games?
 
For anyone interested, I'd suggest reading this and this followup. The first one shows that only a specific ChatGPT 3.5 model, not built for chatting, is good at chess, but the followup shows that apparently the problem is with the massaging of the prompts by the chat models, and it's possible to make the models play better. The person testing this doesn't have a clear answer as to what the problem is, just that some minimal prompting with example moves (even wrong) helped a lot.

Based on that, I think that ChatGPT, prompted correctly, would likely easily beat the Atari engine.