News Chinese AI models storm Hugging Face's LLM chatbot benchmark leaderboard — Alibaba runs the board as major US competitors have worsened

Status
Not open for further replies.
LLM performance is only as good as its training data and that true artificial "intelligence" is still many, many years away.
First, this statement discounts the role of network architecture.

Second, intelligence isn't a binary thing - it's more like a spectrum. There are various classes cognitive tasks and capabilities you might be familiar with, if you study child development or animal intelligence.

The definition of "intelligence" cannot be whether something processes information exactly like humans do, or else the search for extra terrestrial intelligence would be completely futile. If there's intelligent life out there, it probably doesn't think quite like we do. Machines that act and behave intelligently also needn't necessarily do so, either.
 
I don't love the click-bait China vs. the world title. The fact is qwen is open source, open weights and can be run anywhere. It can (and has already been) fine tuned to add/remove bias. I applaud hugging face's work to create standardized tests for LLMs, and for putting the focus on open source, open weights first.
 
First, this statement discounts the role of network architecture.

Second, intelligence isn't a binary thing - it's more like a spectrum. There are various classes cognitive tasks and capabilities you might be familiar with, if you study child development or animal intelligence.

The definition of "intelligence" cannot be whether something processes information exactly like humans do, or else the search for extra terrestrial intelligence would be completely futile. If there's intelligent life out there, it probably doesn't think quite like we do. Machines that act and behave intelligently also needn't necessarily do so, either.
We're creating a tools to help humans, therfore I would argue LLMs are more helpful if we grade them by human intelligence standards.
 
Status
Not open for further replies.