Sakana AI review: Hype vs. reality for the AI scientist

Kenneth Pangan
Written by

Kenneth Pangan

Amogh Sarda
Reviewed by

Amogh Sarda

Last edited November 6, 2025

Expert Verified

There’s a good chance you’ve heard the buzz around Sakana AI's "AI Scientist." It’s a system that supposedly automates the entire scientific research process, from dreaming up new ideas to getting papers published. The headline-grabbing claim? For about $15, this AI can produce a full research paper, kicking off a new age of automated discovery. It’s the kind of news that makes everyone in tech lean in a little closer.

But is this really the dawn of "Artificial Research Intelligence," or is the story a bit more complicated? In this article, we’re going to give you a straight-up Sakana AI review, comparing the company's big promises with what independent researchers actually found. More importantly, we'll talk about what this all means for businesses that need practical, reliable AI they can use today.

What is Sakana AI and the ‘AI scientist’?

Sakana AI is a Tokyo-based research lab that gets its inspiration from nature to build new kinds of AI. Their "AI Scientist" project is easily their most talked-about creation. It’s built to be a hands-off system that can manage the whole research cycle on its own.

According to Sakana AI, the system is supposed to:

  • Brainstorm brand-new research ideas.

  • Dig through existing literature using the Semantic Scholar database.

  • Write and run the code needed for experiments.

  • Analyze the results and draft an entire scientific paper.

  • Even perform its own peer review to catch any mistakes.

The project has kicked up a ton of excitement, painting a picture of a future where science moves at lightning speed. But with big claims come big questions, and the AI Scientist has definitely drawn some critical eyes.

The promise: Can an AI really automate scientific discovery?

If you read Sakana AI’s announcements, they’re positioning the AI Scientist as a revolutionary tool, the first of its kind to truly automate discovery from start to finish.

Here are the key promises they’ve put on the table:

  • End-to-end automation: The system is meant to handle everything from the first spark of a hypothesis to the final paper, with no human babysitting required. The idea is to let it run continuously, building on its own discoveries.

  • Peer-review success: Their most famous claim is that one of its fully AI-generated papers passed peer review for a workshop at ICLR 2025, a major machine learning conference. This was held up as proof that its work could meet human standards.

  • Insane cost savings: The company talks up the tiny price tag of about $6 to $15 per paper. This hints at a future where research isn't held back by big budgets, opening the doors for more people to contribute.

  • Open-ended discovery: The system is designed to create a library of knowledge that grows over time, learning from its past work and feedback, much like the human scientific community does.

Sakana AI frames this as the start of a new era where AI can make discoveries "at and beyond human levels." It’s a thrilling thought, but what happens when you actually look under the hood?

This video provides a brief overview of Sakana AI's "AI Scientist" and its goal of automating scientific discovery.

The reality: What an independent review found

Once you get past the headlines, a much messier picture starts to form. A thorough independent study by Beel et al., along with some sharp reporting from outlets like TechCrunch, put the AI Scientist’s abilities to the test. And while the tech is impressive on some levels, it’s a long way from the autonomous genius it’s cracked up to be.

Idea generation and novelty are a bust

The AI Scientist is supposed to find "novel" ideas by reviewing existing literature. But the independent analysis found this was basically just a glorified keyword search on Semantic Scholar. It doesn't actually synthesize or understand the knowledge it’s scanning, which is pretty essential if you want to figure out what's genuinely new.

Because of this, the system flagged several well-known concepts as "novel," including "micro-batching for SGD," a technique that's been around for years. This points to a huge weakness: the AI can spot keywords, but it has no real grasp of context or originality. Without that, it can't really push the boundaries of science.

Experiments are sloppy and fail constantly

Coming up with an idea is one thing, but testing it is where the AI Scientist really stumbled. The independent evaluation dropped a pretty damning statistic: 42% of the AI's experiments failed to even run due to coding errors. The system would often get stuck in a loop, trying the same broken code again and again.

And when the experiments did work? The code changes were tiny, with an average of just 8% being modified from the original template. This tells us the AI isn't very adaptable and isn't really creating new ways to test ideas. To make matters worse, the methodology was often just plain wrong. In one case, the AI claimed it improved energy efficiency, but its own results showed it had actually increased the use of computational resources, the exact opposite of its goal.

MetricAI Scientist Performance (Beel et al. Study)
Experiment Success Rate58% (5 out of 12 failed)
Novelty AssessmentUnreliable; flagged known concepts as "novel"
Average Citations per Paper5
Manuscript QualityFrequent errors (missing figures, placeholder text)
Result Accuracy57% of papers contained hallucinated or incorrect results

The final papers and reviews are shallow

The final papers didn't look much better. They were found to be low-quality, with a median of just five citations (most of them outdated). They were also full of amateur mistakes, like missing figures, duplicated sections, and even placeholder text that literally said, "Conclusions Here."

And what about that automated peer reviewer? It could spit out structured feedback, sure, but it consistently missed the biggest flaws in its own work. When tested on papers written by humans, it was overly critical, rejecting papers that human reviewers had approved. It seems the AI can follow a review template, but it lacks the deep, critical thinking needed for a real critique.

The study’s conclusion summed it up perfectly: the AI Scientist's output is comparable to that of an "unmotivated undergraduate student rushing to meet a deadline." It’s a fascinating demo of AI mimicry, but it’s nowhere near producing reliable science.

From moonshots to reality: What businesses need from AI today

While wild projects like the AI Scientist give us an exciting peek into the future, you can't run a business on moonshots. You need reliable, controllable, and transparent AI that solves real-world problems right now. The hype around experimental AI can be a distraction from the practical tools that are already making a difference.

This is where a down-to-earth solution like eesel AI comes into play. It’s built for the real world, not a research lab.

Let's compare Sakana AI's experimental approach with the business-ready strengths of eesel AI:

  • Reliability vs. Unpredictability: While the AI Scientist fails on 42% of its experiments, eesel AI lets you run a simulation first. This means you can safely test its performance on thousands of your actual past tickets before it ever talks to a customer. You get a clear, accurate forecast of how it will perform and can deploy it with total confidence.

  • Control vs. Black Box: The AI Scientist often generates flawed, nonsensical results you have no control over. With eesel AI, you're the one in charge. You can define exactly what knowledge it uses, customize its personality and actions, and choose which types of tickets to automate. The AI never goes rogue because it only does what you tell it to.

  • Grounded Knowledge vs. Hallucination: Sakana AI's bot struggles to understand literature and often just makes things up. eesel AI grounds itself in your company's reality. It instantly connects to your existing knowledge in places like your helpdesk, Confluence, Google Docs, and past support conversations. It learns your brand voice and your actual processes, so its answers are always accurate and helpful.

Pro Tip
An AI tool's real value isn't in flashy claims but in its ability to solve your problems with trust and transparency. Before you roll out any AI, ask yourself: 'Can I test this safely, and can I control exactly what it does?'

The final verdict on Sakana AI

Sakana AI's project is a seriously impressive technical demo. It’s a milestone that shows just how far AI has come in copying complex human skills like research and writing. It’s a cool experiment, no doubt about it.

But it absolutely does not live up to the hype of a fully autonomous scientist ready to change the world. The system is just too unreliable, superficial, and buggy for any serious use. It’s a fascinating proof-of-concept, not a tool you can actually count on.

Ready for AI that actually works?

The future of AI is exciting, but today's problems need practical solutions. While we wait for an AI to write the next Nobel-winning paper, businesses can already solve huge challenges in customer support and internal knowledge sharing.

Instead of wrestling with an experimental AI that produces flawed papers, you can deploy an AI that delivers flawless answers. eesel AI is designed for the real world. It’s self-serve, connects with your tools in minutes, and gives you the confidence to automate support safely and effectively.

Don't just read about what AI might do someday. See what it can do for you right now. Try eesel AI for free and find out how quickly you can automate your support with an AI you can actually trust.

Frequently asked questions

The primary conclusion is that while the "AI Scientist" is an impressive technical demonstration, it falls far short of being an autonomous, reliable research tool ready for real-world application. Its capabilities are superficial compared to the bold claims.

This independent Sakana AI review found Sakana AI's claims to be significantly exaggerated. For instance, the AI Scientist struggled with true novelty, produced error-filled experiments, and generated low-quality papers, contrary to promises of end-to-end automation and peer-review success.

The Sakana AI review revealed issues like the AI flagging known concepts as novel, a 42% failure rate for running experiments due to coding errors, flawed methodologies, and final papers containing significant mistakes like missing figures or placeholder text.

No, the Sakana AI review strongly indicates it cannot. While the system attempts end-to-end automation, its inability to generate truly novel ideas, reliably conduct experiments, or produce high-quality, accurate papers means it cannot yet automate genuine scientific discovery.

This Sakana AI review highlights its unreliability, unpredictability, and lack of control, making it unsuitable for businesses. Unlike practical solutions, it generates inconsistent and often flawed results, lacking the transparency and reliability businesses need for real-world problem-solving.

The Sakana AI review found that while the AI Scientist could structure feedback, its automated peer reviewer consistently missed significant flaws in its own work. When reviewing human-written papers, it was often overly critical, lacking the deep critical thinking of human reviewers.

This Sakana AI review contrasts the experimental AI's unreliability and black-box nature with eesel AI's focus on reliability, control, and grounded knowledge. Proven solutions offer safe simulation, user control over actions, and leverage existing company knowledge for accurate results.

Share this post

Kenneth undefined

Article by

Kenneth Pangan

Writer and marketer for over ten years, Kenneth Pangan splits his time between history, politics, and art with plenty of interruptions from his dogs demanding attention.