AgentsX
Posts
Meet MLE-STAR: Google's New AI That Automates Machine Learning Tasks

Meet MLE-STAR: Google's New AI That Automates Machine Learning Tasks

Google's New Weapon for Developers.

Fred
August 05, 2025

What’s trending?

Google's AI Just Built a Better AI
$30M Says AI’s Future Is Vertical
How to Keep AI From Being Clueless

From Code to Deployment: How MLE-STAR Automates the AI Pipeline

Google Research has unveiled MLE-STAR, a breakthrough AI agent that revolutionizes machine learning engineering by combining intelligent web search, targeted code refinement, and adaptive ensemble strategies.

Unlike traditional MLE tools that rigidly rely on standard libraries like scikit-learn, this system dynamically evolves ML pipelines with minimal human intervention, achieving 63.6% Kaggle medal rates (including 36% gold) in benchmark tests.

How MLE-STAR Works Differently

Smart Web Research - Scours for cutting-edge model architectures instead of defaulting to outdated options.
Precision Refinement - Identifies weak pipeline components (feature engineering/model selection) for surgical improvements.
Adaptive Ensembling - Generates & optimizes multiple solution variants automatically.
Built-In Safeguards -
- A debugging agent fixes runtime errors.
- Data leak prevention blocks test set contamination.
- The usage checker ensures full dataset utilization.

Presenting MLE-STAR, a novel research focused ML engineering agent that integrates web search and targeted code block refinement that could help foster innovation and streamline ML model development. Learn more at goo.gle/4fmXvmK
— Google Research (@GoogleResearch)
5:01 PM • Aug 1, 2025

Key Advantages Over Competing Systems

2.5x performance boost over previous state-of-the-art (63.6% vs 25.8% success rate).
Adopts modern architectures (EfficientNet, ViT) instead of legacy models like ResNet.
Supports manual overrides (e.g., integrating custom models like RealMLP).
Mitigates LLM hallucinations via automated validation checks.

Currently available as open-source research software via Google's Agent Development Kit, MLE-STAR demonstrates how AI can not just assist but autonomously advance ML engineering, while maintaining rigorous reproducibility standards.

The Next AI Wave? $30M+ for Agents That Speak Industry Lingo

The AI research startup Fundamental Research Labs (formerly Altera) has raised $33 million in Series A funding led by Prosus, with participation from Stripe CEO Patrick Collison. This brings their total funding to over $40 million following a $9 million seed round last year.

Founded by ex-MIT professor Dr. Robert Yang, the company operates unconventionally with four parallel teams:

Games (evolving from Minecraft AI bots).
Prosumer Apps.
Core Research.
Platform Development.

Yang describes the vision as building a "historical" company rather than following traditional startup trajectories.

Shortcut – the first superhuman excel agent – is live.
While not perfect, Shortcut beats first year analysts from McKinsey/Goldman head-to-head 89.1% (220:27) when blindly judged by their managers.
We even gave humans 10x more time.
Try Shortcut now (before your boss does).
— nico (@nicochristie)
4:00 PM • Jul 28, 2025

Products Generating Real Revenue

Two flagship AI agents are already monetized post-free trials:

Fairies
- General-purpose assistant that connects apps, answers across platforms. queries, and automates workflows.
- Serves as a testbed for the company's core AI advancements.
Shortcut
- Spreadsheet-based autonomous analyst for financial modeling.
- Designed with Excel-like familiarity for power users.

Investor Confidence

Prosus' Sandeep Bakshi highlights what sets the company apart,

"Digital humans with actual use cases" beyond demos
Unique ability to attract top AI talent
Rapid translation of research into commercial products

While currently focused on productivity apps ("where the most value is created"), Yang reveals long-term ambitions to solve physical problems through embodied AI and robotics.

As AI assistants evolve from chatbots to autonomous agents that can email, edit documents, and manage databases, the tech industry faces a critical challenge: creating the infrastructure to let these agents operate safely and efficiently across our digital lives.

Two emerging protocols, Anthropic’s Model Context Protocol (MCP) and Google’s Agent2Agent (A2A), aim to standardize how AI interacts with software and other agents, but significant hurdles remain.

What is MCP?
Why is everyone talking about it?
Let’s take a closer look.
Model Context Protocol (MCP) is a new system introduced by Anthropic to make AI models more powerful.
— Alex Xu (@alexxubyte)
4:35 PM • Mar 10, 2025

The Protocol Landscape

MCP (Anthropic): Translates between natural language and APIs, with over 15,000 servers already registered. Focuses on agent-tool communication.
A2A (Google): Governs agent-to-agent coordination, adopted by 150+ companies (Adobe, Salesforce). Designed for multi-agent workflows.
Other Players: Cisco, IBM, and academic projects like Oxford’s Agora, are developing competing standards.

Key Challenges

Security Risks
- Agents are vulnerable to prompt injection attacks (e.g., hijacking via malicious emails).
- Proponents argue that standardization will make vulnerabilities easier to detect and patch.
Open vs. Controlled Development
- A2A is open-source under Linux Foundation governance.
- MCP remains Anthropic-owned, though forkable. Critics want broader oversight to prevent monopolization.
Efficiency Trade-offs
- Natural-language communication (MCP/A2A’s approach) is human-friendly but token-heavy, inflating costs.
- "You waste tokens summarizing documents no human will see," notes researcher Zhaorun Chen.
- Alternatives like Agora use structured data for machine-to-machine efficiency.

The Path Forward

While these protocols are gaining traction, experts agree they’re in early stages:

Security frameworks must evolve to prevent real-world harm.
Governance models need to balance between corporate control and community input.
Optimized communication methods could reduce computational overhead.

"This is the plumbing for the AI age," says AWS’s David Nalley. "Getting it wrong means leaks, clogs, or worse."

Stay with us. We drop insights, hacks, and tips to keep you ahead. No fluff. Just real ways to sharpen your edge.

What’s next? Break limits. Experiment. See how AI changes the game.

Till next time - keep chasing big ideas.

What's your take on our newsletter?

Thank you for reading