[LINK] Interesting implications for peer review

Thu Oct 2 18:27:08 AEST 2025

Summary of paper at the URL below, done by Comet (Perplexity AI)

Tony

Summary: Real AI Agents and Real Work

The Threshold Moment

Professor Ethan Mollick argues that AI has quietly crossed a crucial threshold - it can now perform real, economically relevant work. This conclusion is based on OpenAI's new test where experts with 14+ years of experience designed realistic 4-7 hour tasks, and AI systems came remarkably close to matching human expert performance.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Key Findings

Performance Gap Narrowing: While human experts still won, the margins were narrow and varied dramatically by industry. The main reasons AI lost weren't hallucinations or errors, but formatting issues and instruction-following - areas showing rapid improvement.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Tasks vs. Jobs: Mollick emphasizes that while AI can now handle individual tasks well, it's not replacing entire jobs yet. Jobs consist of many interconnected tasks, and AI's "jagged" abilities mean it can excel at some tasks while failing at others requiring complex human interaction.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Real-World Example: Academic Research Replication

Mollick demonstrates AI's capability with a striking example: he gave Claude Sonnet 4.5 a sophisticated economics paper and its replication data, asking it to reproduce the findings. Without further instruction, the AI:

Read and understood the paper

Sorted through archive files

Converted statistical code from STATA to Python

Methodically reproduced all findings successfully

This task would normally take human researchers many hours and represents a potential solution to academia's "replication crisis".oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Agent Revolution

Why Agents Work Now: Contrary to expectations, even small improvements in AI accuracy lead to huge increases in task completion ability. Modern "thinking" models are self-correcting, meaning they don't get derailed by single errors in long task chains.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Exponential Progress: Data from METR shows consistent exponential gains in AI's ability to complete long tasks autonomously, from GPT-3 to GPT-5 over five years.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Two Futures

Mollick warns of two possible paths:

The Nightmare Scenario: Thoughtlessly using AI to do more of what we already do. He demonstrates this by creating 17 different PowerPoint presentations from a single memo - highlighting the risk of drowning in AI-generated content.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Thoughtful Approach: Using human judgment to decide what's worth doing, not just what can be done. The recommended workflow: delegate to AI first, review and correct if needed, but do the work yourself if AI can't handle it. This approach could make work 40% faster and 60% cheaper while maintaining human control.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Critical Choice

The difference between a productive future and a counterproductive one isn't in the AI technology itself, but in how we choose to use it. By focusing on making ourselves more capable rather than just more productive, we can harness AI agents' power while avoiding the trap of infinite, unnecessary content generation.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Mollick concludes that AI agents are here now, capable of valuable work, but their impact depends entirely on our wisdom in deploying them.

https://www.oneusefulthing.org/p/real-ai-agents-and-real-work
Antony Barry
antonybbarry at gmail.com