[LINK] Interesting implications for peer review
Antony Barry
antonybbarry at gmail.com
Thu Oct 2 18:27:08 AEST 2025
Summary of paper at the URL below, done by Comet (Perplexity AI)
Tony
Summary: Real AI Agents and Real Work
The Threshold Moment
Professor Ethan Mollick argues that AI has quietly crossed a crucial threshold - it can now perform real, economically relevant work. This conclusion is based on OpenAI's new test where experts with 14+ years of experience designed realistic 4-7 hour tasks, and AI systems came remarkably close to matching human expert performance.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Key Findings
Performance Gap Narrowing: While human experts still won, the margins were narrow and varied dramatically by industry. The main reasons AI lost weren't hallucinations or errors, but formatting issues and instruction-following - areas showing rapid improvement.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Tasks vs. Jobs: Mollick emphasizes that while AI can now handle individual tasks well, it's not replacing entire jobs yet. Jobs consist of many interconnected tasks, and AI's "jagged" abilities mean it can excel at some tasks while failing at others requiring complex human interaction.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Real-World Example: Academic Research Replication
Mollick demonstrates AI's capability with a striking example: he gave Claude Sonnet 4.5 a sophisticated economics paper and its replication data, asking it to reproduce the findings. Without further instruction, the AI:
Read and understood the paper
Sorted through archive files
Converted statistical code from STATA to Python
Methodically reproduced all findings successfully
This task would normally take human researchers many hours and represents a potential solution to academia's "replication crisis".oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Agent Revolution
Why Agents Work Now: Contrary to expectations, even small improvements in AI accuracy lead to huge increases in task completion ability. Modern "thinking" models are self-correcting, meaning they don't get derailed by single errors in long task chains.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Exponential Progress: Data from METR shows consistent exponential gains in AI's ability to complete long tasks autonomously, from GPT-3 to GPT-5 over five years.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Two Futures
Mollick warns of two possible paths:
The Nightmare Scenario: Thoughtlessly using AI to do more of what we already do. He demonstrates this by creating 17 different PowerPoint presentations from a single memo - highlighting the risk of drowning in AI-generated content.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Thoughtful Approach: Using human judgment to decide what's worth doing, not just what can be done. The recommended workflow: delegate to AI first, review and correct if needed, but do the work yourself if AI can't handle it. This approach could make work 40% faster and 60% cheaper while maintaining human control.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
The Critical Choice
The difference between a productive future and a counterproductive one isn't in the AI technology itself, but in how we choose to use it. By focusing on making ourselves more capable rather than just more productive, we can harness AI agents' power while avoiding the trap of infinite, unnecessary content generation.oneusefulthing <https://www.oneusefulthing.org/p/real-ai-agents-and-real-work>
Mollick concludes that AI agents are here now, capable of valuable work, but their impact depends entirely on our wisdom in deploying them.
https://www.oneusefulthing.org/p/real-ai-agents-and-real-work
Antony Barry
antonybbarry at gmail.com
More information about the Link
mailing list