Artificial intelligence (AI) firm OpenAI has reportedly begun requesting third-party contractors to submit actual work from their current or past jobs so the company can test and benchmark the performance of its next-generation AI models.
Key Points
- OpenAI is collecting real human work from freelancers to benchmark AI performance and train next-generation models.
- The initiative involves contractors submitting detailed task requests and deliverables, including actual files or realistic mock-ups.
- The move spotlights a contradiction: jobs considered “replaceable” or not “real work” are being used as critical benchmarks for AI development.
Records obtained by Wired from OpenAI and the training data firm Handshake AI suggest the initiative is part of OpenAI’s effort to create a human performance benchmark for various tasks. In September, OpenAI introduced a new evaluation system designed to compare its AI models’ output with that of human professionals across multiple industries.
The AI company describes this evaluation system as a crucial measure of its progress toward developing Artificial General Intelligence (AGI), an AI capable of outperforming humans in most economically valuable tasks.
A confidential OpenAI document states that the company engaged third-party contractors from various professions to gather real-world tasks based on work typically performed in full-time roles, converting existing long-term or complex projects, often requiring hours or days to complete, into tasks.
Related: Watchdog Flags Sensitive Child Images Said to Be Made Using Grok
Additionally, the AI firm instructed contractors to detail tasks they have completed in their current or previous roles. Along with these descriptions, contractors were asked to provide “concrete output (not a summary of the file, but the actual file), e.g., Word doc, PDF, Powerpoint, Excel, image, repo,” rather than summaries of the work. In cases where real examples were unavailable, contractors were permitted to submit fabricated samples that realistically demonstrate how they would approach specific tasks.
According to OpenAI records, real-world tasks consist of two parts: the task request, which outlines what a manager or colleague asked the worker to do, and the task deliverable, which is the actual work completed in response. The company repeatedly stresses in its instructions that contractors’ submissions should represent genuine, on-the-job work that they have “actually done.”
Related: OpenAI Seeks New Head of Preparedness to Tackle AI Security and Health Risks
OpenAI’s reliance on real human work spotlights a stark contradiction in the AI landscape. In 2025, companies across industries have cited AI as a reason for mass layoffs, cutting roles deemed vulnerable to automation. Yet OpenAI CEO Sam Altman has previously suggested that many of these same positions might not even qualify as “real work.”
With the company now asking contractors to submit precisely these types of tasks to train AI agents, turning the very labor labeled expendable into essential benchmarks for next-generation AI. The move raises urgent questions about whose work is valued, and who pays the price for automation.
