The People Who Label Your AI’s Training Data Make Less Than Your Chai Wallah

The People Who Label Your AI's Training Data Make Less Than Your Chai Wallah

By Akshay A. Walimbe

You’ve probably heard the phrase “AI is the future.” You’ve seen the headlines about trillion dollar valuations, about models that can write poetry and pass law exams, about companies racing to build artificial general intelligence.

What you probably haven’t heard about is Daniel.

Daniel (not his real name) worked for Sama, a data labeling company headquartered in San Francisco that employs workers in Kenya, Uganda, and India. His job? Reading and categorising some of the most disturbing content on the internet descriptions of child sexual abuse, bestiality, murder, suicide, incest so that OpenAI’s ChatGPT could learn what not to say.

He was paid less than $2 per hour.

In January 2023, Time magazine published an investigation that pulled the curtain back on what it takes to make AI “safe.” The story documented how OpenAI had outsourced toxic content labeling to Kenyan workers through Sama, paying them less than $2 per hour to make ChatGPT less toxic. The story revealed that behind the polished interface of the world’s most famous chatbot was a pipeline of human suffering that most users never think about.

This isn’t a glitch in the system. It is the system.

The Assembly Line You Don’t See

Let me explain how this works, because understanding it changes how you think about every AI tool you use.

Large language models like GPT learn from massive datasets scraped from the internet. But the internet contains everything the brilliant and the horrific, the factual and the fabricated, the helpful and the hateful. A raw model trained on all of it would happily generate instructions for making weapons, produce racist content, or describe violence in graphic detail.

To prevent this, AI companies use a process called Reinforcement Learning from Human Feedback RLHF. In plain language: humans read the AI’s outputs, label them as acceptable or unacceptable, and the model learns from those labels.

Someone has to do that reading. Someone has to sit in front of a screen, hour after hour, day after day, reading the worst content the internet has ever produced, and clicking “harmful” or “not harmful.”

Those someones are overwhelmingly in the Global South. Kenya, Uganda, India, the Philippines, Venezuela. They work through outsourcing companies Sama, Scale AI, Remotasks, Appen, CloudFactory. They are paid by the task, often pennies per label. They have minimal mental health support. They have no equity in the companies whose products they make possible.

The Time investigation found that Sama’s Kenyan workers were exposed to text describing graphic violence and sexual abuse. All four employees interviewed by Time described being mentally scarred by the work. Workers reported developing anxiety, depression, and recurring nightmares. Some described being haunted by the content they had to read for months after leaving. Sama ultimately cancelled all OpenAI work in February 2022 eight months earlier than planned because of the traumatic nature of the content.

According to Time, OpenAI paid Sama $12.50 per hour per worker. The workers’ take home? Between $1.32 and $2 per hour, depending on seniority between six to nine times less than what OpenAI paid the outsourcing company. In Nairobi, a chai at a street stall costs about 20-30 shillings roughly 15 to 25 cents. These workers were earning barely enough for a few cups of chai per hour while reviewing content that would traumatise most people in minutes.

It’s Not Just Safety Labels

The Sama OpenAI story is the most famous example, but it’s the tip of an iceberg.

Every AI system you interact with relies on human labeling at some point in its development. Self driving car systems need humans to draw bounding boxes around pedestrians, cyclists, and road signs in millions of images. Medical AI needs humans to label tumours in X-rays. Voice assistants need humans to transcribe and tag audio recordings. Recommendation algorithms need humans to categorise products and content.

This is what researcher Mary Gray and computational social scientist Siddharth Suri call “ghost work” the hidden human labour that powers the AI economy. Their research documented a vast, invisible workforce of millions of people worldwide who perform the micro tasks that make AI systems function. These workers exist in a grey zone: not formal employees, not independent contractors in any meaningful sense. They have no benefits, no job security, no career progression, and no recognition.

Scale AI, one of the world’s largest data labeling companies, was valued at over $13 billion in its 2024 funding round. Its workforce of labelers? Mostly independent contractors in developing countries, paid per task, with no guarantee of minimum hours or minimum pay. MIT Technology Review and Rest of World have documented cases of Scale AI’s Filipino workers whose pay plummeted from $10 per task to less than 1 cent on some projects, and Venezuelan workers earning between $0.90 and $2 per hour.

The math is stark. OpenAI was valued at over $150 billion by late 2025. Estimates of the global data labeling market range widely from around $3.8 billion to $18.7 billion in 2024, depending on the source and how you define the market. Either way, the people who build the foundation of AI’s intelligence collectively earn a fraction of what a single AI company is worth.

India: The Next Data Labeling Factory

If you’re reading this from India, this story is about to become very personal.

India is already one of the world’s largest sources of data labeling labour. Companies like iMerit (headquartered in Kolkata), Playment (Bengaluru), and dozens of smaller outfits operate data annotation centres across the country. CloudFactory runs operations in multiple Indian cities. Scale AI and Amazon’s Mechanical Turk have massive Indian workforces.

Why India? The reasons are grimly practical. A large English speaking population. A massive pool of educated but underemployed graduates. Low wage expectations by global standards. A time zone that allows 24 hour coverage when combined with African and South American teams.

The typical data labeling worker in India earns between Rs 10,000 and Rs 25,000 per month. At the lower end, that’s about Rs 350 per day less than what a chai wallah in many Indian cities makes. A skilled chai wallah at a busy intersection in Mumbai or Delhi can earn Rs 500-800 per day. The person labeling data so that an AI model can diagnose diseases or approve loans is often earning less.

And the work itself? It ranges from the tedious to the traumatic. Labeling images for self driving cars is mind numbing but relatively harmless. Content moderation for social media platforms labeling hate speech, violent imagery, sexual content is psychologically devastating. Workers have described it as “having the worst of humanity pumped into your brain eight hours a day.”

India’s BPO and IT services industry has long been built on the arbitrage of lower labour costs. The data labeling economy is the latest iteration of this pattern but with a darker edge. At least the call centre worker helping you reset your password wasn’t being systematically exposed to images of violence and abuse.

The Content Moderators’ Silence

In 2022, Daniel Motaung, a content moderator who worked for Meta through Sama in Nairobi, filed a petition before a Kenyan court alleging poor working conditions, inadequate pay, and insufficient mental health support. He described being paid approximately $2.20 per hour to moderate content that included graphic violence, child exploitation, and terrorism. Over 140 Facebook moderators in Kenya have since been diagnosed with severe PTSD, according to reporting on the case.

The case opened up a broader conversation about what tech companies owe the workers who clean up their platforms. Meta’s response? They pointed to their contractor, Sama, and said working conditions were determined by the outsourcing company, not by Meta directly. It’s the perfect liability shield: the company that profits from the work doesn’t employ the workers who do it.

This pattern repeats itself across the AI industry. The companies building the most valuable technology in human history have structured their supply chains so that the most harmful, lowest paid work is done by people with the least power, in countries with the weakest labour protections, through intermediaries that provide plausible deniability.

The Irony That Should Make You Uncomfortable

Here’s what strikes me every time I think about this.

We build AI systems that are supposed to make the world more efficient, more productive, more fair. We celebrate when a model can pass a medical exam or write legal contracts. We invest billions in making these systems “ethical” and “responsible.”

And the foundation of all of it the human judgment that teaches the machine what’s good and bad, what’s true and false, what’s safe and harmful is built on a workforce that we pay poverty wages, expose to psychological trauma, and render invisible.

The AI companies will tell you this is getting better. Automated labeling is improving. Synthetic data is reducing the need for human annotation. Better tools mean less exposure to harmful content. And there are legitimate counterpoints. A Kenyan court ruling held that Meta could be held liable alongside Sama, establishing a precedent for Big Tech accountability. Some companies have improved pay and mental health support. Uber India, for instance, launched a data labeling programme for its 1.4 million driver partners in 12 Indian cities positioning it as supplementary income during downtime rather than traumatic full time work.

But the scale of AI development is growing faster than the automation of labeling. OpenAI, Google, Anthropic, and Meta are training larger models on more data, which requires more labels, not fewer. The World Bank estimates 150 to 430 million data laborers globally. The demand for human annotation is projected to grow, not shrink.

And the fundamental economics haven’t changed. If you’re a publicly listed AI company under pressure to show margins, and your costs include billions in compute infrastructure, the last line item you want to increase is the payment to labelers in Nairobi and Kolkata. The financial incentive is always to push those costs down, not up.

What Your Chai Costs

The next time you ask ChatGPT a question, or use an AI tool at work, or get a recommendation from an algorithm pause for a moment.

Someone you will never meet, in a city you may never visit, sat in front of a screen and did the work that made that possible. They labelled the data. They flagged the harmful content. They drew the bounding boxes. They read the worst of the internet so that your experience could be clean and pleasant.

They were paid less than the person who made your morning chai.

The AI industry’s market capitalisation is measured in trillions. The workers who build its foundation are measured in pennies per task. And the gap between those two numbers is not a market inefficiency. It’s a design choice.

Every time you use an AI tool, someone you’ll never meet did the hardest work.

I’m have written a book about exactly this how AI and automated systems make decisions about your life, where accountability disappears, and what we can do about it. If you want to know morea about this book or order a copy, you can do it here: https://akshaywalimbe.com/beyond-bias/

Akshay Walimbe

AW

AW

The People Who Label Your AI’s Training Data Make Less Than Your Chai Wallah

The People Who Label Your AI's Training Data Make Less Than Your Chai Wallah

AW

Contact Detail

Quick links