Is ‘Sweatshop Data’ Really Over?

Welcome back to In the loopThe new newsletter twice a week on the AI world.
If you read this in your browser, you can subscribe so that the following is delivered directly to your reception box.
What to know: the future of “data on axtains”
You can measure time in the AI world through the rate of new tests with provocative titles. Another arrived earlier this month from the Mechanize Work team: a new startup that tries, uh, to automate all human work. His title? “The data on the workshops is finished.”
He drew my attention. As regular readers may know, I have made a lot of reports over the years on the origins of the data used to form AI systems. My story “inside Facebook African sweatshop” was the first to reveal how Meta used entrepreneurs in Kenya, some of $ 1.50 per hour, to eliminate the content of their platforms – content that would later be used in attempts to train AI systems to do this work automatically. I also announced the news that Openai used workers from the same outsourcing company to detoxify the Chatppt. In both cases, workers said the work had left them diagnoses of post-traumatic stress disorder. So if the data on workshops is really one thing in the past, it would really be a big problem.
What the test supports – Mechanize the work test indicates a very real trend in AI research. To summarize: AI systems were relatively unintelligent. To teach them the difference between, let’s say, a cat and a dog, you must give them many examples labeled different from cats and dogs. The most profitable way to obtain these labels was in the world of world, where the workforce is inexpensive. But as AI systems have become smarter, they no longer need to be informed of basic information, according to the authors. IA companies are now desperately looking expert datawhich necessarily comes from people with doctorates – and who will not support poverty wages. “The teaching of the AIS of these new capacities will require the dedicated efforts of high -level specialists who work full time, and not entrepreneurs with a low -scale competence,”, support the authors.
A new AI paradigm – The authors are, in an important, correct sense. Big money has indeed opted for expert data. According to initiates, a business cover, including mechanized work, is jostling to dominate space, which could possibly be worth hundreds of billions of dollars, according to initiates. Many of them are not only hiring experts, but also build dedicated software environments to help AI learn from large -scale experience, in a paradigm called Learning strengthening with verifiable rewards. It is inspired by Deepmind’s Alphazero 2017 model, which did not need to observe humans playing chess or leaving, and has become superhuman just by playing millions of times against itself. In the same vein, these companies are trying to create software that would allow AI to “self-play”, with the help of experts, on coding, science and mathematics issues. If they can operate this, it could potentially unlock new major capacity jumps, believe the best researchers.
There is only one problem – Although all this is true, that does not mean that the data on the workshops have disappeared. “We do not observe the workforce of data workers, in the classic sense, by decreasing,” explains Milagros Miceli, researcher at the Weizenbaum Institute in Berlin who studies the so-called mission farm data. “Quite the opposite.”
Meta and Tiktok, for example, still count on thousands of entrepreneurs from around the world to eliminate the harmful content of their systems – a task that has stubbornly resisted the automation of AI. Other types of low -remuneration tasks, generally performed in places like Kenya, the Philippines and India, are booming.
“Right now, what we see is a large part of what we call algorithmic verification: people who check the existing AI models to ensure that they work according to the plan,” explains Miceli. “What is funny is that they are the same workers. If you talk to people, they will tell you: I made the moderation of the content. I made the labeling of the data. Now, I do this.”
Who knows: Shengjia Zhao, chief scientist, Meta Superintelligence Labs
Mark Zuckerberg promoted the researcher of the IA Shengjia Zhao to the chief scientist of the new effort inside Meta to create a “superintendent”. Zhao joined Meta last month from Openai, where he worked on the O1-Mini and O3-Mini models.
MEMO DE ZUCK – In a note to Saturday staff, Zuckerberg wrote: “Shengjia has already been the pioneer of several breakthroughs, including a new setting up and has distinguished itself as a leader in the field.” Zhao, who studied for his undergraduate diploma in Beijing and graduated from Stanford with a doctorate in 2022, “will put the research program and a scientific direction for our new laboratory,” wrote Zuckerberg.
Push Meta’s recruitment – According to reports, Zuckerberg sparked a fierce war for Talents in the AI industry by offering that the best IA researchers pay up to $ 300 million, reports. “I lost track of the number of people here they have tried to get,” Sam Altman told Openai in a Slack message, according to the Wall Street Journal.
Bad news for Lecun – The promotion of Zhao is yet another sign than Yann Lecun – which until the hiring of Blitz this year was the most senior AI scientist in Meta – was put in grazing. Notable criticism of the idea that LLM will evolve at superintelligence, Lecun’s opinions seem to be more and more in contradiction with the Zuckerberg bulletin. Meta’s superintendent team is clearly a higher priority for Zuckerberg than the separate group than the execut, called Facebook Ai Research (Fair). In a note annexed to his announcement of Zhao’s promotion on the wires, Zuckerberg denied that Lecun had been sidelined. “To avoid any confusion, there is no change in the role of Yann,” he wrote. “He will continue to be chief scientist for the fair.”
IA in action
One of the big ways that AI already affects our world is in the changes it brings to our information ecosystem. New publishers have been complaining for a long time that Google’s “IA glimpses” in its research results have reduced traffic, and therefore revenues, harming their ability to use journalists and keep the powerful to account. Now we have new data from the Pew Research Center which puts this relief complaint.
When AI summaries are included in research results, only 8% of users click on a link – down 15% without summary of the AI, the study revealed. Only 1% of users clicked on any link In This summary of the AI itself, rubbing the argument that AI summaries are an effective way to send users to the content of the publishers.
As always, if you have an interesting story of AI in action, we would be delighted to hear it. Send us an email to: intheloop@time.com
What we read
“How to save OpenAi’s non -profit soul, according to an old Openai employee”, by Jacob Hilton in time
Jacob Hilton, who worked in Openai between 2018 and 2023, written on the current battle on the legal structure of Openai – and what it could mean for the future of our world.
“The non-profit organization still has no independent staff, and its members of the board of directors are too busy directing their own companies or their university laboratories to provide significant surveillance,” he said. “To add to this, the proposed restructuring of Openai now threatens the authority of the Council when it should rather strengthen.”



