AI needs better human data, not bigger models

gautamnaidu2020@gmail.comMay 26, 2025

0 0 4 minutes read

AI needs better human data, not bigger models

Opinion of: Rowan Stone, CEO of Sapien

AI is a paper tiger without human expertise in data management and training practices. Despite massive growth projections, AI innovations will not be relevant if they continue to form models based on poor quality data.

In addition to improving data standards, AI models need human intervention for a contextual understanding and critical thinking to ensure the ethical development of AI and a correct exit generation.

AI has a problem of “bad data”

Humans have a nuanced conscience. They rely on their experiences to make inferences and logical decisions. However, AI models are as good as their training data.

The accuracy of an AI model does not fully depend on the technical sophistication of the underlying algorithms or the amount of data processed. Instead, the precise performance of AI depends on reliable and high quality data during training and analytical performance tests.

The wrong data have multiple ramifications for the formation of AI models: it generates a prejudicial production and hallucinations from a defective logic, which leads to a waste of time to recycle AI models to unlearn bad habits, thus increasing the costs of the company.

The biased and statistically under-represented data disproportionately amplify defects and biased results in AI systems, in particular in the monitoring of health care and safety.

For example, an innocence project report lists several cases of identification, with a former Detroit police chief admitting that the fact of relying only on facial recognition based on AI would lead to 96% bad identifications. In addition, according to a report by the Harvard Medical School, an AI model used in American health systems has hierarchicized healthier white patients on more sick black patients.

The AI models follow the concept “Garbage In, Garbage Out” (Gigo), as defective and biased data, or “garbage”, generate poor quality outings. The wrong input data create operational ineffectiveness because the project teams face delays and higher costs in cleaning data sets before resuming the formation of the model.

Beyond their operational effect, AI models formed on low-quality data erod the confidence of companies in deploying, causing irreparable reputation damage. According to a research document, hallucination rates for GPT-3.5 were at 39.6%, stressing the need for additional validation by researchers.

These reputation damage has a great significance consequences because it becomes difficult to obtain investments and affects the positioning of the model market. In a summit of the IOC network, 21% of the main American computer leaders expressed a lack of reliability as the most urgent concern not to use AI.

Bad data for the training of AI models devalue projects and cause enormous economic losses for businesses. On average, incomplete and low -quality AI training data leads to poorly informed decision -making that costs companies 6% of their annual income.

Recent: Cheaper, faster, more risky – the rise of Deepseek and its security problems

Poor quality training data affects AI innovation and model training, therefore the search for alternative solutions is essential.

The problem of poor data has forced AI companies to redirect scientists to data preparation. Almost 67% of data scientists spend their time preparing correct data sets to prevent the delivery of disinformation of AI models.

AI / ML models may find it difficult to follow relevant production unless specialists – real humans with appropriate identification information – work to refine them. This demonstrates the need for human experts to guide the development of AI by guaranteeing high -quality organized data for the formation of AI models.

Human border data is essential

Elon Musk recently said, “The cumulative sum of human knowledge has been exhausted in AI training.” Nothing could be further from the truth, because data from the human border is the key to driving stronger, more reliable and impartial AI models.

The dismissal of human knowledge musk is a call to use synthetic data artificially produced for the formation of AI model with fine adjustment. Unlike humans, however, synthetic data lack real world experiences and have failed to make ethical judgments.

Human expertise guarantees a meticulous review and validation of the data to maintain the consistency, accuracy and reliability of an AI model. Humans evaluate, evaluate and interpret the production of a model to identify biases or errors and ensure that they align themselves with societal values and ethical standards.

In addition, human intelligence offers unique perspectives when preparing data by providing a contextual reference, common sense and logical reasoning to data interpretation. This helps to solve ambiguous results, understand the nuances and solve problems for the formation of high complexity models.

The symbiotic relationship between artificial and human intelligence is crucial to exploit the potential of the TI as a transformative technology without causing societal damage. A collaborative approach between man and machine helps to unlock human intuition and creativity to build new algorithms and architectures of AI for public good.

Decentralized networks could be the missing piece to finally solidify this relationship to a global scale.

Companies waste time and resources when they have low AI models that require constant refinement of scientists and staff engineers. Using decentralized human intervention, companies can reduce costs and increase efficiency by distributing the evaluation process on a global network of trainers and data contributors.

Learning the decentralized strengthening of human feedback (RLHF) does training on AI models a collaborative company. Daily users and specialists in the field can contribute to training and receive financial incentives for annotation, labeling, segmentation and classification of categories.

A decentralized mechanism based on blockchain automates remuneration because contributors receive rewards based on quantifiable AI model improvements rather than rigid quotas or references. In addition, the decentralized RLHF democratizes data and training models by involving people from various backgrounds, reducing structural biases and improving general intelligence.

According to a Gartner survey, companies will abandon more than 60% of AI projects by 2026 due to the unavailability of data ready for AI. Consequently, human skills and skills are crucial to preparing AI training data if industry wants to contribute $ 15.7 billions to the world economy by 2030.

The data infrastructure for the formation of the AI model requires continuous improvement based on new and emerging data and use cases. Humans can ensure that organizations maintain a database ready for AI through constant management of metadata, observability and governance.

Without human supervision, companies will grope with the massive volume of partitioned data on the storage of cloud and offshore data. Companies must adopt a “human in a loop” approach to refine data sets to build high quality, efficient and relevant HIA models.

Opinion of: Rowan Stone, CEO of Sapien.

This article is for general information purposes and is not intended to be and must not be considered as legal or investment advice. The points of view, the thoughts and opinions expressed here are the only of the author and do not reflect or do not necessarily represent the opinions and opinions of Cointellegraph.