Home Tech Elon Musk agrees that we are running out of AI training data

Elon Musk agrees that we are running out of AI training data

80
0
Elon Musk agrees that we are running out of AI training data

Elon Musk agrees with other AI experts that there is no real-world data to train AI models. “We have now exhausted the cumulative amount of human knowledge … in AI training,” Musk said during a live chat with Stagwell chairman Mark Penn that aired on X late Wednesday. “That happened last year.” Musk, who owns AI company xAI, echoed the theme of OpenAI’s former chief scientist, Ilya Sutskever, at NeurIPS, a machine learning conference, during a speech in December. Sutskever, who says the AI ​​industry has reached what he calls “peak data,” predicts the lack of training data will force a change from the way models are developed today. Indeed, Musk suggests that synthetic data – data generated by AI models themselves – is the way forward. “The only way is to improve [real-world data] it is with synthetic data, where AI creates [training data],” he said. “With synthetic data… [AI] will sort itself out and go through this self-learning process.” Other companies, including tech giants like Microsoft, Meta, OpenAI, and Anthropic, are already using synthetic data to train leading AI models. Gartner estimates that 60% of data is used for AI and analytics projects in 2024. Microsoft’s Phi-4, which is open-sourced early on, is trained on synthetic data along with Google’s Gemma model to develop one of its best systems, Claude 3.5 Sonnet’s latest Llama model using AI-generated data has other advantages, such as cost savings. The Palmyra X 004 model, developed using almost all synthetic resources, costs only $700,000. developed – compared to the $4.6 million estimate for an OpenAI model of the same size.But there are also flaws. Some research suggests that synthetic data can lead to model collapse, where models become less “creative” – and more biased – in their results, ultimately impairing their functionality. Because the model creates synthetic data, if the data used to train the model has biases and limitations, its output will also be affected.

Source link