DeepSeek AI data under scrutiny as Microsoft investigates OpenAI data steal

January 30, 2025 3 min read

DeepSeek AI, a Chinese chatbot service that recently gained traction on the Apple App Store, is now in the spotlight due to allegations of unauthorized data access from Microsoft-backed OpenAI. According to sources familiar with the situation, DeepSeek AI’s founder, Liang Wenfeng, has strongly denied these accusations, dismissing them as baseless and labeling them as a coordinated attempt by Western media to undermine the company’s advancements.

Despite these denials, industry analysts suspect that DeepSeek AI may have leveraged OpenAI’s proprietary data to enhance its DeepSeek R1 model, which runs on the DeepSeek V3 algorithmic framework. Reports indicate that the company may have accessed extensive datasets through OpenAI’s Application Programming Interfaces (APIs)—a common method through which software developers integrate AI models into their applications.

Potential Exploitation of OpenAI’s API

Typically, Microsoft allows licensed software developers to utilize OpenAI’s models via API access, enabling them to integrate GPT-based conversational AI into their platforms. However, concerns have emerged that DeepSeek AI might have systematically extracted large volumes of data from OpenAI’s cloud infrastructure, potentially bypassing usage restrictions or rate limits designed to prevent unauthorized large-scale data extraction.

Given Microsoft’s strict data policies and security mechanisms, any potential misuse of the API would likely involve sophisticated techniques such as data scraping, API tunneling, or parallelized request handling to evade detection. Although OpenAI’s API includes monitoring features like token-based authentication and query rate limitations, malicious actors could theoretically work around these controls by distributing requests across multiple accounts or cloud proxies.

Microsoft’s Investigation and European GDPR Concerns

Currently, Microsoft, under CEO Satya Nadella, is investigating the matter. However, an anonymous insider has suggested that DeepSeek AI does not necessarily need external data sources, as China’s own AI ecosystem—especially Baidu’s extensive language model infrastructure—could provide ample training data for the chatbot.

Meanwhile, European regulatory bodies have also taken notice. Italy’s Garante per la Protezione dei Dati Personali (Italian Data Protection Authority) has initiated an inquiry into whether DeepSeek AI complies with General Data Protection Regulation (GDPR) requirements. This follows a formal complaint from Belgium’s data protection agency, citing potential GDPR violations in how the chatbot processes user data.

The European Commission is expected to form a committee to scrutinize the issue further. If DeepSeek AI is found to be in breach of European data privacy laws, it may face financial penalties or even a temporary ban from operating within EU jurisdictions. The investigation aligns with broader concerns from privacy-conscious nations such as the Netherlands, the Czech Republic, Finland, and Denmark, where citizens express strong preferences for retaining control over their personal data.

Alibaba’s AI Challenge and Data Transfer Restrictions

Adding to the competitive AI landscape, Chinese tech giant Alibaba has officially announced that its QWEN 2.5 Max model outperforms leading Western AI systems, including OpenAI’s ChatGPT, Meta’s LLaMA, Google’s Gemini, and even DeepSeek’s own chatbot. While Alibaba claims its model offers superior capabilities, both DeepSeek and Alibaba have remained silent on critical questions regarding data privacy, cross-border storage, and compliance with Western data sovereignty laws.

Recent regulatory shifts in Western nations have imposed strict constraints on AI firms transferring user-generated data to servers located in foreign jurisdictions. This move is intended to prevent unauthorized surveillance, mitigate cybersecurity risks, and enhance user data control. Both Chinese firms face growing pressure to clarify their data governance policies, especially as regulatory scrutiny intensifies worldwide.

Conclusion

As AI development accelerates, the clash between global tech giants and regulatory bodies highlights the importance of data ethics, security, and fair competition. Whether DeepSeek AI has indeed exploited OpenAI’s API remains uncertain, but the controversy underscores the broader geopolitical and technological tensions shaping the AI landscape today.