Data, Privacy, And the Future of Artificial Intelligence

January 3, 2023 9 min read

By Michael G. McLaughlin, Associate, Baker, Donelson, Bearman, Caldwell & Berkowitz, PC

Data privacy and protection have become both central and increasingly restrictive for businesses in the United States and Europe. At the same time, innovators at the cutting edge of Artificial Intelligence (AI) research are continuously seeking more and better datasets to develop breakthrough technologies to solve the most challenging problems facing humanity. However, with the expanding data privacy regulatory landscape, Western technology companies more and more find themselves at a disadvantage to their Chinese counterparts, whose authoritarian goals and ethics stand in stark contrast with those of the democratic states.

Nearly every person on Earth has a unique digital fingerprint. This fingerprint identifies us through innumerable datapoints and becomes more detailed through each interaction we have with the inescapable technology that surrounds us. From the mobile devices and smart-watches we carry and wear, to the social media profiles and accounts we log into for work and leisure, to the cars we drive and the things we buy—the amount of digital exhaust we create every day is staggering.

Each minute, Internet users spend $443,000 on Amazon, post 347,000 Tweets, share 1.7 million pieces of content on Facebook, and upload over 500 hours of video to YouTube, according to Domo. Each of those millions of transactions is comprised of multiple datapoints—IP addresses, credit card information, geographic location, language and grammar, gender, skin tone, biometrics—the list goes on. By 2025, the total amount of data that will be created, copied, or consumed will reach 181 zettabytes. For reference, the typed letter “a” is one byte. One zettabyte is roughly the equivalent of a typed letter “a” for every grain of sand on all the beaches on earth. The Library of Congress contains 16 petabytes of information. There are 1 million petabytes in a zettabyte.

Every action we take online enlarges the boundless diaspora of data uniquely attributable to each of us. And while data privacy and protection have become the subject of much legal scrutiny recently, few people understand just how important their data really is. It has become tiresome to hear the trite phrase: “if you are not paying for a product, you are the product.” With the technological advancements in AI looming large on the horizon, perhaps it is more accurate to say, “your data is the product.”

Free social media platforms, such as Facebook, Twitter, and TikTok all use their mobile applications to harvest geolocational data, photos and videos, contacts, device and browser type, likes, tweets, posts, even how long users pause on particular posts or advertisements as they scroll through their feeds. TikTok also collects biometrics and anything copied to a device’s clipboard—including usernames and passwords from other applications. And Google scans the content of every email a Gmail user sends and receives.

The portrait tech companies are able to paint about individual users is astounding. In a 2014 study, researchers from Stanford University and the University of Cambridge found that Facebook needs 10 likes to know a user better than their co-worker, 70 likes to surpass a friend, and 300 likes to know someone more personally than their spouse. Given Moore’s law, stating that computing power doubles every two years, social media platforms very likely know their users far more intimately today with far fewer engagements.

Regulators and privacy advocates are staunchly opposed to the unchecked and nonconsensual collection of personal information, whether indirectly through third-party cookies or directly by social media platforms, online retailers, or other service providers. Many regulatory bodies and legislatures have established legal frameworks to try and curb the collection and monetization of personal information due to privacy concerns. Europe’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and Virginia’s Consumer Data Protection Act (VCDPA) are all examples of legislation that limit data collection and sales. The regulatory logic is that for individuals to unknowingly cede their digital fingerprint to major technology companies who aggregate data from every conceivable online activity constitutes unreasonable and intrusive surveillance that poses a risk in individual privacy.

However, while the collection of significant amounts of personal data does pose privacy concerns, over-regulation of data collection and sharing could have significant adverse effects on America’s future economic and national security. Though huge swaths of data collected and aggregated by tech giants are used for targeted advertising and enhancing user experience, they are also being used to develop the data-driven technologies that will determine the future of America’s standing on the world stage—namely, Artificial Intelligence.

To realize the benefits of AI—that is, simulated human inferences and decision-making processes—developers require large quantities of high-quality data to train the AI. Training data is used by machine learning algorithms to “self-teach” AI to perform specific tasks. For instance, whenever Gmail requires its users to authenticate their accounts by selecting pictures of a stoplight or when Google Maps directs a driver to take a “short cut” down a side street, Google is creating a dataset of training data to enable the development of autonomous vehicles.

The development of AI is patently different from the development of any other modern technology. Where most modern technologies rely on basic if/then decision trees, AI creates a neural network of innumerable datapoints to mathematically calculate the best solution to any problem. For this neural network to function, the AI must be “taught” to make connections across a vast universe of data. And unlike many technologies, the success of a particular AI will be determined by the quality and quantity of data it can access.

You are already seeing AI at play when you turn on Amazon Prime and have custom video content queued up based upon the genre of your recent book purchases, or the political videos appearing in your Facebook feed after you view the election results on MSNBC. Soon, this type of data-driven personalized experience could apply to every part of your life.

Take healthcare, for instance—

With access to quality data, in the not-too-distant future, health monitoring AI would be able to identify and diagnose the onset of illness and disease in ways modern medicine is simply not capable of. Smartwatches and mobile devices will be able to work in concert to identify imperceptible symptoms such as irregular breathing, sleep patterns, heartrate, and a change of gait. These symptoms would be flagged as anomalous against a health data baseline created from years of 24/7 monitoring from your devices. It would also be checked against the baselines of millions of other people of a similar age and demographic worldwide. The AI at the backend of this platform would incorporate datapoints from your genetic code, as well as your medical history and the medical histories of your immediate and extended family. Using location data from a mesh network of mobile devices, the AI could also determine who in your recent proximity might have exhibited similar symptoms. Based on near-instantaneous analysis of your personal data as well as a thorough understanding of the entire compendium of medical studies and research, the AI could diagnose ailments at the earliest possible moment and request that your doctor approve a recommended prescription available for immediate delivery to your location.

You wouldn’t necessarily know why the AI is making the decisions it is making, but it would create a decision web from millions of datapoints to achieve the best outcome for your wellbeing. Expanding this web to the Internet-of-Things, your home and office thermostats could lower the ambient temperature to account for the coming fever; your office calendar could automatically reschedule the next morning’s meetings; and your refrigerator could order Pedialyte, Tylenol, and chicken soup. Before you know you’re sick, you could already be on the path to recovery.

A future like this is predicated on technology developers being able to access immense quantities of both high-quality training data for AI development and the sharing of data collected by multiple sources to create the necessary digital neural networks. However, in the United States and other Western democracies, much of the data required to achieve the level of personal automation in the above scenario is currently neither centralized nor shared freely across organizations, who prize this data for its commercial value. Moreover, some of the data is also governed by regulations—such as the Health Insurance Portability and Accountability Act (HIPAA), CCPA, and GDPR—which prohibit or significantly limit the type of collection and disclosure that would allow for the development of such AI.

Presently, the United States and China are locked in a race as the world’s two competing AI superpowers. The United States is ahead, but the lead is narrowing. Americans value privacy and enjoy the protections they are afforded by the Fourth Amendment and other regulations, but the importance of privacy should be weighed against future economic and national security interests. Where American AI developers are increasingly brushing against various sectoral data privacy regulations, Chinese developers face no such restrictions. Beyond having access to the world’s largest population of 1.4 billion people, China’s leading AI developers also benefit from legal and regulatory frameworks designed to advance Beijing’s technological ambitions. These frameworks collectively require all data collected about Chinese citizens be stored within the geographic borders of China and compels the sharing of that data with the Chinese government.

The applicable data is not limited to Chinese citizens. Whenever TikTok users worldwide upload their videos, ByteDance—TikTok’s parent company—adds to its dataset material for training its facial recognition, voice recognition, and deep-fake technologies. Whenever cities in Africa or Asia install HikVision cameras or Huawei servers as part of China’s “Safe City” products, the foreign data collected add diverse inputs to China’s AI training datasets. Whenever women from around the globe provide blood samples to China’s BGI Group for neonatal testing, the company harvests genetic sequences of millions of women and children worldwide.

While the U.S. and other techno-democracies seek to use AI to advance societal interests, China is plumbing the depths of the dark side of AI.

According to The Washington Post, Chinese security services are currently using the fruits of its data collection to develop and deploy AI to identify, detain, and persecute its Uyghur Muslim population. So-called “predictive policing,” China’s Ministry of Public Security leverages access to data about individual Uyghur’s hobbies, occupations, familial ties, travel history, social media activities, and other traits to predict acts of terrorism. Utilizing data collected from worldwide sources, Chinese technology companies have developed facial recognition, gait recognition, and behavioral identifiers that are incorporated into its nationwide surveillance system to identify Uyghurs assessed to pose a threat. As though taken from the script of Minority Report, Chinese law enforcement use AI to identify and arrest Uyghurs their algorithms predict will commit acts contrary to state interests. These individuals are then rounded up and summarily sent to “re-education” camps.

The two uses of AI detailed in this article are intended to illustrate the crossroads the world faces. Authoritarian systems of government readily lend themselves to mass surveillance, collection, and data aggregation. Democratic systems, by contrast, place a premium on privacy and are generally resistant to most forms of government surveillance. In the age of AI, where access to data will determine both economic success and national security, this distinction places democracies at a disadvantage. As regulators, lawmakers, and tech giants in democratic nations seek to develop the foundations for ethical uses of AI, lawmakers and regulators need to establish a regulatory environment that gives western AI developers access to sufficient data to compete with their Chinese counterparts.

For techno-democracies to thrive in the age of Artificial Intelligence, lawmakers and regulators should seek to balance individual privacy with the society’s need to develop advanced technologies. Further, regulations need to be established that enable companies that collect large and diverse datasets—from personal information to genetic data—to share that data with AI developers in such a way that protects privacy while encouraging innovation. Otherwise, the United States will cede AI superiority to China.

About the Author

Michael McLaughlin is a cybersecurity and data privacy attorney with Baker Donelson in Washington, D.C., and the former Senior Counterintelligence Advisor for United States Cyber Command. His forthcoming book, Battlefield Cyber: How China and Russia Are Undermining our Democracy and National Security, is available for pre-order on Amazon. He can be reached at mgmclaughlin@bakerdonelson.com and via the Baker Donelson website at https://www.bakerdonelson.com/.