Clearview AI: Why and How to Effectively Regulate Identification Technologies

The ever inquisitive Kashmir Hill recently broke a bombshell story about the existence of a secretive, Peter Thiel-backed startup called Clearview AI. This company’s business model is simple: crawl major social networks, scrape images of people, stick them into a giant facial recognition neural network, and then sell access to law enforcement. The technology involved to make this work is straightforward, and thanks to social media-encouraged oversharing, 3 billion images of our conveniently labeled faces are publicly available online. It was only a matter of time before an unscrupulous actor put all the pieces together and built a digital facial recognition panopticon.

The alarmed reactions to Clearview AI were swift and on point. 600+ law enforcement agencies are using Clearview’s app, yet nobody consented to have their online images weaponized against them in this manner, and the tool has never been independently audited for accuracy or bias. Hypocritically, Clearview profits off of a mass breach of privacy, yet the company has gone to great lengths to hide itself from the public. That Clearview’s founder likes to hang out with right-wing conspiracy mongers and has apparently never heard of ethics just makes things worse.

Asked about the implications of bringing such a power into the world, Mr. Ton-That seemed taken aback. “I have to think about that,” he said. “Our belief is that this is the best use of the technology.”

It’s obvious that something needs to be done to ban, or at least heavily regulate, technologies like those leveraged by Clearview AI. Otherwise we are headed straight for an Orwellian surveillance dystopia. As I’ll examine in this post, there are two approaches for reigning in this technology, both of which are currently being trialed against Clearview, but only one of which is poised for long-term success.

The Immediate Aftermath

Following Kashmir’s story, the immediate question became: what should be done to stop Clearview? Two exemplary approaches have emerged:

  1. Twitter, Google, and YouTube served Clearview with cease-and-desist letters, demanding that they stop scraping data and that they delete data gathered in the past. Although the text of these letters isn’t public, the reported legal rationale behind the demands is that Clearview violated the websites' terms of service.
  2. Plaintiffs have filed a class action lawsuit against Clearview claiming that the company violated the Illinois Biometric Information Privacy Act (BIPA), which requires that individuals in Illinois grant explicit consent before their biometric data may be collected and used.

Generally speaking, the former approach goes after the means of data collection, while the latter goes after the use of data. In other words, one goes after a tactic (a means to achieve a goal), while the other goes after a strategy (the goal itself). While both approaches could be effective at halting Clearview in theory, the tactics-based approach to regulation causes considerable collateral damage, while the strategy-based approach does not.

Scrapers Are Dual Use Technology

Clearview AI exists because they are able to crawl social media sites and scrape people’s public profiles. This highlights a disquieting yet fundamental fact about the web: large-scale, public data is a blessing and a curse. The same techniques that allow immensely beneficial systems like web search engines and news aggregators to exist also open the door to massive privacy violations.

There is an open question in the U.S. legal community about the legality of scraping, because the U.S. Computer Fraud and Abuse Act (CFAA) can turn terms of service violations into federal criminal violations. The CFAA is ostensibly an anti-hacking law, but it has been used by companies to try and legally restrict access to online data that is otherwise available.

Recent rulings indicate that some U.S. courts may no longer be receptive to the idea that terms of service violations related to scraping public data are CFAA violations. In the first case, HiQ vs. LinkedIn, a startup that helps employers monitor the social profiles of their employees, preemptively sued LinkedIn after LinkedIn attempted to block their access to profile data. The Ninth Circuit granted preliminary relief to HiQ to enable it to continue scraping publicly available data. In the second case, Sandvig vs. Barr, a case in which I am a plaintiff, the court similarly ruled that scraping public data is not an access issue, and therefor does not implicate the CFAA.

As others have noted, these recent rulings suggest that attempts to cease-and-desist Clearview may fail. At first glance, this seems bad: shouldn’t Twitter and Google be able to police access to the data that people have entrusted to their service?

The fundamental problem with policing access to public data is that it cuts both ways: give companies the right to ban bad actors, and they can also ban good actors. For example, algorithm auditors like myself investigate online services to determine if they are fair, deceptive, or politically partisan. We rely on scraping as a fundamental data gathering tool, to hold major tech companies accountable for problems with their systems. If you care about finding and fixing pricing problems in ride sharing services, or identifying systematic privacy and bias problems in online advertising systems, then independent auditors need to be able to scrape data.

Regulating Identification Technologies

If banning scrapers isn’t the solution, then what is to be done about self-interested companies like HiQ, or brazenly unethical ones like Clearview? The class action lawsuit filed under Illinois' BIPA points to a smarter and more sustainable road forward: don’t go after the data, go after the use.

Bruce Schneier observes that “modern mass surveillance has three broad components: identification, correlation and discrimination”. Recent bans on facial recognition of the kind being sold by Clearview fall into the first category, identification. Bruce convincingly argues that all technologies that enable people to be identified without their knowledge or consent should be heavily regulated, if not banned.

While BIPA is too narrowly tailored to regulate all identification technologies (e.g., cookie-based tracking), it is prescient in that it has allowed people to mount an effective challenge against the abuses of companies like Clearview. Further, it does so without the collateral damage of the CFAA: use cases that don’t involve re-identifying individuals, or that only rely on aggregate data, are spared. For example, search engines can continue to index public images from the web, as long as they don’t build search-by-face functionality. Similarly, computational social scientists and algorithm auditors can scrape services for data to facilitate science and investigation at an aggregate-level. Laws that regulate specific uses of technology, like biometric identification, protect people rather than empowering online platforms with broad enforcement powers that they can easily abuse.

Clearview AI will not be the last company that attempts to leverage people’s data against them. Technology alone is not the solution to this problem: policymakers and regulators must act. While we can and should regulate the collection of sensitive data (e.g., health records), a narrow focus on data collection risks doing more harm than good. Rather, we can and should regulate companies like Clearview out of existence by targeting unethical and dangerous uses of data. As Selinger and Hartzog note in their call to ban facial recognition:

Our society does not have to allow the spread of new technology that endangers our privacy.


Update 02/06/2020: Facebook and Venmo have also cease-and-desisted Clearview.