Long known that this is a difficult problem. Here a considerable piece on the subject:
Is Data Privacy Real? Don’t Bet on It
Aug 23, 2019 North America in Knowledge@Wharton
In 2009, Netflix was sued for releasing movie ratings data from half a million subscribers who were identified only by unique ID numbers. The video streaming service divulged this “anonymized” information to the public as part of its Netflix Prize contest, in which participants were asked to use the data to develop a better content recommendation algorithm. But researchers from the University of Texas showed that as few as six movie ratings could be used to identify users. A closet lesbian sued Netflix, saying her anonymity was compromised. The lawsuit was settled in 2010.
The Netflix case reveals a problem about which the public is just starting to learn, but that data analysts and computer scientists have known for years. In anonymized datasets where distinguishing characteristics of a person such as name and address have been deleted, even a handful of seemingly innocuous information can lead to identification. When this data is used to serve ads or personalize product recommendations, re-identification can be largely harmless. The danger is that the data can be — and sometimes is — used to make assumptions about future behavior or inferences about one’s private life — leading to rejection for a loan, a job or worse.
A research paper published in Nature Communications last month showed how easy re-identification can be: A computer algorithm could identify 99.98% of Americans by knowing as few as 15 attributes per person, not including names or other unique data. Even earlier, a 2012 study showed that just by tracking people’s Facebook ‘Likes,’ researchers could identify if someone was Caucasian or African-American with a 95% certainty, male or female (93%), or gay (88%); whether they drink (70%); or if they used drugs (65%).
This is not news to people in the industry — but it is to the public. “Most people don’t realize that even if personal information is stripped away or is not collected directly, it’s often possible to link certain information with a person’s identity by correlating the information with other datasets,” says Kevin Werbach, Wharton legal studies and business ethics professor and author of the book, The Blockchain and the New Architecture of Trust. “It’s a challenging issue because there are so many different kinds of uses data could be put to.” Werbach is a faculty affiliate of the Warren Center for Network and Data Sciences, a research center of Penn faculty who study innovation in interconnected social, economic and technological systems.
For example, telecom companies routinely sell phone location information to data aggregators, which in turn sell them to just about anyone, according to a January 2019 article in Vice. These data buyers could include landlords screening potential renters, debt collectors tracking deadbeats or a jealous boyfriend stalking a former flame. One data aggregator was able to find an individual’s full name and address as well as continuously track the phone’s location. This case, the article says, shows “just how exposed mobile networks and the data they generate are, leaving them open to surveillance by ordinary citizens, stalkers, and criminals.” .... "
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment