A recent paper titled “Who Knows What About Me?” from Harvard’s Privacy Tools for Sharing Research Data project surveyed the personal data sharing by mobile phone apps to third parties. Their striking finding is that “73% of Android apps shared personal information such as email address with third parties, and 47% of iOS apps shared geo-coordinates and other location data with third parties.” Mobile phones are notorious for the data trails they leave behind, but usually the data is limited to telecom operators in order for the device to function in the network. The easy access to people’s mobile data by other actors necessitates an understanding of these data types.
An assessment of privacy in information systems must necessarily start with a survey of the types of data processed, also known as data fields or attributes. In a guideline document we published about mobile phone data collection, we differentiate between (i) identifiers, (ii) key-attributes, and (iii) secondary attributes, depending on the level of identifiability of these data types. Taking a risk-based approach to privacy engineering necessitates a broader spectrum view of data types than the EU’s dichotomy of normal and sensitive personal data, as well as a consideration of the context in which they’re collected and processed, and what the combination with other data fields reveals about an individual.
So, for example, email addresses and IP numbers are likely key-attributes, since not much extra data is needed to identify a person individually. The GPS location of a mobile phone user – without any further data – seems fairly innocuous secondary attributes, but research has shown that it’s possible to reidentify a significant amount of individuals based on their unique geospatial traces, if no obfuscation techniques are applied. A data collection that combines data fields such as geo-location and email address would therefore likely reveal much about an individual, especially when measurements are taken over time.
The age old data minimization principle dictates that only the information needed to operate a particular system should be collected and processed. James Rachels goes a step further, though, in his 1975 paper “Why Privacy is Important” when he argues that the type of information collected should be viewed as the “kind and degree of knowledge concerning one another which it is appropriate for them to have.” The appropriate test seems like a useful addition to the principle of data minimization, and encourages systems designers to consider the social invasiveness of their system, rather than just the functionality.