Data is the currency of the new age, and while you might not think knowing your preference on washing powder is of any particular importance, it’s exceptionally valuable to the right companies.
Users, however, are increasingly becoming aware of the value of their privacy, and so it’s become important for companies to be able to gather data, while still retaining the anonymity — and trust — of their user base.
Back at the launch of iOS 10, Apple revealed it was trialing a system of gathering user data known as “differential privacy”. This system, when done correctly, allows for a company to gather large amounts of user data, without identifying the specific user that sent the data in the first place.
Simply put, it’s a method that allows Apple to study the forest, without being able to see the trees. Another way of explaining it would be like viewing a city from high above; you can study the lay of the land and the way the city blocks are laid out, but you can’t see the individual people, or know who owns which property. For extra security, Apple has the encryption of the data occur at a local level — on the device it’s sent from, rather than being encrypted at Apple’s central servers.
Back in 2016, Apple was only using the method to gather data from the iOS 10 keyboard, Spotlight searches, and Notes. Now, after a successful trial on iOS 10, the company has expanded the remit to gather data from more of iOS, including from Safari. As before, the system is opt-in, so users will have to enable the setting themselves in their system preferences.
Why does Apple need this data, and what does it use it for? The data allows Apple to respond quickly to user trends and deliver a better experience on their platform wherever possible. For instance, data on emoji use in various languages, or information on the latest trends inspiring use of foreign language words such as “Despacito” can be used to optimize iOS Keyboard to show more relevant suggestions. Other results include identifying which websites require the most resources for Safari, and allow Apple to make changes to optimize around those sites.
Everyone wants your data, and they’re getting it — it’s invaluable information that can grow many aspects of a business, from improving the very service offered to expanding the revenue stream. That’s the price of using free services from the likes of Google, Facebook, and a plethora of other companies, including Apple.
But Apple has become the paragon of privacy ever since it stood up to the FBI in the San Bernardino, California, shootings case. And to match its new privacy-first mindset, the iPhone maker is essentially limiting the amount of data it collects on people, while still keeping things anonymous. It’s all thanks to a method it’s implementing in iOS 10 called differential privacy.
Apple is trying to show that gathering user data on mobile devices doesn’t always have to mean sacrificing a user’s privacy.
Craig Federighi, Apple’s senior vice president of software, reminded us that Apple doesn’t build user profiles. And services like iMessage, HomeKit, and FaceTime use end-to-end encryption to protect data, which means law enforcement, criminals, or even Apple can’t access it. Apple has now clarified how differential privacy will work According to Recode, iOS 10 will be the first time Apple begins to collect differential data. But the key point is that this data collection is opt-in — the user will have to consent.
iOS 10 uses on-device intelligence to accomplish tasks like identifying people, objects, and scenes in Photos, and power suggestions for the keyboard. The image recognition features do not rely on users’ could-stored photos but on other data sets. Apple has not clarified what data it’s using, but it’s definitely not people’s cloud-stored photos.
“When it comes to performing analysis of your data,” Federighi said at the Worldwide Developers Conference keynote. “We’re doing it on your devices, keeping your personal data under your control.”
Data won’t be sent to the cloud, unlike Google’s data analysis. With differential data, Apple is trying to show that gathering user data on mobile devices doesn’t always have to mean sacrificing a user’s privacy.
What is differential privacy and how does it work?
Differential privacy is a mathematical technique that has been studied for several years. It’s a method to gather data on a large group of people while learning as little as possible about individuals in that group.
“Starting with iOS 10, Apple is using technology called differential privacy to help discover the usage patterns of a large number of users without compromising individual privacy,” Apple writes. “In iOS 10, this technology will help improve QuickType and emoji suggestions, Spotlight deep link suggestions, and Lookup Hints in Notes.”
Basically, your data is randomized and then sent to Apple in bulk along with other user data — that way it’s sent securely. The technique gathers popular trends about what people like, want, and do, without ever needing to attach that data to a specific individual. Apple, hackers, or law enforcement won’t be able to tell who this data is coming from, or even if a specific user is part of the data set.
Google actually has been using differential privacy since 2014 in its Chrome browser, but the search giant has opted to name the technique RAPPOR, Randomized Aggregatable Privacy-Preserving Ordinal Response. The people who created RAPPOR describe it best as a technique that allows “the forest of client data to be studied, without permitting the possibility of looking at individual trees.”
“Building on the concept of randomized response, RAPPOR enables learning statistics about the behavior of users’ software while guaranteeing client privacy,” Google writes in a blog post. “The guarantees of differential privacy, which are widely accepted as being the strongest form of privacy, have almost never been used in practice despite intense research in academia. RAPPOR introduces a practical method to achieve those guarantees.”
What parts of iOS 10 use differential privacy?
Apple is bringing this client privacy to the iOS 10 keyboard, Spotlight searches, and Notes. It’s likely that Apple is testing the technique on these services and apps first. If successful, the Cupertino company could extend this data-gathering technique to other services like Maps.
“We believe you should have great features and great privacy,” Federighi said at the keynote. “Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable … crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.”
“To obscure an individual’s identity, differential privacy adds mathematical noise to a small sample of the individual’s usage pattern.”
This is simplifying it, but hashing turns data into random characters; subsampling means Apple is only taking a small part of the data; and noise injection throws in other data to hide your personal information.
“To obscure an individual’s identity, differential privacy adds mathematical noise to a small sample of the individual’s usage pattern,” the company says in its iOS 10 preview guide. “As more people share the same pattern, general patterns begin to emerge, which can inform and enhance the user experience.”
Throwing more noise into a field of data obscures where the data is coming from, but trends will emerge as more people share the same pattern.
How does it make iOS 10 better?
Differential privacy isn’t just for gathering data though — it can actually help improve services.
“There’s this idea where the more privacy you have, the less useful the data is,” says Aaron Roth, a computer science assistant professor at the University of Pennsylvania, who also wrote the book on differential privacy. “There’s some truth to that, but it’s not quite so simple. Privacy can also increase the usefulness of data by preventing this kind of overfitting.”
In the iOS 10 keyboard, Apple is collecting user data to improve suggestions for QuickType and emojification. Rather than relying on and updating your own personal dictionary on your device, Apple will use differential privacy to pinpoint emoji and language trends across all its users. That way, you may end up seeing what the more popular emojis are, and new slang may pop up before you even type them.
“Of course, one of the important tools in making software more intelligent is to spot patterns in how multiple users are using their devices,” Federighi said. “For instance, you might want to know what new words are trending so you can offer them up more readily in the QuickType keyboard.”
Spotlight search also benefits from differential privacy. Currently, if you search “Finding Dory,” you’ll get links to articles from the News app and web content, as well as methods to purchase tickets if you have apps like Fandango installed. That’s thanks to deep linking, a feature introduced last year in iOS 9.
But how are these search results ranked? Why does the Apple News article show up first before Fandango’s results? There are also a lot of irrelevant results that plague the search results, and differential privacy helps by giving you the most popular deep links. So if everyone ignored the article from Apple News and went to Fandango’s result — that’s what will show up first.
Notes is, as of right now, the only other service that will use differential privacy. Notes is getting a little smarter in iOS 10 — apart from being able to use it with multiple people, the app will also let you interact with text to perform other functions. For example, if you write a name and number in Notes, the app would suggest creating a new contact. These suggestions aren’t random, though. Differential privacy helps Apple know what kind of notes people are making, and it will suggest whatever’s relevant to you.
Do you want smarter devices or private devices?
Roth, the most prominent mind on differential privacy, said Apple’s work is “groundbreaking.” Apple is showing us that you can indeed have smart features while still protecting a user’s privacy — and the company said at the keynote that it’s even collecting less data.
Contrast that with Google’s upcoming messaging app, Allo. The search giant hasn’t enabled end-to-end encryption app-wide, as it wouldn’t be able to offer its special Assistant features within messaging threads. Allo has an Incognito Mode, like Chrome, that will use end-to-end encryption — but this is clearly a compromise. It’s a way to offer some privacy, while still keeping all the smart AI features. Facebook is even reportedly considering a “mode” to enable end-to-end encryption in its Messenger app.
Apple wants to offer smart features and needs your data, too, but the company doesn’t want to compromise its existing end-to-end encryption. So on-device intelligence and differential privacy is its way of addressing that. Surely, this will prompt other major tech companies to re-evaluate the amount of data they collect, and the manner in which they collect it.
In the meantime, we’ll have to wait and see just how “smart” these iOS features will be when compared to what the likes of Google and Facebook can do, and we’ll likely see more information related to how differential privacy works in the new version as the fall approaches.
Updated on 06-24-2016 by Julian Chokkattu: Added in clarification from Apple on when differential data collection will begin, and that it is opt-in.