Research

To date, we've been making progress in three directions: (1) developing new data management abstractions that enhance users' and programmers' visibility over personal data in mobile OSes, (2) developing new auditing tools to increase transparency and oversight over Web services' use of personal data, and (3) Investigating designs for responsible data exchanges across Web services. We describe each in turn.

1. New Protection Abstractions for Mobile OSes

Data storage abstractions in OSes have evolved enormously. While traditional OSes used to provide fairly low-level abstractions -- files and directories -- modern OSes, including Android, iOS, OSX, and recent Windows, embed much higher-level abstractions, such as relational databases or object-relational models. Despite the change in abstraction, many crucial protection systems, such as encryption or deniable systems, still operate at the old file level, which often renders them ineffective, hurting privacy.

We are investigating new data protection abstractions that are more suitable for modern operating systems, including a new logical data object abstraction, which corresponds directly to user-level objects, such as emails, documents, or pictures. Thus far, we've investigated two end-of-spectrum approaches for implementing logical data objects: (1) expose a new APIs to app programmers (CleanOS system, described in an OSDI 2012 paper) and (2) recognize objects automatically by leveraging structural information from modern storage abstractions (Pebbles system, described in an OSDI 2014 paper). More details here (source code partially available).

2. Auditing Tools for Web Services

We're developing new auditing tools to increase transparency and oversight over Web services' use of personal data. Today's Web services accumulate enormous sensitive information -- such as emails, search logs, or locations -- and use them to target advertisements, prices, or products at users. Presently, users and privacy watchdogs alike have little insight into how their data is used for such purposes.

To enhance transparency, we are building XRay, a Chrome plugin that predicts what data -- such as emails or searches -- is used to target which ads in Gmail, which prices in Amazon, etc. The mechanism is Web-service independent, though the plugin is not. The insight is to compare ads/prices witnessed by different accounts with similar, but not identical, subsets of the data. XRay is described in our USENIX Security 2014 paper and was featured in a NY Times Bits article. This work is in collaboration with Prof. Augustin Chaintreau from Columbia. More details here (source code available).

Data has become the principal asset of the Internet era, which everyone strives to acquire and process. A new economy is emerging, in which striking amounts of continuously changing user data is being sold and shared for others to process upon. That economy needs to be controlled so that information can be shared efficiently, with strong semantic guarantees, and securely across multiple applications. To this end, we're building Synapse, an easy-to-use, strong-semantics, secure Web programming framework for large-scale, data-driven Web service integrations. An early prototype of Synapse that focuses on programmability aspects of data integration has been deployed in production at Crowdtap, a NYC startup. It helps them share data between the individual services composing their web application in an isolated, consistent, and real-time way. A paper on Synapse will appear at EuroSys 2015. More details here (source code available).

CAREER: Responsible Data Management

Research

1. New Protection Abstractions for Mobile OSes

2. Auditing Tools for Web Services

3. Responsible Data Sharing Abstractions