Big Data Privacy Analysis

Given the inexpensive nature and availability of information storage media, individuals worldwide have exponentially increased their production and persistence of large amounts of data whether such data are captured as text, images, or sound. Analysis of these Big Data repositories introduces fascinating new opportunities for discovering new insights that contribute to different branches of science. The potential of Big Data comes however with a price; the users’ privacy is often at risk. Guarantees of conformance to privacy terms and regulations are limited in current Big Data analytics and mining practices. 

Unlike relational databases that exhibit a clear structure, Big Data is characterized by its unstructured nature and the variety of data types including both textual and audio-visual material. Only the Big Data applications encapsulate the logic that makes sense of such unstructured repositories. Hence, our work comes to provide tools and frameworks to build trusted Big Data applications. Using our framework, Big Data developers are able to verify that their code complies with privacy agreements and that sensitive users’ information is kept private regardless of changes in the applications and/or privacy regulations. Our work investigates the following research questions:

- RQ1: How to formally specify privacy? Can we devise machine-readable privacy rules?

- R2Q: How to extract privacy rules from natural language descriptions and formalize regulations such as the HIPAA? 

- RQ3: How can we leverage the formal definition of privacy to reason about privacy conformance for a piece of code?

- RQ4: How to automatically generate tests from formal specification of privacy? 

In this paper, we address RQ1 and RQ3. We present JPrivacy; a privacy profiling system for Java code. JPrivacy is based on a formal model for privacy rules and provide the algorithms and related tools to check Java code against these rules. Figure 1 shows the JPrivacy framework and its main components. JPrivacy takes as input a Java application and a natural-language description of privacy terms. It formalizes the privacy terms and checks the application’s code for potential violations of these terms. JPrivacy can also leverage these terms in order to generate test cases. These test cases guarantee that an application continues to comply with the privacy regulations as the code and underlying Big Data repositories evolves.

 

This work is done in collaboration with M. Brian Blake research group at the University of Miami.

 

Publications

  • Mohamed Abdellatif, Iman Saleh, and M.Brian Blake, "JPrivacy: A Java Privacy Profiling Framework for Big Data Applications", International Workshop on Collaborative Big Data, October 2014