Privacy Challenges and Solutions for Medical Data Sharing
Various types of data, including demographics, clinical, and genomic information, are increasingly collected and stored in Electronic Medical Record (EMR) systems and biomedical research repositories. Such data have been traditionally used in automating the workflow of healthcare, but were recently recognized as an invaluable source for performing large-scale, low-cost biological, medical and healthcare analysis. These tasks are essential for the discovery of new drugs and therapies, and are a key step towards realizing the vision of personalized medicine. As a result, over $50 Billion were pledged by the Obama administration in 2009 to promote technologies for managing and sharing medical data. Meanwhile, detailed medical data are increasingly disseminated beyond the institution they were collected by, in accordance with data sharing regulations, such as the policy of the National Institutes of Health (NIH) for genomic information. This, however, may pose serious threats to patients’ privacy, which must be eliminated to comply with data sharing policies and legislation, such as the HIPAA privacy rule and the EU Directive 95/46/CE.
In this tutorial, we will elaborate on the need of sharing medical data in a privacy-preserving way, review the existing policies and practices for sharing medical data, and present state-of-the-art approaches for ensuring that the disseminated data are privacy-protected and useful. Following that, we will highlight important open problems and future directions. More specifically, the tutorial will consist of three parts. The first part will provide an overview of successful practices and paradigms to share and use medical data in applications. We will focus on the analysis and mining tasks supported by different types of medical data, as well as on privacy threats that data sharing entails. The second part of the tutorial will survey approaches for privacy-preserving medical data sharing. We will address a number of important issues, such as capturing and balancing data utility and privacy in applications, and designing privacy techniques for different types of data and data sharing scenarios. We will also present interesting case studies using data from the US Census and the EMR system of the Vanderbilt University Medical Center, a state-of-the-art system that stores information about 2 Million patients over 15 years. In the third part of the tutorial, we will discuss important open problems and provide a roadmap for the future.
By the end of this tutorial, the attendees will have a basic understanding of the concepts and underlying principles used to disseminate medical data in a protected and useful form. The tutorial will be accessible to computer science researchers and educators who are interested in data privacy, data mining, and information systems, as well as to industry developers. By focusing on open problems, we also hope to engage graduate students to conduct research in this emerging and interesting field.
Aris Gkoulalas-Divanis is a Research Staff Member in the Information Analyics Lab at IBM Research-Zurich. Prior to that, he was a postdoctoral research fellow in the Health Information Privacy LABoratory (HIPLAB) in Vanderbilt University (2009-2010), working on privacy for medical data. Aris received the Diploma from the University of Ioannina, the MS from the University of Minnesota, and the PhD from the University of Thessaly, all in Computer Science. His PhD dissertation was awarded the Certificate of Recognition and Honorable Mention in the 2009 ACM SIGKDD. His research interests are in the areas of data mining, privacy preserving data mining, privacy in medical data, and knowledge hiding.
Grigorios Loukides is a postdoctoral research fellow in the Health Information Privacy LABoratory (HIPLAB), Vanderbilt University. He received a Diploma from the University of Crete (2005) and a PhD from Cardiff University (2009), both in Computer Science. Grigorios’ research interests are in privacy-preserving data mining and biomedical informatics. He has investigated both theoretical and practical research aspects, including algorithmic design, optimization, and formal modeling, and explored interesting applications in healthcare and business.