Generation and evaluation of privacy preserving synthetic health data

Show simple item record

dc.contributor.author Yale, Andrew
dc.contributor.author Dash, Saloni
dc.contributor.author Dutta, Ritik
dc.contributor.author Guyon, Isabelle
dc.contributor.author Pavao, Adrien
dc.contributor.author Bennett, Kristin P.
dc.date.accessioned 2020-04-27T05:22:48Z
dc.date.available 2020-04-27T05:22:48Z
dc.date.issued 2020-11
dc.identifier.citation Yale, Andrew; Dash, Saloni; Dutta, Ritik; Guyon, Isabelle; Pavao, Adrien and Bennett, Kristin P., “Generation and evaluation of privacy preserving synthetic health data”, Neurocomputing, DOI: 10.1016/j.neucom.2019.12.136, vol. 416, pp. 244-255, Nov. 2020. en_US
dc.identifier.issn 0925-2312
dc.identifier.uri http://dx.doi.org/10.1016/j.neucom.2019.12.136
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/5343
dc.description.abstract We develop metrics for measuring the quality of synthetic health data for both education and research. We use novel and existing metrics to capture a synthetic dataset�s resemblance, privacy, utility and footprint. Using these metrics, we develop an end-to-end workflow based on our generative adversarial network (GAN) method, HealthGAN, that creates privacy preserving synthetic health data. Our workflow meets privacy specifications of our data partner: (1) the HealthGAN is trained inside a secure environment; (2) the HealthGAN model is used outside of the secure environment by external users to generate synthetic data. This second step facilitates data handling for external users by avoiding de-identification, which may require special user training, be costly, or cause loss of data fidelity. This workflow is compared against five other baseline methods. While maintaining resemblance and utility comparable to other methods, HealthGAN provides the best privacy and footprint. We present two case studies in which our methodology was put to work in the classroom and research settings. We evaluate utility in the classroom through a data analysis challenge given to students and in research by replicating three different medical papers with synthetic data. Data, code, and the challenge that we organized for educational purposes are available.
dc.description.statementofresponsibility by Andrew Yale, Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao and Kristin P. Bennett
dc.language.iso en_US en_US
dc.publisher Elsevier en_US
dc.subject Synthetic data en_US
dc.subject Health data en_US
dc.subject Generative adversarial networks en_US
dc.subject Privacy en_US
dc.title Generation and evaluation of privacy preserving synthetic health data en_US
dc.type Article en_US
dc.relation.journal Neurocomputing


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account