Synthetic Event Time Series Health Data Generation

Show simple item record

dc.contributor.author Dash, Saloni
dc.contributor.author Dutta, Ritik
dc.contributor.author Guyon, Isabelle
dc.contributor.author Pavao, Adrien
dc.contributor.author Yale, Andrew
dc.contributor.author Bennett, Kristin P.
dc.date.accessioned 2019-12-03T14:21:43Z
dc.date.available 2019-12-03T14:21:43Z
dc.date.issued 2019-11
dc.identifier.citation Dash, Saloni; Dutta, Ritik; Guyon, Isabelle; Pavao, Adrien, Yale, Andrew, Bennett, Kristin P., "Synthetic event time series health data generation", arXiv, Cornell University Library, DOI: arXiv:1911.06411, Nov. 2019 en_US
dc.identifier.uri http://arxiv.org/abs/1911.06411
dc.identifier.uri https://repository.iitgn.ac.in/handle/123456789/4988
dc.description.abstract Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectional health data which is not necessarily representative of real data. In reality, medical data is longitudinal in nature, with a single patient having multiple health events, non-uniformly distributed throughout their lifetime. These events are influenced by patient covariates such as comorbidities, age group, gender etc. as well as external temporal effects (e.g. flu season). While there exist seminal methods to model time series data, it becomes increasingly challenging to extend these methods to medical event time series data. Due to the complexity of the real data, in which each patient visit is an event, we transform the data by using summary statistics to characterize the events for a fixed set of time intervals, to facilitate analysis and interpretability. We then train a generative adversarial network to generate synthetic data. We demonstrate this approach by generating human sleep patterns, from a publicly available dataset. We empirically evaluate the generated data and show close univariate resemblance between synthetic and real data. However, we also demonstrate how stratification by covariates is required to gain a deeper understanding of synthetic data quality.
dc.description.statementofresponsibility by Saloni Dash, Ritik Dutta, Isabelle Guyon, Adrien Pavao, Andrew Yale andKristin P. Bennett
dc.language.iso en_US en_US
dc.publisher Cornell University Library en_US
dc.subject Machine Learning (cs.LG) en_US
dc.subject Computers and Society (cs.CY) en_US
dc.subject Machine Learning (stat.ML) en_US
dc.title Synthetic Event Time Series Health Data Generation en_US
dc.type Pre-Print en_US
dc.relation.journal arXiv


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Digital Repository


Browse

My Account