Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data

Ford, Elizabeth, Rooney, Philip, Hurley, Peter, Oliver, Seb, Bremner, Stephen and Cassell, Jackie (2020) Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data. Frontiers in Public Health, 8. a54. ISSN 2296-2565

[img] PDF (Accepted Author's version) - Accepted Version
Restricted to SRO admin only
Available under License Creative Commons Attribution.

Download (527kB)
[img] PDF - Published Version
Available under License Creative Commons Attribution.

Download (469kB)

Abstract

Background
Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias towards the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterised by under-diagnosis.
Methods
Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors.
Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic.
Results
Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis.
Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR.
Conclusions
The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrated the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.

Item Type: Article
Keywords: Electronic Health Records, Patient Data; Data Quality, Missing Data, Bayesian Analysis, Methodology.
Schools and Departments: Brighton and Sussex Medical School > Primary Care and Public Health
School of Mathematical and Physical Sciences > Physics and Astronomy
Subjects: Q Science > QA Mathematics > QA0273 Probabilities. Mathematical statistics > QA0274.7 Markov processes. Markov chains
R Medicine > R Medicine (General) > R858 Computer applications to medicine. Medical informatics
R Medicine > R Medicine (General) > R864 Medical records
R Medicine > RA Public aspects of medicine > RA0001 Medicine and the state. Including medical statistics, medical economics, provisions for medical care, medical sociology
Depositing User: Elizabeth Ford
Date Deposited: 25 Feb 2020 09:47
Last Modified: 13 Mar 2020 12:15
URI: http://sro.sussex.ac.uk/id/eprint/90071

View download statistics for this item

📧 Request an update
Project NameSussex Project NumberFunderFunder Ref
ASTRODEM: Using astrophysics to close the 'diagnosis gap' for dementia in UK general practice.G1895WELLCOME TRUST202133/Z/16/Z