__smbhome.uscs.susx.ac.uk_tjk30_Documents_fpubh-08-00054.pdf (458.51 kB)
Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data
Version 2 2023-06-12, 09:23
Version 1 2023-06-09, 20:42
journal contribution
posted on 2023-06-12, 09:23 authored by Elizabeth FordElizabeth Ford, Philip Rooney, Pete Hurley, Seb OliverSeb Oliver, Stephen BremnerStephen Bremner, Jackie CassellBackground Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias towards the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterised by under-diagnosis. Methods Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic. Results Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR. Conclusions The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrated the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.
Funding
A citizens’ jury study to understand whether, and under what conditions, the public would accept medical free text being used for research; G2433; EPSRC-ENGINEERING & PHYSICAL SCIENCES RESEARCH COUNCIL
ASTRODEM: Using astrophysics to close the 'diagnosis gap' for dementia in UK general practice.; G1895; WELLCOME TRUST; 202133/Z/16/Z
Testing the feasibility of applied Bayesian probabilistic modeling to maximize the public health value of electronic health record data; G2372; PUBLIC HEALTH ENGLAND
History
Publication status
- Published
File Version
- Published version
Journal
Frontiers in Public HealthISSN
2296-2565Publisher
Frontiers MediaExternal DOI
Volume
8Article number
a54Department affiliated with
- Primary Care and Public Health Publications
Full text available
- No
Peer reviewed?
- Yes
Legacy Posted Date
2020-02-25First Open Access (FOA) Date
2020-03-13First Compliant Deposit (FCD) Date
2020-02-24Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC