University of Sussex
Browse
__smbhome.uscs.susx.ac.uk_tjk30_Documents_fpubh-08-00054.pdf (458.51 kB)

Can the use of Bayesian analysis methods correct for incompleteness in electronic health records diagnosis data? Development of a novel method using simulated and real-life clinical data

Download (458.51 kB)
Version 2 2023-06-12, 09:23
Version 1 2023-06-09, 20:42
journal contribution
posted on 2023-06-12, 09:23 authored by Elizabeth FordElizabeth Ford, Philip Rooney, Pete Hurley, Seb OliverSeb Oliver, Stephen BremnerStephen Bremner, Jackie Cassell
Background Patient health information is collected routinely in electronic health records (EHRs) and used for research purposes, however, many health conditions are known to be under-diagnosed or under-recorded in EHRs. In research, missing diagnoses result in under-ascertainment of true cases, which attenuates estimated associations between variables and results in a bias towards the null. Bayesian approaches allow the specification of prior information to the model, such as the likely rates of missingness in the data. This paper describes a Bayesian analysis approach which aimed to reduce attenuation of associations in EHR studies focussed on conditions characterised by under-diagnosis. Methods Study 1: We created synthetic data, produced to mimic structured EHR data where diagnoses were under-recorded. We fitted logistic regression (LR) models with and without Bayesian priors representing rates of misclassification in the data. We examined the LR parameters estimated by models with and without priors. Study 2: We used EHR data from UK primary care in a case-control design with dementia as the outcome. We fitted LR models examining risk factors for dementia, with and without generic prior information on misclassification rates. We examined LR parameters estimated by models with and without the priors, and estimated classification accuracy using Area Under the Receiver Operating Characteristic. Results Study 1: In synthetic data, estimates of LR parameters were much closer to the true parameter values when Bayesian priors were added to the model; with no priors, parameters were substantially attenuated by under-diagnosis. Study 2: The Bayesian approach ran well on real life clinic data from UK primary care, with the addition of prior information increasing LR parameter values in all cases. In multivariate regression models, Bayesian methods showed no improvement in classification accuracy over traditional LR. Conclusions The Bayesian approach showed promise but had implementation challenges in real clinical data: prior information on rates of misclassification was difficult to find. Our simple model made a number of assumptions, such as diagnoses being missing at random. Further development is needed to integrated the method into studies using real-life EHR data. Our findings nevertheless highlight the importance of developing methods to address missing diagnoses in EHR data.

Funding

A citizens’ jury study to understand whether, and under what conditions, the public would accept medical free text being used for research; G2433; EPSRC-ENGINEERING & PHYSICAL SCIENCES RESEARCH COUNCIL

ASTRODEM: Using astrophysics to close the 'diagnosis gap' for dementia in UK general practice.; G1895; WELLCOME TRUST; 202133/Z/16/Z

Testing the feasibility of applied Bayesian probabilistic modeling to maximize the public health value of electronic health record data; G2372; PUBLIC HEALTH ENGLAND

History

Publication status

  • Published

File Version

  • Published version

Journal

Frontiers in Public Health

ISSN

2296-2565

Publisher

Frontiers Media

Volume

8

Article number

a54

Department affiliated with

  • Primary Care and Public Health Publications

Full text available

  • No

Peer reviewed?

  • Yes

Legacy Posted Date

2020-02-25

First Open Access (FOA) Date

2020-03-13

First Compliant Deposit (FCD) Date

2020-02-24

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC