University of Sussex
Browse
__smbhome.uscs.susx.ac.uk_tjk30_Documents_untitled.pdf (716.08 kB)

Identifying undetected dementia in UK primary care patients: a retrospective case-control study comparing machine-learning and standard epidemiological approaches

Download (716.08 kB)
Version 2 2023-06-12, 09:15
Version 1 2023-06-09, 19:44
journal contribution
posted on 2023-06-12, 09:15 authored by Elizabeth FordElizabeth Ford, Philip Rooney, Seb OliverSeb Oliver, Richard Hoile, Pete Hurley, Sube Banerjee, Harm van MarwijkHarm van Marwijk, Jackie Cassell
Background Identifying dementia early in time, using real world data, is a public health challenge. As only two-thirds of people with dementia now ultimately receive a formal diagnosis in United Kingdom health systems and many receive it late in the disease process, there is ample room for improvement. The policy of the UK government and National Health Service (NHS) is to increase rates of timely dementia diagnosis. We used data from general practice (GP) patient records to create a machine-learning model to identify patients who have or who are developing dementia, but are currently undetected as having the condition by the GP. Methods We used electronic patient records from Clinical Practice Research Datalink (CPRD). Using a case-control design, we selected patients aged >65y with a diagnosis of dementia (cases) and matched them 1:1 by sex and age to patients with no evidence of dementia (controls). We developed a list of 70 clinical entities related to the onset of dementia and recorded in the 5 years before diagnosis. After creating binary features, we trialled machine learning classifiers to discriminate between cases and controls (logistic regression, naïve Bayes, support vector machines, random forest and neural networks). We examined the most important features contributing to discrimination. Results The final analysis included data on 93,120 patients, with a median age of 82.6 years; 64.8% were female. The naïve Bayes model performed least well. The logistic regression, support vector machine, neural network and random forest performed very similarly with an AUROC of 0.74. The top features retained in the logistic regression model were disorientation and wandering, behaviour change, schizophrenia, self-neglect, and difficulty managing. Conclusions Our model could aid GPs or health service planners with the early detection of dementia. Future work could improve the model by exploring the longitudinal nature of patient data and modelling decline in function over time.

Funding

ASTRODEM: Using astrophysics to close the 'diagnosis gap' for dementia in UK general practice.; G1895; WELLCOME TRUST; 202133/Z/16/Z

History

Publication status

  • Published

File Version

  • Published version

Journal

BMC Medical Informatics and Decision Making

ISSN

1472-6947

Publisher

BMC

Volume

19

Article number

a248

Department affiliated with

  • Primary Care and Public Health Publications

Full text available

  • No

Peer reviewed?

  • Yes

Legacy Posted Date

2019-11-25

First Open Access (FOA) Date

2019-11-25

First Compliant Deposit (FCD) Date

2019-11-22

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC