University of Sussex
Browse
journal.pone.0257005.pdf (2.87 MB)

Comparison of machine learning methods for estimating case fatality ratios: an Ebola outbreak simulation study

Download (2.87 MB)
journal contribution
posted on 2023-06-10, 02:17 authored by Alpha Forna, Ilaria Dorigatti, Pierre NouvelletPierre Nouvellet, Christl A Donnelly
Background Machine learning (ML) algorithms are now increasingly used in infectious disease epidemiology. Epidemiologists should understand how ML algorithms behave within the context of outbreak data where missingness of data is almost ubiquitous. Methods Using simulated data, we use a ML algorithmic framework to evaluate data imputation performance and the resulting case fatality ratio (CFR) estimates, focusing on the scale and type of data missingness (i.e., missing completely at random—MCAR, missing at random—MAR, or missing not at random—MNAR). Results Across ML methods, dataset sizes and proportions of training data used, the area under the receiver operating characteristic curve decreased by 7% (median, range: 1%–16%) when missingness was increased from 10% to 40%. Overall reduction in CFR bias for MAR across methods, proportion of missingness, outbreak size and proportion of training data was 0.5% (median, range: 0%–11%). Conclusion ML methods could reduce bias and increase the precision in CFR estimates at low levels of missingness. However, no method is robust to high percentages of missingness. Thus, a datacentric approach is recommended in outbreak settings—patient survival outcome data should be prioritised for collection and random-sample follow-ups should be implemented to ascertain missing outcomes.

History

Publication status

  • Published

File Version

  • Published version

Journal

PLoS ONE

ISSN

1932-6203

Publisher

Public Library of Science

Issue

9

Volume

16

Page range

1-15

Article number

a0257005

Event location

United States

Department affiliated with

  • Evolution, Behaviour and Environment Publications

Full text available

  • Yes

Peer reviewed?

  • Yes

Legacy Posted Date

2022-01-14

First Open Access (FOA) Date

2022-01-14

First Compliant Deposit (FCD) Date

2022-01-14

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC