University of Sussex
Browse

File(s) not publicly available

Automatically estimating the incidence of symptoms recorded in GP free text notes

chapter
posted on 2023-06-07, 23:50 authored by Rob Koeling, A Rosemary Tate, John Carroll
The UK General Practice Research Database (GPRD) is a valuable source of information for health services research. It contains coded data supplemented by free text (physicians' notes and letters). However, due to the difficulty of extracting useful information and the cost of anonymisation, this text is seldom utilised in epidemiological research. We annotated the records of 344 women in the year prior to a diagnosis of ovarian cancer and developed a method for automatically detecting mentions of symptoms in text. We estimated the incidence of five commonly presenting symptoms using: (1) coded symptoms, (2) codes augmented by symptoms automatically extracted from text, and (3) a 'gold standard' dataset of codes and text tagged by three clinically trained annotators. The estimates of incidence of each symptom increased by at least 40% when coded information was enhanced using the manually tagged free text. Our automatic method extracted a significant proportion of this extra information. Our straightforward approach should be extremely useful for medical researchers who wish to validate studies based on codes, or to accurately assess symptoms, using information that can be automatically extracted from unanonymised free text.

History

Publication status

  • Published

Publisher

ACM

Page range

43-49

Pages

7.0

Event name

Proceedings of the First International Workshop on Managing Interoperability and Complexity in Health Systems (MIXHS'11)

Event location

Glasgow, UK

Event type

conference

Book title

Proceedings of the first international workshop / Managing interoperability and complexity in health systems (MIXHS 2011)

Place of publication

New York, NY

ISBN

9781450309547

Series

Conference on Information and Knowledge Management

Department affiliated with

  • Primary Care and Public Health Publications

Full text available

  • No

Peer reviewed?

  • Yes

Legacy Posted Date

2012-02-06

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC