University of Sussex
Browse
s10579-015-9330-7.pdf (910.46 kB)

Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Download (910.46 kB)
journal contribution
posted on 2023-06-15, 20:47 authored by Aleksandar Savkov, John Carroll, Rob Koeling, Jackie Cassell
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

Funding

Medical Research Council’s licence agreement with MHRA; Medical Research Council

The Farr Institute CIPHER

History

Publication status

  • Published

File Version

  • Published version

Journal

Language Resources and Evaluation

ISSN

1574-020X

Publisher

Springer Verlag

Issue

3

Volume

50

Page range

523-548

Department affiliated with

  • BSMS Publications

Full text available

  • Yes

Peer reviewed?

  • Yes

Legacy Posted Date

2016-01-25

First Open Access (FOA) Date

2016-01-25

First Compliant Deposit (FCD) Date

2016-01-25

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC