Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Savkov, Aleksandar, Carroll, John, Koeling, Rob and Cassell, Jackie (2016) Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus. Language Resources and Evaluation, 50 (3). pp. 523-548. ISSN 1574-020X

[img] PDF - Published Version
Available under License Creative Commons Attribution.

Download (948kB)

Abstract

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.

Item Type: Article
Keywords: Corpus annotation, Annotation guidelines, Clinical text, Chunking, Named entities
Schools and Departments: Brighton and Sussex Medical School > Brighton and Sussex Medical School
Brighton and Sussex Medical School > Primary Care and Public Health
School of Engineering and Informatics > Informatics
Subjects: P Language and Literature > P Philology. Linguistics > P0098 Computational linguistics. Natural language processing
R Medicine > R Medicine (General)
Depositing User: Jane Hale
Date Deposited: 25 Jan 2016 12:31
Last Modified: 25 Mar 2017 04:34
URI: http://sro.sussex.ac.uk/id/eprint/59419

View download statistics for this item

📧 Request an update
Project NameSussex Project NumberFunderFunder Ref
Medical Research Council’s licence agreement with MHRAUnsetMedical Research CouncilUnset
UnsetUnsetThe Farr Institute CIPHERUnset