s10579-015-9330-7.pdf (910.46 kB)
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
journal contribution
posted on 2023-06-15, 20:47 authored by Aleksandar Savkov, John Carroll, Rob Koeling, Jackie CassellThe free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning.
Funding
Medical Research Council’s licence agreement with MHRA; Medical Research Council
The Farr Institute CIPHER
History
Publication status
- Published
File Version
- Published version
Journal
Language Resources and EvaluationISSN
1574-020XPublisher
Springer VerlagExternal DOI
Issue
3Volume
50Page range
523-548Department affiliated with
- BSMS Publications
Full text available
- Yes
Peer reviewed?
- Yes
Legacy Posted Date
2016-01-25First Open Access (FOA) Date
2016-01-25First Compliant Deposit (FCD) Date
2016-01-25Usage metrics
Categories
No categories selectedLicence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC