Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text?

Ford, Elizabeth, Nicholson, Amanda, Koeling, Rob, Tate, Rosemary, Carroll, John, Axelrod, Lesley, Smith, Helen, Rait, Greta, Davies, Kevin, Petersen, Irene, Williams, Tim and Cassell, Jackie (2013) Optimising the use of electronic health records to estimate the incidence of rheumatoid arthritis in primary care: what information is hidden in free text? BMC Medical Research Methodology, 13 (105). pp. 1-12. ISSN 1471-2288

Full text not available from this repository.

Abstract

Background
Primary care databases are a major source of data for epidemiological and health services research. However, most studies are based on coded information, ignoring information stored in free text. Using the early presentation of rheumatoid arthritis (RA) as an exemplar, our objective was to estimate the extent of data hidden within free text, using a keyword search.
Methods
We examined the electronic health records (EHRs) of 6,387 patients from the UK, aged 30 years and older, with a first coded diagnosis of RA between 2005 and 2008. We listed indicators for RA which were present in coded format and ran keyword searches for similar information held in free text. The frequency of indicator code groups and keywords from one year before to 14 days after RA diagnosis were compared, and temporal relationships examined.
Results
One or more keyword for RA was found in the free text in 29% of patients prior to the RA diagnostic code. Keywords for inflammatory arthritis diagnoses were present for 14% of patients whereas only 11% had a diagnostic code. Codes for synovitis were found in 3% of patients, but keywords were identified in an additional 17%. In 13% of patients there was evidence of a positive rheumatoid factor test in text only, uncoded. No gender differences were found. Keywords generally occurred close in time to the coded diagnosis of rheumatoid arthritis. They were often found under codes indicating letters and communications.
Conclusions
Potential cases may be missed or wrongly dated when coded data alone are used to identify patients with RA, as diagnostic suspicions are frequently confined to text. The use of EHRs to create disease registers or assess quality of care will be misleading if free text information is not taken into account. Methods to facilitate the automated processing of text need to be developed and implemented.

Item Type: Article
Schools and Departments: Brighton and Sussex Medical School > Primary Care and Public Health
School of Engineering and Informatics > Informatics
Brighton and Sussex Medical School > Clinical and Experimental Medicine
Subjects: Q Science > QA Mathematics > QA0075 Electronic computers. Computer science
R Medicine > RA Public aspects of medicine > RA0421 Public health. Hygiene. Preventive Medicine > RA0648.5 Epidemics. Epidemiology. Quarantine. Disinfection
Depositing User: John Carroll
Date Deposited: 28 Feb 2014 09:26
Last Modified: 21 Sep 2017 09:23
URI: http://sro.sussex.ac.uk/id/eprint/47649
📧 Request an update
Project NameSussex Project NumberFunderFunder Ref
PREP (Patient Record Enhancement Project)086105/Z/08/ZWellcome TrustUnset