CLEAR (Computational Language and EducAtion Research)

End-End Systems

SHARP: Strategic Health IT Advanced Research Projects

*** The SHARP program between the Mayo Clinic and the University of Colorado Boulder. It is funded by HHS. ***

Much clinical data is captured in free text such as radiology and pathology notes. Extracting structured information from such free text facilitates searching, comparing, summarizing, etc. to enable research, improve standards of care and evaluate outcomes easily. Natural language processing systems can extract structured information from clinical notes that allows the information contained there to be searched, e.g. for a diagnosis, compared, e.g. to find common co-morbidities with a certain diagnosis, and summarized. Thus natural language processing can facilitate the use of clinical narratives for high throughput phenotyping, decision support at the point of care, and evaluation of health care delivery outcomes.

The SHARP natural language processing (NLP) team is currently working on improving the functionality, interoperability, and usability of a clinical NLP system, the Clinical Text Analysis and Knowledge Extraction System (cTAKES). Specifically the work focuses on adapting linguistic annotation guidelines to clinical texts, translating NLP research outcomes to better cTAKES performance, working with Clinical Element Models and high throughput phenotyping programs for interoperable systems, and improving the usability of cTAKES through adopting standards and investigating NLP use cases. The SHARP NLP team aims to (1) adapt syntactic tree, semantic role and named entity annotation guidelines to the clinical domain, (2) improve the usability of cTAKES for end users, (3) create a Common Type System for clinical NLP systems in the UIMA framework, and (4) create a NLP evaluation workbench that allows NLP investigators and developers to compare and evaluate various NLP algorithms. Additionally, the SHARP NLP team plans to be a delivery platform for open source clinical NLP systems through Open Health Natural Language Processing (OHNLP) consortium and welcomes contributions from clinical NLP researchers.

The following annotation guidelines are being developed, used and adpated to clinical text in this project:
THYME Annotation Guidelines
Syntactic tree (TreeBank) annotation guidelines
Semantic role (PropBank) annotation guidelines
Unified Medical Language System (UMLS) entity annnotation guidelines
Clinical coreference annnotation guidelines