TemporalWiki

Welcome to the THYME project

Welcome to the Temporal Histories of Your Medical Event (THYME) project (THYME is pronounced [taim]).

The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of statistical inferencing. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.

The goal of our current proposal is to discover temporal relations from clinical free text through achieving four specific aims:

Specific Aim 1: Develop (1) a temporal relation annotation schema and guidelines for clinical free text based on TimeML, which will require extensions to Treebank, PropBank and VerbNet annotation guidelines to the clinical domain, (2) an annotated corpus (500K words of clinical narrative) following the temporal relations schema with additions to Treebank, PropBank and VerbNet, (3) a descriptive study comparing temporal relations in the clinical and general domains.

Specific Aim 2: Extend and evaluate existing methods and/or develop new algorithms for temporal relation discovery in the clinical domain. Component-level evaluation

Specific Aim 3: Integrate best method and/or a variety of methods for temporal relation discovery into Apache cTAKES (ctakes.apache.org) and release as open source annotators in the pipeline. Functional testing. Dissemination activities.

Specific Aim 4: System-level evaluation. Test the functionality of the enhanced Apache cTAKES (ctakes.apache.org) on translational research use cases, e.g. the progression of colon cancer as documented in clinical notes and pathology reports, the progression of brain tumor as documented in radiology reports.

The methods we will use for the temporal relation discovery are based on machine learning, e.g., Support Vector Machine technology. Such methods require the annotation of a reference standard from which the computations are derived. The best methods will be released as part of the cTAKES for the larger community to use and contribute to. We will test the methods against biomedical queries.

Funding

The project described is supported by Grant Number R01LM010090 from the National Library Of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.

The project period is October, 2010 - September, 2014.

The project is a collaboration with the i2b2 project (U54LM008748 from the National Library of Medicine).

Who We Are

University of Colorado
- Martha Palmer (PI)
- Jim Martin
- Wayne Ward
- Steven Bethard (at University of Alabama at Burmingham as of Sept, 2013)
- William Styler
- Arrick Lanfranchi (through August, 2012)
- Tim O'Gorman
- Kevin Crooks (through December 2013)
- Mariah Hamang
- and several Lingustics and Computer Science graduate students

Boston Childrens Hospital/Harvard Medical School
- Guergana Savova (PI)
- Dmitriy Dligach
- Timothy Miller
- Sameer Pradhan
- Sean Finan
- Chen Lin
- David Harris
- Jennifer Green (through December, 2013)

Mayo Clinic
- Piet de Groen
- Brad Erickson
- James Masanz
- Donna Ihrke (through December, 2012)
- Pauline Funk (through January, 2013)

Brandeis University
- James Pustejovsky

Publications and presentations crediting THYME

2014

Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. (in press). Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics.
Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. Forthcoming. Annotating the clinical text – MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
Miller, Tim. 2014. Discovering narrative containers in clinical text. i2b2 All Hands meeting, Jan 17, 2014. Boston, MA (presentation)

2013

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317; http://jamia.bmj.com/cgi/rapidpdf/amiajnl-2012-001317?ijkey=z3pXhpyBzC7S1wC&keytype=ref. PMID: 23355458
Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004. Anafora is available open source from https://github.com/weitechen/anafora
Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Pradhan, Sameer; Lin, Chen; and Savova, Guergana. 2013. Discovering narrative containers in clinical text. BioNLP workshop at the Association for Computational Linguistics. http://aclweb.org/anthology/W/W13/W13-1903.pdf
Bethard, Steven. 2013. A Synchronous Context Free Grammar for Time Normalization. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. http://www.aclweb.org/anthology/D13-1078
Bethard, Steven. 2013. ClearTK-TimeML: A minimalist approach to TempEval 2013. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA: Association for Computational Linguistics, pp. 10-14. http://www.aclweb.org/anthology/S13-2002
Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang and Zhi Zhong. 2013. Towards Robust Linguistic Analysis Using OntoNotes. Proceedings of the Conference on Natural Language Learning. Sofia, Bulgaria. August, 2013.
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. http://jamia.bmj.com/content/early/2013/10/03/amiajnl-2013-001766.full
Dmitriy Dligach, Timothy A. Miller, Guergana K. Savova. 2013. Active Learning for Phenotyping Tasks. In Proceedings of the 2013 NLP for Medicine and Biology workshop held in conjunction with RANLP-2013. September 2013. Hissar, Bulgaria. http://aclweb.org/anthology//W/W13/W13-5101.pdf
Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/
American Medical Informatics Association (AMIA) national webinar. “Towards semantic annotations of the clinical narrative”. National webinar. April 2013 (invited presentation)
Natural Language Processing Working Group Pre-Symposium – doctoral consortium and a data workshop. “Shared Annotated Resources for the Clinical Domain”. American Medical Informatics Association. Washington, DC, USA. November 2013.
Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Shared resources, shared code and shared activities in clinical natural language processing. AMIA Annual Symposium, Panel. Washington, DC.
AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2012

Savova, Guergana. 2012. Shared Annotated Resources for the Clinical Domain. Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology. San Diego, CA, USA. September 2012.
Drs. Pustejovsky, Palmer and Savova are members of the Program Committee of the 2012 i2b2 shared task whose topic is temporal relations in the clinical domain. The THYME annotation guidelines are the basis of the annotation guidelines for that shared task.
Participation in the State of the Art of Clinical NLP workshop organized by the NLM in April, 2012. Dr. Savova chaired a session, Prof. Pustejovsky was an invited speaker presenting on Temporal relations/TimeML.
Participation and presentation in the AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2011

Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2011. Shared annotated resources for the clinical domain. AMIA Annual Symposium, Panel. Washington, DC.

THYME Shared NLP Tasks

CLEF/ShARe 2014 (in collaboration with the ShARe project): http://clefehealth2014.dcu.ie/task-2
SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2014/task7/
SemEval 2015 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project)
SemEval 2015 Clinical TempEval

Getting access to the THYME annotations

The THYME corpus is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. Requests for a copy of the THYME corpus should be made to a Mayo NLP investigator (e.g., Dr. Piet de Groen) in order to complete the DUA. After the DUA has been completed, the THYME corpus is available via a secure download mechanism. Distribution of the corpus is supported by grants GM102282 and LM010090 from the NIH; include the funding acknowledgment in your publications.

The corpus is released to an established or junior NLP investigator, formally associated with an institution; thus it is not released to a student. However, all students working with the investigator can have full access to the corpus under the DUA of the investigator. The investigator is urged to have the students work on the corpus on workstations that stay within the laboratory space of the investigator.

The steps for obtaining a DUA are:

Step 1: Send an email to ClinicalNLP@mayo.edu indicating your interest in the THYME corpus. You will receive a form to fill out. Once that form is returned to ClinicalNLP@mayo.edu, your request will be passed on to the Mayo Clinic’s legal contracts group.

Step 2: Mayo Legal Contracts Office will send you a partially filled-out DUA for you to add information to, and for you to have signed by your site's official signatory.

Step 3: After you return the DUA to Mayo Clinic, Mayo Legal Contracts Office will complete the DUA and send you a copy of the fully executed DUA, and ClinicalNLP@mayo.edu will be copied on the email.

Step 4: A Mayo Clinic NLP investigator will arrange to talk with you.

Step 5: Once the DUA is complete and you have talked with a Mayo Clinic NLP investigator, ClinicalNLP@mayo.edu will send you instructions for obtaining the corpus via the secure downloading mechanism

When using the corpus, please cite:

Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. (in press). Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics.

THYME Annotations

Annotation layers are treebank and propbank annotations as well as temporal annotations for events, temporal expressions and temporal relations.

Guidelines

The THYME Temporal Relations Guidelines (PDF) - The current version of the THYME Temporal Relations Guidelines and release notes. Updated February 28th, 2014.

i2b2 Simplified THYME Guidelines (PDF) The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.

Tool for viewing annotations - Anafora

we developed a web-based annotation tool. It is open source and available at https://github.com/weitechen/anafora. Use it to view the THYME annotations

Viewing annotations (Anafora)

(available to the team only)

To view the Temporal-Entity data, use the URL:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Entity/gold/

TASK_NAME is the filestem, for example, ID074_path_219b

to view Temporal-Relation data:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Relation/gold/

you could find the available Entity/Relation gold data on verbs by using:

  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Entity.gold.completed.xml"
  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Relation.gold.completed.xml"

Train/Development/Test splits

Use this split for experiments with the THYME data (% 8)!
(A note about Protege/Knowtator and Anafora annotation tools: annotations)

Colon Cancer Data

All sets summary: 1-4,6-36,38-56,58-76,90,94,114,120
Train sets:(Residue 0,1,2,3) 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 90, 114, 120
Development sets: (Residue 4,5) 4, 12, 13, 20, 21, 28, 29, 36, 44, 45, 52, 53, 60, 61, 68, 69, 76
Test sets: (Residue 6,7) 6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 94

Brain Cancer Data

All sets summary: 2-9
Train sets: 2, 3, 8, 9
Development sets: 4, 5
Test sets: 6, 7

THYME Software

The THYME system is available as part of Apache cTAKES at http://ctakes.apache.org/

Relevant Papers

Internal Presentations

Presentations

Venues for manuscript submissions

Venues for manuscript submissions/publications

Project materials

Project Charter

Tasks, leads, teams and deadlines

Progress reports

Annotations - Describes the corpus, the layers of annotations and annotation progress

Annotation Tools - Describes the progress and information pertaining to the Anafora annotation tool

Communication

Bi-weekly meetings, Wed 11-noon ET
- Call in details
Distribution Lists

Meeting Notes

March 12, 2014 Bi-weekly team meeting agenda and notes
March 7, 2014 Methods meeting agenda and notes
February 28, 2014 Methods meeting agenda and notes
February 25, 2014 Bi-weekly team meeting agenda and notes
February 21, 2014 Methods meeting agenda and notes
February 14, 2014 Methods meeting agenda and notes
February 12, 2014 Bi-weekly team meeting agenda and notes
February 7, 2014 Methods meeting agenda and notes
January 31, 2014 Methods meeting agenda and notes
January 29, 2014 Agenda and notes
January 24, 2014 Methods meeting agenda and notes
January 17, 2014 Methods meeting agenda and notes
January 10, 2014 Methods meeting agenda and notes
January 8, 2014 Agenda and notes
December 20, 2013 Methods meeting agenda and notes
December 18, 2013 Agenda and notes
December 13, 2013 Methods meeting agenda and notes
December 6, 2013 Methods meeting agenda and notes
December 4, 2013 Agenda and notes
Nov 22, 2013 Methods meeting agenda and notes
Nov 15, 2013 Methods meeting agenda and notes
Nov 8, 2013 Methods meeting agenda and notes
Nov 6, 2013 Agenda and notes
Nov 1, 2013 Methods meeting agenda and notes
Oct 23, 2013 Agenda and notes
Oct 11, 2013 Methods meeting agenda and notes
Oct 9, 2013 Agenda and notes
Oct 4, 2013 Methods meeting agenda and notes
Sep 27, 2013 Methods meeting agenda and notes
Sep 25, 2013 Agenda and notes
Sep 20, 2013 Methods meeting agenda and notes
Sep 13, 2013 Methods meeting agenda and notes
Sep 11, 2013 Agenda and notes
Sep 06, 2013 Methods meeting agenda and notes
Aug 30, 2013 Methods meeting agenda and notes
August 28, 2013 Agenda and notes
Aug 23, 2013 Methods meeting agenda and notes
August 14, 2013 Agenda and notes
Aug 16, 2013 Methods meeting agenda and notes
Aug 2, 2013 Methods meeting agenda and notes
July 31, 2013 Agenda and notes
July 25, 2013 Methods meeting agenda and notes
July 19, 2013 Methods meeting agenda and notes
July 17, 2013 Agenda and notes
July 12, 2013 Methods meeting agenda and notes
No conference call on July 3, 2013. Happy 4th of July!
June 28, 2013 Methods meeting agenda and notes
June 19, 2013 Agenda and notes
June 5, 2013 Agenda and notes
May 31, 2013 Methods meeting agenda and notes
May 24, 2013 Methods meeting agenda and notes
May 22, 2013 Agenda and notes
May 17, 2013 Methods meeting agenda and notes
May 9, 2013 Methods meeting agenda and notes
May 8, 2013 Agenda and notes
May 3, 2013 Methods meeting agenda and notes
April 24, 2013 Agenda and notes
April 10, 2013 Agenda and notes
March 27, 2013 Agenda and notes
March 13, 2013 Agenda and notes
February 27, 2013 Agenda and notes
February 13, 2013 Agenda and notes
January 30, 2013 Agenda and notes
January 28, 2013 (annotations subgroup) Agenda and notes
January 16, 2013 Agenda and notes
January 2, 2013 Agenda and notes
December 19, 2012 Agenda and notes
December 5, 2012 Agenda and notes
November 21, 2012 Agenda and notes
November 6, 2012 Agenda and notes
October 24, 2012 Agenda and notes
October 10, 2012 Agenda and notes
September 12, 2012 Agenda and notes
August 29, 2012 Agenda and notes
August 15, 2012 Agenda and notes
August 1, 2012 Agenda and notes
July 18, 2012 Agenda and notes
June 22, 2012 Agenda and notes
June 20, 2012 Agenda and notes
June 6, 2012 Agenda and notes
May 23, 2012 Agenda and notes
May 9, 2012 Agenda and notes
April 25, 2012 Agenda and notes
April 11, 2012 Agenda and notes
March 28, 2012 Agenda and notes
March 14, 2012 Agenda and notes
Feb 29, 2012 Agenda and notes
Feb 14, 2012 Agenda and notes
Feb 1, 2012 Agenda and notes

Getting started

Contact

If you need assistance and/or if you have questions about the project, feel free to send e-mail to steven.bethard at colorado dot edu OR Guergana.Savova at childrens dot harvard dot edu

Main Page

Contents