Main Page

From TemporalWiki

(Difference between revisions)
Jump to: navigation, search
(THYME Software)
(Meeting Notes)
(361 intermediate revisions not shown)
Line 4: Line 4:
The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of statistical inferencing. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.  
The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of statistical inferencing. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.  
-
The goal of our current proposal is to discover temporal relations from clinical free text through achieving four specific aims:
+
The best methods have been/will be released as part of the cTAKES (ctakes.apache.org) for the larger community to use and contribute to. We will test the methods against biomedical queries.
-
 
+
-
Specific Aim 1: Develop (1) a temporal relation annotation schema and guidelines for clinical free text based on TimeML, which will require extensions to Treebank, PropBank and VerbNet annotation guidelines to the clinical domain, (2) an annotated corpus (500K words of clinical narrative) following the temporal relations schema with additions to Treebank, PropBank and VerbNet, (3) a descriptive study comparing temporal relations in the clinical and general domains.
+
-
 
+
-
Specific Aim 2: Extend and evaluate existing methods and/or develop new algorithms for temporal relation discovery in the clinical domain. Component-level evaluation
+
-
 
+
-
Specific Aim 3: Integrate best method and/or a variety of methods for temporal relation discovery into Apache cTAKES (ctakes.apache.org) and release as open source annotators in the pipeline. Functional testing. Dissemination activities.
+
-
 
+
-
Specific Aim 4: System-level evaluation. Test the functionality of the enhanced Apache cTAKES (ctakes.apache.org) on translational research use cases, e.g. the progression of colon cancer as documented in clinical notes and pathology reports, the progression of brain tumor as documented in radiology reports.
+
-
 
+
-
The methods we will use for the temporal relation discovery are based on machine learning, e.g., Support Vector Machine technology. Such methods require the annotation of a reference standard from which the computations are derived. The best methods will be released as part of the cTAKES for the larger community to use and contribute to. We will test the methods against biomedical queries.
+
-
 
+
== Funding ==
== Funding ==
-
The project described is supported by Grant Number R01LM010090 from the National Library Of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.
+
Phase 1 of the project (2010-2014) was supported in part by the i2b2 project (U54LM008748 from the National Library of Medicine) and THYME R01LM010090 from the National Library Of Medicine.  
-
The project period is October, 2010 - September, 2014.
+
Phase 2 (2015-2018) is supported by THYME R01LM010090 from the National Library Of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.
-
The project is a collaboration with the i2b2 project (U54LM008748 from the National Library of Medicine).
+
== Who We Are ==
 +
* Boston Childrens Hospital/Harvard Medical School
 +
** Guergana Savova (MPI)
 +
** Dmitriy Dligach
 +
** Timothy Miller
 +
** Sean Finan
 +
** Chen Lin
 +
** David Harris
-
 
-
== Who We Are ==
 
* University of Colorado
* University of Colorado
-
** Martha Palmer (PI)
+
** Martha Palmer (MPI)
** Jim Martin
** Jim Martin
** Wayne Ward
** Wayne Ward
-
** Steven Bethard (at University of Alabama at Burmingham as of Sept, 2013)
+
** Jordan Boyd-Graber
-
** William Styler
+
** Will Styler
** Arrick Lanfranchi (through August, 2012)
** Arrick Lanfranchi (through August, 2012)
** Tim O'Gorman
** Tim O'Gorman
Line 38: Line 32:
** and several Lingustics and Computer Science graduate students
** and several Lingustics and Computer Science graduate students
-
* Boston Childrens Hospital/Harvard Medical School
+
* University of Arizona
-
** Guergana Savova (PI)
+
** Steven Bethard
-
** Dmitriy Dligach
+
 
-
** Timothy Miller
+
* University of Alabama, Birmingham
-
** Sameer Pradhan
+
** John Osborne
-
** Sean Finan
+
-
** Chen Lin
+
-
** David Harris
+
-
** Jennifer Green (through December, 2013)
+
* Mayo Clinic
* Mayo Clinic
** Piet de Groen
** Piet de Groen
** Brad Erickson
** Brad Erickson
-
** James Masanz
+
** James Masanz (through July, 2015)
** Donna Ihrke (through December, 2012)
** Donna Ihrke (through December, 2012)
** Pauline Funk (through January, 2013)
** Pauline Funk (through January, 2013)
Line 57: Line 47:
* Brandeis University
* Brandeis University
** James Pustejovsky
** James Pustejovsky
-
 
== Publications and presentations crediting THYME ==
== Publications and presentations crediting THYME ==
 +
=== 2019 ===
 +
* Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2019. A BERT-based Universal Model for Both Within-and Cross-sentence Clinical Temporal Relation Extraction. Proceedings of the 2nd Clinical Natural Language Processing Workshop. 2019.
 +
* Savova, G. K., Danciu, I., Alamudun, F., Miller, T., Lin, C., Bitterman, D. S., ... & Warner, J. L. (2019). Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Research, canres-0579.
 +
 +
 +
=== 2018 ===
 +
* Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiry, Hadi; Bethard, Steven; Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. October 31, 2018, Brussels, Belgium
 +
 +
=== 2017 ===
 +
* Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
 +
 +
* Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain. http://www.aclweb.org/anthology/E17-2118.
 +
 +
* Clinical TempEval 2017: http://alt.qcri.org/semeval2017/task12/
 +
 +
=== 2016 ===
 +
* Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
 +
* Miller, Timothy; Dligach, Dmitriy; Chen, Lin; Bethard, Steven; Savova, Guergana. 2016. Cross-domain Coreference Feature Exploration. AMIA Annual Symposium. Chicago, IL. November, 2016
 +
* Steven Bethard and Jonathan Parker (May 2016). “A Semantically Compositional Annotation Scheme for Time Normalization”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
 +
* Steven Bethard, Guergana Savova, Wei-Te Chen, Leon Derczynski, James Pustejovsky, and Marc Verhagen. 2016. “SemEval-2016 Task 12: Clinical TempEval”. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, CA
 +
* Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
 +
* Clinical TempEval 2016: http://alt.qcri.org/semeval2016/task12/
 +
 +
=== 2015 ===
 +
* Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
 +
* Bethard, Steven; Derczynski, Leon; Savova, Guergana; Pustejovsky, James; Verhagen, Marc. 2015. SemEval-2015 Task 6: Clinical TempEval. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). http://www.aclweb.org/anthology/S15-2136. http://aclweb.org/anthology/S/S15/S15-2136.pdf
 +
* Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Lin, Chen; Savova, Guergana. 2015. Extracting Time Expressions from Clinical Text. Proceedings of BioNLP 15. http://www.aclweb.org/anthology/W15-3809
 +
* Clinical TempEval 2015: http://alt.qcri.org/semeval2015/task6/
 +
* Clinical TempEval 2015 papers:
 +
** Sumithra Velupillai; Danielle L Mowery; Samir Abdelrahman; Lee Christensen; Wendy Chapman. BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge. http://aclweb.org/anthology/S/S15/S15-2137.pdf
 +
** Hegler Tissot; Genevieve Gorrell; Angus Roberts; Leon Derczynski; Marcos Didonet Del Fabro. UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval. http://aclweb.org/anthology/S/S15/S15-2141.pdf
 +
=== 2014 ===
=== 2014 ===
-
* Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. (in press). Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics.
+
*Lin, Chen; Karlson, Elizabeth; Dligach, Dmitriy; Ramirez, Monica; Miller, Timothy; Mo, Huan; Braggs, Natalie; Cagan, Andrew; Denny, Joshua; Savova, Guergana. 2014. Automatic identification of Methotrexade-induced liver toxicity in Rheumatoid Arthritis patients from the electronic medical records. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/10/24/amiajnl-2014-002642.abstract
 +
* Pascal B. Pfiffner, JiWon Oh, Timothy A. Miller, Kenneth D. Mandl. 2014. ClinicalTrials.gov as a Data Source for Semi-Automated Point-Of-Care Trial Eligibility Screening. PlosOne. DOI: 10.1371/journal.pone.0111055. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0111055#abstract0
 +
*Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana.2014. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/08/21/amiajnl-2013-002544.full.pdf+html
 +
* Sameer Pradhan, Noemie Elhadad, Wendy Chapman, Suresh Manandhar, Guergana Savova. 2014. SemEval 2014: Task 7. In Proceedings of the International Workshop on Semantic Evaluations, Dublin, Ireland. August. http://alt.qcri.org/semeval2014/cdrom/
 +
* Finan, Sean; De Groen, Piet; Savova, Guergana. 2014. Narrative event and temporal relation visualization tool. American Medical Informatics Association annual symposium. November 2014. Washington, DC.
 +
* Bethard, Steven. 2014. The state of the art of temporal relation extraction. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
 +
* Miller, Timothy. 2014. Methods for temporal relation discovery in the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
 +
* Pradhan, Sameer. 2014. Extrinsic evaluation of temporal relation discovery system. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
 +
* Finan, Sean. 2014. Visualization tool for temporal relations from the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
 +
* Chen, Pei. 2014. Modules for temporal relation discovery from the clinical narrative in Apache cTAKES. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
 +
* Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Eduard Hovy, Vincent Ng and Michael Strube. 2014. Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
 +
* Xiaoqiang Luo, Sameer Pradhan, Marta Recasens and Eduard Hovy. 2014. An Extension of BLANC to System Mentions. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
 +
* Chen Lin, Timothy Miller, Alvin Kho, Steven Bethard, Dmitriy Dligach, Sameer Pradhan and Guergana Savova. 2014. Descending-Path Convolution Kernel for Syntactic Structures. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
 +
* Savova, Guergana. 2014. Temporal relation discovery from the clinical narrative. Invited talk at the National Library of Medicine Informatics Series. June 4, 2014. Bethesda, MD.
 +
* Finan, Sean; de Groen, Piet; Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31 Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/
 +
* Bethard, Steven; Ogren, Philip; Becker, Lee. 2014. ClearTK 2.0: Design Patterns for Machine Learning in UIMA. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). http://anthology.aclweb.org//
 +
* Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. 2014. Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics. http://www.transacl.org/wp-content/uploads/2014/04/47.pdf
* Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text – MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
* Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text – MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
* Miller, Tim. 2014. Discovering narrative containers in clinical text. i2b2 All Hands meeting, Jan 17, 2014. Boston, MA (presentation)
* Miller, Tim. 2014. Discovering narrative containers in clinical text. i2b2 All Hands meeting, Jan 17, 2014. Boston, MA (presentation)
Line 67: Line 104:
=== 2013 ===
=== 2013 ===
* Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317; http://jamia.bmj.com/cgi/rapidpdf/amiajnl-2012-001317?ijkey=z3pXhpyBzC7S1wC&keytype=ref. PMID: 23355458
* Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317; http://jamia.bmj.com/cgi/rapidpdf/amiajnl-2012-001317?ijkey=z3pXhpyBzC7S1wC&keytype=ref. PMID: 23355458
 +
* Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Pradhan, Sameer; Lin, Chen; Savova, Guergana. 2013. Discovering time expressions in clinical text. Late breaking abstract. American Medical Informatics Association Conference. November, 2014. Washington, DC.
* Chen, Wei-Te  and  Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004. Anafora is available open source from https://github.com/weitechen/anafora
* Chen, Wei-Te  and  Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004. Anafora is available open source from https://github.com/weitechen/anafora
* Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Pradhan, Sameer; Lin, Chen; and Savova, Guergana. 2013. Discovering narrative containers in clinical text. BioNLP workshop at the Association for Computational Linguistics. http://aclweb.org/anthology/W/W13/W13-1903.pdf
* Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Pradhan, Sameer; Lin, Chen; and Savova, Guergana. 2013. Discovering narrative containers in clinical text. BioNLP workshop at the Association for Computational Linguistics. http://aclweb.org/anthology/W/W13/W13-1903.pdf
Line 88: Line 126:
=== 2011 ===
=== 2011 ===
* Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2011. Shared annotated resources for the clinical domain. AMIA Annual Symposium, Panel. Washington, DC.
* Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2011. Shared annotated resources for the clinical domain. AMIA Annual Symposium, Panel. Washington, DC.
-
 
== Shared NLP Tasks with THYME participation ==
== Shared NLP Tasks with THYME participation ==
* CLEF/ShARe 2014 (in collaboration with the ShARe project): http://clefehealth2014.dcu.ie/task-2
* CLEF/ShARe 2014 (in collaboration with the ShARe project): http://clefehealth2014.dcu.ie/task-2
* SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2014/task7/
* SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2014/task7/
-
* SemEval 2015 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project)
+
* SemEval 2015 Analysis of Clinical Text Task 14 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2015/task14/
-
* SemEval 2015 Clinical TempEval
+
* SemEval 2015 Clinical TempEval Task 6: http://alt.qcri.org/semeval2015/task6/
-
 
+
* SemEval 2016 Clinical TempEval Task 12: http://alt.qcri.org/semeval2016/task12/
 +
* SemEval 2017 Clinical TempEval Task 12: http://alt.qcri.org/semeval2017/task12/
== Getting access to the THYME corpus and gold standard annotations ==
== Getting access to the THYME corpus and gold standard annotations ==
-
The THYME corpus with the gold standard annotations is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. Requests for a copy of the THYME corpus should be made to a Mayo NLP investigator (e.g., Dr. Piet de Groen) in order to complete the DUA. After the DUA has been completed, the THYME corpus is available via a secure download mechanism. Distribution of the corpus is supported by grants GM102282 and LM010090 from the NIH; include the funding acknowledgment in your publications.
+
The THYME corpus with the gold standard annotations is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. The steps for obtaining a DUA are outlined below. After the DUA has been completed, the THYME corpus is available via a secure download mechanism. Distribution of the corpus is supported by grant LM010090 from the NIH; include the funding acknowledgment in your publications.
The corpus is released to an established or junior NLP investigator, formally associated with an institution; thus it is not released to a student. However, all students working with the investigator can have full access to the corpus under the DUA of the investigator. The investigator is urged to have the students work on the corpus on workstations that stay within the laboratory space of the investigator.  
The corpus is released to an established or junior NLP investigator, formally associated with an institution; thus it is not released to a student. However, all students working with the investigator can have full access to the corpus under the DUA of the investigator. The investigator is urged to have the students work on the corpus on workstations that stay within the laboratory space of the investigator.  
Line 105: Line 143:
The steps for obtaining a DUA are:
The steps for obtaining a DUA are:
-
'''Step 1''': Send an email to ClinicalNLP@mayo.edu indicating your interest in the THYME corpus.
+
#  Submit the [https://docs.google.com/forms/d/1EwixkePCA-pefHSOTfq-M1lMny9DjPtsOf1H37uMU3w/viewform THYME corpus request form], informing us about your institution, your principal investigator, and your intended use of the data.
-
You will receive a form to fill out.  Once that form is returned to ClinicalNLP@mayo.edu, your request will be passed on to the Mayo Clinic’s legal contracts group.  
+
# A THYME investigator will send your principal investigator a DUA for you to add information to, and for you to have signed by your site's official signatory. The THYME investigator will provide instructions for returning the signed and completed DUA.
-
 
+
# When you return the DUA, a THYME investigator will arrange to talk with your principal investigator. Of note, the discussion must be with the lab's principal investigator, not a student/postdoc/administrator. Topics that will be addressed include allowable uses of the data and proper security measures.
-
'''Step 2''': Mayo Legal Contracts Office will send you a partially filled-out DUA for you to add information to, and for you to have signed by your site's official signatory.  
+
# Once the DUA is complete and a THYME investigator has confirmed your understanding of the DUA, you will be sent instructions for obtaining the corpus via a secure downloading mechanism.
-
 
+
-
'''Step 3''': After you return the DUA to Mayo Clinic, Mayo Legal Contracts Office will complete the DUA and send you a copy of the fully executed DUA, and ClinicalNLP@mayo.edu will be copied on the email.
+
-
'''Step 4''': A Mayo Clinic NLP investigator will arrange to talk with you.
+
When using the THYME corpus, please
-
 
+
-
'''Step 5''': Once the DUA is complete and you have talked with a Mayo Clinic NLP investigator, ClinicalNLP@mayo.edu will send you instructions for obtaining the corpus via the secure downloading mechanism
+
-
 
+
-
When using the corpus, please cite:
+
-
 
+
-
'''Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. (in press). Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics.'''
+
 +
# Include the Mayo Clinic in your acknowledgements
 +
# Cite the article: William F. Styler IV, Steven Bethard, Sean Finan, Martha Palmer, Sameer Pradhan, Piet C. de Groen, Brad Erickson, Timothy Miller, Chen Lin, Guergana Savova, James Pustejovsky. Temporal Annotation in the Clinical Domain. Transactions of the Association for Computational Linguistics. Vol 2 (2014). https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305
== THYME Gold Standard Annotations ==
== THYME Gold Standard Annotations ==
Line 127: Line 159:
=== Annotation guidelines ===
=== Annotation guidelines ===
-
* [http://clear.colorado.edu/compsem/documents/THYME%20Guidelines.pdf The THYME Temporal Relations Guidelines (PDF)] - The current version of the THYME Temporal Relations Guidelines and release notes.  Updated February 28th, 2014.
+
* [http://clear.colorado.edu/compsem/documents/THYME_guidelines.pdf The THYME Temporal Relations Guidelines (PDF)] - The current version of the THYME Temporal Relations Guidelines and release notes.  Updated February 28th, 2014.
* [[Media:i2b2simplifiedthymeguidelines.pdf|i2b2 Simplified THYME Guidelines (PDF)]] The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.
* [[Media:i2b2simplifiedthymeguidelines.pdf|i2b2 Simplified THYME Guidelines (PDF)]] The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.
 +
 +
* Syntactic Tree (Treebank): http://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf
 +
 +
* Semantic Role (Propbank): http://clear.colorado.edu/compsem/documents/propbank_guidelines.pdf
 +
 +
* UMLS entity and relations annotations/templates: http://clear.colorado.edu/compsem/documents/umls_guidelines.pdf
 +
 +
* Clinical coreference guidelines (based on ODIE, OntoNotes, MUC-7): http://clear.colorado.edu/compsem/documents/Coreference%20Guidelines.pdf
=== Tool for viewing the gold standard annotations - Anafora ===
=== Tool for viewing the gold standard annotations - Anafora ===
Line 161: Line 201:
===Colon Cancer Data ===
===Colon Cancer Data ===
-
* All sets summary: 1-4,6-36,38-56,58-76,90,94,114,120
+
 
-
* Train sets:(Residue 0,1,2,3) 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 90, 114, 120
+
* Train sets (Residue 0,1,2,3): [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201, 202, 203, 208, 209, 210, 211, 216, 217]
-
* Development sets: (Residue 4,5) 4, 12, 13, 20, 21, 28, 29, 36, 44, 45, 52, 53, 60, 61, 68, 69, 76
+
 
-
* Test sets: (Residue 6,7) 6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 94
+
* Development sets (Residue 4,5): [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197, 204, 205, 212, 213]
 +
 
 +
* Test sets (Residue 6,7): [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199, 206, 207, 214, 215]
=== Brain Cancer Data ===
=== Brain Cancer Data ===
-
* All sets summary: 2-9
+
* Train sets: [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201]
-
* Train sets: 2, 3, 8, 9
+
-
* Development sets: 4, 5
+
-
* Test sets: 6, 7
+
 +
* Development sets: [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197]
 +
 +
* Test sets: [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199]
== THYME Software ==
== THYME Software ==
The THYME system is available as part of Apache cTAKES at http://ctakes.apache.org/
The THYME system is available as part of Apache cTAKES at http://ctakes.apache.org/
 +
 +
Demo of the system: ctakes.apache.org -> get started -> demos -> ctakes-temporal (http://alt.qcri.org/semeval2016/task12/)
* [[Software|System diagram]]
* [[Software|System diagram]]
* [[development progress]]
* [[development progress]]
-
We are also developing a visualization tool (THYME viz tool) which will be made available in cTAKES.
+
We are also developing a visualization tool (THYME viz tool) which will be made available in cTAKES. A prototype and details of the THYME vizualization tool was presented by Sean Finan at several annual workshops.
 +
 
 +
Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/
 +
 
 +
Finan, Sean. De Groen, Piet. Savova, Guergana. 2014.  Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31st Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/
 +
 
 +
Finan, Sean. De Groen, Piet. Savova, Guergana. 2014.  Narrative Event and Temporal Relation Visualization Tool. Natural Language Processing Workshop.  Informatics for Integrating Biology & the Bedside (I2B2).  4th Annual Academic User Group Meeting. July 9 2014.  Harvard Medical School.
 +
https://www.i2b2.org/events/slides/NarrativeVisualizer.pdf
 +
 
 +
* [http://youtu.be/Kp9YE0o3urU Visualization Tool Demonstration Video]
-
== Relevant Papers ==
+
== Relevant Background Papers ==
[[Relevant Papers]]
[[Relevant Papers]]
 +
== Reading Group ==
 +
[[Paper Queue]]
== Internal Presentations ==
== Internal Presentations ==
Line 207: Line 262:
== Communication ==
== Communication ==
-
* Bi-weekly team meetings, Wed 11-noon ET
+
* Bi-weekly team meetings, Tue 1:30-2:30 pm ET
** [[Call_in_detals| Call in details]]
** [[Call_in_detals| Call in details]]
* Weekly methods meetings, Fri 3:30-5 pm ET
* Weekly methods meetings, Fri 3:30-5 pm ET
* [[Distribution_lists|Distribution Lists]]
* [[Distribution_lists|Distribution Lists]]
 +
== IDEAS notebook ==
 +
[[IDEAS_notebook|Ideas notebook]]
-
== Meeting Notes ==
 
-
*[[THYME_Methods_03142014 | March 14, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_03122014 | March 12, 2014]] Bi-weekly team meeting agenda and notes
 
-
*[[THYME_Methods_03072014 | March 7, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_02282014 | February 28, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_02252014 | February 25, 2014]] Bi-weekly team meeting agenda and notes
 
-
*[[THYME_Methods_02212014 | February 21, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_02142014 | February 14, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_02122014 | February 12, 2014]] Bi-weekly team meeting agenda and notes
 
-
*[[THYME_Methods_02072014 | February 7, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_01312014 | January 31, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_01292014 | January 29, 2014]] Agenda and notes
 
-
*[[THYME_Methods_01242014 | January 24, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_01172014 | January 17, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_01102014 | January 10, 2014]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_01072014 | January 8, 2014]] Agenda and notes
 
-
*[[THYME_Methods_12192013 | December 20, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_12182013 | December 18, 2013]] Agenda and notes
 
-
*[[THYME_Methods_12132013 | December 13, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_12062013 | December 6, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_12042013 | December 4, 2013]] Agenda and notes
 
-
*[[THYME_Methods_11222013 | Nov 22, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_11152013 | Nov 15, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_11082013 | Nov 8, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_11062013 | Nov 6, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_11012013 | Nov 1, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_10232013 | Oct 23, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_10112013 | Oct 11, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_10092013 | Oct 9, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_10042013 | Oct 4, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_09272013 | Sep 27, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_09252013 | Sep 25, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_09202013 | Sep 20, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_09132013 | Sep 13, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_09112013 | Sep 11, 2013]] Agenda and notes
 
-
*[[THYME_Methods_09062013 | Sep 06, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_08302013 | Aug 30, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_08282013 | August 28, 2013]] Agenda and notes
 
-
*[[THYME_Methods_08232013 | Aug 23, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_08142013 | August 14, 2013]] Agenda and notes
 
-
*[[THYME_Methods_08162013 | Aug 16, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_08022013 | Aug 2, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_07312013 | July 31, 2013]] Agenda and notes
 
-
*[[THYME_Methods_07252013 | July 25, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_07192013 | July 19, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_07172013 | July 17, 2013]] Agenda and notes
 
-
*[[THYME_Methods_07122013 | July 12, 2013]] Methods meeting agenda and notes
 
-
* No conference call on July 3, 2013. Happy 4th of July!
 
-
*[[THYME_Methods_06282013 | June 28, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_06192013 | June 19, 2013]] Agenda and notes
 
-
*[[THYME_Methods_06052013 | June 5, 2013]] Agenda and notes
 
-
*[[THYME_Methods_05312013 | May 31, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_05242013 | May 24, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_05222013 | May 22, 2013]] Agenda and notes
 
-
*[[THYME_Methods_05172013 | May 17, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Methods_05092013 | May 9, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_05082013 | May 8, 2013]] Agenda and notes
 
-
*[[THYME_Methods_05032013 | May 3, 2013]] Methods meeting agenda and notes
 
-
*[[THYME_Meeting_04242013 | April 24, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_04102013 | April 10, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_03272013 | March 27, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_03132013 | March 13, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_02272013 | February 27, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_02132013 | February 13, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_01302030 | January 30, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_01282030 | January 28, 2013 (annotations subgroup)]] Agenda and notes
 
-
*[[THYME_Meeting_01162013 | January 16, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_01022013 | January 2, 2013]] Agenda and notes
 
-
*[[THYME_Meeting_12192012 | December 19, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_12052012 | December 5, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_11212012 | November 21, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_11062012 | November 6, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_10242012 | October 24, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_10102012 | October 10, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_09122012 | September 12, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_08292012 | August 29, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_08152012 | August 15, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_08012012 | August 1, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_07182012 | July 18, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_06272012 | June 22, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_06202012 | June 20, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_06062012 | June 6, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_05232012 | May 23, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_05092012 | May 9, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_04252012 | April 25, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_04112012 | April 11, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_03282012 | March 28, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_03142012 | March 14, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_02292012 | Feb 29, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_02142012 | Feb 14, 2012]] Agenda and notes
 
-
*[[THYME_Meeting_02012012 | Feb 1, 2012]] Agenda and notes
 
 +
== Meeting Notes ==
 +
*[[Methods_11112019|November 11, 2019]] Methods meeting
 +
*[[THYME_All_team_11072019|November 7, 2019]] THYME all team meeting
 +
*[[Methods_11042019|November 4, 2019]] Methods meeting
 +
*[[Methods_10282019|October 28, 2019]] Methods meeting
 +
*[[THYME_All_team_10242019|October 24, 2019]] THYME all team meeting
 +
*[[Methods_10142019|October 14, 2019]] Methods meeting
 +
*[[THYME_All_team_10102019|October 10, 2019]] THYME all team meeting
 +
*[[hNLP_Methods_10072019|October 7, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_09302019|September 30, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_09232019|September 23, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_09092019|September 9, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_08262019|August 26, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_08192019|August 19, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_08122019|August 12, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_08052019|August 5, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_07292019|July 29, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_07222019|July 22, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_07152019|July 15, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_07012019|July 1, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_06242019|June 24, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_05202019|May 20, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_05132019|May 13, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_05062019|May 6, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_04292019|April 29, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_04222019|April 22, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_04082019|April 8, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_04012019|April 1, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_03252019|March 25, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_03182019|March 18, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_03112019|March 11, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_03042019|March 4, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_02252019|Feb 25, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_02112019|Feb 11, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_02042019|Feb 4, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_01282019|Jan 28, 2019]] hNLP Methods meeting
 +
* January 21, 2019: No meeting
 +
*[[hNLP_Methods_01142019|Jan 14, 2019]] hNLP Methods meeting
 +
*[[hNLP_Methods_01072019|Jan 07, 2019]] hNLP Methods meeting
 +
*[[Meeting_notes_2018 | 2018 Meeting notes]]
 +
*[[Meeting_notes_2017 | 2017 Meeting notes]]
 +
*[[Meeting_notes_2016 | 2016 Meeting notes]]
 +
*[[Meeting_notes_2015 | 2015 Meeting notes]]
 +
*[[Meeting_notes_2014 | 2014 Meeting notes]]
 +
*[[Meeting_notes_2013 | 2013 Meeting notes]]
 +
*[[Meeting_notes_2012 | 2012 Meeting notes]]
== Getting started ==
== Getting started ==
Line 313: Line 325:
== Contact ==
== Contact ==
-
If you need assistance and/or if you have questions about the project, feel free to send e-mail to steven.bethard at colorado dot edu OR Guergana.Savova at childrens dot harvard dot edu
+
If you need assistance and/or if you have questions about the project, feel free to send e-mail to guergana dot savova at childrens dot harvard dot edu, martha dot palmer at colorado dot edu, or bethard at email dot arizona dot edu.

Revision as of 19:47, 11 November 2019

Contents

Welcome to the THYME project

Welcome to the Temporal Histories of Your Medical Event (THYME) project (THYME is pronounced [taim]).

The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of statistical inferencing. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.

The best methods have been/will be released as part of the cTAKES (ctakes.apache.org) for the larger community to use and contribute to. We will test the methods against biomedical queries.

Funding

Phase 1 of the project (2010-2014) was supported in part by the i2b2 project (U54LM008748 from the National Library of Medicine) and THYME R01LM010090 from the National Library Of Medicine.

Phase 2 (2015-2018) is supported by THYME R01LM010090 from the National Library Of Medicine. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.

Who We Are

  • Boston Childrens Hospital/Harvard Medical School
    • Guergana Savova (MPI)
    • Dmitriy Dligach
    • Timothy Miller
    • Sean Finan
    • Chen Lin
    • David Harris
  • University of Colorado
    • Martha Palmer (MPI)
    • Jim Martin
    • Wayne Ward
    • Jordan Boyd-Graber
    • Will Styler
    • Arrick Lanfranchi (through August, 2012)
    • Tim O'Gorman
    • Kevin Crooks (through December 2013)
    • Mariah Hamang
    • and several Lingustics and Computer Science graduate students
  • University of Arizona
    • Steven Bethard
  • University of Alabama, Birmingham
    • John Osborne
  • Mayo Clinic
    • Piet de Groen
    • Brad Erickson
    • James Masanz (through July, 2015)
    • Donna Ihrke (through December, 2012)
    • Pauline Funk (through January, 2013)
  • Brandeis University
    • James Pustejovsky

Publications and presentations crediting THYME

2019

  • Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2019. A BERT-based Universal Model for Both Within-and Cross-sentence Clinical Temporal Relation Extraction. Proceedings of the 2nd Clinical Natural Language Processing Workshop. 2019.
  • Savova, G. K., Danciu, I., Alamudun, F., Miller, T., Lin, C., Bitterman, D. S., ... & Warner, J. L. (2019). Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Research, canres-0579.


2018

  • Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiry, Hadi; Bethard, Steven; Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. October 31, 2018, Brussels, Belgium

2017

  • Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
  • Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain. http://www.aclweb.org/anthology/E17-2118.

2016

  • Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
  • Miller, Timothy; Dligach, Dmitriy; Chen, Lin; Bethard, Steven; Savova, Guergana. 2016. Cross-domain Coreference Feature Exploration. AMIA Annual Symposium. Chicago, IL. November, 2016
  • Steven Bethard and Jonathan Parker (May 2016). “A Semantically Compositional Annotation Scheme for Time Normalization”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
  • Steven Bethard, Guergana Savova, Wei-Te Chen, Leon Derczynski, James Pustejovsky, and Marc Verhagen. 2016. “SemEval-2016 Task 12: Clinical TempEval”. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, CA
  • Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
  • Clinical TempEval 2016: http://alt.qcri.org/semeval2016/task12/

2015

2014

  • Lin, Chen; Karlson, Elizabeth; Dligach, Dmitriy; Ramirez, Monica; Miller, Timothy; Mo, Huan; Braggs, Natalie; Cagan, Andrew; Denny, Joshua; Savova, Guergana. 2014. Automatic identification of Methotrexade-induced liver toxicity in Rheumatoid Arthritis patients from the electronic medical records. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/10/24/amiajnl-2014-002642.abstract
  • Pascal B. Pfiffner, JiWon Oh, Timothy A. Miller, Kenneth D. Mandl. 2014. ClinicalTrials.gov as a Data Source for Semi-Automated Point-Of-Care Trial Eligibility Screening. PlosOne. DOI: 10.1371/journal.pone.0111055. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0111055#abstract0
  • Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana.2014. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/08/21/amiajnl-2013-002544.full.pdf+html
  • Sameer Pradhan, Noemie Elhadad, Wendy Chapman, Suresh Manandhar, Guergana Savova. 2014. SemEval 2014: Task 7. In Proceedings of the International Workshop on Semantic Evaluations, Dublin, Ireland. August. http://alt.qcri.org/semeval2014/cdrom/
  • Finan, Sean; De Groen, Piet; Savova, Guergana. 2014. Narrative event and temporal relation visualization tool. American Medical Informatics Association annual symposium. November 2014. Washington, DC.
  • Bethard, Steven. 2014. The state of the art of temporal relation extraction. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
  • Miller, Timothy. 2014. Methods for temporal relation discovery in the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
  • Pradhan, Sameer. 2014. Extrinsic evaluation of temporal relation discovery system. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
  • Finan, Sean. 2014. Visualization tool for temporal relations from the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
  • Chen, Pei. 2014. Modules for temporal relation discovery from the clinical narrative in Apache cTAKES. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
  • Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Eduard Hovy, Vincent Ng and Michael Strube. 2014. Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
  • Xiaoqiang Luo, Sameer Pradhan, Marta Recasens and Eduard Hovy. 2014. An Extension of BLANC to System Mentions. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
  • Chen Lin, Timothy Miller, Alvin Kho, Steven Bethard, Dmitriy Dligach, Sameer Pradhan and Guergana Savova. 2014. Descending-Path Convolution Kernel for Syntactic Structures. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
  • Savova, Guergana. 2014. Temporal relation discovery from the clinical narrative. Invited talk at the National Library of Medicine Informatics Series. June 4, 2014. Bethesda, MD.
  • Finan, Sean; de Groen, Piet; Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31 Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/
  • Bethard, Steven; Ogren, Philip; Becker, Lee. 2014. ClearTK 2.0: Design Patterns for Machine Learning in UIMA. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). http://anthology.aclweb.org//
  • Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. 2014. Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics. http://www.transacl.org/wp-content/uploads/2014/04/47.pdf
  • Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text – MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
  • Miller, Tim. 2014. Discovering narrative containers in clinical text. i2b2 All Hands meeting, Jan 17, 2014. Boston, MA (presentation)

2013

  • Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317; http://jamia.bmj.com/cgi/rapidpdf/amiajnl-2012-001317?ijkey=z3pXhpyBzC7S1wC&keytype=ref. PMID: 23355458
  • Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Pradhan, Sameer; Lin, Chen; Savova, Guergana. 2013. Discovering time expressions in clinical text. Late breaking abstract. American Medical Informatics Association Conference. November, 2014. Washington, DC.
  • Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004. Anafora is available open source from https://github.com/weitechen/anafora
  • Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Pradhan, Sameer; Lin, Chen; and Savova, Guergana. 2013. Discovering narrative containers in clinical text. BioNLP workshop at the Association for Computational Linguistics. http://aclweb.org/anthology/W/W13/W13-1903.pdf
  • Bethard, Steven. 2013. A Synchronous Context Free Grammar for Time Normalization. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. http://www.aclweb.org/anthology/D13-1078
  • Bethard, Steven. 2013. ClearTK-TimeML: A minimalist approach to TempEval 2013. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA: Association for Computational Linguistics, pp. 10-14. http://www.aclweb.org/anthology/S13-2002
  • Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang and Zhi Zhong. 2013. Towards Robust Linguistic Analysis Using OntoNotes. Proceedings of the Conference on Natural Language Learning. Sofia, Bulgaria. August, 2013.
  • Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. http://jamia.bmj.com/content/early/2013/10/03/amiajnl-2013-001766.full
  • Dmitriy Dligach, Timothy A. Miller, Guergana K. Savova. 2013. Active Learning for Phenotyping Tasks. In Proceedings of the 2013 NLP for Medicine and Biology workshop held in conjunction with RANLP-2013. September 2013. Hissar, Bulgaria. http://aclweb.org/anthology//W/W13/W13-5101.pdf
  • Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/
  • American Medical Informatics Association (AMIA) national webinar. “Towards semantic annotations of the clinical narrative”. National webinar. April 2013 (invited presentation)
  • Natural Language Processing Working Group Pre-Symposium – doctoral consortium and a data workshop. “Shared Annotated Resources for the Clinical Domain”. American Medical Informatics Association. Washington, DC, USA. November 2013.
  • Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Shared resources, shared code and shared activities in clinical natural language processing. AMIA Annual Symposium, Panel. Washington, DC.
  • AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2012

  • Savova, Guergana. 2012. Shared Annotated Resources for the Clinical Domain. Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology. San Diego, CA, USA. September 2012.
  • Drs. Pustejovsky, Palmer and Savova are members of the Program Committee of the 2012 i2b2 shared task whose topic is temporal relations in the clinical domain. The THYME annotation guidelines are the basis of the annotation guidelines for that shared task.
  • Participation in the State of the Art of Clinical NLP workshop organized by the NLM in April, 2012. Dr. Savova chaired a session, Prof. Pustejovsky was an invited speaker presenting on Temporal relations/TimeML.
  • Participation and presentation in the AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2011

  • Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2011. Shared annotated resources for the clinical domain. AMIA Annual Symposium, Panel. Washington, DC.

Shared NLP Tasks with THYME participation

Getting access to the THYME corpus and gold standard annotations

The THYME corpus with the gold standard annotations is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. The steps for obtaining a DUA are outlined below. After the DUA has been completed, the THYME corpus is available via a secure download mechanism. Distribution of the corpus is supported by grant LM010090 from the NIH; include the funding acknowledgment in your publications.

The corpus is released to an established or junior NLP investigator, formally associated with an institution; thus it is not released to a student. However, all students working with the investigator can have full access to the corpus under the DUA of the investigator. The investigator is urged to have the students work on the corpus on workstations that stay within the laboratory space of the investigator.

The steps for obtaining a DUA are:

  1. Submit the THYME corpus request form, informing us about your institution, your principal investigator, and your intended use of the data.
  2. A THYME investigator will send your principal investigator a DUA for you to add information to, and for you to have signed by your site's official signatory. The THYME investigator will provide instructions for returning the signed and completed DUA.
  3. When you return the DUA, a THYME investigator will arrange to talk with your principal investigator. Of note, the discussion must be with the lab's principal investigator, not a student/postdoc/administrator. Topics that will be addressed include allowable uses of the data and proper security measures.
  4. Once the DUA is complete and a THYME investigator has confirmed your understanding of the DUA, you will be sent instructions for obtaining the corpus via a secure downloading mechanism.

When using the THYME corpus, please

  1. Include the Mayo Clinic in your acknowledgements
  2. Cite the article: William F. Styler IV, Steven Bethard, Sean Finan, Martha Palmer, Sameer Pradhan, Piet C. de Groen, Brad Erickson, Timothy Miller, Chen Lin, Guergana Savova, James Pustejovsky. Temporal Annotation in the Clinical Domain. Transactions of the Association for Computational Linguistics. Vol 2 (2014). https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305

THYME Gold Standard Annotations

Annotation layers are treebank and propbank annotations as well as temporal annotations for events, temporal expressions and temporal relations.


Annotation guidelines

  • i2b2 Simplified THYME Guidelines (PDF) The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.

Tool for viewing the gold standard annotations - Anafora

We developed a web-based annotation tool. It is open source and available at https://github.com/weitechen/anafora. Use it to view the THYME annotations. Citation for the tool is:

Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004.

Viewing the gold standard annotations (Anafora)

(available to the team only)

To view the Temporal-Entity data, use the URL:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Entity/gold/

TASK_NAME is the filestem, for example, ID074_path_219b

to view Temporal-Relation data:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Relation/gold/

you could find the available Entity/Relation gold data on verbs by using:

  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Entity.gold.completed.xml"
  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Relation.gold.completed.xml"


Train/Development/Test splits

Colon Cancer Data

  • Train sets (Residue 0,1,2,3): [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201, 202, 203, 208, 209, 210, 211, 216, 217]
  • Development sets (Residue 4,5): [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197, 204, 205, 212, 213]
  • Test sets (Residue 6,7): [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199, 206, 207, 214, 215]

Brain Cancer Data

  • Train sets: [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201]
  • Development sets: [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197]
  • Test sets: [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199]

THYME Software

The THYME system is available as part of Apache cTAKES at http://ctakes.apache.org/

Demo of the system: ctakes.apache.org -> get started -> demos -> ctakes-temporal (http://alt.qcri.org/semeval2016/task12/)

We are also developing a visualization tool (THYME viz tool) which will be made available in cTAKES. A prototype and details of the THYME vizualization tool was presented by Sean Finan at several annual workshops.

Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/

Finan, Sean. De Groen, Piet. Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31st Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/

Finan, Sean. De Groen, Piet. Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Natural Language Processing Workshop. Informatics for Integrating Biology & the Bedside (I2B2). 4th Annual Academic User Group Meeting. July 9 2014. Harvard Medical School. https://www.i2b2.org/events/slides/NarrativeVisualizer.pdf

Relevant Background Papers

Relevant Papers

Reading Group

Paper Queue

Internal Presentations

Presentations


Venues for manuscript submissions

Venues for manuscript submissions/publications


Project materials

Project Charter

Tasks, leads, teams and deadlines

Progress reports

Annotations - Describes the corpus, the layers of annotations and annotation progress

Annotation Tools - Describes the progress and information pertaining to the Anafora annotation tool


Communication

IDEAS notebook

Ideas notebook


Meeting Notes

Getting started


Contact

If you need assistance and/or if you have questions about the project, feel free to send e-mail to guergana dot savova at childrens dot harvard dot edu, martha dot palmer at colorado dot edu, or bethard at email dot arizona dot edu.

Personal tools