TemporalWiki

Welcome to the THYME project

Welcome to the Temporal Histories of Your Medical Event (THYME) project (THYME is pronounced [taim]).

The overarching long-term vision of our research is to create novel technologies for processing clinical free text. Such technologies will enable sophisticated and efficient indexing, retrieval and data mining over the ever increasing amounts of electronic clinical data. Processing free text poses a number of challenges to which the fields of Artificial intelligence, natural language processing and computer science in general have made advances. Methods for processing free text are informed by linguistic theory combined with the power of modern machine learning. A key component to the next step, natural language understanding, is discovering events and their relations on a timeline. Temporal relations are of prime importance in biomedicine as they are intrinsically linked to diseases, signs and symptoms, and treatments. Understanding the timeline of clinically relevant events is key to the next generation of translational research where the importance of generalizing over large amounts of data holds the promise of deciphering biomedical puzzles.

The best methods have been/will be released as part of the cTAKES (ctakes.apache.org) for the larger community to use and contribute to. We will test the methods against biomedical queries.

Funding

Phase 1 of the project (2010-2014) was supported by THYME R01LM010090 from the National Library Of Medicine and in part by the i2b2 project (U54LM008748 from the National Library of Medicine).

Phase 2 (2015-2018) and Phase 3 (2019-2023) are supported by THYME R01LM010090 from the National Library Of Medicine.

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Library Of Medicine or the National Institutes of Health.

Who We Are

Boston Childrens Hospital/Harvard Medical School
- Guergana Savova (MPI)
- Timothy Miller
- Sean Finan
- Chen Lin
- David Harris

University of Colorado -- Boulder
- Martha Palmer (MPI)
- Jim Martin
- Kristin Wright-Bettner
- a small army of Lingustics and Computer Science graduate students
- past -- Wayne Ward, Jordan Boyd-Graber, Will Styler III, Arrick Lanfranchi (through August, 2012), Tim O'Gorman, Kevin Crooks (through December 2013), Mariah Hamang, Jinho Cho

University of Arizona
- Steven Bethard
- graduate students

Loyola University Chicago
- Dmitriy Dligach
- graduate students

University of Minnesota
- Piet de Groen

University of Alabama, Birmingham
- John Osborne

Mayo Clinic
- past -- Piet de Groen, Brad Erickson, James Masanz (through July, 2015), Donna Ihrke (through December, 2012), Pauline Funk (through January, 2013)

Brandeis University
- James Pustejovsky

Publications and presentations crediting THYME

2021

Lin, Chen, Timothy Miller, Dmitriy Dligach, Steven Bethard, and Guergana Savova. "EntityBERT: Entity-centric Masking Strategy for Model Pretraining for the Clinical Domain." In Proceedings of the 20th Workshop on Biomedical Language Processing, pp. 191-201. 2021.

Kulshrestha S, Dligach D, Joyce C, Baker MS, Gonzalez R, O’Rourke AP, Glazer JM, Stey A, Kruser JM, Churpek MM, Afshar M. Comparison and Interpretability of Machine Learning Models to Predict Severity of Injury. JAMIA Open. 2021.

Savova, Guergana . 2021. “Natural Language Processing for Biomedicine”. Informatics and Implementation Science Learning Series (I2S2). April 2021. Brown University, RI

2020

Ahuja, Y. et al. Leveraging electronic health records data to predict multiple sclerosis disease activity. Ann. Clin. Transl. Neurol. n/a. https://onlinelibrary.wiley.com/doi/10.1002/acn3.51324

Kristin Wright-Bettner, Chen Lin, Timothy Miller, Steven Bethard, Dmitriy Dligach, Martha Palmer, James H Martin, Guergana Savova, Defining and Learning Refined Temporal Relations in the Clinical Narrative, Proceedings of the 11th International Workshop on Health Text Mining and Information Analysis, 104-114, 2020/11.https://www.aclweb.org/anthology/2020.louhi-1.12/

Chen Lin, Timothy Miller, Dmitriy Dligach, Farig Sadeque, Steven Bethard, Guergana Savova, A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction, Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, 70-75, 2020/7. https://www.aclweb.org/anthology/2020.bionlp-1.7/

Alon Geva, Steven H Abman, Shannon F Manzi, Dunbar D Ivy, Mary P Mullen, John Griffin, Chen Lin, Guergana K Savova, Kenneth D Mandl. 2020. Adverse drug event rates in pediatric pulmonary hypertension: a comparison of real-world data sources. Journal of the American Medical Informatics Association, Vol. 27, issue 2, pp. 249-300. PMID: 31769835 PMCID: PMC7025334 DOI: 10.1093/jamia/ocz194. https://academic.oup.com/jamia/article/27/2/294/5643900

Sujay Kulshrestha, Dmitriy Dligach, Cara Joyce, Marshall S Baker, Richard Gonzalez, Ann P O’Rourke, Joshua M Glazer, Anne Stey, Jacqueline M Kruser, Matthew M Churpek, Majid Afshar. Prediction of severe chest injury using natural language processing from the electronic health record. Injury. 2020.

Anoop Mayampurath, Matthew Churpek, Xin Su, Sameep Shah, Elizabeth Munroe, Bhakti Patel, Dmitriy Dligach, and Majid Afshar. External Validation of an Acute Respiratory Distress Syndrome Prediction Model Using Radiology Reports. Critical Care Medicine. 2020.

To D, Sharma B, Karnik N, Joyce C, Dligach D, Afshar M. Validation of an alcohol misuse classifier in hospitalized patients. Alcohol. 2020 May;84:49-55. doi: 10.1016/j.alcohol.2019.09.008. Epub 2019 Sep 28. PubMed PMID: 31574300; PubMed Central PMCID: PMC7101259.

Lin, Chen. 2020 . Customize cTAKES for Automated Adverse Drug Event Surveillance in Pediatric Pulmonary Hypertension. ApacheCon 2020, cTAKES track. October 2020. Virtual due to COVID-19. https://www.apachecon.com/acah2020/tracks/ctakes.html

Savova, Guergana. 2020 . “Natural Language Processing for Cancer Deep Phenotyping”. Observational Health Data Sciences and Informatics (OHDSI) consortium. October 2020.

Savova, Guergana . 2020. “Clinical Natural Language Processing, Some Tasks and Applications in Medicine”. 11th International Workshop on Heath Text Mining and Information Analysis (LOUHI), Conference on Empirical Methods for Natural Language Processing (EMNLP) 2020. Virtual due to COVID-19

2019

Kristin Wright-Bettner , Martha Palmer, Guergana Savova, Piet de Groen and Timothy Miller. 2019. Cross-document coreference: An approach to capturing coreference without context. In LOUHI 2019: The Tenth International Workshop on Health Text Mining and Information Analysis. Hong Kong, Nov 3, 2019.https://www.aclweb.org/anthology/D19-6201/
Dmitriy Dligach, Majid Afshar, Timothy Miller. Towards a Universal Document-Level Clinical Text Encoder: Methods for Neural Network Pre-training with Applications to Substance Misuse. American Medical Informatics Association Symposium. Washington DC, November, 2019.
Lin, Chen, Miller, Timothy , Dligach, Dmitriy, Bethard, Steven & Savova, Guergana. 2019. A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction. in Clinical NLP Workshop (2019). Conference North American Association of Computational Linguistics. Minneapolis, MN. June 3-7 2019. https://www.aclweb.org/anthology/W19-1908.pdf
Guergana Savova, Ioana Danciu, Folami Alamudun, Timothy Miller, Chen Lin, Danielle S Bitterman, Georgia Tourassi and Jeremy L Warner. 2019. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Research. doi: 10.1158/0008-5472.CAN-19-0579. PMID: 31395609
To D, Sharma B, Karnik N, Joyce C, Dligach D, Afshar M. Validation of an Alcohol Misuse Classifier in Hospitalized Patients. Alcohol. 2019 Sep 28;. doi: 10.1016/j.alcohol.2019.09.008.
Dongfang Xu , Egoitz Laparra, Steven Bethard. Pre-trained Contextualized Character Embeddings Lead to Major Improvements in Time Normalization: a Detailed Analysis. Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019). Minneapolis, MN. June 2019. https://www.aclweb.org/anthology/S19-1008.
Piet de Groen, 2019. Challenges and Limitations of Natural Language Processing. Mayo Clinic Conference: Current Applications and Future of Artificial Intelligence in Cardiology. July 20, 2019. San Francisco, California
Piet de Groen, 2019. Artificial Intelligence: Challenges and Limitations of Natural Language Processing in Healthcare. December 12, 2019 – Medical Grand Rounds: University of Minnesota, Minneapolis, MN

2018

Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Amiry, Hadi; Bethard, Steven; Savova, Guergana. 2018. Self-training improves Recurrent Neural Networks performance for Temporal Relation Extraction. LOUHI 2018: The Ninth International Workshop on Health Text Mining and Information Analysis. October 31, 2018, Brussels, Belgium

2017

Dligach, Dmitriy; Miller, Timothy; Lin, Chen; Bethard, Steven; Savova, Guergana. 2017. Neural temporal relation extraction. European Chapter of the Association for Computational Linguistics (EACL 2017). April 3-7, 2017. Valencia, Spain.
Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Lin, Chen; Savova, Guergana. 2017. Towards Portable Entity-Centric Clinical Coreference Resolution. Journal of Biomedical Informatics. Vol. 69, May 2017, pp. 251-258. https://doi.org/10.1016/j.jbi.2017.04.015; http://www.sciencedirect.com/science/article/pii/S1532046417300850
Natalia Viani, Timothy Miller, Dmitriy Dligach, Steven Bethard, Carlo Napolitano, Silvia Priori, Riccardo Bellazzi, Lucia Sacchi and Guergana Savova. 2017. Recurrent Neural Network Architectures for Event Extraction from Italian Medical Reports. AIME 2017 16th Conference on Artificial Intelligence in Medicine. Vienna, Austria June 21-24, 2017.
Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2017. Representations of Time Expressions for Temporal Relation Extraction with Convolutional Neural Networks. BioNLP workshop at the Association for Computational Linguistics conference. Vancouver, Canada, Friday August 4, 2017
Timothy A. Miller, Dmitriy Dligach, Chen Lin, Steven Bethard, Guergana Savova. Cross-domain Coreference Feature Exploration. Annual Symposium of the American Medical Informatics Association, Chicago, IL, 2016.
Steven Bethard, Guergana Savova, Martha Palmer, and James Pustejovsky. SemEval-2017 Task 12: Clinical TempEval. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, August 3-4, 2017.

Clinical TempEval 2017: http://alt.qcri.org/semeval2017/task12/

2016

Lin, Chen; Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Savova, Guergana. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016
Miller, Timothy; Dligach, Dmitriy; Chen, Lin; Bethard, Steven; Savova, Guergana. 2016. Cross-domain Coreference Feature Exploration. AMIA Annual Symposium. Chicago, IL. November, 2016
Steven Bethard and Jonathan Parker (May 2016). “A Semantically Compositional Annotation Scheme for Time Normalization”. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016).
Steven Bethard, Guergana Savova, Wei-Te Chen, Leon Derczynski, James Pustejovsky, and Marc Verhagen. 2016. “SemEval-2016 Task 12: Clinical TempEval”. In: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval 2016). San Diego, CA
Ethan Hartzell, Chen Lin. 2016. Enhancing Clinical Temporal Relation Discovery with Syntactic Embeddings from GloVe. International Conference on Intelligent Biology and Medicine (ICIBM 2016). December 2016, Houston, Texas, USA
Clinical TempEval 2016: http://alt.qcri.org/semeval2016/task12/

2015

Lin, Chen; Dligach, Dmitriy; Miller, Timothy; Bethard, Steven; Savova, Guergana. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association. http://jamia.oxfordjournals.org/content/early/2015/10/31/jamia.ocv113
Bethard, Steven; Derczynski, Leon; Savova, Guergana; Pustejovsky, James; Verhagen, Marc. 2015. SemEval-2015 Task 6: Clinical TempEval. Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015). http://www.aclweb.org/anthology/S15-2136. http://aclweb.org/anthology/S/S15/S15-2136.pdf
Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Lin, Chen; Savova, Guergana. 2015. Extracting Time Expressions from Clinical Text. Proceedings of BioNLP 15. http://www.aclweb.org/anthology/W15-3809
Clinical TempEval 2015: http://alt.qcri.org/semeval2015/task6/
Clinical TempEval 2015 papers:
- Sumithra Velupillai; Danielle L Mowery; Samir Abdelrahman; Lee Christensen; Wendy Chapman. BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge. http://aclweb.org/anthology/S/S15/S15-2137.pdf
- Hegler Tissot; Genevieve Gorrell; Angus Roberts; Leon Derczynski; Marcos Didonet Del Fabro. UFPRSheffield: Contrasting Rule-based and Support Vector Machine Approaches to Time Expression Identification in Clinical TempEval. http://aclweb.org/anthology/S/S15/S15-2141.pdf

2014

Lin, Chen; Karlson, Elizabeth; Dligach, Dmitriy; Ramirez, Monica; Miller, Timothy; Mo, Huan; Braggs, Natalie; Cagan, Andrew; Denny, Joshua; Savova, Guergana. 2014. Automatic identification of Methotrexade-induced liver toxicity in Rheumatoid Arthritis patients from the electronic medical records. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/10/24/amiajnl-2014-002642.abstract
Pascal B. Pfiffner, JiWon Oh, Timothy A. Miller, Kenneth D. Mandl. 2014. ClinicalTrials.gov as a Data Source for Semi-Automated Point-Of-Care Trial Eligibility Screening. PlosOne. DOI: 10.1371/journal.pone.0111055. http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0111055#abstract0
Pradhan, Sameer; Elhadad, Noemie; South, Brett; Martinez, David; Christensen, Lee; Vogel, Amy; Suominen, Hanna; Chapman, Wendy; Savova, Guergana.2014. Evaluating the state of the art in disorder recognition and normalization of the clinical narrative. Journal of the Medical Informatics Association. http://jamia.bmj.com/content/early/2014/08/21/amiajnl-2013-002544.full.pdf+html
Sameer Pradhan, Noemie Elhadad, Wendy Chapman, Suresh Manandhar, Guergana Savova. 2014. SemEval 2014: Task 7. In Proceedings of the International Workshop on Semantic Evaluations, Dublin, Ireland. August. http://alt.qcri.org/semeval2014/cdrom/
Finan, Sean; De Groen, Piet; Savova, Guergana. 2014. Narrative event and temporal relation visualization tool. American Medical Informatics Association annual symposium. November 2014. Washington, DC.
Bethard, Steven. 2014. The state of the art of temporal relation extraction. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
Miller, Timothy. 2014. Methods for temporal relation discovery in the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
Pradhan, Sameer. 2014. Extrinsic evaluation of temporal relation discovery system. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
Finan, Sean. 2014. Visualization tool for temporal relations from the clinical narrative. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
Chen, Pei. 2014. Modules for temporal relation discovery from the clinical narrative in Apache cTAKES. Presentation at the NLP workshop at the 4th i2b2 Academic User Group conference. July 9, 2014. Boston, MA.
Sameer Pradhan, Xiaoqiang Luo, Marta Recasens, Eduard Hovy, Vincent Ng and Michael Strube. 2014. Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
Xiaoqiang Luo, Sameer Pradhan, Marta Recasens and Eduard Hovy. 2014. An Extension of BLANC to System Mentions. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
Chen Lin, Timothy Miller, Alvin Kho, Steven Bethard, Dmitriy Dligach, Sameer Pradhan and Guergana Savova. 2014. Descending-Path Convolution Kernel for Syntactic Structures. Short paper. Association for Computational Linguistics Conference. Baltimore, Maryland. http://anthology.aclweb.org//
Savova, Guergana. 2014. Temporal relation discovery from the clinical narrative. Invited talk at the National Library of Medicine Informatics Series. June 4, 2014. Bethesda, MD.
Finan, Sean; de Groen, Piet; Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31 Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/
Bethard, Steven; Ogren, Philip; Becker, Lee. 2014. ClearTK 2.0: Design Patterns for Machine Learning in UIMA. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). http://anthology.aclweb.org//
Styler, William; Bethard, Steven; Finan, Sean; Palmer, Martha; Pradhan, Sameer; de Groen, Piet; Erickson, Brad; Miller, Timothy; Chen, Lin; Savova, Guergana K.; Pustejovsky, James. 2014. Temporal annotations in the clinical domain. Transactions of the Association for Computational Linguistics. http://www.transacl.org/wp-content/uploads/2014/04/47.pdf
Savova, Guergana; Pradhan, Sameer; Palmer, Martha; Styler, Will; Chapman, Wendy; Elhadad, Noemie. (in press). Annotating the clinical text – MiPACQ, ShARe, SHARPn and THYME corpora. In Handbook of Linguistic Annotations. Ed. James Pustejovsky and Nancy Ide. Springer.
Miller, Tim. 2014. Discovering narrative containers in clinical text. i2b2 All Hands meeting, Jan 17, 2014. Boston, MA (presentation)

2013

Albright, Daniel; Lanfranchi, Arrick; Fredriksen, Anwen; Styler, William; Warner, Collin; Hwang, Jena; Choi, Jinho; Dligach, Dmitriy; Nielsen, Rodney; Martin, James; Ward, Wayne; Palmer, Martha; Savova, Guergana. 2013. Towards syntactic and semantic annotations of the clinical narrative. Journal of the American Medical Informatics Association. 2013;0:1–9. doi:10.1136/amiajnl-2012-001317; http://jamia.bmj.com/cgi/rapidpdf/amiajnl-2012-001317?ijkey=z3pXhpyBzC7S1wC&keytype=ref. PMID: 23355458
Miller, Timothy; Dligach, Dmitriy; Bethard, Steven; Pradhan, Sameer; Lin, Chen; Savova, Guergana. 2013. Discovering time expressions in clinical text. Late breaking abstract. American Medical Informatics Association Conference. November, 2014. Washington, DC.
Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004. Anafora is available open source from https://github.com/weitechen/anafora
Miller, Timothy; Bethard, Steven; Dligach, Dmitriy; Pradhan, Sameer; Lin, Chen; and Savova, Guergana. 2013. Discovering narrative containers in clinical text. BioNLP workshop at the Association for Computational Linguistics. http://aclweb.org/anthology/W/W13/W13-1903.pdf
Bethard, Steven. 2013. A Synchronous Context Free Grammar for Time Normalization. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. http://www.aclweb.org/anthology/D13-1078
Bethard, Steven. 2013. ClearTK-TimeML: A minimalist approach to TempEval 2013. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013). Atlanta, Georgia, USA: Association for Computational Linguistics, pp. 10-14. http://www.aclweb.org/anthology/S13-2002
Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang and Zhi Zhong. 2013. Towards Robust Linguistic Analysis Using OntoNotes. Proceedings of the Conference on Natural Language Learning. Sofia, Bulgaria. August, 2013.
Dligach, Dmitriy; Bethard, Steven; Becker, Lee; Miller, Timothy; Savova, Guergana. 2013. Discovering body site and severity modifiers in clinical texts. Journal of the American Medical Informatics Association. http://jamia.bmj.com/content/early/2013/10/03/amiajnl-2013-001766.full
Dmitriy Dligach, Timothy A. Miller, Guergana K. Savova. 2013. Active Learning for Phenotyping Tasks. In Proceedings of the 2013 NLP for Medicine and Biology workshop held in conjunction with RANLP-2013. September 2013. Hissar, Bulgaria. http://aclweb.org/anthology//W/W13/W13-5101.pdf
Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/
American Medical Informatics Association (AMIA) national webinar. “Towards semantic annotations of the clinical narrative”. National webinar. April 2013 (invited presentation)
Natural Language Processing Working Group Pre-Symposium – doctoral consortium and a data workshop. “Shared Annotated Resources for the Clinical Domain”. American Medical Informatics Association. Washington, DC, USA. November 2013.
Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2013. Shared resources, shared code and shared activities in clinical natural language processing. AMIA Annual Symposium, Panel. Washington, DC.
AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2012

Savova, Guergana. 2012. Shared Annotated Resources for the Clinical Domain. Natural Language Processing (NLP) Annotation workshop collocated with the 2nd annual IEEE International Conference on Healthcare Informatics, Imaging and Systems Biology. San Diego, CA, USA. September 2012.
Drs. Pustejovsky, Palmer and Savova are members of the Program Committee of the 2012 i2b2 shared task whose topic is temporal relations in the clinical domain. The THYME annotation guidelines are the basis of the annotation guidelines for that shared task.
Participation in the State of the Art of Clinical NLP workshop organized by the NLM in April, 2012. Dr. Savova chaired a session, Prof. Pustejovsky was an invited speaker presenting on Temporal relations/TimeML.
Participation and presentation in the AMIA Fall symposium workshop on Natural Language Processing and data. Dr. Savova presented THYME work as part of the data workshop.

2011

Savova, Guergana; Chapman, Wendy; Elhadad, Noemie; Palmer, Martha. 2011. Shared annotated resources for the clinical domain. AMIA Annual Symposium, Panel. Washington, DC.

Shared NLP Tasks with THYME participation

CLEF/ShARe 2014 (in collaboration with the ShARe project): http://clefehealth2014.dcu.ie/task-2
SemEval 2014 Analysis of Clinical Text Task 7 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2014/task7/
SemEval 2015 Analysis of Clinical Text Task 14 (in collaboration with the ShARe project): http://alt.qcri.org/semeval2015/task14/
SemEval 2015 Clinical TempEval Task 6: http://alt.qcri.org/semeval2015/task6/
SemEval 2016 Clinical TempEval Task 12: http://alt.qcri.org/semeval2016/task12/
SemEval 2017 Clinical TempEval Task 12: http://alt.qcri.org/semeval2017/task12/

Getting access to the THYME corpus and gold standard annotations

The THYME corpus with the gold standard annotations is available to others involved in NLP research under a data use agreement (DUA) with Mayo Clinic. The corpus is distributed through the hNLP Center (center.healthnlp.org). Please, visit the hNLP Center website for more details.

When using the THYME corpus, please

Include the Mayo Clinic in your acknowledgements
Cite the article: William F. Styler IV, Steven Bethard, Sean Finan, Martha Palmer, Sameer Pradhan, Piet C. de Groen, Brad Erickson, Timothy Miller, Chen Lin, Guergana Savova, James Pustejovsky. Temporal Annotation in the Clinical Domain. Transactions of the Association for Computational Linguistics. Vol 2 (2014). https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305

THYME Gold Standard Annotations

Annotation layers are treebank and propbank annotations as well as temporal annotations for events, temporal expressions and temporal relations.

Annotation guidelines

The THYME Temporal Relations Guidelines (PDF) - The current version of the THYME Temporal Relations Guidelines and release notes. Updated February 28th, 2014.

i2b2 Simplified THYME Guidelines (PDF) The guidelines provided to the organizers of the 2012 Temporal relations i2b2 challenge for consideration during planning. They reflect an earlier stage of our guidelines.

Syntactic Tree (Treebank): http://clear.colorado.edu/compsem/documents/treebank_guidelines.pdf

Semantic Role (Propbank): http://clear.colorado.edu/compsem/documents/propbank_guidelines.pdf

UMLS entity and relations annotations/templates: http://clear.colorado.edu/compsem/documents/umls_guidelines.pdf

Clinical coreference guidelines (based on ODIE, OntoNotes, MUC-7): http://clear.colorado.edu/compsem/documents/Coreference%20Guidelines.pdf

Tool for viewing the gold standard annotations - Anafora

We developed a web-based annotation tool. It is open source and available at https://github.com/weitechen/anafora. Use it to view the THYME annotations. Citation for the tool is:

Chen, Wei-Te and Styler, Will. 2013. Anafora: A Web-based General Purpose Annotation Tool. Proceeding of the North American Association for Computational Linguistics Conference. Atlanta, GA, June 9-13. http://www.aclweb.org/anthology/N13-3004.

Viewing the gold standard annotations (Anafora)

(available to the team only)

To view the Temporal-Entity data, use the URL:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Entity/gold/

TASK_NAME is the filestem, for example, ID074_path_219b

to view Temporal-Relation data:

https://verbs.colorado.edu/anafora/annotate/Temporal/ColonCancer/TASK_NAME/Temporal.Relation/gold/

you could find the available Entity/Relation gold data on verbs by using:

  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Entity.gold.completed.xml"
  find /data/anafora/anaforaProjectFile/Temporal/ -name "*Temporal-Relation.gold.completed.xml"

Train/Development/Test splits

Use this split for experiments with the THYME data (% 8)!
(A note about Protege/Knowtator and Anafora annotation tools: annotations)

Colon Cancer Data

Train sets (Residue 0,1,2,3): [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201, 202, 203, 208, 209, 210, 211, 216, 217]

Development sets (Residue 4,5): [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197, 204, 205, 212, 213]

Test sets (Residue 6,7): [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199, 206, 207, 214, 215]

Brain Cancer Data

Train sets: [1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27, 32, 33, 34, 35, 40, 41, 42, 43, 48, 49, 50, 51, 56, 57, 58, 59, 64, 65, 66, 67, 72, 73, 74, 75, 80, 81, 82, 83, 88, 89, 90, 91, 96, 97, 98, 99, 104, 105, 106, 107, 112, 113, 114, 115, 120, 121, 122, 123, 128, 129, 130, 131, 136, 137, 138, 139, 144, 145, 146, 147, 152, 153, 154, 155, 160, 161, 162, 163, 168, 169, 170, 171, 176, 177, 178, 179, 184, 185, 186, 187, 192, 193, 194, 195, 200, 201]

Development sets: [4, 5, 12, 13, 20, 21, 28, 29, 36, 37, 44, 45, 52, 53, 60, 61, 68, 69, 76, 77, 84, 85, 92, 93, 100, 101, 108, 109, 116, 117, 124, 125, 132, 133, 140, 141, 148, 149, 156, 157, 164, 165, 172, 173, 180, 181, 188, 189, 196, 197]

Test sets: [6, 7, 14, 15, 22, 23, 30, 31, 38, 39, 46, 47, 54, 55, 62, 63, 70, 71, 78, 79, 86, 87, 94, 95, 102, 103, 110, 111, 118, 119, 126, 127, 134, 135, 142, 143, 150, 151, 158, 159, 166, 167, 174, 175, 182, 183, 190, 191, 198, 199]

THYME Software

The THYME system is available as part of Apache cTAKES at http://ctakes.apache.org/

Demo of the system: ctakes.apache.org -> get started -> demos -> ctakes-temporal (http://alt.qcri.org/semeval2016/task12/)

We have developed a visualization tool (THYME viz tool) available in Apache cTAKES sandbox. A prototype and details of the THYME vizualization tool was presented by Sean Finan at several annual workshops:

Finan, Sean. 2013. Challenges of visually representing rich temporal information of the clinical narrative. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 30th Annual Human-Computer Interaction Lab Symposium. May 22-23 2013. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2013/

Finan, Sean. De Groen, Piet. Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Workshop: Exploring Temporal Patterns in Electronic Health Record Data. 31st Annual Human-Computer Interaction Lab Symposium. May 29 2014. University of Maryland. http://www.cs.umd.edu/hcil/eventflow/workshop2014/

Finan, Sean. De Groen, Piet. Savova, Guergana. 2014. Narrative Event and Temporal Relation Visualization Tool. Natural Language Processing Workshop. Informatics for Integrating Biology & the Bedside (I2B2). 4th Annual Academic User Group Meeting. July 9 2014. Harvard Medical School.

https://www.i2b2.org/events/slides/NarrativeVisualizer.pdf

Visualization Tool Demonstration Video

Relevant Background Papers

Relevant Papers

Reading Group

Paper Queue

2022 NAACL

Internal Presentations

Presentations

Venues for manuscript submissions

Venues for manuscript submissions/publications

Project materials

Project Charter

Tasks, leads, teams and deadlines

Progress reports

Annotations - Describes the corpus, the layers of annotations and annotation progress

Annotation Tools - Describes the progress and information pertaining to the Anafora annotation tool

Communication

Bi-weekly team meetings, Fri 11am-noon ET
Weekly methods meetings, ~~Tue 4-5:30 pm ET~~ - Temporary COVID-19 hours: Weds 11AM-12:30PM

IDEAS notebook

Ideas notebook

Meeting Notes

Sept 14, 2022 Methods meeting
Sept 9, 2022 THYME all team meeting
Sept 7, 2022 Methods meeting
August 26, 2022 THYME all team meeting
August 24, 2022 Methods meeting
August 17, 2022 Methods meeting
August 12, 2022 THYME all team meeting
August 10, 2022 Methods meeting
August 3, 2022 Methods meeting
July 29, 2022 THYME all team meeting
July 27, 2022 Methods meeting
July 20, 2022 Methods meeting
July 6, 2022 Methods meeting
June 29, 2022 Methods meeting
June 22, 2022 Methods meeting
June 17, 2022 THYME all team meeting
June 15, 2022 Methods meeting
June 8, 2022 Methods meeting
June 1, 2022 Methods meeting
May 25, 2022 Methods meeting
May 18, 2022 Methods meeting
May 11, 2022 Methods meeting
May 6, 2022 THYME all team meeting
May 4, 2022 Methods meeting
April 27, 2022 Methods meeting
April 22, 2022 THYME all team meeting
April 20, 2022 Methods meeting
April 13, 2022 Methods meeting
April 8, 2022 THYME all team meeting
April 6, 2022 Methods meeting
March 30, 2022 Methods meeting
March 25, 2022 THYME all team meeting
March 23, 2022 Methods meeting
March 16, 2022 Methods meeting
March 11, 2022 THYME all team meeting
March 9, 2022 Methods meeting
March 2, 2022 Methods meeting
February 25, 2022 THYME all team meeting
February 23, 2022 Methods meeting
February 16, 2022 Methods meeting
February 11, 2022 THYME all team meeting
February 9, 2022 Methods meeting
February 2, 2022 Methods meeting
January 28, 2022 THYME all team meeting
January 26, 2022 Methods meeting
January 19, 2022 Methods meeting
January 14, 2022 THYME all team meeting
January 12, 2022 Methods meeting
January 5, 2022 Methods meeting

Getting started

Contact

If you need assistance and/or if you have questions about the project, feel free to send e-mail to guergana dot savova at childrens dot harvard dot edu, martha dot palmer at colorado dot edu, or bethard at email dot arizona dot edu.

Main Page