Revision as of 13:40, 17 October 2023

Location: Hybrid - Buchanan 430, and the zoom link below

Time: Wednesdays at 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908


Date	Title
08/30/23	Planning, introductions, welcome!
09/06/2023	ACL talk videos (Geoffrey Hinton)
09/13/2023	Ongoing projects talks (Susan: AIDA, KAIROS, DWD)
09/20/2023	Brunch and garden party outside in the Shakespeare Garden! (no zoom)
09/27/2023	Felix Zheng - practice talk, Ongoing projects (Martha: UMR. Jim: ISAT. Rehan: Event Coref Projects)
10/04/2023	Ongoing projects talks, focus on low-resource and endangered languages (UMR2, LECS lab, NALA)
10/11/2023	Ongoing projects talks, LECS lab and BLAST lab
10/18/2023	Téa Wright thesis proposal, BLAST lab Téa Wright Research Proposal: Pretrained multilingual model Adaptation for Low Resource Languages with OCR Pretrained multilingual models (PMMs) have advanced the natural language processing (NLP) field over recent years, but they often struggle when confronted with low-resource languages. This proposal will explore the challenges of adapting PMMs to such languages, with a current focus on Lakota and Dakota. Of the data available for endangered languages, much of it is in formats that are not machine readable. As a result, endangered languages are left out of NLP technologies. Using optical character recognition (OCR) to digitize these resources is beneficial for this dilemma, but also introduces noise. The goal of this research is to determine how this noise affects model adaptation and performance for zero-shot and few-shot learning for low-resource languages. The project will involve data collection and scanning, annotation for a gold evaluation dataset, and evaluation of multiple language models across different adaptation methods and levels of noise. Additionally, we hope to expand this pipeline to more scripts and languages. The potential implications of this study are broad: generalizability to languages not included in the study as well as providing insight into how noise affects model adaptation and the types of noise that are most harmful. This project aims to address the unique challenges of Lakota and Dakota as well as develop the field’s understanding of how models may be adapted to include low-resource languages, working towards more inclusive NLP technologies.
10/25/2023	Daniel Acuna (starting at 11:20)
11/1/2023	TBD
11/8/2023	Luke Gessler
11/15/2023	TBD
11/22/2023	* fall break *
11/29/2023	Jon's Proposal
12/06/2023	Adam's Proposal
12/13/2023	Elizabeth's Proposal
12/20/2023	Rehan's Dissertation

@@ Line 28: / Line 28: @@
 |- style="border-top: 2px solid DarkGray;"
-| 10/04/2023 || Ongoing projects talks, focus on low-resource and endangered languages (UMR2, Alexis' lab, Katharina's lab)
+| 10/04/2023 || Ongoing projects talks, focus on low-resource and endangered languages (UMR2, LECS lab, NALA)
 |- style="border-top: 2px solid DarkGray;"
-| 10/11/2023 || Ongoing projects talks (Maria's lab)
+| 10/11/2023 || Ongoing projects talks, LECS lab and BLAST lab
 |- style="border-top: 2px solid DarkGray;"
-| 10/18/2023 || Téa's Thesis Midterm
+| 10/18/2023 || Téa Wright thesis proposal, BLAST lab
+-----
+'''Téa Wright'''
+'''Research Proposal: Pretrained multilingual model Adaptation for Low Resource Languages with OCR'''
+Pretrained multilingual models (PMMs) have advanced the natural language processing (NLP) field over recent years, but they often struggle when confronted with low-resource languages. This proposal will explore the challenges of adapting PMMs to such languages, with a current focus on Lakota and Dakota. Of the data available for endangered languages, much of it is in formats that are not machine readable. As a result, endangered languages are left out of NLP technologies. Using optical character recognition (OCR) to digitize these resources is beneficial for this dilemma, but also introduces noise.
+The goal of this research is to determine how this noise affects model adaptation and performance for zero-shot and few-shot learning for low-resource languages. The project will involve data collection and scanning, annotation for a gold evaluation dataset, and evaluation of multiple language models across different adaptation methods and levels of noise. Additionally, we hope to expand this pipeline to more scripts and languages.
+The potential implications of this study are broad: generalizability to languages not included in the study as well as providing insight into how noise affects model adaptation and the types of noise that are most harmful. This project aims to address the unique challenges of Lakota and Dakota as well as develop the field’s understanding of how models may be adapted to include low-resource languages, working towards more inclusive NLP technologies.
 |- style="border-top: 2px solid DarkGray;"

Difference between revisions of "Meeting Schedule"

Revision as of 13:40, 17 October 2023

Past Schedules

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools