Meeting Schedule

From CompSemWiki
Revision as of 10:50, 25 October 2023 by CompSemUser (talk | contribs)
Jump to navigationJump to search

Location: Hybrid - Buchanan 430, and the zoom link below

Time: Wednesdays at 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
08/30/23 Planning, introductions, welcome!
09/06/2023 ACL talk videos (Geoffrey Hinton)
09/13/2023 Ongoing projects talks (Susan: AIDA, KAIROS, DWD)
09/20/2023 Brunch and garden party outside in the Shakespeare Garden! (no zoom)
09/27/2023 Felix Zheng - practice talk, Ongoing projects (Martha: UMR. Jim: ISAT. Rehan: Event Coref Projects)
10/04/2023 Ongoing projects talks, focus on low-resource and endangered languages (UMR2, LECS lab, NALA)
10/11/2023 Ongoing projects talks, LECS lab and BLAST lab
10/18/2023 Téa Wright thesis proposal, BLAST lab

Téa Wright

Research Proposal: Pretrained multilingual model Adaptation for Low Resource Languages with OCR

Pretrained multilingual models (PMMs) have advanced the natural language processing (NLP) field over recent years, but they often struggle when confronted with low-resource languages. This proposal will explore the challenges of adapting PMMs to such languages, with a current focus on Lakota and Dakota. Of the data available for endangered languages, much of it is in formats that are not machine readable. As a result, endangered languages are left out of NLP technologies. Using optical character recognition (OCR) to digitize these resources is beneficial for this dilemma, but also introduces noise.

The goal of this research is to determine how this noise affects model adaptation and performance for zero-shot and few-shot learning for low-resource languages. The project will involve data collection and scanning, annotation for a gold evaluation dataset, and evaluation of multiple language models across different adaptation methods and levels of noise. Additionally, we hope to expand this pipeline to more scripts and languages.

The potential implications of this study are broad: generalizability to languages not included in the study as well as providing insight into how noise affects model adaptation and the types of noise that are most harmful. This project aims to address the unique challenges of Lakota and Dakota as well as develop the field’s understanding of how models may be adapted to include low-resource languages, working towards more inclusive NLP technologies.


10/25/2023 BLAST lab; and then Daniel Acuña (Daniel's talk will start at 11:20)

Daniel Acuña

The differential and irreplaceable contributions of academia and industry to AI research

Striking recent advances by industry’s artificial intelligence (AI) have stunned the academic world, making us rethink whether academia should just follow industry’s lead. Due to its open publication, citation, and code-sharing culture, AI offers a rare opportunity to investigate whether these recent advances are outliers or something more systematic. In the present study, we investigate the impact and novelty of academic and industry AI research across 58 conferences—the primary publication medium of AI—involving 292,185 articles and 524 state-of-the-art models from 1995 to 2020. Our findings reveal an overall seismic shift in impact and novelty metrics, which started around 2015, presumably motivated by deep learning. In the most recent measures, an article published by an exclusively industry team dominates impact, with a 73.78 percent higher chance of being highly cited, 12.80 percent higher chance of being citation-disruptive, and several times more likely to produce state-of-the-art models. In contrast, we find that academic teams dominate novelty, having a striking 2.8 times more likelihood of producing novel, atypical work. Controlling for potential confounding factors such as subfield, team size, seniority, and prestige, we find that academia–industry collaborations are unable to simultaneously replicate the impact and novelty of non-collaborative teams, suggesting each environment offers irreplaceable contributions to advance AI.


11/1/2023 Guest speaker: Tianyu Jiang, University of Cincinnati


Tianyu Jiang, Assistant Professor, Dept. of Computer Science, University of Cincinnati

Commonsense Knowledge of Prototypical Functions for Natural Language Processing

Recent advances in natural language processing (NLP) have enabled computers to understand and generate natural language to a remarkable degree. However, it is still a big challenge for computers to "read between the lines" as we humans do. People often omit a lot of information in daily communication, but we have no difficulty understanding each other because our commonsense knowledge can help us make inferences. In this research, we focus on one specific type of commonsense knowledge that people use in everyday living: "functional knowledge". People go to different places for a common set of goals: people go to schools to study, go to stores to buy clothing, and go to restaurants to eat. Comparably, people create and use physical objects for different purposes: knives are for cutting, cars are for transportation, and phones are for communication. I will first introduce how we can automatically learn this type of knowledge, and then demonstrate how to utilize this prior knowledge of functions in two downstream applications including sentence-level understanding and visual activity recognition.

Bio: Tianyu Jiang is an Assistant Professor in the Computer Science department at the University of Cincinnati. He received his Ph.D. in Computer Science from the University of Utah, advised by Ellen Riloff. His main research interests are in the area of Natural Language Processing (NLP), specifically in semantics, commonsense knowledge, multimodality, and information extraction.


11/8/2023 Luke Gessler
11/15/2023 TBD
11/22/2023 *** fall break ***
11/29/2023 Jon's Proposal
12/06/2023 Adam's Proposal
12/13/2023 Elizabeth's Proposal
12/20/2023 Rehan's Dissertation


Past Schedules