Difference between revisions of "Meeting Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(80 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' Hybrid - Buchanan 430, and the zoom link below
+
'''Location:''' Hybrid - Muenzinger D430, and the zoom link below
  
'''Time:''' Wednesdays at 10:30am, Mountain Time
+
'''Time:''' Wednesdays at 11:30am, Mountain Time
  
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
Line 13: Line 13:
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 08/30/23 || '''Planning, introductions, welcome!'''
+
| 08/28/2024 || '''Planning, introductions, welcome!'''
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/06/2023 || ACL talk videos (Geoffrey Hinton)
+
| 09/04/2024 || Brunch Social
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/13/2023 || Ongoing projects talks (Susan: AIDA, KAIROS, DWD)
+
| 09/11/2024 || Watch and discuss NLP keynote
 +
 
 +
'''Winner:''' Barbara Plank’s “Are LLMs Narrowing our Horizon? Let’s Embrace Variation in NLP!”
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/20/2023 || Brunch and garden party outside in the Shakespeare Garden! (no zoom)
+
| 09/18/2024 || CLASIC presentations
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/27/2023 || Felix Zheng - practice talk, Ongoing projects (Martha: UMR. Jim: ISAT. Rehan: Event Coref Projects)
+
| 09/25/2024 || Invited talks/discussions from Leeds and Anschutz folks: Liu Liu, Abe Handler, Yanjun Gao
 +
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/04/2023 || Ongoing projects talks, focus on low-resource and endangered languages (UMR2, LECS lab, NALA)
+
| 10/02/2024 || Martha Palmer, Annie Zaenen, Susan Brown, Alexis Cooper.
 +
 
 +
'''Title:''' Testing GPT4's interpretation of the Caused-Motion Construction
 +
 
 +
'''Abstract:''' The fields of Artificial Intelligence and Natural Language Processing have been revolutionized by the advent  of  Large  Language  Models  such  as  GPT4.  They  are  perceived  as  being  language  experts and there is a lot of speculation about how intelligent they are, with claims being made about “Sparks of  General  Artificial  Intelligence.”  This  talk  will  describe  in  detail  an  English  linguistic  construction, the Caused Motion Construction, and compare prior interpretation approaches with current LLM interpretations.  The  prior  approaches  are  based  on  VerbNet. It’s unique  contributions  to  prior  approaches  will  be  outlined.  Then  the  results  of  a  recent  preliminary study  probing  GPT4’s  analysis  of  the  same  constructions  will  be  presented.  Not  surprisingly,  this analysis  illustrates  both  strengths  and  weaknesses  of  GPT4’s  ability  to  interpret  Caused  Motion Constructions and to generalize this interpretation.
 +
 
 +
Recording: https://o365coloradoedu-my.sharepoint.com/:v:/r/personal/mpalmer_colorado_edu/Documents/BoulderNLP-Palmer-Oct2-2024.mp4?csf=1&web=1&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=aCHeN8
 +
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/11/2023 || Ongoing projects talks, LECS lab and BLAST lab
+
| 10/09/2024 || NAACL Paper Clinic: Come get feedback on your submission drafts!
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/18/2023 || Téa Wright thesis proposal, BLAST lab
+
| 10/16/2024 || Senior Thesis Proposals:
 +
 
 +
 
 +
'''Alexandra Barry'''
 +
 
 +
'''Title''': Benchmarking LLM Handling of Cross-Dialectal Spanish
 +
 
 +
'''Abstract''': This proposal introduces current issues and gaps in cross-dialectal NLP in Spanish as well as the lack of resources available for Latin American dialects. The presentation will cover past work in dialect detection, translation, and benchmarking in order to build a foundation for a proposal that aims to create a benchmark that analyses LLM robustness across a series of tasks in different Spanish dialects
 +
 
  
-----
 
  
'''Téa Wright'''
+
'''Tavin Turner'''
  
'''Research Proposal: Pretrained multilingual model Adaptation for Low Resource Languages with OCR'''
+
'''Title''': Agreeing to Disagree: Statutory Relational Stance Modeling
  
Pretrained multilingual models (PMMs) have advanced the natural language processing (NLP) field over recent years, but they often struggle when confronted with low-resource languages. This proposal will explore the challenges of adapting PMMs to such languages, with a current focus on Lakota and Dakota. Of the data available for endangered languages, much of it is in formats that are not machine readable. As a result, endangered languages are left out of NLP technologies. Using optical character recognition (OCR) to digitize these resources is beneficial for this dilemma, but also introduces noise.
+
'''Abstract''': Policy division deeply affects which bills get passed in legislature, and how. So far, statutory NLP has predicted voting breakdowns, interpreted stakeholder benefit, informed legal decision support systems, and much more. In practice, legislation demands compromise and concession to pass important policy, yet models often struggle to reason over the whole act. Leveraging neuro-symbolic models, we seek to intermediate this challenge with relational structures of statutes’ sectional stances – modeling stance agreement, exception, etc. Beyond supporting downstream statutory analysis tasks, these structures could help stakeholders understand how a bill impacts them, litmus the cooperation within a legislature, and reveal patterns of compromise that aid a bill through ratification.
  
The goal of this research is to determine how this noise affects model adaptation and performance for zero-shot and few-shot learning for low-resource languages. The project will involve data collection and scanning, annotation for a gold evaluation dataset, and evaluation of multiple language models across different adaptation methods and levels of noise. Additionally, we hope to expand this pipeline to more scripts and languages.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 10/23/2024 || '''Ananya Ganesh''''s PhD Dissertation Proposal
  
The potential implications of this study are broad: generalizability to languages not included in the study as well as providing insight into how noise affects model adaptation and the types of noise that are most harmful. This project aims to address the unique challenges of Lakota and Dakota as well as develop the field’s understanding of how models may be adapted to include low-resource languages, working towards more inclusive NLP technologies.
+
'''Title''': Reliable Language Technology for Classroom Dialog Understanding
  
 +
'''Abstract''': In this proposal, I will lay out how NLP models can be developed to address realistic use cases in analyzing classroom dialogue. Towards this goal, I will first introduce a new task and corresponding dataset, focused on detecting off-task utterances in small-group discussions. I will
 +
then propose a method to solve this task that considers how the inherent structure in the dialog can be used to learn richer representations of the dialog context. Next, I will introduce preliminary work on applying LLMs in the in-context learning setting for a broad range of tasks pertaining to qualitative coding of classroom dialog, and discuss potential follow-up work. Finally, keeping in mind our goals of serving many independent stakeholders, I will propose a study to incorporate differing stake-holder’s subjective judgments while curating gold-standard data for classroom discourse analysis.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/25/2023 || BLAST lab; and then Daniel Acuña (Daniel's talk will start at 11:20)
+
| 10/30/2024 || '''Marie McGregor''''s area exam
 +
 
 +
'''Title''': Adapting AMR Metrics to UMR Graphs
 +
 +
'''Abstract''': Uniform Meaning Representation (UMR) expands on the capabilities of Abstract Meaning Representation (AMR) by supporting document-level annotation, suitability for low-resource languages, and support for logical inference. As a framework for any sort of representation is developed, a way to measure the similarities or differences between two representations must be developed in tandem to support the creation of parsers and for computing inner-annotator agreement (IAA). Fortunately, there exists robust research into metrics to assess the similarity of AMR graphs. The usefulness of these metrics to UMRs depends on four key aspects: scalability, correctness, interpretability, and cross-lingual suitability. This paper investigates the applicability of AMR metrics to UMR graphs along these aspects in order to create useful and reliable UMR metrics.
  
'''[https://scienceofscience.org/ Daniel Acuña]'''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 11/06/2024 || Short presentations / discussions: Curry Guinn, Yifu Wu, Kevin Stowe
  
'''The differential and irreplaceable contributions of academia and industry to AI research'''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 11/13/2024 || Invited talk by '''Nick Dronen''' and '''Seminar Lunch'''  
  
Striking recent advances by industry’s artificial intelligence (AI) have stunned the academic world, making us rethink whether academia should just follow industry’s lead. Due to its open publication, citation, and code-sharing culture, AI offers a rare opportunity to investigate whether these recent advances are outliers or something more systematic. In the present study, we investigate the impact and novelty of academic and industry AI research across 58 conferences—the primary publication medium of AI—involving 292,185 articles and 524 state-of-the-art models from 1995 to 2020. Our findings reveal an overall seismic shift in impact and novelty metrics, which started around 2015, presumably motivated by deep learning. In the most recent measures, an article published by an exclusively industry team dominates impact, with a 73.78 percent higher chance of being highly cited, 12.80 percent higher chance of being citation-disruptive, and several times more likely to produce state-of-the-art models. In contrast, we find that academic teams dominate novelty, having a striking 2.8 times more likelihood of producing novel, atypical work. Controlling for potential confounding factors such as subfield, team size, seniority, and prestige, we find that academia–industry collaborations are unable to simultaneously replicate the impact and novelty of non-collaborative teams, suggesting each environment offers irreplaceable contributions to advance AI.
+
'''Title''': SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
 
   
 
   
 +
'''Abstract''': Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs’ algorithmic abilities under simple lexical or semantic variations. To this end, we present the SETLEXSEM CHALLENGE, a synthetic benchmark that evaluates the performance of LLMs on set operations. SETLEXSEM assesses the robustness of LLMs’ instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs with SETLEXSEM, we find that they exhibit poor robustness to variation in both operation and operands. We show – via the framework’s systematic sampling of set members along lexical and semantic dimensions – that LLMs are not only not robust to variation along these dimensions but demonstrate unique failure modes in particular, easy-to-create semantic groupings of "deceptive" sets. We find that rigorously measuring language model robustness to variation in frequency and length is challenging and present an analysis that measures them independently.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/1/2023 || Guest speaker: [https://jiangtianyu.com/ Tianyu Jiang], University of Cincinnati
+
| 11/20/2024 || '''Abteen’s proposal'''
  
 +
'''When''': Wed. Nov 20, 11:30 am
  
'''Tianyu Jiang, Assistant Professor, Dept. of Computer Science, University of Cincinnati'''
+
'''Where''': MUEN D430 and zoom https://cuboulder.zoom.us/j/97014876908
  
'''Commonsense Knowledge of Prototypical Functions for Natural Language Processing'''
+
'''Title''': Extending Benchmarks and Multilingual Models to Truly Low-Resource Languages
 
   
 
   
Recent advances in natural language processing (NLP) have enabled computers to understand and generate natural language to a remarkable degree. However, it is still a big challenge for computers to "read between the lines" as we humans do. People often omit a lot of information in daily communication, but we have no difficulty understanding each other because our commonsense knowledge can help us make inferences. In this research, we focus on one specific type of commonsense knowledge that people use in everyday living: "functional knowledge". People go to different places for a common set of goals: people go to schools to study, go to stores to buy clothing, and go to restaurants to eat. Comparably, people create and use physical objects for different purposes: knives are for cutting, cars are for transportation, and phones are for communication. I will first introduce how we can automatically learn this type of knowledge, and then demonstrate how to utilize this prior knowledge of functions in two downstream applications including sentence-level understanding and visual activity recognition.
+
'''Abstract''': Driven by successes in large-scale data collection and training efforts, the field of natural language processing (NLP) has seen a dramatic surge in model performance. However, the vast majority of the roughly 7,000 languages spoken across the globe do not have the necessary amounts of easily available text resources and have not been able to share in these advancements. In this proposal, we focus on how best to improve pretrained model performance for these languages, which we refer to as truly low-resource. First, we discuss model adaptation techniques which leverage unlabeled data and discuss experiments which evaluate these approaches in a realistic setting. Next, we address a limitation of prior work, and describe two data collection efforts for low-resource languages. We further present a synthetic evaluation resource which tests a model's understanding of specific linguistic phenomenon: lexical gaps. Finally, we propose additional analysis experiments we aim to address disagreements across prior work, and extend these experiments to include low-resource languages.
 
'''Bio:''' Tianyu Jiang is an Assistant Professor in the Computer Science department at the University of Cincinnati. He received his Ph.D. in Computer Science from the University of Utah, advised by Ellen Riloff. His main research interests are in the area of Natural Language Processing (NLP), specifically in semantics, commonsense knowledge, multimodality, and information extraction.
 
  
  
|- style="border-top: 2px solid DarkGray;"
 
| 11/8/2023 || Luke Gessler, CU Boulder Computer Science
 
  
'''Low-resource Monolingual Transformer Language Models'''
+
'''Alex’s area exam''':
Since the publication of BERT in 2018, pretrained Transformer language models (TLMs) have been a foundational requirement for almost all natural language processing systems. High-quality TLMs are easily attainable for languages with vast amounts of data, such as English, but for all but the top 100 most data-rich languages, it is very difficult to train TLMs with high quality. Most work aimed at addressing this issue has taken a multilingual approach, but in this talk, we take up the question of whether low-resource TLMs could be trained effectively using only data drawn from one language. First, we describe a novel training algorithm for monolingual low-resource TLMs which characteristically involves reducing model size and using multitask learning with syntactically labeled data. Second, we describe a complementary training algorithm which uses contrastive learning and a syntactically-guided self-attention mechanism to provide syntactic inductive bias to TLMs. Third, we present a new TLM evaluation dataset, extensible to any language with a New Testament translation, aimed at addressing the severe lack of model evaluation resources in low-resource settings. To our knowledge, this is the first major effort to develop low-resource monolingual TLMs, and our results show that our methods are often more effective than any other competing approach to provide TLMs for low-resource languages.
 
  
|- style="border-top: 2px solid DarkGray;"
+
'''When''': Wed. Nov 20, 1:30 pm
| 11/15/2023 || Jie Cao, iSAT
 
  
'''Inductive Biases for Deep Linguistic Structured Prediction with Independent Factorization'''
+
'''Where''': MUEN E214 and zoom https://cuboulder.zoom.us/j/97014876908
  
Discovering the underlying structure of text can enable rigorous analysis, easier knowledge organization, and programmable reasoning. The no-free-lunch theorem underscores that the search for appropriate inductive biases that influence hypothesis selection in machine learning is necessary to obtain generalization. This is also true for deep learning models to predict intricate combinatory structures. We ground our studies on deep structured prediction on both broad-coverage linguistic representations and application-specific representations.
+
'''Title''': Computational Media Framing Analysis through Rhetorical Devices and Linguistic Features
               
 
Due to the compositionality of natural language, many language representations are also defined to be compositional structures. However, we need to make the right design choices to factorize the input and output, and then model the correlations between their decomposed parts. We study structural inductive biases by designing factorization-oriented learning and reasoning mechanisms at the lexical, phrasal, and sentential levels.
 
  
Furthermore, human-encoded knowledge with language can also be used as valuable inductive biases. We study how to use natural language descriptions to represent the meaning of output symbols (intents and slots) in task-oriented dialogue state tracking, which helps to generalize to unseen domains and services. We offer detailed comparative studies on how to use natural language as inductive biases by investigating encoding strategies, supplementary pretraining, and homogeneous/heterogeneous evaluations.
+
'''Abstract''': Over the past decade, there has been an increased focus on media framing in the Natural Language Processing (NLP) community. Framing has been defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993). This computational work generally seeks to quantify framing on a large scale to raise awareness about media bias. A prevalent paradigm for computational framing analysis focuses on studying high-level topical information. Though highly generalizable, this approach addresses only emphasis framing: when a writer or speaker highlights particular aspect of a topic more frequently than others. However, prior framing work is broad, encompassing many other facets and types of framing present in the media. In recognition of this, there has been a recent line of work seeking to subvert the earlier focus on topical information. In this survey, we present an analysis of work which is both in line with goals of expanding the breadth of computational framing analysis and is generalizable. We focus on work which analyzes the role of rhetorical devices and linguistic features to reveal insights about media framing.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/22/2023 || *** fall break ***
+
| 11/27/2024 || '''No meeting:''' Fall break
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/29/2023 || EMNLP practice talks
+
| 12/04/2024 || Enora's prelim
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12/06/2023 || Adam's Proposal
+
| 12/11/2024 ||  
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12/13/2023 || Elizabeth's Proposal
+
| 1/23/25|| Chenhao Tan CS Colloquium, 3:30pm
 +
 
  
|- style="border-top: 2px solid DarkGray;"
 
 
|}
 
|}
 
  
 
=Past Schedules=
 
=Past Schedules=
 +
* [[Spring 2024 Schedule]]
 +
* [[Fall 2023 Schedule]]
 
* [[Spring 2023 Schedule]]
 
* [[Spring 2023 Schedule]]
 
* [[Fall 2022 Schedule]]
 
* [[Fall 2022 Schedule]]

Latest revision as of 15:45, 18 November 2024

Location: Hybrid - Muenzinger D430, and the zoom link below

Time: Wednesdays at 11:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
08/28/2024 Planning, introductions, welcome!
09/04/2024 Brunch Social
09/11/2024 Watch and discuss NLP keynote

Winner: Barbara Plank’s “Are LLMs Narrowing our Horizon? Let’s Embrace Variation in NLP!”

09/18/2024 CLASIC presentations
09/25/2024 Invited talks/discussions from Leeds and Anschutz folks: Liu Liu, Abe Handler, Yanjun Gao


10/02/2024 Martha Palmer, Annie Zaenen, Susan Brown, Alexis Cooper.

Title: Testing GPT4's interpretation of the Caused-Motion Construction

Abstract: The fields of Artificial Intelligence and Natural Language Processing have been revolutionized by the advent of Large Language Models such as GPT4. They are perceived as being language experts and there is a lot of speculation about how intelligent they are, with claims being made about “Sparks of General Artificial Intelligence.” This talk will describe in detail an English linguistic construction, the Caused Motion Construction, and compare prior interpretation approaches with current LLM interpretations. The prior approaches are based on VerbNet. It’s unique contributions to prior approaches will be outlined. Then the results of a recent preliminary study probing GPT4’s analysis of the same constructions will be presented. Not surprisingly, this analysis illustrates both strengths and weaknesses of GPT4’s ability to interpret Caused Motion Constructions and to generalize this interpretation.

Recording: https://o365coloradoedu-my.sharepoint.com/:v:/r/personal/mpalmer_colorado_edu/Documents/BoulderNLP-Palmer-Oct2-2024.mp4?csf=1&web=1&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=aCHeN8


10/09/2024 NAACL Paper Clinic: Come get feedback on your submission drafts!
10/16/2024 Senior Thesis Proposals:


Alexandra Barry

Title: Benchmarking LLM Handling of Cross-Dialectal Spanish

Abstract: This proposal introduces current issues and gaps in cross-dialectal NLP in Spanish as well as the lack of resources available for Latin American dialects. The presentation will cover past work in dialect detection, translation, and benchmarking in order to build a foundation for a proposal that aims to create a benchmark that analyses LLM robustness across a series of tasks in different Spanish dialects


Tavin Turner

Title: Agreeing to Disagree: Statutory Relational Stance Modeling

Abstract: Policy division deeply affects which bills get passed in legislature, and how. So far, statutory NLP has predicted voting breakdowns, interpreted stakeholder benefit, informed legal decision support systems, and much more. In practice, legislation demands compromise and concession to pass important policy, yet models often struggle to reason over the whole act. Leveraging neuro-symbolic models, we seek to intermediate this challenge with relational structures of statutes’ sectional stances – modeling stance agreement, exception, etc. Beyond supporting downstream statutory analysis tasks, these structures could help stakeholders understand how a bill impacts them, litmus the cooperation within a legislature, and reveal patterns of compromise that aid a bill through ratification.

10/23/2024 Ananya Ganesh's PhD Dissertation Proposal

Title: Reliable Language Technology for Classroom Dialog Understanding

Abstract: In this proposal, I will lay out how NLP models can be developed to address realistic use cases in analyzing classroom dialogue. Towards this goal, I will first introduce a new task and corresponding dataset, focused on detecting off-task utterances in small-group discussions. I will then propose a method to solve this task that considers how the inherent structure in the dialog can be used to learn richer representations of the dialog context. Next, I will introduce preliminary work on applying LLMs in the in-context learning setting for a broad range of tasks pertaining to qualitative coding of classroom dialog, and discuss potential follow-up work. Finally, keeping in mind our goals of serving many independent stakeholders, I will propose a study to incorporate differing stake-holder’s subjective judgments while curating gold-standard data for classroom discourse analysis.

10/30/2024 Marie McGregor's area exam

Title: Adapting AMR Metrics to UMR Graphs

Abstract: Uniform Meaning Representation (UMR) expands on the capabilities of Abstract Meaning Representation (AMR) by supporting document-level annotation, suitability for low-resource languages, and support for logical inference. As a framework for any sort of representation is developed, a way to measure the similarities or differences between two representations must be developed in tandem to support the creation of parsers and for computing inner-annotator agreement (IAA). Fortunately, there exists robust research into metrics to assess the similarity of AMR graphs. The usefulness of these metrics to UMRs depends on four key aspects: scalability, correctness, interpretability, and cross-lingual suitability. This paper investigates the applicability of AMR metrics to UMR graphs along these aspects in order to create useful and reliable UMR metrics.

11/06/2024 Short presentations / discussions: Curry Guinn, Yifu Wu, Kevin Stowe
11/13/2024 Invited talk by Nick Dronen and Seminar Lunch

Title: SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models

Abstract: Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs’ algorithmic abilities under simple lexical or semantic variations. To this end, we present the SETLEXSEM CHALLENGE, a synthetic benchmark that evaluates the performance of LLMs on set operations. SETLEXSEM assesses the robustness of LLMs’ instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs with SETLEXSEM, we find that they exhibit poor robustness to variation in both operation and operands. We show – via the framework’s systematic sampling of set members along lexical and semantic dimensions – that LLMs are not only not robust to variation along these dimensions but demonstrate unique failure modes in particular, easy-to-create semantic groupings of "deceptive" sets. We find that rigorously measuring language model robustness to variation in frequency and length is challenging and present an analysis that measures them independently.

11/20/2024 Abteen’s proposal

When: Wed. Nov 20, 11:30 am

Where: MUEN D430 and zoom https://cuboulder.zoom.us/j/97014876908

Title: Extending Benchmarks and Multilingual Models to Truly Low-Resource Languages

Abstract: Driven by successes in large-scale data collection and training efforts, the field of natural language processing (NLP) has seen a dramatic surge in model performance. However, the vast majority of the roughly 7,000 languages spoken across the globe do not have the necessary amounts of easily available text resources and have not been able to share in these advancements. In this proposal, we focus on how best to improve pretrained model performance for these languages, which we refer to as truly low-resource. First, we discuss model adaptation techniques which leverage unlabeled data and discuss experiments which evaluate these approaches in a realistic setting. Next, we address a limitation of prior work, and describe two data collection efforts for low-resource languages. We further present a synthetic evaluation resource which tests a model's understanding of specific linguistic phenomenon: lexical gaps. Finally, we propose additional analysis experiments we aim to address disagreements across prior work, and extend these experiments to include low-resource languages.


Alex’s area exam:

When: Wed. Nov 20, 1:30 pm

Where: MUEN E214 and zoom https://cuboulder.zoom.us/j/97014876908

Title: Computational Media Framing Analysis through Rhetorical Devices and Linguistic Features

Abstract: Over the past decade, there has been an increased focus on media framing in the Natural Language Processing (NLP) community. Framing has been defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993). This computational work generally seeks to quantify framing on a large scale to raise awareness about media bias. A prevalent paradigm for computational framing analysis focuses on studying high-level topical information. Though highly generalizable, this approach addresses only emphasis framing: when a writer or speaker highlights particular aspect of a topic more frequently than others. However, prior framing work is broad, encompassing many other facets and types of framing present in the media. In recognition of this, there has been a recent line of work seeking to subvert the earlier focus on topical information. In this survey, we present an analysis of work which is both in line with goals of expanding the breadth of computational framing analysis and is generalizable. We focus on work which analyzes the role of rhetorical devices and linguistic features to reveal insights about media framing.

11/27/2024 No meeting: Fall break
12/04/2024 Enora's prelim
12/11/2024
1/23/25 Chenhao Tan CS Colloquium, 3:30pm


Past Schedules