Fall 2024 Schedule
Location: Hybrid - Muenzinger D430, and the zoom link below
Time: Wednesdays at 11:30am, Mountain Time
Zoom link: https://cuboulder.zoom.us/j/97014876908
Date | Title |
---|---|
08/28/2024 | Planning, introductions, welcome! |
09/04/2024 | Brunch Social |
09/11/2024 | Watch and discuss NLP keynote
Winner: Barbara Plank’s “Are LLMs Narrowing our Horizon? Let’s Embrace Variation in NLP!” |
09/18/2024 | CLASIC presentations |
09/25/2024 | Invited talks/discussions from Leeds and Anschutz folks: Liu Liu, Abe Handler, Yanjun Gao
|
10/02/2024 | Martha Palmer, Annie Zaenen, Susan Brown, Alexis Cooper.
Title: Testing GPT4's interpretation of the Caused-Motion Construction Abstract: The fields of Artificial Intelligence and Natural Language Processing have been revolutionized by the advent of Large Language Models such as GPT4. They are perceived as being language experts and there is a lot of speculation about how intelligent they are, with claims being made about “Sparks of General Artificial Intelligence.” This talk will describe in detail an English linguistic construction, the Caused Motion Construction, and compare prior interpretation approaches with current LLM interpretations. The prior approaches are based on VerbNet. It’s unique contributions to prior approaches will be outlined. Then the results of a recent preliminary study probing GPT4’s analysis of the same constructions will be presented. Not surprisingly, this analysis illustrates both strengths and weaknesses of GPT4’s ability to interpret Caused Motion Constructions and to generalize this interpretation.
|
10/09/2024 | NAACL Paper Clinic: Come get feedback on your submission drafts! |
10/16/2024 | Senior Thesis Proposals:
Title: Benchmarking LLM Handling of Cross-Dialectal Spanish Abstract: This proposal introduces current issues and gaps in cross-dialectal NLP in Spanish as well as the lack of resources available for Latin American dialects. The presentation will cover past work in dialect detection, translation, and benchmarking in order to build a foundation for a proposal that aims to create a benchmark that analyses LLM robustness across a series of tasks in different Spanish dialects
Tavin Turner Title: Agreeing to Disagree: Statutory Relational Stance Modeling Abstract: Policy division deeply affects which bills get passed in legislature, and how. So far, statutory NLP has predicted voting breakdowns, interpreted stakeholder benefit, informed legal decision support systems, and much more. In practice, legislation demands compromise and concession to pass important policy, yet models often struggle to reason over the whole act. Leveraging neuro-symbolic models, we seek to intermediate this challenge with relational structures of statutes’ sectional stances – modeling stance agreement, exception, etc. Beyond supporting downstream statutory analysis tasks, these structures could help stakeholders understand how a bill impacts them, litmus the cooperation within a legislature, and reveal patterns of compromise that aid a bill through ratification. |
10/23/2024 | Ananya Ganesh's PhD Dissertation Proposal
Title: Reliable Language Technology for Classroom Dialog Understanding Abstract: In this proposal, I will lay out how NLP models can be developed to address realistic use cases in analyzing classroom dialogue. Towards this goal, I will first introduce a new task and corresponding dataset, focused on detecting off-task utterances in small-group discussions. I will then propose a method to solve this task that considers how the inherent structure in the dialog can be used to learn richer representations of the dialog context. Next, I will introduce preliminary work on applying LLMs in the in-context learning setting for a broad range of tasks pertaining to qualitative coding of classroom dialog, and discuss potential follow-up work. Finally, keeping in mind our goals of serving many independent stakeholders, I will propose a study to incorporate differing stake-holder’s subjective judgments while curating gold-standard data for classroom discourse analysis. |
10/30/2024 | Marie McGregor's area exam
Title: Adapting AMR Metrics to UMR Graphs Abstract: Uniform Meaning Representation (UMR) expands on the capabilities of Abstract Meaning Representation (AMR) by supporting document-level annotation, suitability for low-resource languages, and support for logical inference. As a framework for any sort of representation is developed, a way to measure the similarities or differences between two representations must be developed in tandem to support the creation of parsers and for computing inner-annotator agreement (IAA). Fortunately, there exists robust research into metrics to assess the similarity of AMR graphs. The usefulness of these metrics to UMRs depends on four key aspects: scalability, correctness, interpretability, and cross-lingual suitability. This paper investigates the applicability of AMR metrics to UMR graphs along these aspects in order to create useful and reliable UMR metrics. |
11/06/2024 | Short presentations / discussions: Curry Guinn, Yifu Wu, Kevin Stowe |
11/13/2024 | Invited talk by Nick Dronen and Seminar Lunch
Title: SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models Abstract: Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs’ algorithmic abilities under simple lexical or semantic variations. To this end, we present the SETLEXSEM CHALLENGE, a synthetic benchmark that evaluates the performance of LLMs on set operations. SETLEXSEM assesses the robustness of LLMs’ instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs with SETLEXSEM, we find that they exhibit poor robustness to variation in both operation and operands. We show – via the framework’s systematic sampling of set members along lexical and semantic dimensions – that LLMs are not only not robust to variation along these dimensions but demonstrate unique failure modes in particular, easy-to-create semantic groupings of "deceptive" sets. We find that rigorously measuring language model robustness to variation in frequency and length is challenging and present an analysis that measures them independently. |
11/20/2024 | Abteen’s proposal
When: Wed. Nov 20, 11:30 am Where: MUEN D430 and zoom https://cuboulder.zoom.us/j/97014876908 Title: Extending Benchmarks and Multilingual Models to Truly Low-Resource Languages Abstract: Driven by successes in large-scale data collection and training efforts, the field of natural language processing (NLP) has seen a dramatic surge in model performance. However, the vast majority of the roughly 7,000 languages spoken across the globe do not have the necessary amounts of easily available text resources and have not been able to share in these advancements. In this proposal, we focus on how best to improve pretrained model performance for these languages, which we refer to as truly low-resource. First, we discuss model adaptation techniques which leverage unlabeled data and discuss experiments which evaluate these approaches in a realistic setting. Next, we address a limitation of prior work, and describe two data collection efforts for low-resource languages. We further present a synthetic evaluation resource which tests a model's understanding of specific linguistic phenomenon: lexical gaps. Finally, we propose additional analysis experiments we aim to address disagreements across prior work, and extend these experiments to include low-resource languages.
Alex’s area exam: When: Wed. Nov 20, 1:30 pm Where: MUEN E214 and zoom https://cuboulder.zoom.us/j/97014876908 Title: Computational Media Framing Analysis through Rhetorical Devices and Linguistic Features Abstract: Over the past decade, there has been an increased focus on media framing in the Natural Language Processing (NLP) community. Framing has been defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993). This computational work generally seeks to quantify framing on a large scale to raise awareness about media bias. A prevalent paradigm for computational framing analysis focuses on studying high-level topical information. Though highly generalizable, this approach addresses only emphasis framing: when a writer or speaker highlights particular aspect of a topic more frequently than others. However, prior framing work is broad, encompassing many other facets and types of framing present in the media. In recognition of this, there has been a recent line of work seeking to subvert the earlier focus on topical information. In this survey, we present an analysis of work which is both in line with goals of expanding the breadth of computational framing analysis and is generalizable. We focus on work which analyzes the role of rhetorical devices and linguistic features to reveal insights about media framing. |
11/27/2024 | No meeting: Fall break |
12/04/2024 | Enora's prelim |
12/11/2024 |