Difference between revisions of "Meeting Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(26 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' Hybrid - Muenzinger D430, and the zoom link below
+
'''Location:'''  
 +
* Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
 +
* Feb 12 onwards: Muenzinger D430, '''except''':
 +
* '''CLASIC Open House (3/12)''' will be in LBB 124
 +
* '''Adam's Defense (4/2)''' will be in LBB 430.
 +
* '''Ali Marashian's Area Exam (4/9)''' will be in LBB 430.
 +
* '''Elizabeth Spaulding's Defense (4/16)''' will be zoom-only
 +
 
 +
 
  
 
'''Time:''' Wednesdays at 11:30am, Mountain Time
 
'''Time:''' Wednesdays at 11:30am, Mountain Time
Line 13: Line 21:
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 08/28/2024 || '''Planning, introductions, welcome!'''
+
| 01/08/2025 || Invited talk: '''Denis Peskoff''' https://denis.ai/
 +
 
 +
'''Title''': Perspectives on Prompting
  
|- style="border-top: 2px solid DarkGray;"
+
'''Abstract''': Natural language processing is in a state of flux.  I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction.  First, I will talk about a paper that evaluated the responses of large language models to domain questions.  Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board.  Last, I will discuss a new paper on identifying generated content in Wikipedia.  In addition, I will highlight a mega-paper I was involved in about prompting.
| 09/04/2024 || Brunch Social
+
 
 +
'''Bio''': Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart.  He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service.  His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/11/2024 || Watch and discuss NLP keynote
+
| 01/15/2025 || '''Planning, introductions, welcome!'''
 
 
'''Winner:''' Barbara Plank’s “Are LLMs Narrowing our Horizon? Let’s Embrace Variation in NLP!”
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/18/2024 || CLASIC presentations
+
| 01/22/2025 || LSA Keynote -- Chris Potts
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/25/2024 || Invited talks/discussions from Leeds and Anschutz folks: Liu Liu, Abe Handler, Yanjun Gao
+
| 01/23/25 (Thu CS seminar) || '''Chenhao Tan''', CS Colloquium, 3:30pm, ECCR 265
 +
'''Title''': Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI
  
 +
'''Abstract''': A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses.
 +
Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.
  
|- style="border-top: 2px solid DarkGray;"
 
| 10/02/2024 || Martha Palmer, Annie Zaenen, Susan Brown, Alexis Cooper.
 
  
'''Title:''' Testing GPT4's interpretation of the Caused-Motion Construction
+
'''Bio''': Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.
  
'''Abstract:''' The fields of Artificial Intelligence and Natural Language Processing have been revolutionized by the advent  of  Large  Language  Models  such  as  GPT4.  They  are  perceived  as  being  language  experts and there is a lot of speculation about how intelligent they are, with claims being made about “Sparks of  General  Artificial  Intelligence.”  This  talk  will  describe  in  detail  an  English  linguistic  construction, the Caused Motion Construction, and compare prior interpretation approaches with current LLM interpretations.  The  prior  approaches  are  based  on VerbNet. It’s unique  contributions  to  prior  approaches  will  be  outlined.  Then  the  results  of  a  recent  preliminary study  probing  GPT4’s  analysis  of  the same  constructions  will  be  presented.  Not  surprisingly,  this analysis  illustrates  both  strengths  and  weaknesses  of  GPT4’s  ability  to  interpret  Caused  Motion Constructions and to generalize this interpretation.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 01/29/25 || '''Laurie Jones''' from Information Science
 +
'''Abstract:''' Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.  
  
Recording: https://o365coloradoedu-my.sharepoint.com/:v:/r/personal/mpalmer_colorado_edu/Documents/BoulderNLP-Palmer-Oct2-2024.mp4?csf=1&web=1&nav=eyJyZWZlcnJhbEluZm8iOnsicmVmZXJyYWxBcHAiOiJPbmVEcml2ZUZvckJ1c2luZXNzIiwicmVmZXJyYWxBcHBQbGF0Zm9ybSI6IldlYiIsInJlZmVycmFsTW9kZSI6InZpZXciLCJyZWZlcnJhbFZpZXciOiJNeUZpbGVzTGlua0NvcHkifX0&e=aCHeN8
+
'''Similarity through creation and consumption:''' Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method
  
 +
'''Collective Memory expression in LLMs:''' As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.
  
|- style="border-top: 2px solid DarkGray;"
+
'''Bio''': Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.
| 10/09/2024 || NAACL Paper Clinic: Come get feedback on your submission drafts!
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/16/2024 || Senior Thesis Proposals:
+
| 02/05/25 || '''Bhargav Shandilya''' 's Area Exam
  
 +
'''Title''': From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation
  
'''Alexandra Barry'''
+
'''Abstract''': Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.
 +
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 02/12/25 || '''Michael Ginn''' 's Area Exam
  
'''Title''': Benchmarking LLM Handling of Cross-Dialectal Spanish
+
'''Title''': Extracting Automata from Modern Neural Networks
  
'''Abstract''': This proposal introduces current issues and gaps in cross-dialectal NLP in Spanish as well as the lack of resources available for Latin American dialects. The presentation will cover past work in dialect detection, translation, and benchmarking in order to build a foundation for a proposal that aims to create a benchmark that analyses LLM robustness across a series of tasks in different Spanish dialects
+
'''Abstract:''' It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.
  
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 02/19/25 || '''Amy Burkhardt''', Cambium Assessment
  
 +
'''Title''': AI and NLP in Education: Research, Implementation, and Lessons from Industry
  
'''Tavin Turner'''
+
'''Abstract''': This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.
 +
 +
'''Bio:''' Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.
  
'''Title''': Agreeing to Disagree: Statutory Relational Stance Modeling
+
|- style="border-top: 2px solid DarkGray;"
 +
| 02/26/25 || No Meeting
  
'''Abstract''': Policy division deeply affects which bills get passed in legislature, and how. So far, statutory NLP has predicted voting breakdowns, interpreted stakeholder benefit, informed legal decision support systems, and much more. In practice, legislation demands compromise and concession to pass important policy, yet models often struggle to reason over the whole act. Leveraging neuro-symbolic models, we seek to intermediate this challenge with relational structures of statutes’ sectional stances – modeling stance agreement, exception, etc. Beyond supporting downstream statutory analysis tasks, these structures could help stakeholders understand how a bill impacts them, litmus the cooperation within a legislature, and reveal patterns of compromise that aid a bill through ratification.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 03/05/25 || '''Benet Post''''s Talk
  
|- style="border-top: 2px solid DarkGray;"
+
'''Title''': Multi-Dialectical NLP Tools for Quechua
| 10/23/2024 || '''Ananya Ganesh''''s PhD Dissertation Proposal
 
  
'''Title''': Reliable Language Technology for Classroom Dialog Understanding
+
'''Abstract''': This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.
  
'''Abstract''': In this proposal, I will lay out how NLP models can be developed to address realistic use cases in analyzing classroom dialogue. Towards this goal, I will first introduce a new task and corresponding dataset, focused on detecting off-task utterances in small-group discussions. I will
+
---
then propose a method to solve this task that considers how the inherent structure in the dialog can be used to learn richer representations of the dialog context. Next, I will introduce preliminary work on applying LLMs in the in-context learning setting for a broad range of tasks pertaining to qualitative coding of classroom dialog, and discuss potential follow-up work. Finally, keeping in mind our goals of serving many independent stakeholders, I will propose a study to incorporate differing stake-holder’s subjective judgments while curating gold-standard data for classroom discourse analysis.
 
  
|- style="border-top: 2px solid DarkGray;"
+
Anschutz Talk
| 10/30/2024 || '''Marie McGregor''''s area exam
 
  
'''Title''': Adapting AMR Metrics to UMR Graphs
+
'''Title''': Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning
 
   
 
   
'''Abstract''': Uniform Meaning Representation (UMR) expands on the capabilities of Abstract Meaning Representation (AMR) by supporting document-level annotation, suitability for low-resource languages, and support for logical inference. As a framework for any sort of representation is developed, a way to measure the similarities or differences between two representations must be developed in tandem to support the creation of parsers and for computing inner-annotator agreement (IAA). Fortunately, there exists robust research into metrics to assess the similarity of AMR graphs. The usefulness of these metrics to UMRs depends on four key aspects: scalability, correctness, interpretability, and cross-lingual suitability. This paper investigates the applicability of AMR metrics to UMR graphs along these aspects in order to create useful and reliable UMR metrics.
+
'''Abstract''': Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/06/2024 || Short presentations / discussions: Curry Guinn, Yifu Wu, Kevin Stowe
+
| 03/12/25 || CLASIC Industry Day
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/13/2024 || Invited talk by '''Nick Dronen''' and '''Seminar Lunch'''  
+
| 03/19/25 || '''Dananjay Srinivas'''' Area Exam (Late start, 12-1)
  
'''Title''': SETLEXSEM CHALLENGE: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models
+
'''Title''': Assessing progress in Natural Language Inference in the age of Neural Networks
 
   
 
   
'''Abstract''': Set theory is foundational to mathematics and, when sets are finite, to reasoning about the world. An intelligent system should perform set operations consistently, regardless of superficial variations in the operands. Initially designed for semantically-oriented NLP tasks, large language models (LLMs) are now being evaluated on algorithmic tasks. Because sets are comprised of arbitrary symbols (e.g. numbers, words), they provide an opportunity to test, systematically, the invariance of LLMs’ algorithmic abilities under simple lexical or semantic variations. To this end, we present the SETLEXSEM CHALLENGE, a synthetic benchmark that evaluates the performance of LLMs on set operations. SETLEXSEM assesses the robustness of LLMs’ instruction-following abilities under various conditions, focusing on the set operations and the nature and construction of the set members. Evaluating seven LLMs with SETLEXSEM, we find that they exhibit poor robustness to variation in both operation and operands. We show – via the framework’s systematic sampling of set members along lexical and semantic dimensions – that LLMs are not only not robust to variation along these dimensions but demonstrate unique failure modes in particular, easy-to-create semantic groupings of "deceptive" sets. We find that rigorously measuring language model robustness to variation in frequency and length is challenging and present an analysis that measures them independently.
+
'''Abstract''': Over the last decade, the space of natural language inference (NLI) has seen a lot of progress, primarily through novel constructions of inference tasks that benefit from neural approaches. This has led to claims of neural models’ abilities to understand and reason over natural language. Simultaneously, subsequent works also empirically find limitations with NLI methods and tasks, challenging previous claims of neural networks’ ability to operate on logical semantics. In this talk, we synthesize NLI task formulations and relevant empirical findings from prior scholarship to qualitative assess the soundness and limitations of neural approaches to NLI. We find from our synthesis, that though neural approaches to NLI is a well explored space, certain foundational questions still remain unanswered, affecting the fidelity of neural inference. We share key findings for future research on NLI, as well as discuss ideas on how we believe the space of NLI should be transformed in order to build language technology that can robustly operate on logical semantics.
 +
 +
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 03/26/25 || '''No meeting - Spring Break'''
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/20/2024 || '''Abteen’s proposal'''
+
| 04/02/25 || '''Adam Wiemerslage''''s Defense
  
'''When''': Wed. Nov 20, 11:30 am
+
'''Title''': Generalizing Low-Resource Morphology: Cognitive and Neural Perspectives on Inflection
  
'''Where''': MUEN D430 and zoom https://cuboulder.zoom.us/j/97014876908
+
'''Abstract''': State of the art NLP methods to leverage enormous amounts of digital text are transforming the experience of working with computers and accessing the internet for many people. However, for most of the world’s languages, there is insufficient digital data to make recently popular technology like large language models (LLMs) possible. New technology like LLMs are typically not well- suited for underrepresented languages—often referred to as low-resource languages in NLP—without sufficient digital data. In this case, simpler language technologies like dictionaries, morphological analyzers, and text normalizers are useful. This is especially apparent for language documentary life- cycles, building educational tools, and the development of language typology databases. With this in mind, we propose techniques for automatically expanding coverage of morphological databases and develop methods for building morphological tools for the large set of languages with few available resources. We then study the generation capabilities of neural network models that learn from these resources. Finally, we propose methods for training neural networks when only small amounts of data are available, taking inspiration from the recent successes of self-supervised pretraining in high-resource NLP.
  
'''Title''': Extending Benchmarks and Multilingual Models to Truly Low-Resource Languages
 
 
'''Abstract''': Driven by successes in large-scale data collection and training efforts, the field of natural language processing (NLP) has seen a dramatic surge in model performance. However, the vast majority of the roughly 7,000 languages spoken across the globe do not have the necessary amounts of easily available text resources and have not been able to share in these advancements. In this proposal, we focus on how best to improve pretrained model performance for these languages, which we refer to as truly low-resource. First, we discuss model adaptation techniques which leverage unlabeled data and discuss experiments which evaluate these approaches in a realistic setting. Next, we address a limitation of prior work, and describe two data collection efforts for low-resource languages. We further present a synthetic evaluation resource which tests a model's understanding of specific linguistic phenomenon: lexical gaps. Finally, we propose additional analysis experiments we aim to address disagreements across prior work, and extend these experiments to include low-resource languages.
 
  
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 04/09/25 || '''Ali Marashian''''s Area Exam
  
 +
'''Title''': Meditations on the Available Resources for Low-Resource NMT
  
'''Alex’s area exam''':
+
'''Abstract''': In spite of the progress in NMT in the last decade, most languages in the world do not have sufficient digitized data to train neural models on. Different approaches to remedy the
 +
problems of low-resource languages utilize different resources.
 +
In this presentation, we will look into the available categories of resources through the lens of practicality: parallel data, monolingual data, pretrained multilingual models, grammar books and morphological information, and automatic evaluation metrics. We conclude by highlighting the importance of more focus on data collection as well as on the interpretability of some of the available tools.
  
'''When''': Wed. Nov 20, 1:30 pm
 
  
'''Where''': MUEN E214 and zoom https://cuboulder.zoom.us/j/97014876908
+
|- style="border-top: 2px solid DarkGray;"
 +
| 04/16/25 || '''Elizabeth Spaulding''''s Defense
  
'''Title''': Computational Media Framing Analysis through Rhetorical Devices and Linguistic Features
+
'''Title''': The Meaning of Agency and Patiency to Machines and People
  
'''Abstract''': Over the past decade, there has been an increased focus on media framing in the Natural Language Processing (NLP) community. Framing has been defined as “select[ing] some aspects of a perceived reality and mak[ing] them more salient in a communicating text, in such a way as to promote a particular problem definition, causal interpretation, moral evaluation, and/or treatment recommendation for the item described” (Entman, 1993). This computational work generally seeks to quantify framing on a large scale to raise awareness about media bias. A prevalent paradigm for computational framing analysis focuses on studying high-level topical information. Though highly generalizable, this approach addresses only emphasis framing: when a writer or speaker highlights particular aspect of a topic more frequently than others. However, prior framing work is broad, encompassing many other facets and types of framing present in the media. In recognition of this, there has been a recent line of work seeking to subvert the earlier focus on topical information. In this survey, we present an analysis of work which is both in line with goals of expanding the breadth of computational framing analysis and is generalizable. We focus on work which analyzes the role of rhetorical devices and linguistic features to reveal insights about media framing.
+
'''Abstract''': This thesis establishes the capabilities and limitations of various language modeling technologies on the task of semantic proto-role labeling (SPRL), which assigns relational properties such as volition, awareness, and change of state to event participants in sentences. First, we demonstrate the feasibility and best practices of SPRL learned and inferred jointly with other information extraction tasks. We also show that language model output categorizes entities in sentences consistently across verb-invariant and verb-specific linguistic theories of agency, adding to the growing body of evidence of language model event reasoning capabilities. Further, we introduce a method for adopting semantic proto-role labeling systems and proto-role theory as a tool for analyzing events and participants by using it to quantify implicit human perceptions of agency and experience in text. We discuss the implications of our findings as a whole and identify multiple paths for future work, including deeper annotator involvement in future annotation of SPRL, SPRL analysis on machine-generated text, and cross-lingual studies of SPRL. Pursuing these future directions could improve both the theoretical frameworks and the computational methods, and help uncover how both people and machines structure and process events.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/27/2024 || '''No meeting:''' Fall break
+
| 04/23/25 || '''Maggie Perkoff''''s Defense
  
|- style="border-top: 2px solid DarkGray;"
+
'''Title''': Bringing Everyone In: The Future of Collaboration with Conversational AI
| 12/04/2024 || Enora's prelim
+
 +
'''Abstract''': Collaborative learning enables students to build rapport with their peers while building upon their own knowledge.  Teachers can weave collaborative learning opportunities into the classroom by having students work together in small groups.  However, these collaborations can break down when students are confused with the material, one person dominates the conversation, or when some of the participants struggle to connect with their peers.  Unfortunately, a single teacher cannot attend to the needs of all groups at the same time.  In these cases, pedagogical conversational agents (PCAs) have the potential to support teachers and students alike by taking on a collaboration facilitator role.  These agents engage students in productive dialog by providing appropriate interventions in a learning setting.  With the rapid improvement of large language models (LLMs), these agents can easily be backed by a generative model that can adapt to new domains and variations in communication styles. Integrating LLMs into PCAs requires understanding the desired teacher behavior in different scenarios and constraining the outputs of the model to match them.  This dissertation explores how to design, develop, and evaluate PCAs that incorporate LLMs to support students collaborating in small groups.  One of the products of this research is the Jigsaw Interactive Agent (JIA), a multi-modal PCA that provides real-time support to students via a chat interface.  In this work, we describe the multi-modal system to analyze students' discourse that JIA relies on, test different methods for constraining the JIA outputs in a lab setting, and evaluate the use a retrieval-augmented generation approach to enhance the outputs with curriculum materials.  Furthermore, we propose a framework for expanding JIA's capabilities to support neurodivergent students.  Ultimately, this dissertation aims to align advancements in LLM-based conversational agents with the perspectives and expertise of the teachers and students who can greatly benefit from their usage.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12/11/2024 ||  
+
| 04/30/25 || '''NAACL -- no meeting'''
 
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 1/23/25|| Chenhao Tan CS Colloquium, 3:30pm
+
| 05/07/25 || '''Final's Week -- no meeting'''
 
 
  
 
|}
 
|}

Latest revision as of 08:32, 25 April 2025

Location:

  • Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
  • Feb 12 onwards: Muenzinger D430, except:
  • CLASIC Open House (3/12) will be in LBB 124
  • Adam's Defense (4/2) will be in LBB 430.
  • Ali Marashian's Area Exam (4/9) will be in LBB 430.
  • Elizabeth Spaulding's Defense (4/16) will be zoom-only


Time: Wednesdays at 11:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
01/08/2025 Invited talk: Denis Peskoff https://denis.ai/

Title: Perspectives on Prompting

Abstract: Natural language processing is in a state of flux. I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction. First, I will talk about a paper that evaluated the responses of large language models to domain questions. Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board. Last, I will discuss a new paper on identifying generated content in Wikipedia. In addition, I will highlight a mega-paper I was involved in about prompting.

Bio: Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart. He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service. His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.

01/15/2025 Planning, introductions, welcome!
01/22/2025 LSA Keynote -- Chris Potts
01/23/25 (Thu CS seminar) Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265

Title: Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI

Abstract: A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses. Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.


Bio: Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.

01/29/25 Laurie Jones from Information Science

Abstract: Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.

Similarity through creation and consumption: Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method

Collective Memory expression in LLMs: As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.

Bio: Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.

02/05/25 Bhargav Shandilya 's Area Exam

Title: From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation

Abstract: Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.

02/12/25 Michael Ginn 's Area Exam

Title: Extracting Automata from Modern Neural Networks

Abstract: It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.

02/19/25 Amy Burkhardt, Cambium Assessment

Title: AI and NLP in Education: Research, Implementation, and Lessons from Industry

Abstract: This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.

Bio: Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.

02/26/25 No Meeting
03/05/25 Benet Post's Talk

Title: Multi-Dialectical NLP Tools for Quechua

Abstract: This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.

---

Anschutz Talk

Title: Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning

Abstract: Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.

03/12/25 CLASIC Industry Day
03/19/25 Dananjay Srinivas' Area Exam (Late start, 12-1)

Title: Assessing progress in Natural Language Inference in the age of Neural Networks

Abstract: Over the last decade, the space of natural language inference (NLI) has seen a lot of progress, primarily through novel constructions of inference tasks that benefit from neural approaches. This has led to claims of neural models’ abilities to understand and reason over natural language. Simultaneously, subsequent works also empirically find limitations with NLI methods and tasks, challenging previous claims of neural networks’ ability to operate on logical semantics. In this talk, we synthesize NLI task formulations and relevant empirical findings from prior scholarship to qualitative assess the soundness and limitations of neural approaches to NLI. We find from our synthesis, that though neural approaches to NLI is a well explored space, certain foundational questions still remain unanswered, affecting the fidelity of neural inference. We share key findings for future research on NLI, as well as discuss ideas on how we believe the space of NLI should be transformed in order to build language technology that can robustly operate on logical semantics.


03/26/25 No meeting - Spring Break
04/02/25 Adam Wiemerslage's Defense

Title: Generalizing Low-Resource Morphology: Cognitive and Neural Perspectives on Inflection

Abstract: State of the art NLP methods to leverage enormous amounts of digital text are transforming the experience of working with computers and accessing the internet for many people. However, for most of the world’s languages, there is insufficient digital data to make recently popular technology like large language models (LLMs) possible. New technology like LLMs are typically not well- suited for underrepresented languages—often referred to as low-resource languages in NLP—without sufficient digital data. In this case, simpler language technologies like dictionaries, morphological analyzers, and text normalizers are useful. This is especially apparent for language documentary life- cycles, building educational tools, and the development of language typology databases. With this in mind, we propose techniques for automatically expanding coverage of morphological databases and develop methods for building morphological tools for the large set of languages with few available resources. We then study the generation capabilities of neural network models that learn from these resources. Finally, we propose methods for training neural networks when only small amounts of data are available, taking inspiration from the recent successes of self-supervised pretraining in high-resource NLP.


04/09/25 Ali Marashian's Area Exam

Title: Meditations on the Available Resources for Low-Resource NMT

Abstract: In spite of the progress in NMT in the last decade, most languages in the world do not have sufficient digitized data to train neural models on. Different approaches to remedy the problems of low-resource languages utilize different resources. In this presentation, we will look into the available categories of resources through the lens of practicality: parallel data, monolingual data, pretrained multilingual models, grammar books and morphological information, and automatic evaluation metrics. We conclude by highlighting the importance of more focus on data collection as well as on the interpretability of some of the available tools.


04/16/25 Elizabeth Spaulding's Defense

Title: The Meaning of Agency and Patiency to Machines and People

Abstract: This thesis establishes the capabilities and limitations of various language modeling technologies on the task of semantic proto-role labeling (SPRL), which assigns relational properties such as volition, awareness, and change of state to event participants in sentences. First, we demonstrate the feasibility and best practices of SPRL learned and inferred jointly with other information extraction tasks. We also show that language model output categorizes entities in sentences consistently across verb-invariant and verb-specific linguistic theories of agency, adding to the growing body of evidence of language model event reasoning capabilities. Further, we introduce a method for adopting semantic proto-role labeling systems and proto-role theory as a tool for analyzing events and participants by using it to quantify implicit human perceptions of agency and experience in text. We discuss the implications of our findings as a whole and identify multiple paths for future work, including deeper annotator involvement in future annotation of SPRL, SPRL analysis on machine-generated text, and cross-lingual studies of SPRL. Pursuing these future directions could improve both the theoretical frameworks and the computational methods, and help uncover how both people and machines structure and process events.

04/23/25 Maggie Perkoff's Defense

Title: Bringing Everyone In: The Future of Collaboration with Conversational AI

Abstract: Collaborative learning enables students to build rapport with their peers while building upon their own knowledge. Teachers can weave collaborative learning opportunities into the classroom by having students work together in small groups. However, these collaborations can break down when students are confused with the material, one person dominates the conversation, or when some of the participants struggle to connect with their peers. Unfortunately, a single teacher cannot attend to the needs of all groups at the same time. In these cases, pedagogical conversational agents (PCAs) have the potential to support teachers and students alike by taking on a collaboration facilitator role. These agents engage students in productive dialog by providing appropriate interventions in a learning setting. With the rapid improvement of large language models (LLMs), these agents can easily be backed by a generative model that can adapt to new domains and variations in communication styles. Integrating LLMs into PCAs requires understanding the desired teacher behavior in different scenarios and constraining the outputs of the model to match them. This dissertation explores how to design, develop, and evaluate PCAs that incorporate LLMs to support students collaborating in small groups. One of the products of this research is the Jigsaw Interactive Agent (JIA), a multi-modal PCA that provides real-time support to students via a chat interface. In this work, we describe the multi-modal system to analyze students' discourse that JIA relies on, test different methods for constraining the JIA outputs in a lab setting, and evaluate the use a retrieval-augmented generation approach to enhance the outputs with curriculum materials. Furthermore, we propose a framework for expanding JIA's capabilities to support neurodivergent students. Ultimately, this dissertation aims to align advancements in LLM-based conversational agents with the perspectives and expertise of the teachers and students who can greatly benefit from their usage.

04/30/25 NAACL -- no meeting
05/07/25 Final's Week -- no meeting

Past Schedules