Difference between revisions of "Meeting Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(62 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' Hybrid - Buchanan 430, and the zoom link below
+
'''Location:'''  
 +
* Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
 +
* Feb 12 onwards: Muenzinger D430, '''except''':
 +
* '''CLASIC Open House (3/12)''' will be in LBB 124
 +
* '''Adam's Defense (4/2)''' will be in LBB 430.
  
'''Time:''' Wednesdays at 10:30am, Mountain Time
+
 
 +
'''Time:''' Wednesdays at 11:30am, Mountain Time
  
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
Line 13: Line 18:
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 01/24/2024 || '''Planning, introductions, welcome!'''
+
| 01/08/2025 || Invited talk: Denis Peskoff https://denis.ai/
 +
 
 +
'''Title''': Perspectives on Prompting
 +
 
 +
'''Abstract''': Natural language processing is in a state of flux.  I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction.  First, I will talk about a paper that evaluated the responses of large language models to domain questions.  Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board.  Last, I will discuss a new paper on identifying generated content in Wikipedia.  In addition, I will highlight a mega-paper I was involved in about prompting.
 +
 
 +
'''Bio''': Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart.  He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service.  His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 01/31/2024 || Brunch Social
+
| 01/15/2025 || '''Planning, introductions, welcome!'''
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02/07/2024 || '''No Meeting''' - Virtual PhD Open House
+
| 01/22/2025 || LSA Keynote -- Chris Potts
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02/14/2024 || ACL paper clinic
+
| 01/23/25 (Thu CS seminar) || Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265
 +
'''Title''': Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI
 +
 
 +
'''Abstract''': A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses.
 +
Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.
 +
 
 +
 
 +
'''Bio''': Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02/21/2024 || Cancelled in favor of LING Circle talk by Professor Gibbs
+
| 01/29/25 || Laurie Jones from Information Science
 +
'''Abstract:''' Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.
 +
 
 +
'''Similarity through creation and consumption:''' Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method
 +
 
 +
'''Collective Memory expression in LLMs:''' As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.
  
 +
'''Bio''': Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02/28/2024 || Short talks by Kathy McKeown and Robin Burke
+
| 02/05/25 || '''Bhargav Shandilya''' 's Area Exam
 
 
Kathy's web page: ''' https://www.cs.columbia.edu/~kathy/
 
  
Title: Addressing Large Language Models that Lie: Case Studies in Summarization
+
'''Title''': From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation
  
Kathleen McKeown
+
'''Abstract''': Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.
Columbia University
 
 
   
 
   
The advent of large language models promises a new level of performance in generation of text of all kinds, enabling generation of text that is far more fluent, coherent and relevant than was previously possible. However, they also introduce a major new problem: they wholly hallucinate facts out of thin air. When summarizing an input document, they may incorrectly intermingle facts from the input, they may introduce facts that were not mentioned at all, and worse yet, they may even make up things that are not true in the real world. In this talk, I will discuss our work in characterizing the kinds of errors that can occur and methods that we have developed to help mitigate hallucination in language modeling approaches to text summarization for a variety of genres.
+
|- style="border-top: 2px solid DarkGray;"
+
| 02/12/25 || '''Michael Ginn''' 's Area Exam
Kathleen R. McKeown is the Henry and Gertrude Rothschild Professor of Computer Science at Columbia University and the Founding Director of the Data Science Institute, serving as Director from 2012 to 2017. In earlier years, she served as Department Chair (1998-2003) and as Vice Dean for Research for the School of Engineering and Applied Science (2010-2012). A leading scholar and researcher in the field of natural language processing, McKeown focuses her research on the use of data for societal problems; her interests include text summarization, question answering, natural language generation, social media analysis and multilingual applications. She has received numerous honors and awards, including 2023 IEEE Innovation in Societal Infrastructure Award, American Philosophical Society Elected member, American Academy of Arts and Science elected member, American Association of Artificial Intelligence Fellow, a Founding Fellow of the Association for Computational Linguistics and an Association for Computing Machinery Fellow. Early on she received the National Science Foundation Presidential Young Investigator Award, and a National Science Foundation Faculty Award for Women. In 2010, she won both the Columbia Great Teacher Award—an honor bestowed by the students—and the Anita Borg Woman of Vision Award for Innovation.
 
 
 
  
Title: Multistakeholder fairness in recommender systems
+
'''Title''': Extracting Automata from Modern Neural Networks
  
Robin Burke
+
'''Abstract:''' It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.
University of Colorado Boulder
 
 
Abstract: Research in machine learning fairness makes two key simplifying assumptions that have proven challenging to move beyond. One assumption is that we can productively concentrate on a uni-dimensional version of the problem: achieving fairness for a single protected group defined by a single sensitive feature. The second assumption is that technical solutions need not engage with the essentially political nature of claims surrounding fairness. I argue that relaxing these assumptions is necessary for machine learning fairness to achieve practical utility. While some recent research in rich subgroup fairness has considered ways to relax the first assumption, these approaches require that fairness be defined in the same way for all groups, which amounts to a hardening of the second assumption. In this talk, I argue for a formulation of machine learning fairness based on social choice and exemplify the approach in the area of recommender systems. Social choice is inherently multi-agent, escaping the single group assumption and, in its classic formulation, places no constraints on agents' preferences. In addition, social choice was developed to formalize political decision-making mechanisms, such as elections, and therefore offers some hope of directly addressing the inherent politics of fairness. Social choice has complexities of its own, however, and the talk will outline a research agenda aimed at understanding the challenges and opportunities afforded by this approach to machine learning fairness.
 
 
Bio: Information Science Department Chair and Professor Robin Burke conducts research in personalized recommender systems, a field he helped found and develop. His most recent projects explore fairness, accountability and transparency in recommendation through the integration of objectives from diverse stakeholders. Professor Burke is the author of more than 150 peer-reviewed articles in various areas of artificial intelligence including recommender systems, machine learning and information retrieval. His work has received support from the National Science Foundation, the National Endowment for the Humanities, the Fulbright Commission and the MacArthur Foundation, among others.
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03/06/2024 || '''Jon Cai''', CU Boulder Computer Science, PhD proposal defense
+
| 02/19/25 || '''Amy Burkhardt''', Cambium Assessment
 +
 
 +
'''Title''': AI and NLP in Education: Research, Implementation, and Lessons from Industry
  
'''Title:'''  
+
'''Abstract''': This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.
Learning Fast and Slow with Semantics
+
 +
'''Bio:''' Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.
  
'''Abstract:'''
+
|- style="border-top: 2px solid DarkGray;"
Abstract Meaning Representation(AMR) is a linguistic formalism that capture and encode semantics of natural language. It is one of the most widely accepted implementation over the truth value based theory of meanings. The impact of AMR has broadened since its introduction from its original design objective to help machine translation to more NLP tasks such as information extraction, summarizations and multi-modality semantic alignments etc. Meanwhile, AMR serves as a theoretical tool for computational semantics researches to advance semantic theories.  Being able to model holistic semantics thus become one of the ultimate goal for NLP and computational linguistics community. Despite the amazing advancement of LLMs in recent years, we still see gaps between shallow and deep semantic understanding of machine learning models. In this proposal, we go through the generalization issues that AMR parsing models renders and our proposed solutions over how could we design new methodologies and analytical tools to help us navigate the labyrinth of modeling semantics via AMR.
+
| 02/26/25 || No Meeting
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03/13/2024 || Veronica Qing Lyu,
+
| 03/05/25 || Benet Post's Talk
  
'''Title:'''Faithful Chain of Thought Reasoning.  (''' https://aclanthology.org/2023.ijcnlp-main.20/ }
+
'''Title''': Multi-Dialectical NLP Tools for Quechua
  
'''Abstract:'''
+
'''Abstract''': This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.
While Chain-of-Thought (CoT) prompting boosts Language Models' (LM) performance on a gamut of complex reasoning tasks, the generated reasoning chain does not necessarily reflect how the model arrives at the answer (aka. faithfulness). We propose Faithful CoT, a reasoning framework involving two stages: Translation (Natural Language query → symbolic reasoning chain) and Problem Solving (reasoning chain → answer), using an LM and a deterministic solver respectively. This guarantees that the reasoning chain provides a faithful explanation of the final answer. Aside from interpretability, Faithful CoT also improves empirical performance: it outperforms standard CoT on 9 of 10 benchmarks from 4 diverse domains, with a relative accuracy gain of 6.3% on Math Word Problems (MWP), 3.4% on Planning, 5.5% on Multi-hop Question Answering (QA), and 21.4% on Relational Inference. Furthermore, with GPT-4 and Codex, it sets the new state-of-the-art few-shot performance on 7 datasets (with 95.0+ accuracy on 6 of them), showing a strong synergy between faithfulness and accuracy.
 
  
'''Bio:'''
+
---
Veronica Qing Lyu is a fifth-year PhD student in Computer and Information Science at the University of Pennsylvania, advised by Chris Callison-Burch and Marianna Apidianaki. Her current research interests lie in the intersection of linguistics and natural language processing, explainable AI, and reasoning. Her paper "Faithful Chain-of-Thought Reasoning" received the Area Chair Award at IJCNLP-AACL 2023 (Interpretability and Analysis of Models for NLP track). She will co-organize a tutorial on “Explanations in the Era of Large Language Models” in NAACL 2024. Before Penn, she studied linguistics as an undergraduate student at the Department of Foreign Languages and Literatures at Tsinghua University.
 
  
|- style="border-top: 2px solid DarkGray;"
+
Anschutz Talk
| 03/20/2024 || Jie Cao, CU Boulder/iSAT, practice talk
 
  
'''Title:''' Modularized Conversational Modeling for Efficient, Controllable, and Robust Real-World Applications
+
'''Title''': Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning
 
   
 
   
'''Abstract:''' Large Language Models~(LLM) make conversational AI accessible to everyone. Its general-purpose design benefits people across different domains, offering a powerful natural language interface to generate text, images, videos, and a broad range of AI services. However, a single monolithic black box is hard to maintain, scale, and control for all our communication goals, and it is often fragile and hallucinatory.  To build robust conversational applications, such as high-stake healthcare and education domains, we must tackle various challenges carefully, e.g., hard-to-obtain data and annotations, controlling the model behaviors, etc. In this talk, I will discuss my research agenda on modularized conversational modeling, focusing on efficient modeling under minimal supervision and controllable modules via neurosymbolic interfaces. I will begin by introducing zero-shot dialogue state tracking via modeling the natural language descriptions of the functionalities of intent and slots and factorizing the tasks for supplementary pretraining. Next, I will describe managing uncertain controls via discrete latent variables for structured prediction and conditional generation tasks. Finally, I will demo a case study on educational AI agent design for a form of collaborative learning called Jigsaw Classroom by showing its challenges in data collection, analysis, evaluation, and deployment issues of noisy speech.  I will end this talk by highlighting the future directions for better modularized conversational modeling and its applications.
+
'''Abstract''': Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.
  
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 03/12/25 || CLASIC Industry Day
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03/27/2024 || '''No Meeting''' - Spring Break
+
| 03/19/25 || Dananjay Srinivas' Area Exam (Late start, 12-1)
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04/03/2024 || CLASIC Industry Day
+
| 03/26/25 || '''No meeting - Spring Break'''
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04/10/2024 || iSAT Dry Run or other?
+
| 04/02/25 || Adam Wiemerslage's Defense
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04/17/2024 || '''Maggie Perkoff''', dissertation proposal defense
+
| 04/09/25 || Ali Marashian's Area Exam
 
 
'''Title:''' Bringing Everyone In: The Future of Collaboration with
 
Conversational AI
 
  
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 04/16/25 || Elizabeth Spaulding's Defense
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04/24/2024 || '''Téa Wright''', Senior Thesis Defense
+
| 04/23/25 || Maggie Perkoff's Defense  
 
 
'''Title:''' PMM Adaptation for Lakota and Dakota with Noisy Data from OCR
 
 
 
'''Abstract:''' This research addresses the challenge of integrating low-resource languages, specifically Lakota and Dakota, into Natural Language Processing (NLP) technologies such as Pretrained Multilingual Models (PMMs). These languages are critically underrepresented in digital linguistic resources, worsening risks of linguistic erosion. Our study seeks to explore this problem by creating authentic, Optical Character Recognition (OCR)-derived datasets to examine the capabilities of PMMs in handling these underrepresented languages. We document and create annotated datasets for these languages to perform a basic evaluation of PMMs on word alignment under realistic, noisy data conditions. We investigate the zero-shot capabilities and analyze how variations in language and the presence of noise from handwriting or formatting in adaptation data affects performance. By contributing datasets for Lakota and Dakota as well as aiming to highlight strengths and weaknesses in existing NLP tools, we hope to promote more inclusive approaches in technological advancements.
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 05/01/2024 || Sagi's Proposal
+
| 04/30/25 || NAACL, maybe no meeting?
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 05/08/2024 || Mary's Prelim
+
| 05/07/25 || Jon Cai's Defense
 
 
  
  
Line 114: Line 124:
  
 
=Past Schedules=
 
=Past Schedules=
 +
* [[Fall 2024 Schedule]]
 +
* [[Spring 2024 Schedule]]
 
* [[Fall 2023 Schedule]]
 
* [[Fall 2023 Schedule]]
 
* [[Spring 2023 Schedule]]
 
* [[Spring 2023 Schedule]]

Latest revision as of 13:23, 11 March 2025

Location:

  • Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
  • Feb 12 onwards: Muenzinger D430, except:
  • CLASIC Open House (3/12) will be in LBB 124
  • Adam's Defense (4/2) will be in LBB 430.


Time: Wednesdays at 11:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
01/08/2025 Invited talk: Denis Peskoff https://denis.ai/

Title: Perspectives on Prompting

Abstract: Natural language processing is in a state of flux. I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction. First, I will talk about a paper that evaluated the responses of large language models to domain questions. Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board. Last, I will discuss a new paper on identifying generated content in Wikipedia. In addition, I will highlight a mega-paper I was involved in about prompting.

Bio: Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart. He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service. His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.

01/15/2025 Planning, introductions, welcome!
01/22/2025 LSA Keynote -- Chris Potts
01/23/25 (Thu CS seminar) Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265

Title: Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI

Abstract: A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses. Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.


Bio: Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.

01/29/25 Laurie Jones from Information Science

Abstract: Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.

Similarity through creation and consumption: Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method

Collective Memory expression in LLMs: As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.

Bio: Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.

02/05/25 Bhargav Shandilya 's Area Exam

Title: From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation

Abstract: Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.

02/12/25 Michael Ginn 's Area Exam

Title: Extracting Automata from Modern Neural Networks

Abstract: It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.

02/19/25 Amy Burkhardt, Cambium Assessment

Title: AI and NLP in Education: Research, Implementation, and Lessons from Industry

Abstract: This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.

Bio: Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.

02/26/25 No Meeting
03/05/25 Benet Post's Talk

Title: Multi-Dialectical NLP Tools for Quechua

Abstract: This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.

---

Anschutz Talk

Title: Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning

Abstract: Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.

03/12/25 CLASIC Industry Day
03/19/25 Dananjay Srinivas' Area Exam (Late start, 12-1)
03/26/25 No meeting - Spring Break
04/02/25 Adam Wiemerslage's Defense
04/09/25 Ali Marashian's Area Exam
04/16/25 Elizabeth Spaulding's Defense
04/23/25 Maggie Perkoff's Defense
04/30/25 NAACL, maybe no meeting?
05/07/25 Jon Cai's Defense


Past Schedules