Difference between revisions of "Meeting Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(103 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' Hybrid - Buchanan 430, and the zoom link below
+
'''Location:'''  
 +
* Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
 +
* Feb 12 onwards: Muenzinger D430, '''except''':
 +
* '''CLASIC Open House (3/12)''' will be in LBB 124
 +
* '''Adam's Defense (4/2)''' will be in LBB 430.
  
'''Time:''' Wednesdays at 10:30am, Mountain Time
+
 
 +
'''Time:''' Wednesdays at 11:30am, Mountain Time
  
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
Line 13: Line 18:
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 08/30/23 || '''Planning, introductions, welcome!'''
+
| 01/08/2025 || Invited talk: Denis Peskoff https://denis.ai/
 +
 
 +
'''Title''': Perspectives on Prompting
 +
 
 +
'''Abstract''': Natural language processing is in a state of flux.  I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction.  First, I will talk about a paper that evaluated the responses of large language models to domain questions.  Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board.  Last, I will discuss a new paper on identifying generated content in Wikipedia.  In addition, I will highlight a mega-paper I was involved in about prompting.
 +
 
 +
'''Bio''': Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart.  He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service.  His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/06/2023 || ACL talk videos (Geoffrey Hinton)
+
| 01/15/2025 || '''Planning, introductions, welcome!'''
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/13/2023 || Ongoing projects talks (Susan: AIDA, KAIROS, DWD)
+
| 01/22/2025 || LSA Keynote -- Chris Potts
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 09/20/2023 || Brunch and garden party outside in the Shakespeare Garden! (no zoom)
+
| 01/23/25 (Thu CS seminar) || Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265
 +
'''Title''': Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI
 +
 
 +
'''Abstract''': A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses.
 +
Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.
  
|- style="border-top: 2px solid DarkGray;"
 
| 09/27/2023 || Felix Zheng - practice talk, Ongoing projects (Martha: UMR. Jim: ISAT. Rehan: Event Coref Projects)
 
  
|- style="border-top: 2px solid DarkGray;"
+
'''Bio''': Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.
| 10/04/2023 || Ongoing projects talks, focus on low-resource and endangered languages (UMR2, LECS lab, NALA)
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/11/2023 || Ongoing projects talks, LECS lab and BLAST lab
+
| 01/29/25 || Laurie Jones from Information Science
 +
'''Abstract:''' Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.
  
|- style="border-top: 2px solid DarkGray;"
+
'''Similarity through creation and consumption:''' Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method
| 10/18/2023 || Téa Wright thesis proposal, BLAST lab
 
  
-----
+
'''Collective Memory expression in LLMs:''' As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.
  
'''Téa Wright'''
+
'''Bio''': Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.
  
'''Research Proposal: Pretrained multilingual model Adaptation for Low Resource Languages with OCR'''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 02/05/25 || '''Bhargav Shandilya''' 's Area Exam
  
Pretrained multilingual models (PMMs) have advanced the natural language processing (NLP) field over recent years, but they often struggle when confronted with low-resource languages. This proposal will explore the challenges of adapting PMMs to such languages, with a current focus on Lakota and Dakota. Of the data available for endangered languages, much of it is in formats that are not machine readable. As a result, endangered languages are left out of NLP technologies. Using optical character recognition (OCR) to digitize these resources is beneficial for this dilemma, but also introduces noise.
+
'''Title''': From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation
  
The goal of this research is to determine how this noise affects model adaptation and performance for zero-shot and few-shot learning for low-resource languages. The project will involve data collection and scanning, annotation for a gold evaluation dataset, and evaluation of multiple language models across different adaptation methods and levels of noise. Additionally, we hope to expand this pipeline to more scripts and languages.
+
'''Abstract''': Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.
 +
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 02/12/25 || '''Michael Ginn''' 's Area Exam
  
The potential implications of this study are broad: generalizability to languages not included in the study as well as providing insight into how noise affects model adaptation and the types of noise that are most harmful. This project aims to address the unique challenges of Lakota and Dakota as well as develop the field’s understanding of how models may be adapted to include low-resource languages, working towards more inclusive NLP technologies.
+
'''Title''': Extracting Automata from Modern Neural Networks
  
 +
'''Abstract:''' It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10/25/2023 || BLAST lab; and then Daniel Acuña (Daniel's talk will start at 11:20)
+
| 02/19/25 || '''Amy Burkhardt''', Cambium Assessment
  
'''[https://scienceofscience.org/ Daniel Acuña]'''
+
'''Title''': AI and NLP in Education: Research, Implementation, and Lessons from Industry
  
'''The differential and irreplaceable contributions of academia and industry to AI research'''
+
'''Abstract''': This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.
 
 
Striking recent advances by industry’s artificial intelligence (AI) have stunned the academic world, making us rethink whether academia should just follow industry’s lead. Due to its open publication, citation, and code-sharing culture, AI offers a rare opportunity to investigate whether these recent advances are outliers or something more systematic. In the present study, we investigate the impact and novelty of academic and industry AI research across 58 conferences—the primary publication medium of AI—involving 292,185 articles and 524 state-of-the-art models from 1995 to 2020. Our findings reveal an overall seismic shift in impact and novelty metrics, which started around 2015, presumably motivated by deep learning. In the most recent measures, an article published by an exclusively industry team dominates impact, with a 73.78 percent higher chance of being highly cited, 12.80 percent higher chance of being citation-disruptive, and several times more likely to produce state-of-the-art models. In contrast, we find that academic teams dominate novelty, having a striking 2.8 times more likelihood of producing novel, atypical work. Controlling for potential confounding factors such as subfield, team size, seniority, and prestige, we find that academia–industry collaborations are unable to simultaneously replicate the impact and novelty of non-collaborative teams, suggesting each environment offers irreplaceable contributions to advance AI.
 
 
   
 
   
 +
'''Bio:''' Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/1/2023 || Guest speaker: [https://jiangtianyu.com/ Tianyu Jiang], University of Cincinnati
+
| 02/26/25 || No Meeting
  
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 03/05/25 || Benet Post's Talk
  
'''Tianyu Jiang, Assistant Professor, Dept. of Computer Science, University of Cincinnati'''
+
'''Title''': Multi-Dialectical NLP Tools for Quechua
  
'''Commonsense Knowledge of Prototypical Functions for Natural Language Processing'''
+
'''Abstract''': This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.
 
Recent advances in natural language processing (NLP) have enabled computers to understand and generate natural language to a remarkable degree. However, it is still a big challenge for computers to "read between the lines" as we humans do. People often omit a lot of information in daily communication, but we have no difficulty understanding each other because our commonsense knowledge can help us make inferences. In this research, we focus on one specific type of commonsense knowledge that people use in everyday living: "functional knowledge". People go to different places for a common set of goals: people go to schools to study, go to stores to buy clothing, and go to restaurants to eat. Comparably, people create and use physical objects for different purposes: knives are for cutting, cars are for transportation, and phones are for communication. I will first introduce how we can automatically learn this type of knowledge, and then demonstrate how to utilize this prior knowledge of functions in two downstream applications including sentence-level understanding and visual activity recognition.
 
 
'''Bio:''' Tianyu Jiang is an Assistant Professor in the Computer Science department at the University of Cincinnati. He received his Ph.D. in Computer Science from the University of Utah, advised by Ellen Riloff. His main research interests are in the area of Natural Language Processing (NLP), specifically in semantics, commonsense knowledge, multimodality, and information extraction.
 
  
 +
---
  
|- style="border-top: 2px solid DarkGray;"
+
Anschutz Talk
| 11/8/2023 || Luke Gessler, CU Boulder Computer Science
 
  
'''Low-resource Monolingual Transformer Language Models'''
+
'''Title''': Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning
Since the publication of BERT in 2018, pretrained Transformer language models (TLMs) have been a foundational requirement for almost all natural language processing systems. High-quality TLMs are easily attainable for languages with vast amounts of data, such as English, but for all but the top 100 most data-rich languages, it is very difficult to train TLMs with high quality. Most work aimed at addressing this issue has taken a multilingual approach, but in this talk, we take up the question of whether low-resource TLMs could be trained effectively using only data drawn from one language. First, we describe a novel training algorithm for monolingual low-resource TLMs which characteristically involves reducing model size and using multitask learning with syntactically labeled data. Second, we describe a complementary training algorithm which uses contrastive learning and a syntactically-guided self-attention mechanism to provide syntactic inductive bias to TLMs. Third, we present a new TLM evaluation dataset, extensible to any language with a New Testament translation, aimed at addressing the severe lack of model evaluation resources in low-resource settings. To our knowledge, this is the first major effort to develop low-resource monolingual TLMs, and our results show that our methods are often more effective than any other competing approach to provide TLMs for low-resource languages.
+
 +
'''Abstract''': Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/15/2023 || Jie Cao, iSAT
+
| 03/12/25 || CLASIC Industry Day
  
'''Inductive Biases for Deep Linguistic Structured Prediction with Independent Factorization'''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 03/19/25 || Dananjay Srinivas' Area Exam (Late start, 12-1)
  
Discovering the underlying structure of text can enable rigorous analysis, easier knowledge organization, and programmable reasoning. The no-free-lunch theorem underscores that the search for appropriate inductive biases that influence hypothesis selection in machine learning is necessary to obtain generalization. This is also true for deep learning models to predict intricate combinatory structures. We ground our studies on deep structured prediction on both broad-coverage linguistic representations and application-specific representations.
+
|- style="border-top: 2px solid DarkGray;"
               
+
| 03/26/25 || '''No meeting - Spring Break'''
Due to the compositionality of natural language, many language representations are also defined to be compositional structures. However, we need to make the right design choices to factorize the input and output, and then model the correlations between their decomposed parts. We study structural inductive biases by designing factorization-oriented learning and reasoning mechanisms at the lexical, phrasal, and sentential levels.
 
  
Furthermore, human-encoded knowledge with language can also be used as valuable inductive biases. We study how to use natural language descriptions to represent the meaning of output symbols (intents and slots) in task-oriented dialogue state tracking, which helps to generalize to unseen domains and services. We offer detailed comparative studies on how to use natural language as inductive biases by investigating encoding strategies, supplementary pretraining, and homogeneous/heterogeneous evaluations.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 04/02/25 || Adam Wiemerslage's Defense
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/22/2023 || *** fall break ***
+
| 04/09/25 || Ali Marashian's Area Exam
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11/29/2023 || EMNLP practice talks
+
| 04/16/25 || Elizabeth Spaulding's Defense
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12/06/2023 || Adam's Proposal
+
| 04/23/25 || Maggie Perkoff's Defense
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12/13/2023 || Elizabeth's Proposal
+
| 04/30/25 || NAACL, maybe no meeting?
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 +
| 05/07/25 || Jon Cai's Defense
 +
 +
 
|}
 
|}
 
  
 
=Past Schedules=
 
=Past Schedules=
 +
* [[Fall 2024 Schedule]]
 +
* [[Spring 2024 Schedule]]
 +
* [[Fall 2023 Schedule]]
 
* [[Spring 2023 Schedule]]
 
* [[Spring 2023 Schedule]]
 
* [[Fall 2022 Schedule]]
 
* [[Fall 2022 Schedule]]

Latest revision as of 13:23, 11 March 2025

Location:

  • Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
  • Feb 12 onwards: Muenzinger D430, except:
  • CLASIC Open House (3/12) will be in LBB 124
  • Adam's Defense (4/2) will be in LBB 430.


Time: Wednesdays at 11:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
01/08/2025 Invited talk: Denis Peskoff https://denis.ai/

Title: Perspectives on Prompting

Abstract: Natural language processing is in a state of flux. I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction. First, I will talk about a paper that evaluated the responses of large language models to domain questions. Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board. Last, I will discuss a new paper on identifying generated content in Wikipedia. In addition, I will highlight a mega-paper I was involved in about prompting.

Bio: Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart. He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service. His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges.

01/15/2025 Planning, introductions, welcome!
01/22/2025 LSA Keynote -- Chris Potts
01/23/25 (Thu CS seminar) Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265

Title: Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI

Abstract: A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses. Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.


Bio: Chenhao Tan is an associate professor of computer science and data science at the University of Chicago, and is also a visiting scientist at Abridge. He obtained his PhD degree in the Department of Computer Science at Cornell University and bachelor's degrees in computer science and in economics from Tsinghua University. Prior to joining the University of Chicago, he was an assistant professor at the University of Colorado Boulder and a postdoc at the University of Washington. His research interests include human-centered AI, natural language processing, and computational social science. His work has been covered by many news media outlets, such as the New York Times and the Washington Post. He also won a Sloan research fellowship, an NSF CAREER award, an NSF CRII award, a Google research scholar award, research awards from Amazon, IBM, JP Morgan, and Salesforce, a Facebook fellowship, and a Yahoo! Key Scientific Challenges award.

01/29/25 Laurie Jones from Information Science

Abstract: Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community.

Similarity through creation and consumption: Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method

Collective Memory expression in LLMs: As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment.

Bio: Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms.

02/05/25 Bhargav Shandilya 's Area Exam

Title: From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation

Abstract: Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts.

02/12/25 Michael Ginn 's Area Exam

Title: Extracting Automata from Modern Neural Networks

Abstract: It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks.

02/19/25 Amy Burkhardt, Cambium Assessment

Title: AI and NLP in Education: Research, Implementation, and Lessons from Industry

Abstract: This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation.

Bio: Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University.

02/26/25 No Meeting
03/05/25 Benet Post's Talk

Title: Multi-Dialectical NLP Tools for Quechua

Abstract: This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally.

---

Anschutz Talk

Title: Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning

Abstract: Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations.

03/12/25 CLASIC Industry Day
03/19/25 Dananjay Srinivas' Area Exam (Late start, 12-1)
03/26/25 No meeting - Spring Break
04/02/25 Adam Wiemerslage's Defense
04/09/25 Ali Marashian's Area Exam
04/16/25 Elizabeth Spaulding's Defense
04/23/25 Maggie Perkoff's Defense
04/30/25 NAACL, maybe no meeting?
05/07/25 Jon Cai's Defense


Past Schedules