Meeting Schedule
Location:
- Jan 8 - Feb 5: Lucile Berkeley Buchanan Building (LBB) 430
- Feb 12 onwards: Muenzinger D430, except:
- CLASIC Open House (3/12) will be in LBB 124
- Adam's Defense (4/2) will be in LBB 430.
Time: Wednesdays at 11:30am, Mountain Time
Zoom link: https://cuboulder.zoom.us/j/97014876908
Date | Title |
---|---|
01/08/2025 | Invited talk: Denis Peskoff https://denis.ai/
Title: Perspectives on Prompting Abstract: Natural language processing is in a state of flux. I will talk about three recent papers appearing in ACL and EMNLP conferences that are a zeitgeist of the current uncertainty of direction. First, I will talk about a paper that evaluated the responses of large language models to domain questions. Then, I will talk about a paper that used prompting to study the language of the Federal Reserve Board. Last, I will discuss a new paper on identifying generated content in Wikipedia. In addition, I will highlight a mega-paper I was involved in about prompting. Bio: Denis Peskoff just finished a postdoc at Princeton University working with Professor Brandon Stewart. He completed his PhD in computer science at the University of Maryland with Professor Jordan Boyd-Graber and a bachelor’s degree at the Georgetown School of Foreign Service. His research has incorporated domain experts—leading board game players, Federal Reserve Board members, doctors, scientists—to solve natural language processing challenges. |
01/15/2025 | Planning, introductions, welcome! |
01/22/2025 | LSA Keynote -- Chris Potts |
01/23/25 (Thu CS seminar) | Chenhao Tan, CS Colloquium, 3:30pm, ECCR 265
Title: Alignment Beyond Human Preferences: Use Human Goals to Guide AI towards Complementary AI Abstract: A lot of recent work has been dedicated to guide pretrained AI with human preferences. In this talk, I argue that human preferences are often insufficient for complementing human intelligence and demonstrate the key role of human goals with two examples. First, hypothesis generation is critical for scientific discoveries. Instead of removing hallucinations, I will leverage data and labels as a guide to lead hallucination towards effective hypotheses. Second, I will use human perception as a guide for developing case-based explanations to support AI-assisted decision making. In both cases, faithfulness is "compromised" for achieving human goals. I will conclude with future directions towards complementary AI.
|
01/29/25 | Laurie Jones from Information Science
Abstract: Laure is coming to seek feedback about two projects she's been working on from the Boulder NLP community. Similarity through creation and consumption: Initial work looking at similarity between Wikipedia articles surrounding the Arab Spring present diverging perspective in English and Arabic. However, this was not identified through content analysis but rather through leveraging other digital trace data sources such as the blue links (outlinks) and inter-language links (ILLs). I am hoping to identify the Arab Spring article’s ecosystem to inform relationships between articles through the lens of creation and consumption. I am planning to leverage network analysis and graph theory to identify articles that are related along shared editors, outlinks, and clickstreams. Then with the pareto principle, identify densely correlated articles and present an ecosystem that isn't exclusively correlated through content. This I hope can then inform language models, providing additional language-agnostic contextualization. I would love feedback on the application and theoretical contextualization of this method Collective Memory expression in LLMs: As LLMs get integrated into search engines and other accessible methods of querying, they will get utilized more as a historical documentation and referenced as fact. Because they are built upon sources that include bias of not only political perspective but also linguistic and geographical perspectives, the narratives these LLMs will present about the past is collectively informed, its own collective memory. However, what does that mean when you transcend some of these perspectives? Utilizing prompt engineering, I am investigating the 2 widely used large language models, Chat-GPT and Gemini. I hope to cross reference prompts, feigning user identification and cross-utilizing perspectives based on country of origin, language, and temporal framing. I will then utilize a similarity metric to contrast LLM responses, identifying discrepancies and similarities across these perspectives. This much more in its infancy and I'd love possible perspectives on theoretical lineage and cross-language LLM assessment. Bio: Laurie Jones is a PhD student in Information Science. She has a BS in Computer Science and a minor in Arabic from Washington and Lee University in Virginia. Now under Brian Keegan in information science and Alexandra Siegel in political science, Laurie does cross-language cross-platform analysis of English and Arabic content asymmetry. She uses computational social science methods like natural language processing and network analysis as well as her knowledge of the Arabic language to understand collective memory and conflict power processes across languages and platforms. |
02/05/25 | Bhargav Shandilya 's Area Exam
Title: From Relevance to Reasoning - Evaluation Paradigms for Retrieval Augmented Generation Abstract: Retrieval Augmented Generation (RAG) has emerged as a cost-effective alternative to fine-tuning Large Language Models (LLMs), enabling models to access external knowledge for improved performance on domain-specific tasks. While RAG architectures are well-studied, developing robust evaluation frameworks remains challenging due to the complexity of assessing both retrieval and generation components. This survey examines the evolution of RAG evaluation methods, from early metrics like KILT scores to sophisticated frameworks such as RAGAS and ARES, which assess multiple dimensions including context relevance, answer faithfulness, and information integration. Through the lens of documentary linguistics, this survey analyzes how these evaluation paradigms can be adapted for low-resource language applications, where challenges like noisy data and inconsistent document structures necessitate specialized evaluation approaches. By synthesizing insights from foundational studies, this study provides a systematic analysis of evaluation strategies and their implications for developing more robust, adaptable RAG systems across diverse linguistic contexts. |
02/12/25 | Michael Ginn 's Area Exam
Title: Extracting Automata from Modern Neural Networks Abstract: It may be desirable to extract an approximation of a trained neural network as a finite-state automaton, for reasons including interpretability, efficiency, and predictability. Early research on recurrent neural networks (RNNs) proposed methods to convert trained RNNs into finite- state automata by quantizing the continuous hidden state space of the RNN into a discrete state space. However, these methods depend on the assumption of a rough equivalence between these state spaces, which is less straightforward for modern recurrent networks and transformers. In this survey, we review methods for automaton extraction, specifically highlighting the challenges and proposed methods for extraction with modern neural networks. |
02/19/25 | Amy Burkhardt, Cambium Assessment
Title: AI and NLP in Education: Research, Implementation, and Lessons from Industry Abstract: This talk will provide a behind-the-scenes look at conducting research on AI in education within an industry setting. First, I’ll offer a broader context of working on a machine learning team, highlighting the diverse skill sets and projects involved. Then, through a case study of a NLP-based writing feedback tool, I’ll walk through how we built and evaluated the tool, sharing key lessons learned from its implementation. Bio: Amy Burkhardt is a Senior Scientist at Cambium Assessment, specializing in AI applications for education. She holds a PhD in Research and Evaluation Methodology from the University of Colorado, as well as a certificate in Human Language Technology. Prior to joining Cambium Assessment, she served as the Director of Research and Partnerships for the Rapid Online Assessment of Reading (ROAR) at Stanford University. |
02/26/25 | No Meeting |
03/05/25 | Benet Post's Talk
Title: Multi-Dialectical NLP Tools for Quechua Abstract: This preliminary study introduces a multi- dialectical NLP approach for Quechua dialects that combines neural architectures with symbolic linguistic knowledge, specifically leveraging lexical markers and polypersonal verbal agreement to tackle low-resource and morphologically complex data. By embedding rule-based morphological cues into a transformer-based classifier, this work significantly outperforms purely data-driven or statistical baselines. In addition to boosting classification accuracy across more than twenty Quechuan varieties, the method exposes previously undocumented linguistic phenomena in respect to polypersonal verbal agreement phenomena. The findings highlight how neurosymbolic models can advance both language technology and linguistic research by respecting the dialectal diversity within an under-resourced language family, ultimately raising the bar for dialect-sensitive NLP tools designed to empower speakers of these languages digitally. --- Anschutz Talk Title: Evaluating LLMs for Long Context Clinical Summarization with Temporal Reasoning Abstract: Recent advances in LLMs have shown potential in clinical text summarization, but their ability to handle long patient trajectories with multi-modal data spread across time remains underexplored. This study systematically evaluates several state-of-the-art open-source LLMs and their Retrieval Augmented Generation (RAG) variants on long-context clinical summarization. We examine their ability to synthesize structured and unstructured Electronic Health Records (EHR) data while reasoning over temporal coherence, by re-engineering existing tasks, including discharge summarization and diagnosis prediction from two publicly available EHR datasets. Our results indicate that long context window improves input integration, but do not consistently enhance clinical reasoning, and LLMs are still struggling with temporal progression and rare disease prediction. While RAG show improvements in hallucination in some cases, it does not fully address these limitations. |
03/12/25 | CLASIC Industry Day |
03/19/25 | Dananjay Srinivas' Area Exam (Late start, 12-1) |
03/26/25 | No meeting - Spring Break |
04/02/25 | Adam Wiemerslage's Defense |
04/09/25 | Ali Marashian's Area Exam |
04/16/25 | Elizabeth Spaulding's Defense |
04/23/25 | Maggie Perkoff's Defense |
04/30/25 | NAACL, maybe no meeting? |
05/07/25 | Jon Cai's Defense
|
Past Schedules
- Fall 2024 Schedule
- Spring 2024 Schedule
- Fall 2023 Schedule
- Spring 2023 Schedule
- Fall 2022 Schedule
- Spring 2022 Schedule
- Fall 2021 Schedule
- Spring 2021 Schedule
- Fall 2020 Schedule
- Spring 2020 Schedule
- Fall 2019 Schedule
- Spring 2019 Schedule
- Fall 2018 Schedule
- Summer 2018 Schedule
- Spring 2018 Schedule
- Fall 2017 Schedule
- Summer 2017 Schedule
- Spring 2017 Schedule
- Fall 2016 Schedule
- Spring 2016 Schedule
- Fall 2015 Schedule
- Spring 2015 Schedule
- Fall 2014 Schedule
- Spring 2014 Schedule
- Fall 2013 Schedule
- Summer 2013 Schedule
- Spring 2013 Schedule
- Fall 2012 Schedule
- Spring 2012 Schedule
- Fall 2011 Schedule
- Summer 2011 Schedule
- Spring 2011 Schedule
- Fall 2010 Schedule
- Summer 2010 Schedule
- Spring 2010 Schedule
- Fall 2009 Schedule
- Summer 2009 Schedule
- Spring 2009 Schedule
- Fall 2008 Schedule
- Summer 2008 Schedule