Difference between revisions of "Fall 2022 Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(61 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' 279, Fleming Building.
+
'''Location:''' Hybrid - Buchanan 126, and the zoom link below
  
'''Time:''' 10:30am, Mountain Time
+
'''Time:''' Wednesdays at 10:30am, Mountain Time
  
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
 
'''Zoom link:''' https://cuboulder.zoom.us/j/97014876908
Line 11: Line 11:
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 9.1.21 || Planning, introductions, welcome!
+
| 24.08.22 || '''Planning, introductions, welcome!'''
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 31.08.22 || PhD students present! Ongoing projects and opportunities
 +
 
 +
* iSAT: Jim, John, Maggie, Ananya, Zoe
 +
* AIDA: Elizabeth, Rehan, Sijia
  
CompSem meetings will be hybrid this semester - in person at Fleming 279, and online here: https://cuboulder.zoom.us/j/97014876908
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 9.8.21 || '''10am (NOTE: special start time)'''
+
| 07.09.22 || PhD students continue to present!
  
Yoshinari Fujinuma thesis defense
+
20 minutes per project
  
''Analysis and Applications of Cross-Lingual Models in Natural Language Processing''
+
* KAIROS: Susan, Reece
 +
* AmericasNLP: Katharina, Alexis, Abteen
 +
* FOLTA: Alexis, Bhargav, Enora, Michael
 +
* StoryGenerations: Katharina, Maria, Trevor
  
Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models.
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 9.15.21 || ACL best paper recaps
+
| 14.09.22 || And more presentations from our fabulous students and colleagues!
|- style="border-top: 2px solid DarkGray;"
+
 
| 9.22.21 || Introduction to AI Institute (short talks)
+
* Assisted interviewing: DJ, Abe
 +
* THYME:
 +
* UMR: Martha and Jim
 +
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 9.29.21 || *** CANCELLED ***
+
| 21.09.22 || lunch at the Taj
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10.6.21 || Invited talk: [https://artemisp.github.io/ Artemis Panagopoulou], University of Pennsylvania
+
| 28.09.22 || James Pustejovsky
  
[https://artemisp.github.io/publication/2020-08-01-metaphors-TE Link to thesis page]
+
Dense Paraphrasing for Textual Enrichment: Question Answering and Inference
  
''Metaphor and Entailment: Looking at metaphors through the lens of textual entailment''
+
Abstract: Much of the current computational work on inference in NLP can be associated with one of two techniques: the first focuses on a specific notion of text-based question answering (QA), using large pre-trained language models (LLMs). To examine specific linguistic properties present in the model, « probing tasks » (diagnostic classifiers) have been developed to test capabilities that the LLM demonstrates on interpretable semantic inferencing tasks, such as age and object comparisons, hypernym conjunction, antonym negation, and others. The second is Knowledge Graph-based inference and QA, where triples are mined from Wikipedia, ConceptNet, WikiData, and other non-corpus resources, and then used for answering questions involving multiple components of the KG (multi-hop QA). While quite impressive with benchmarked metrics in QA, both techniques are completely confused by (a) syntactically missing semantic content, and (b) the semantics accompanying the consequences of events and actions in narratives.
 +
In this talk, I discuss a model we have developed to enrich the surface form of texts, using type-based semantic operations to « textually expose » the deeper meaning of the corpus that was used to make the original embeddings in the language model. This model, Dense Paraphrasing, is a linguistically-motivated, textual enrichment strategy, that textualizes the compositional operations inherent in a semantic model, such as Generative Lexicon Theory or CCG. This involves broadly three kinds of interpretive processes: (i) recognizing the diverse variability in linguistic forms that can be associated with the same underlying semantic representation (paraphrases); (ii) identifying semantic factors or variables that accompany or are presupposed by the lexical semantics of the words present in the text, through dropped, hidden or shadow arguments; and (iii) interpreting or computing the dynamic consequences of actions and events in the text. After performing these textual enrichment algorithms, we fine-tune the LLM which allows more robust inference and QA task performance.
  
Metaphors are very intriguing elements of human language that are surprisingly prevalent in our everyday communications. Humans are pretty good at understanding metaphors, even if it is the first time they encounter them. Empirical studies indicate that 20% of our daily language use is metaphorical. Naturally, the ubiquity of metaphors draw the attention of psychologists who showed that the human brain processes conventional metaphors in the same speed as literal language. Nevertheless, the computational linguistics literature consistently treats metaphors
+
James Pustejovsky, Professor
as a separate domain to literal language. Earlier work has shown that traditional pipelines do not perform well on metaphoric datasets. Synchronously, the literature on computational understanding of metaphors has largely focused on developing concrete metaphor detection systems, coupled with interpretation systems targeted solely on metaphors. This tendency has presented across various aspects of the field, such as the purposeful exclusion of figurative language from large scale datasets. This study investigates the potential of constructing systems that can jointly handle metaphoric and literal sentences by leveraging the newfound capabilities of deep learning systems.
+
TJX Feldberg Chair in Computer Science
 +
Department of Computer Science
 +
Chair of CL MS Program
 +
Chair of Linguistics Program
  
We narrow the scope of the report, following earlier work, to evaluate deep learning systems fine-tuned on the task of textual entailment (TE). We argue that TE is a task naturally suited to the interpretation of metaphoric language. We show that TE systems can improve significantly in metaphoric performance by being fine-tuned on a small dataset with metaphoric premises. Even though the improvement in performance on metaphors is typically accompanied by a drop in performance on the original dataset we note that auto-regressive models seem to show a smaller drop in performance on literal examples compared to other types of models.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 05.10.22 || Martha, COLING keynote // Daniel poster presentation dry run
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 12.10.22 || COLING / paper review
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 19.10.22 || CLASIC Open House, 11am-1pm
  
 +
This is largely an informational event for students interested in the CLASIC (Computational Linguistics, Analytics, Search, and InformatiCs) Master's program and/or the new LING to CLASIC BAM program. The event will include short talks from graduates of the CLASIC program, and then lunch. Please [https://forms.gle/PSNihAGyX7y8aVGV8 register] if you're interested - until 5pm Monday October 17th.
 +
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10.13.21 || Invited guest - in person!: [https://cs.jhu.edu/~arya Arya McCarthy], Johns Hopkins University
+
| 21.10.22 || '''FRIDAY''' Carolyn Rose, Carnegie Mellon(ICS/iSAT event)
  
''Kilolanguage Processing by Projection''
+
'''Special time and place:''' 11am-12:15pm MT, Muenzinger D430 / [https://cuboulder.zoom.us/j/97658438049 zoom]
  
The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level.
+
'''Title:''' A Layered Model of Learning during Collaborative Software Development: Programs, Programming, and Programmers
First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance.
+
 
Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We begin by creating a corpus of the Bible in over 1600 languages. Independent web-scraping and aggregation, alignment, and normalization create a ripe multilingual dataset. We then show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.
+
Collaborative software development, whether synchronous or asynchronous, is a creative, integrative process in which something new comes into being through the joint engagement, something new that did not fully exist in the mind of any one person prior to the engagement. One can view this engagement from a macro-level perspective, focusing on large scale development efforts of 100 or more developers, organized into sub-teams, producing collections complex software products like Mozilla. Past work in the area of software engineering has explored the symbiosis between the management structure of a software team and the module structure of the resulting software. In this talk, we focus instead on small scale software teams of between 2 and 5 developers, working on smaller-scale efforts of between one hour and 9 months, through more fine grained analysis of collaborative processes and collaborative products. In this more tightly coupled engagement within small groups, we see again a symbiosis between people, processes, and products.  This talk bridges between the field of Computer-Supported Collaborative Learning and the study of software teams in the field of Software Engineering by investigating the inner-workings of small scale collaborative software development. Building on over a decade of AI-enabled collaborative learning experiences in the classroom and online, in this talk we report our work in progress beginning with classroom studies in large online software courses with substantial teamwork components. In our classroom work, we have adapted an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience. At the core of this work are process mining technologies that enable real time monitoring and just-in-time support for learning during productive work.  Recent work on deep-learning approaches to program understanding bridge between investigations of processes and products.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10.20.21 || *** CANCELLED ***
+
| 26.10.22 || '''No meeting -- go to Barbara's talk on Friday, and Nathan's on Monday!'''
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10.27.21 || Invited talk: [http://www.lisamiracchi.com/ Lisa Miracchi, University of Pennsylvania]
+
| 28.10.22 || '''FRIDAY''' [https://cs.uic.edu/profiles/barbara-di-eugenio/ Barbara diEugenio] (ICS talk, noon)
  
''The Practical Emergence Approach to Meaning: Avoiding Echo Chambers''
+
'''Special time and place:''' 12-1:30pm MT, Muenzinger D430 / [https://cuboulder.zoom.us/j/97658438049 zoom]
  
I argue for what I call a stance of practical emergence towards intelligence and related kinds such as knowledge and linguisitic competence. Practical emergence is a commitment in explanatory practice to treating higher-level kinds as distinct from lower-level kinds, such that they cannot be reductively identified in lower-level terms, and to assuming that explanations of them in terms of lower-level kinds may be substantive, in that behavior of higher-level kinds cannot be logically or mathematically deduced from lower-level behavior. I’ll flesh out this stance using the Generative Framework for explaining how higher-level kinds obtain in virtue of lower-level kinds. Then I’ll show how this stance of practical emergence, bolstered by the Generative Framework, helps us avoid the pitfall of creating echo chambers, where the reductive hypotheses about intelligence kinds are amplified, not because they are empirically supported, but because they allow for simpler interdisciplinary communication. I'll use as examples recent work on vector representations of word meanings (such as Word2Vec) and alleged implications for heuristic reasoning. Lastly, I’ll discuss some important ethical implications of these echo chambers. I'll argue that the more ethically responsible approach is to adopt practical emergence, because that will help us proactively identify and address the social and ethical implications of differences between vector representations and manipulations of them, on the one hand, and genuinely intelligent semantic knowledge and reasoning, on the other.
+
'''Title:''' Knowledge Co-Construction and Initiative in Peer Learning for introductory Computer Science
  
|- style="border-top: 2px solid DarkGray;"
+
Peer learning has often been shown to be an effective mode of learning for all participants; and knowledge co-construction (KCC), when participants work together to build knowledge, has been shown to correlate with learning in peer interactions. However, KCC is hard to identify and/or support  computationally. We conducted an extensive analysis of  a corpus of peer-learning interactions in introductory Computer Science: we found a strong relationship between KCC and the linguistic notion of initiative shift,  and moderate correlations between initiative shifts and learning. The results of this analysis were incorporated into KSC-PaL, an artificial agent that can collaborate with a human student via natural-language dialog and actions within a graphical workspace. Evaluations of KSC-PaL showed that the agent was able to encourage shifts in initiative in order to promote learning and that students learned using the agent. This work (joint with Cindy Howard, now at Lewis University), was part of  two larger projects that studied tutoring dialogues and peer learning interactions for introductory Computer Science, and that resulted in two Intelligent Tutoring Systems, iList and Chiqat-Tutor.
| 11.3.21 || EMNLP preview/practice talks
 
  
1. Daniel Chen
+
Barbara Di Eugenio, PhD,
 +
Professor and Director of Graduate Studies,
 +
Department of Computer Science,
 +
University of Illinois, Chicago
  
''AutoAspect: Automatic Annotation of Tense and Aspect for Uniform Meaning Representations''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 31.10.22 || '''MONDAY''' [https://people.cs.georgetown.edu/nschneid/ Nathan Schneider] (Ling Circle Talk, 4pm)
  
We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels.
+
'''Special time and place:''' 4pm, UMC 247 / [https://cuboulder.zoom.us/j/94447045287 zoom] (passcode: 795679)
  
2. Shiran Dudy
+
'''Title:''' The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation
  
''Refocusing on Relevance: Personalization in NLG''
+
In most linguistic meaning representations that are used in NLP, prepositions fly under the radar. I will argue that they should instead be put front and center given their crucial status as linkers of meaning—whether for spatial and temporal relations, for predicate-driven roles, or in special constructions. To that end, we have sought to characterize and disambiguate semantic functions expressed by prepositions and possessives in English (Schneider et al., ACL 2018), and similar markers in other languages (Mandarin Chinese, Korean, Hindi, and German). This approach can be broadened to other constructions and integrated in full-sentence lexical semantic tagging as well as graph-structured meaning representation parsing. Other investigations include crowdsourced annotation, contextualized preposition embeddings, and preposition use in fluent nonnative English.
  
Many NLG tasks such as summarization, dialogue response, or open domain question answering focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user’s intent or context of work is not easily recoverable based solely on that source text - a scenario that we argue is more of the rule than the exception. In this work, we argue that NLG systems in general should place a much higher level of emphasis on making use of additional context, and suggest that relevance(as used in Information Retrieval) be thought of as a crucial tool for designing user-oriented text-generating tasks. We further discuss possible harms and hazards around such personalization, and argue that value-sensitive design represents a crucial path forward through these challenges.
+
Nathan Schneider,
 +
Associate Professor,
 +
Depts. of Computer Science and Linguistics,
 +
Georgetown University
  
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11.10.21 || EMNLP - no meeting
+
| 02.11.22 || *** No meeting - UMRs team at Brandeis ***
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11.17.21 || No meeting
+
| 09.11.22 || Practice talks
|- style="border-top: 2px solid DarkGray;"
 
| 11.24.21 || Fall break - no meeting
 
|- style="border-top: 2px solid DarkGray;"
 
| 12.1.21 || NOTE: Today's meeting will be zoom only! (https://cuboulder.zoom.us/j/97014876908)
 
  
Invited talk: [https://www.abehandler.com/ Abe Handler, CU Information Science]
+
* Ananya Ganesh: "CHIA: CHoosing Instances to Annotate for Machine Translation" (accepted to Findings of EMNLP; practicing for the video recording)
 +
 
 +
* Abteen Ebrahimi: "Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas" (NeurIPS competition; practicing for the in-person presentation of the competition)
  
''Natural Language Processing for Lexical Corpus Analysis''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 16.11.22 || Maggie Perkoff, prelim
  
People have been analyzing documents by reading keywords in context for centuries. Traditional approaches like paper concordances or digital keyword-in-context viewers display all occurrences of a single word from a corpus vocabulary amid immediately surrounding tokens or characters, to show readers how individual lexical items are used in bodies of text. We propose that these common tools are one particular application of a more general approach to analyzing documents, which we define as lexical corpus analysis.
+
'''Title:''' Who said it best? A Thematic Analysis of Open Domain Response Generation Systems
  
In this talk, we will introduce lexical corpus analysis and then present a collection of NLP methods and NLP systems for lexically-oriented corpus investigation. For instance, we will present new methods for single-sentence summarization (sentence compression), and test the application of these methods in our ClioQuery system, which is designed to help historians answer qualitative research questions from news archives. Similarly, we will present the NPFST method for efficiently extracting phases from a corpus, and demonstrate application of NPFST in the Rookie system, designed to help journalists learn about new topics.
+
Open domain response generation is a rapidly increasing field of natural language processing research. This type of system can be embedded in social chatbots, teaching assistants, and even therapy sessions.  The open domain space is defined by the absence of a specific task that the user is trying to achieve by engaging with the conversational agent. A variety of methods have been proposed to improve the capability of these models including knowledge grounded systems, persona embeddings, and transformer models trained on vast datasets.  Some of these systems use automated metrics for evaluation alongside human annotators for response quality - but there is no standard assessment for what makes an open domain dialogue system 'better' than any other.  This paper seeks to identify broad categories of response generation systems and analyze them based on different themes in open domain conversation: engagement, consistency, correctness, personality, and toxicity.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 12.8.21 || Abhidip Bhattacharyya proposal defense
+
| 23.11.22 || *** No meeting - fall break ***
'''10am MDT(NOTE: special start time)'''
+
|- style="border-top: 2px solid DarkGray;"
''Multimodal Semantic Role Labeling and its Application''
+
| 30.11.22|| '''Postponed to Spring 2023:''' HuggingFace demo - Trevor Ward
 
+
|- style="border-top: 2px solid DarkGray;"
Abstract
+
| 07.12.22 || Coffee and pastries
-------------
 
Semantic Role Labeling has been a focus of interest in Natural Language research domain. Semantic Role Labeling captures the underlying predicate argument structure of a sentence. Successful detection of SRL has improved many natural language processing applications such as question answering, information extraction, summarizing, dialogue understanding.  Advancement of Neural Networks has improved the accuracy of automatic SRL.
 
 
 
Neural networks have similarly improved performance in Computer Vision tasks, such as object detection and localization, automatic caption generation, and visual question answering. Identifying  objects accurately is key to the comprehension of an image. Modern caption generation and question answering models rely heavily on object features. Multi-modal representation through neural networks has made it possible to combine information from more than one modality. Modern techniques represent both words and image objects in vector spaces, simplifying the task of communicating between the two modalities.
 
 
 
To truly understand an image, it is also important to understand the roles played by the objects in the scene. In other words, the semantic role of an object can offer richer information about the image. Not much work has been done exploring the potential advantages of using visual semantic role labeling. Hence we propose an investigation of SRL in the domain of computer vision. Our interest is to detect semantic roles of the objects present in the images along with their actions. We are also interested in probing what benefits SRL can offer to multi-modal tasks like image captioning and retrieval.
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|}
 
|}
Line 110: Line 129:
  
 
=Past Schedules=
 
=Past Schedules=
 +
* [[Fall 2022 Schedule]]
 +
* [[Spring 2022 Schedule]]
 
* [[Fall 2021 Schedule]]
 
* [[Fall 2021 Schedule]]
 
* [[Spring 2021 Schedule]]
 
* [[Spring 2021 Schedule]]

Latest revision as of 20:11, 17 January 2023

Location: Hybrid - Buchanan 126, and the zoom link below

Time: Wednesdays at 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
24.08.22 Planning, introductions, welcome!
31.08.22 PhD students present! Ongoing projects and opportunities
  • iSAT: Jim, John, Maggie, Ananya, Zoe
  • AIDA: Elizabeth, Rehan, Sijia


07.09.22 PhD students continue to present!

20 minutes per project

  • KAIROS: Susan, Reece
  • AmericasNLP: Katharina, Alexis, Abteen
  • FOLTA: Alexis, Bhargav, Enora, Michael
  • StoryGenerations: Katharina, Maria, Trevor


14.09.22 And more presentations from our fabulous students and colleagues!
  • Assisted interviewing: DJ, Abe
  • THYME:
  • UMR: Martha and Jim
21.09.22 lunch at the Taj
28.09.22 James Pustejovsky

Dense Paraphrasing for Textual Enrichment: Question Answering and Inference

Abstract: Much of the current computational work on inference in NLP can be associated with one of two techniques: the first focuses on a specific notion of text-based question answering (QA), using large pre-trained language models (LLMs). To examine specific linguistic properties present in the model, « probing tasks » (diagnostic classifiers) have been developed to test capabilities that the LLM demonstrates on interpretable semantic inferencing tasks, such as age and object comparisons, hypernym conjunction, antonym negation, and others. The second is Knowledge Graph-based inference and QA, where triples are mined from Wikipedia, ConceptNet, WikiData, and other non-corpus resources, and then used for answering questions involving multiple components of the KG (multi-hop QA). While quite impressive with benchmarked metrics in QA, both techniques are completely confused by (a) syntactically missing semantic content, and (b) the semantics accompanying the consequences of events and actions in narratives. In this talk, I discuss a model we have developed to enrich the surface form of texts, using type-based semantic operations to « textually expose » the deeper meaning of the corpus that was used to make the original embeddings in the language model. This model, Dense Paraphrasing, is a linguistically-motivated, textual enrichment strategy, that textualizes the compositional operations inherent in a semantic model, such as Generative Lexicon Theory or CCG. This involves broadly three kinds of interpretive processes: (i) recognizing the diverse variability in linguistic forms that can be associated with the same underlying semantic representation (paraphrases); (ii) identifying semantic factors or variables that accompany or are presupposed by the lexical semantics of the words present in the text, through dropped, hidden or shadow arguments; and (iii) interpreting or computing the dynamic consequences of actions and events in the text. After performing these textual enrichment algorithms, we fine-tune the LLM which allows more robust inference and QA task performance.

James Pustejovsky, Professor TJX Feldberg Chair in Computer Science Department of Computer Science Chair of CL MS Program Chair of Linguistics Program

05.10.22 Martha, COLING keynote // Daniel poster presentation dry run
12.10.22 COLING / paper review
19.10.22 CLASIC Open House, 11am-1pm

This is largely an informational event for students interested in the CLASIC (Computational Linguistics, Analytics, Search, and InformatiCs) Master's program and/or the new LING to CLASIC BAM program. The event will include short talks from graduates of the CLASIC program, and then lunch. Please register if you're interested - until 5pm Monday October 17th.

21.10.22 FRIDAY Carolyn Rose, Carnegie Mellon(ICS/iSAT event)

Special time and place: 11am-12:15pm MT, Muenzinger D430 / zoom

Title: A Layered Model of Learning during Collaborative Software Development: Programs, Programming, and Programmers

Collaborative software development, whether synchronous or asynchronous, is a creative, integrative process in which something new comes into being through the joint engagement, something new that did not fully exist in the mind of any one person prior to the engagement. One can view this engagement from a macro-level perspective, focusing on large scale development efforts of 100 or more developers, organized into sub-teams, producing collections complex software products like Mozilla. Past work in the area of software engineering has explored the symbiosis between the management structure of a software team and the module structure of the resulting software. In this talk, we focus instead on small scale software teams of between 2 and 5 developers, working on smaller-scale efforts of between one hour and 9 months, through more fine grained analysis of collaborative processes and collaborative products. In this more tightly coupled engagement within small groups, we see again a symbiosis between people, processes, and products. This talk bridges between the field of Computer-Supported Collaborative Learning and the study of software teams in the field of Software Engineering by investigating the inner-workings of small scale collaborative software development. Building on over a decade of AI-enabled collaborative learning experiences in the classroom and online, in this talk we report our work in progress beginning with classroom studies in large online software courses with substantial teamwork components. In our classroom work, we have adapted an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience. At the core of this work are process mining technologies that enable real time monitoring and just-in-time support for learning during productive work. Recent work on deep-learning approaches to program understanding bridge between investigations of processes and products.

26.10.22 No meeting -- go to Barbara's talk on Friday, and Nathan's on Monday!
28.10.22 FRIDAY Barbara diEugenio (ICS talk, noon)

Special time and place: 12-1:30pm MT, Muenzinger D430 / zoom

Title: Knowledge Co-Construction and Initiative in Peer Learning for introductory Computer Science

Peer learning has often been shown to be an effective mode of learning for all participants; and knowledge co-construction (KCC), when participants work together to build knowledge, has been shown to correlate with learning in peer interactions. However, KCC is hard to identify and/or support computationally. We conducted an extensive analysis of a corpus of peer-learning interactions in introductory Computer Science: we found a strong relationship between KCC and the linguistic notion of initiative shift, and moderate correlations between initiative shifts and learning. The results of this analysis were incorporated into KSC-PaL, an artificial agent that can collaborate with a human student via natural-language dialog and actions within a graphical workspace. Evaluations of KSC-PaL showed that the agent was able to encourage shifts in initiative in order to promote learning and that students learned using the agent. This work (joint with Cindy Howard, now at Lewis University), was part of two larger projects that studied tutoring dialogues and peer learning interactions for introductory Computer Science, and that resulted in two Intelligent Tutoring Systems, iList and Chiqat-Tutor.

Barbara Di Eugenio, PhD, Professor and Director of Graduate Studies, Department of Computer Science, University of Illinois, Chicago

31.10.22 MONDAY Nathan Schneider (Ling Circle Talk, 4pm)

Special time and place: 4pm, UMC 247 / zoom (passcode: 795679)

Title: The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation

In most linguistic meaning representations that are used in NLP, prepositions fly under the radar. I will argue that they should instead be put front and center given their crucial status as linkers of meaning—whether for spatial and temporal relations, for predicate-driven roles, or in special constructions. To that end, we have sought to characterize and disambiguate semantic functions expressed by prepositions and possessives in English (Schneider et al., ACL 2018), and similar markers in other languages (Mandarin Chinese, Korean, Hindi, and German). This approach can be broadened to other constructions and integrated in full-sentence lexical semantic tagging as well as graph-structured meaning representation parsing. Other investigations include crowdsourced annotation, contextualized preposition embeddings, and preposition use in fluent nonnative English.

Nathan Schneider, Associate Professor, Depts. of Computer Science and Linguistics, Georgetown University


02.11.22 *** No meeting - UMRs team at Brandeis ***
09.11.22 Practice talks
  • Ananya Ganesh: "CHIA: CHoosing Instances to Annotate for Machine Translation" (accepted to Findings of EMNLP; practicing for the video recording)
  • Abteen Ebrahimi: "Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas" (NeurIPS competition; practicing for the in-person presentation of the competition)
16.11.22 Maggie Perkoff, prelim

Title: Who said it best? A Thematic Analysis of Open Domain Response Generation Systems

Open domain response generation is a rapidly increasing field of natural language processing research. This type of system can be embedded in social chatbots, teaching assistants, and even therapy sessions. The open domain space is defined by the absence of a specific task that the user is trying to achieve by engaging with the conversational agent. A variety of methods have been proposed to improve the capability of these models including knowledge grounded systems, persona embeddings, and transformer models trained on vast datasets. Some of these systems use automated metrics for evaluation alongside human annotators for response quality - but there is no standard assessment for what makes an open domain dialogue system 'better' than any other. This paper seeks to identify broad categories of response generation systems and analyze them based on different themes in open domain conversation: engagement, consistency, correctness, personality, and toxicity.

23.11.22 *** No meeting - fall break ***
30.11.22 Postponed to Spring 2023: HuggingFace demo - Trevor Ward
07.12.22 Coffee and pastries


Past Schedules