Difference between revisions of "Fall 2022 Schedule"

From CompSemWiki
Jump to navigationJump to search
 
(42 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Location:''' Hybrid (starting Feb 23) - Fleming 279, and the zoom link below
+
'''Location:''' Hybrid - Buchanan 126, and the zoom link below
  
 
'''Time:''' Wednesdays at 10:30am, Mountain Time
 
'''Time:''' Wednesdays at 10:30am, Mountain Time
Line 11: Line 11:
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 1.12.22 || '''Planning, introductions, welcome!'''
+
| 24.08.22 || '''Planning, introductions, welcome!'''
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 31.08.22 || PhD students present! Ongoing projects and opportunities
 +
 
 +
* iSAT: Jim, John, Maggie, Ananya, Zoe
 +
* AIDA: Elizabeth, Rehan, Sijia
  
CompSem meetings will be virtual until further notice (https://cuboulder.zoom.us/j/97014876908)
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 01.19.22 || '''[https://www.colorado.edu/business/leeds-directory/faculty/kai-r-larsen Kai Larsen], CU Boulder Leeds School of Business'''
+
| 07.09.22 || PhD students continue to present!
  
''Validity in Design Research''
+
20 minutes per project
  
Research in design science has always recognized the importance
+
* KAIROS: Susan, Reece
of evaluating its knowledge outcomes, particularly of assessing the efficacy, utility, and attributes of the artifacts produced (e.g., A.I. systems, machine learning models, theories, frameworks). However, demonstrating the validity of design science research
+
* AmericasNLP: Katharina, Alexis, Abteen
(DSR) is challenging and not well understood. This paper defines DSR validity and proposes a DSR Validity Framework. We evaluate the framework by assembling and analyzing an extensive data set of research validities papers from various disciplines, including
+
* FOLTA: Alexis, Bhargav, Enora, Michael
design science. We then analyze the use of validity concepts in DSR and validate the framework. The results demonstrate that the DSR Validity Framework may be used to guide how validity can, and should, be used as an integral aspect of design science
+
* StoryGenerations: Katharina, Maria, Trevor
research. We further describe the steps for selecting appropriate validities for projects and formulate efficacy validity and characteristic validity claims suitable for inclusion in manuscripts.
 
  
Keywords: Design science research (DSR), research validity, validity framework,
 
artifact, evaluation, efficacy validity, characteristic validity.
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 01.26.22 || '''Elizabeth Spaulding, prelim'''
+
| 14.09.22 || And more presentations from our fabulous students and colleagues!
 
 
'''Prelim topic:''' ''Evaluation for Abstract Meaning Representations''
 
  
Abstract Meaning Representation (AMR) is a semantic representation language that provides a way to represent the meaning of a sentence in the form of a graph. The task of AMR parsing—automatically extracting AMR graphs from natural language text—necessitates evaluation metrics to develop neural parsers. My prelim is a review of AMR evaluation metrics and the strengths and weaknesses of each approach, as well as a discussion of gaps and unexplored questions in the current literature.
+
* Assisted interviewing: DJ, Abe
 +
* THYME:
 +
* UMR: Martha and Jim
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02.02.22 || NO MEETING
+
| 21.09.22 || lunch at the Taj
|- style="border-top: 2px solid DarkGray;"
 
| 02.09.22 || [https://blogs.umass.edu/scil/schedule-for-scil-2022/ SCiL live session!]
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 02.16.22 || NO MEETING
+
| 28.09.22 || James Pustejovsky
|- style="border-top: 2px solid DarkGray;"
 
| 02.23.22 || CompSem meetings go back to being hybrid! (Fleming 279 or https://cuboulder.zoom.us/j/97014876908)
 
  
 +
Dense Paraphrasing for Textual Enrichment: Question Answering and Inference
  
'''Invited talk: [https://aniellodesanto.github.io/about/ Aniello de Santo], University of Utah'''
+
Abstract: Much of the current computational work on inference in NLP can be associated with one of two techniques: the first focuses on a specific notion of text-based question answering (QA), using large pre-trained language models (LLMs). To examine specific linguistic properties present in the model, « probing tasks » (diagnostic classifiers) have been developed to test capabilities that the LLM demonstrates on interpretable semantic inferencing tasks, such as age and object comparisons, hypernym conjunction, antonym negation, and others. The second is Knowledge Graph-based inference and QA, where triples are mined from Wikipedia, ConceptNet, WikiData, and other non-corpus resources, and then used for answering questions involving multiple components of the KG (multi-hop QA). While quite impressive with benchmarked metrics in QA, both techniques are completely confused by (a) syntactically missing semantic content, and (b) the semantics accompanying the consequences of events and actions in narratives.
 +
In this talk, I discuss a model we have developed to enrich the surface form of texts, using type-based semantic operations to « textually expose » the deeper meaning of the corpus that was used to make the original embeddings in the language model. This model, Dense Paraphrasing, is a linguistically-motivated, textual enrichment strategy, that textualizes the compositional operations inherent in a semantic model, such as Generative Lexicon Theory or CCG. This involves broadly three kinds of interpretive processes: (i) recognizing the diverse variability in linguistic forms that can be associated with the same underlying semantic representation (paraphrases); (ii) identifying semantic factors or variables that accompany or are presupposed by the lexical semantics of the words present in the text, through dropped, hidden or shadow arguments; and (iii) interpreting or computing the dynamic consequences of actions and events in the text. After performing these textual enrichment algorithms, we fine-tune the LLM which allows more robust inference and QA task performance.
  
''Bridging Typology and Learnability via Formal Language Theory''
+
James Pustejovsky, Professor
 +
TJX Feldberg Chair in Computer Science
 +
Department of Computer Science
 +
Chair of CL MS Program
 +
Chair of Linguistics Program
  
The complexity of linguistic patterns is object of extensive debate in research programs focused on probing the inherent structure of human language abilities. But in what sense is a linguistic phenomenon more complex than another, and what can complexity tell us about the connection between linguistic typology and human cognition? In this talk, I overview a line of research approaching these questions from the perspective of recent advances in formal language theory.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 05.10.22 || Martha, COLING keynote // Daniel poster presentation dry run
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 12.10.22 || COLING / paper review
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 19.10.22 || CLASIC Open House, 11am-1pm
 +
 
 +
This is largely an informational event for students interested in the CLASIC (Computational Linguistics, Analytics, Search, and InformatiCs) Master's program and/or the new LING to CLASIC BAM program. The event will include short talks from graduates of the CLASIC program, and then lunch. Please [https://forms.gle/PSNihAGyX7y8aVGV8 register] if you're interested - until 5pm Monday October 17th.
 +
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 21.10.22 || '''FRIDAY''' Carolyn Rose, Carnegie Mellon(ICS/iSAT event)
  
I will first broadly discuss how language theoretical characterizations allow us to focus on essential properties of linguistic patterns under study. I will emphasize how typological insights can help us refine existing mathematical characterizations, arguing for a two-way bridge between disciplines, and show how the theoretical predictions made by logic/algebraic formalization of typological generalizations can be used to test learning biases in humans (and machines).
+
'''Special time and place:''' 11am-12:15pm MT, Muenzinger D430 / [https://cuboulder.zoom.us/j/97658438049 zoom]
  
In doing so, I aim to illustrate the relevance of mathematically grounded approaches to cognitive investigations into linguistic generalizations, and thus further fruitful cross-disciplinary collaborations.
+
'''Title:''' A Layered Model of Learning during Collaborative Software Development: Programs, Programming, and Programmers
  
   
+
Collaborative software development, whether synchronous or asynchronous, is a creative, integrative process in which something new comes into being through the joint engagement, something new that did not fully exist in the mind of any one person prior to the engagement. One can view this engagement from a macro-level perspective, focusing on large scale development efforts of 100 or more developers, organized into sub-teams, producing collections complex software products like Mozilla.  Past work in the area of software engineering has explored the symbiosis between the management structure of a software team and the module structure of the resulting software.  In this talk, we focus instead on small scale software teams of between 2 and 5 developers, working on smaller-scale efforts of between one hour and 9 months, through more fine grained analysis of collaborative processes and collaborative products.  In this more tightly coupled engagement within small groups, we see again a symbiosis between people, processes, and products.  This talk bridges between the field of Computer-Supported Collaborative Learning and the study of software teams in the field of Software Engineering by investigating the inner-workings of small scale collaborative software development.  Building on over a decade of AI-enabled collaborative learning experiences in the classroom and online, in this talk we report our work in progress beginning with classroom studies in large online software courses with substantial teamwork components.  In our classroom work, we have adapted an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience.  At the core of this work are process mining technologies that enable real time monitoring and just-in-time support for learning during productive work.  Recent work on deep-learning approaches to program understanding bridge between investigations of processes and products.
'''Bio Sketch:'''
 
  
Aniello De Santo is an Assistant Professor in the Linguistics Department at the University of Utah.
+
|- style="border-top: 2px solid DarkGray;"
 +
| 26.10.22 || '''No meeting -- go to Barbara's talk on Friday, and Nathan's on Monday!'''
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 28.10.22 || '''FRIDAY''' [https://cs.uic.edu/profiles/barbara-di-eugenio/ Barbara diEugenio] (ICS talk, noon)
  
Before joining Utah, he received a PhD in Linguistics from Stony Brook University. His research broadly lies at the intersection between computational, theoretical, and experimental linguistics. He is particularly interested in investigating how linguistic representations interact with general cognitive processes, with particular focus on sentence processing and learnability. In his past work, he has mostly made use of symbolic approaches grounded in formal language theory and rich grammar formalisms (Minimalist Grammars, Tree Adjoining Grammars).
+
'''Special time and place:''' 12-1:30pm MT, Muenzinger D430 / [https://cuboulder.zoom.us/j/97658438049 zoom]
  
|- style="border-top: 2px solid DarkGray;"
+
'''Title:''' Knowledge Co-Construction and Initiative in Peer Learning for introductory Computer Science
| 03.02.22 || '''[http://compbio.ucdenver.edu/Hunter_lab/Cohen/index.shtml Kevin Cohen], Computational Bioscience Program, U. Colorado School of Medicine'''
 
  
''Chalk talk: Studying the science in biomedical natural language processing''
+
Peer learning has often been shown to be an effective mode of learning for all participants; and knowledge co-construction (KCC), when participants work together to build knowledge, has been shown to correlate with learning in peer interactions. However, KCC is hard to identify and/or support  computationally. We conducted an extensive analysis of  a corpus of peer-learning interactions in introductory Computer Science: we found a strong relationship between KCC and the linguistic notion of initiative shift,  and moderate correlations between initiative shifts and learning. The results of this analysis were incorporated into KSC-PaL, an artificial agent that can collaborate with a human student via natural-language dialog and actions within a graphical workspace. Evaluations of KSC-PaL showed that the agent was able to encourage shifts in initiative in order to promote learning and that students learned using the agent. This work (joint with Cindy Howard, now at Lewis University), was part of  two larger projects that studied tutoring dialogues and peer learning interactions for introductory Computer Science, and that resulted in two Intelligent Tutoring Systems, iList and Chiqat-Tutor.
  
At this CompSem meeting, I will give a talk chalk on a grant proposal that I am preparing. "Chalk talks" are a kind of presentation that you will have to do when you hit the job market, and once you've found that wonderful job, they may be a regular part of your faculty responsibilities.  I will begin with an introduction to the form and functions of this kind of talk, go over the review criteria for the kind of grant for which I am applying, and then give a chalk talk on my proposal. Please come ready to critique it harshly--my grandmother will tell me how great it is.
+
Barbara Di Eugenio, PhD,
 +
Professor and Director of Graduate Studies,  
 +
Department of Computer Science,  
 +
University of Illinois, Chicago
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03.09.22 || '''[http://verbs.colorado.edu/~ghka9436/ Ghazaleh Kazeminejad], proposal defense'''
+
| 31.10.22 || '''MONDAY''' [https://people.cs.georgetown.edu/nschneid/ Nathan Schneider] (Ling Circle Talk, 4pm)
  
'''NOTE: Special start time 10am'''
+
'''Special time and place:''' 4pm, UMC 247 / [https://cuboulder.zoom.us/j/94447045287 zoom] (passcode: 795679)
  
'''Topic:''' ''Neural-Symbolic NLP: exploiting computational lexical resources''
+
'''Title:''' The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation
  
Recent major advances in Natural Language Processing (NLP) have relied on a distributional approach, representing language numerically to enable complex mathematical operations and algorithms. These numeric representations have been based on the probabilistic distributions of linguistic units. The main recent breakthrough in NLP has been the result of feeding massive data to the machine and using neural network architectures, allowing the machine to learn a model that approximates a given language (grammar and lexicon). Following this paradigm shift, NLP researchers introduced transfer learning, enabling researchers with less powerful computational resources to use their pre-trained language models and transfer what the machine has learned to a new downstream NLP task. However, there are some NLP tasks, particularly in the realm of Natural Language Understanding (NLU), where surface level representations and purely statistical models may benefit from symbolic knowledge and deeper level representations. In this work, we explore contributions that symbolic computational lexical resources can still make to system performances on two different tasks. In particular, we propose to expose the model to symbolic knowledge, including external world knowledge (e.g typical features of entities such as their typical functions or whereabouts) as well as linguistic knowledge (e.g. syntactic dependencies and semantic relationships among the constituents). One of our goals for this work is finding an appropriate numeric representation for this type of symbolic knowledge.
+
In most linguistic meaning representations that are used in NLP, prepositions fly under the radar. I will argue that they should instead be put front and center given their crucial status as linkers of meaning—whether for spatial and temporal relations, for predicate-driven roles, or in special constructions. To that end, we have sought to characterize and disambiguate semantic functions expressed by prepositions and possessives in English (Schneider et al., ACL 2018), and similar markers in other languages (Mandarin Chinese, Korean, Hindi, and German). This approach can be broadened to other constructions and integrated in full-sentence lexical semantic tagging as well as graph-structured meaning representation parsing. Other investigations include crowdsourced annotation, contextualized preposition embeddings, and preposition use in fluent nonnative English.
  
We propose to utilize the semantic predicates from VerbNet, semantic roles from VerbNet and PropBank, syntactic dependency labels, and world knowledge from ConceptNet as symbolic knowledge, going beyond the types of symbolic knowledge used so far in neural-symbolic approaches. We will expose a pre-trained language model to symbolic knowledge in two ways. First, we will embed these relations into a neural network architecture by modifying the input representations. Second, we will treat the knowledge as constraints on the output, penalizing the model at the end of each training step if the constraints are not met in the model predictions at that step.
+
Nathan Schneider,
 +
Associate Professor,
 +
Depts. of Computer Science and Linguistics,
 +
Georgetown University
  
To evaluate this approach, we propose to test it on two downstream NLP tasks: Event Extraction and Entity State Tracking. We propose a thorough investigation of the two tasks, particularly focusing on where they have benefitted from a neural-symbolic approach, and whether and how we could further improve the performance on these tasks by introducing both linguistic and world knowledge to the model.
 
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03.16.22 || '''[https://ckchandler.github.io/ Chelsea Chandler], thesis defense'''
+
| 02.11.22 || *** No meeting - UMRs team at Brandeis ***
 +
|- style="border-top: 2px solid DarkGray;"
 +
| 09.11.22 || Practice talks
  
'''NOTE: Special start time 10am'''
+
* Ananya Ganesh: "CHIA: CHoosing Instances to Annotate for Machine Translation" (accepted to Findings of EMNLP; practicing for the video recording)
 +
 
 +
* Abteen Ebrahimi: "Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas" (NeurIPS competition; practicing for the in-person presentation of the competition)
  
'''Title:''' ''Methods for Multimodal Assessment of Cognitive and Mental State''
+
|- style="border-top: 2px solid DarkGray;"
 +
| 16.11.22 || Maggie Perkoff, prelim
  
Barriers to healthcare access such as time, affordability, and stigma are common in patients suffering from psychiatric and neurodegenerative disorders. Psychiatric patients often need to be monitored with frequent clinical interviews to avoid costly emergency care and preventable events. However, there simply are not enough clinicians to monitor these patients on a regular basis and infrequent clinical evaluations may result in missing the subtle changes in patient state that occur over time. For those suffering from neurodegenerative disorders, the traditional approaches to detecting early onset lack the sensitivity needed to catch the subtle signs of cognitive decline. In order to move toward a more standardized, consistent, and reliable assessment and diagnosis process, machine learning and natural language processing methods can be harnessed to create accurate and accessible prognostic systems that could help to alleviate the burden of mental disorders in society. In this dissertation, a multidisciplinary set of methodologies for the automated assessment of psychiatric mental state were developed that were sufficiently accurate and explainable to nurture trust from patients and clinicians, and also longitudinal and multimodal to model the dynamic and multifaceted nature of mental disorders. The viability of an automated assessment pipeline was examined: from the administration of neuropsychological tests and transcription of spoken responses, to the extraction of construct-relevant data features and prediction of psychiatric mental states. A similar approach was taken for the screening of the neurodegenerative disorders Alzheimer’s disease and Mild Cognitive Impairment. Implications for the real world use of multimodal machine learning for mental disorders are discussed, providing a crucial step towards clinical translation and implementation.
+
'''Title:''' Who said it best? A Thematic Analysis of Open Domain Response Generation Systems
 
'''Biographical Note'''
 
 
Chelsea Chandler is a joint PhD candidate in the Department of Computer Science and Institute of Cognitive Science and is advised by Peter Foltz and Jim Martin. Previous to her PhD, she received a BA in Mathematics and Computer Science from the University of Virginia, worked as a software engineer for Lockheed Martin, and received a MS in Computer Science from the University of Colorado Boulder.
 
  
 +
Open domain response generation is a rapidly increasing field of natural language processing research.  This type of system can be embedded in social chatbots, teaching assistants, and even therapy sessions.  The open domain space is defined by the absence of a specific task that the user is trying to achieve by engaging with the conversational agent.  A variety of methods have been proposed to improve the capability of these models including knowledge grounded systems, persona embeddings, and transformer models trained on vast datasets.  Some of these systems use automated metrics for evaluation alongside human annotators for response quality - but there is no standard assessment for what makes an open domain dialogue system 'better' than any other.  This paper seeks to identify broad categories of response generation systems and analyze them based on different themes in open domain conversation: engagement, consistency, correctness, personality, and toxicity.
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 03.23.22 || ***Spring Break***
+
| 23.11.22 || *** No meeting - fall break ***
|- style="border-top: 2px solid DarkGray;"
 
| 03.30.22 ||  CLASIC Open House
 
|- style="border-top: 2px solid DarkGray;"
 
| 04.06.22 || Grad student appreciation lunch!!!
 
|- style="border-top: 2px solid DarkGray;"
 
| 04.13.22 || *** No meeting ***
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04.20.22|| Sagi Shaier, prelim
+
| 30.11.22|| '''Postponed to Spring 2023:''' HuggingFace demo - Trevor Ward
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 04.27.22 || Adam Wiemerslaga, prelim
+
| 07.12.22 || Coffee and pastries
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
|}
 
|}
Line 114: Line 129:
  
 
=Past Schedules=
 
=Past Schedules=
 +
* [[Fall 2022 Schedule]]
 +
* [[Spring 2022 Schedule]]
 
* [[Fall 2021 Schedule]]
 
* [[Fall 2021 Schedule]]
 
* [[Spring 2021 Schedule]]
 
* [[Spring 2021 Schedule]]

Latest revision as of 20:11, 17 January 2023

Location: Hybrid - Buchanan 126, and the zoom link below

Time: Wednesdays at 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
24.08.22 Planning, introductions, welcome!
31.08.22 PhD students present! Ongoing projects and opportunities
  • iSAT: Jim, John, Maggie, Ananya, Zoe
  • AIDA: Elizabeth, Rehan, Sijia


07.09.22 PhD students continue to present!

20 minutes per project

  • KAIROS: Susan, Reece
  • AmericasNLP: Katharina, Alexis, Abteen
  • FOLTA: Alexis, Bhargav, Enora, Michael
  • StoryGenerations: Katharina, Maria, Trevor


14.09.22 And more presentations from our fabulous students and colleagues!
  • Assisted interviewing: DJ, Abe
  • THYME:
  • UMR: Martha and Jim
21.09.22 lunch at the Taj
28.09.22 James Pustejovsky

Dense Paraphrasing for Textual Enrichment: Question Answering and Inference

Abstract: Much of the current computational work on inference in NLP can be associated with one of two techniques: the first focuses on a specific notion of text-based question answering (QA), using large pre-trained language models (LLMs). To examine specific linguistic properties present in the model, « probing tasks » (diagnostic classifiers) have been developed to test capabilities that the LLM demonstrates on interpretable semantic inferencing tasks, such as age and object comparisons, hypernym conjunction, antonym negation, and others. The second is Knowledge Graph-based inference and QA, where triples are mined from Wikipedia, ConceptNet, WikiData, and other non-corpus resources, and then used for answering questions involving multiple components of the KG (multi-hop QA). While quite impressive with benchmarked metrics in QA, both techniques are completely confused by (a) syntactically missing semantic content, and (b) the semantics accompanying the consequences of events and actions in narratives. In this talk, I discuss a model we have developed to enrich the surface form of texts, using type-based semantic operations to « textually expose » the deeper meaning of the corpus that was used to make the original embeddings in the language model. This model, Dense Paraphrasing, is a linguistically-motivated, textual enrichment strategy, that textualizes the compositional operations inherent in a semantic model, such as Generative Lexicon Theory or CCG. This involves broadly three kinds of interpretive processes: (i) recognizing the diverse variability in linguistic forms that can be associated with the same underlying semantic representation (paraphrases); (ii) identifying semantic factors or variables that accompany or are presupposed by the lexical semantics of the words present in the text, through dropped, hidden or shadow arguments; and (iii) interpreting or computing the dynamic consequences of actions and events in the text. After performing these textual enrichment algorithms, we fine-tune the LLM which allows more robust inference and QA task performance.

James Pustejovsky, Professor TJX Feldberg Chair in Computer Science Department of Computer Science Chair of CL MS Program Chair of Linguistics Program

05.10.22 Martha, COLING keynote // Daniel poster presentation dry run
12.10.22 COLING / paper review
19.10.22 CLASIC Open House, 11am-1pm

This is largely an informational event for students interested in the CLASIC (Computational Linguistics, Analytics, Search, and InformatiCs) Master's program and/or the new LING to CLASIC BAM program. The event will include short talks from graduates of the CLASIC program, and then lunch. Please register if you're interested - until 5pm Monday October 17th.

21.10.22 FRIDAY Carolyn Rose, Carnegie Mellon(ICS/iSAT event)

Special time and place: 11am-12:15pm MT, Muenzinger D430 / zoom

Title: A Layered Model of Learning during Collaborative Software Development: Programs, Programming, and Programmers

Collaborative software development, whether synchronous or asynchronous, is a creative, integrative process in which something new comes into being through the joint engagement, something new that did not fully exist in the mind of any one person prior to the engagement. One can view this engagement from a macro-level perspective, focusing on large scale development efforts of 100 or more developers, organized into sub-teams, producing collections complex software products like Mozilla. Past work in the area of software engineering has explored the symbiosis between the management structure of a software team and the module structure of the resulting software. In this talk, we focus instead on small scale software teams of between 2 and 5 developers, working on smaller-scale efforts of between one hour and 9 months, through more fine grained analysis of collaborative processes and collaborative products. In this more tightly coupled engagement within small groups, we see again a symbiosis between people, processes, and products. This talk bridges between the field of Computer-Supported Collaborative Learning and the study of software teams in the field of Software Engineering by investigating the inner-workings of small scale collaborative software development. Building on over a decade of AI-enabled collaborative learning experiences in the classroom and online, in this talk we report our work in progress beginning with classroom studies in large online software courses with substantial teamwork components. In our classroom work, we have adapted an industry standard team practice referred to as Mob Programming into a paradigm called Online Mob Programming (OMP) for the purpose of encouraging teams to reflect on concepts and share work in the midst of their project experience. At the core of this work are process mining technologies that enable real time monitoring and just-in-time support for learning during productive work. Recent work on deep-learning approaches to program understanding bridge between investigations of processes and products.

26.10.22 No meeting -- go to Barbara's talk on Friday, and Nathan's on Monday!
28.10.22 FRIDAY Barbara diEugenio (ICS talk, noon)

Special time and place: 12-1:30pm MT, Muenzinger D430 / zoom

Title: Knowledge Co-Construction and Initiative in Peer Learning for introductory Computer Science

Peer learning has often been shown to be an effective mode of learning for all participants; and knowledge co-construction (KCC), when participants work together to build knowledge, has been shown to correlate with learning in peer interactions. However, KCC is hard to identify and/or support computationally. We conducted an extensive analysis of a corpus of peer-learning interactions in introductory Computer Science: we found a strong relationship between KCC and the linguistic notion of initiative shift, and moderate correlations between initiative shifts and learning. The results of this analysis were incorporated into KSC-PaL, an artificial agent that can collaborate with a human student via natural-language dialog and actions within a graphical workspace. Evaluations of KSC-PaL showed that the agent was able to encourage shifts in initiative in order to promote learning and that students learned using the agent. This work (joint with Cindy Howard, now at Lewis University), was part of two larger projects that studied tutoring dialogues and peer learning interactions for introductory Computer Science, and that resulted in two Intelligent Tutoring Systems, iList and Chiqat-Tutor.

Barbara Di Eugenio, PhD, Professor and Director of Graduate Studies, Department of Computer Science, University of Illinois, Chicago

31.10.22 MONDAY Nathan Schneider (Ling Circle Talk, 4pm)

Special time and place: 4pm, UMC 247 / zoom (passcode: 795679)

Title: The Ins and Outs of Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation

In most linguistic meaning representations that are used in NLP, prepositions fly under the radar. I will argue that they should instead be put front and center given their crucial status as linkers of meaning—whether for spatial and temporal relations, for predicate-driven roles, or in special constructions. To that end, we have sought to characterize and disambiguate semantic functions expressed by prepositions and possessives in English (Schneider et al., ACL 2018), and similar markers in other languages (Mandarin Chinese, Korean, Hindi, and German). This approach can be broadened to other constructions and integrated in full-sentence lexical semantic tagging as well as graph-structured meaning representation parsing. Other investigations include crowdsourced annotation, contextualized preposition embeddings, and preposition use in fluent nonnative English.

Nathan Schneider, Associate Professor, Depts. of Computer Science and Linguistics, Georgetown University


02.11.22 *** No meeting - UMRs team at Brandeis ***
09.11.22 Practice talks
  • Ananya Ganesh: "CHIA: CHoosing Instances to Annotate for Machine Translation" (accepted to Findings of EMNLP; practicing for the video recording)
  • Abteen Ebrahimi: "Second AmericasNLP Competition on Speech-to-Text Translation for Indigenous Languages of the Americas" (NeurIPS competition; practicing for the in-person presentation of the competition)
16.11.22 Maggie Perkoff, prelim

Title: Who said it best? A Thematic Analysis of Open Domain Response Generation Systems

Open domain response generation is a rapidly increasing field of natural language processing research. This type of system can be embedded in social chatbots, teaching assistants, and even therapy sessions. The open domain space is defined by the absence of a specific task that the user is trying to achieve by engaging with the conversational agent. A variety of methods have been proposed to improve the capability of these models including knowledge grounded systems, persona embeddings, and transformer models trained on vast datasets. Some of these systems use automated metrics for evaluation alongside human annotators for response quality - but there is no standard assessment for what makes an open domain dialogue system 'better' than any other. This paper seeks to identify broad categories of response generation systems and analyze them based on different themes in open domain conversation: engagement, consistency, correctness, personality, and toxicity.

23.11.22 *** No meeting - fall break ***
30.11.22 Postponed to Spring 2023: HuggingFace demo - Trevor Ward
07.12.22 Coffee and pastries


Past Schedules