Fall 2021 Schedule

From CompSemWiki
Jump to navigationJump to search
Date Title
9.1.21 Planning, introductions, welcome!

CompSem meetings will be hybrid this semester - in person at Fleming 279, and online here: https://cuboulder.zoom.us/j/97014876908

9.8.21 10am (NOTE: special start time)

Yoshinari Fujinuma thesis defense

Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models.

9.15.21 ACL best paper recaps
9.22.21 Introduction to AI Institute (short talks)
9.29.21 *** CANCELLED ***
10.6.21 Invited talk: Artemis Panagopoulou, University of Pennsylvania

Link to thesis page

Metaphor and Entailment: Looking at metaphors through the lens of textual entailment

Metaphors are very intriguing elements of human language that are surprisingly prevalent in our everyday communications. Humans are pretty good at understanding metaphors, even if it is the first time they encounter them. Empirical studies indicate that 20% of our daily language use is metaphorical. Naturally, the ubiquity of metaphors draw the attention of psychologists who showed that the human brain processes conventional metaphors in the same speed as literal language. Nevertheless, the computational linguistics literature consistently treats metaphors as a separate domain to literal language. Earlier work has shown that traditional pipelines do not perform well on metaphoric datasets. Synchronously, the literature on computational understanding of metaphors has largely focused on developing concrete metaphor detection systems, coupled with interpretation systems targeted solely on metaphors. This tendency has presented across various aspects of the field, such as the purposeful exclusion of figurative language from large scale datasets. This study investigates the potential of constructing systems that can jointly handle metaphoric and literal sentences by leveraging the newfound capabilities of deep learning systems.

We narrow the scope of the report, following earlier work, to evaluate deep learning systems fine-tuned on the task of textual entailment (TE). We argue that TE is a task naturally suited to the interpretation of metaphoric language. We show that TE systems can improve significantly in metaphoric performance by being fine-tuned on a small dataset with metaphoric premises. Even though the improvement in performance on metaphors is typically accompanied by a drop in performance on the original dataset we note that auto-regressive models seem to show a smaller drop in performance on literal examples compared to other types of models.

10.13.21 Invited guest - in person!: Arya McCarthy, Johns Hopkins University

Kilolanguage Processing by Projection

The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level. First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance. Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We begin by creating a corpus of the Bible in over 1600 languages. Independent web-scraping and aggregation, alignment, and normalization create a ripe multilingual dataset. We then show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.

10.20.21 *** CANCELLED ***
10.27.21 Invited talk: Lisa Miracchi, University of Pennsylvania

The Practical Emergence Approach to Meaning: Avoiding Echo Chambers

I argue for what I call a stance of practical emergence towards intelligence and related kinds such as knowledge and linguisitic competence. Practical emergence is a commitment in explanatory practice to treating higher-level kinds as distinct from lower-level kinds, such that they cannot be reductively identified in lower-level terms, and to assuming that explanations of them in terms of lower-level kinds may be substantive, in that behavior of higher-level kinds cannot be logically or mathematically deduced from lower-level behavior. I’ll flesh out this stance using the Generative Framework for explaining how higher-level kinds obtain in virtue of lower-level kinds. Then I’ll show how this stance of practical emergence, bolstered by the Generative Framework, helps us avoid the pitfall of creating echo chambers, where the reductive hypotheses about intelligence kinds are amplified, not because they are empirically supported, but because they allow for simpler interdisciplinary communication. I'll use as examples recent work on vector representations of word meanings (such as Word2Vec) and alleged implications for heuristic reasoning. Lastly, I’ll discuss some important ethical implications of these echo chambers. I'll argue that the more ethically responsible approach is to adopt practical emergence, because that will help us proactively identify and address the social and ethical implications of differences between vector representations and manipulations of them, on the one hand, and genuinely intelligent semantic knowledge and reasoning, on the other.

11.3.21 EMNLP preview/practice talks

1. Daniel Chen

AutoAspect: Automatic Annotation of Tense and Aspect for Uniform Meaning Representations

We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels.

2. Shiran Dudy

Refocusing on Relevance: Personalization in NLG

Many NLG tasks such as summarization, dialogue response, or open domain question answering focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user’s intent or context of work is not easily recoverable based solely on that source text - a scenario that we argue is more of the rule than the exception. In this work, we argue that NLG systems in general should place a much higher level of emphasis on making use of additional context, and suggest that relevance(as used in Information Retrieval) be thought of as a crucial tool for designing user-oriented text-generating tasks. We further discuss possible harms and hazards around such personalization, and argue that value-sensitive design represents a crucial path forward through these challenges.


11.10.21 EMNLP - no meeting
11.17.21 No meeting
11.24.21 Fall break - no meeting
12.1.21 NOTE: Today's meeting will be zoom only! (https://cuboulder.zoom.us/j/97014876908)

Invited talk: Abe Handler, CU Information Science

Natural Language Processing for Lexical Corpus Analysis

People have been analyzing documents by reading keywords in context for centuries. Traditional approaches like paper concordances or digital keyword-in-context viewers display all occurrences of a single word from a corpus vocabulary amid immediately surrounding tokens or characters, to show readers how individual lexical items are used in bodies of text. We propose that these common tools are one particular application of a more general approach to analyzing documents, which we define as lexical corpus analysis.

In this talk, we will introduce lexical corpus analysis and then present a collection of NLP methods and NLP systems for lexically-oriented corpus investigation. For instance, we will present new methods for single-sentence summarization (sentence compression), and test the application of these methods in our ClioQuery system, which is designed to help historians answer qualitative research questions from news archives. Similarly, we will present the NPFST method for efficiently extracting phases from a corpus, and demonstrate application of NPFST in the Rookie system, designed to help journalists learn about new topics.

12.8.21 Abhidip Bhattacharyya proposal defense
10am MDT(NOTE: special start time)

Multimodal Semantic Role Labeling and its Application

Abstract


Semantic Role Labeling has been a focus of interest in Natural Language research domain. Semantic Role Labeling captures the underlying predicate argument structure of a sentence. Successful detection of SRL has improved many natural language processing applications such as question answering, information extraction, summarizing, dialogue understanding. Advancement of Neural Networks has improved the accuracy of automatic SRL.

Neural networks have similarly improved performance in Computer Vision tasks, such as object detection and localization, automatic caption generation, and visual question answering. Identifying objects accurately is key to the comprehension of an image. Modern caption generation and question answering models rely heavily on object features. Multi-modal representation through neural networks has made it possible to combine information from more than one modality. Modern techniques represent both words and image objects in vector spaces, simplifying the task of communicating between the two modalities.

To truly understand an image, it is also important to understand the roles played by the objects in the scene. In other words, the semantic role of an object can offer richer information about the image. Not much work has been done exploring the potential advantages of using visual semantic role labeling. Hence we propose an investigation of SRL in the domain of computer vision. Our interest is to detect semantic roles of the objects present in the images along with their actions. We are also interested in probing what benefits SRL can offer to multi-modal tasks like image captioning and retrieval.