Fall 2022 Schedule

From CompSemWiki
Revision as of 14:33, 5 November 2021 by CompSemUser (talk | contribs)
Jump to navigationJump to search

Location: 279, Fleming Building.

Time: 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
9.1.21 Planning, introductions, welcome!

CompSem meetings will be hybrid this semester - in person at Fleming 279, and online here: https://cuboulder.zoom.us/j/97014876908

9.8.21 10am (NOTE: special start time)

Yoshinari Fujinuma thesis defense

Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models.

9.15.21 ACL best paper recaps
9.22.21 Introduction to AI Institute (short talks)
9.29.21 *** CANCELLED ***
10.6.21 Invited talk: Artemis Panagopoulou, University of Pennsylvania

Link to thesis page

Metaphor and Entailment: Looking at metaphors through the lens of textual entailment

Metaphors are very intriguing elements of human language that are surprisingly prevalent in our everyday communications. Humans are pretty good at understanding metaphors, even if it is the first time they encounter them. Empirical studies indicate that 20% of our daily language use is metaphorical. Naturally, the ubiquity of metaphors draw the attention of psychologists who showed that the human brain processes conventional metaphors in the same speed as literal language. Nevertheless, the computational linguistics literature consistently treats metaphors as a separate domain to literal language. Earlier work has shown that traditional pipelines do not perform well on metaphoric datasets. Synchronously, the literature on computational understanding of metaphors has largely focused on developing concrete metaphor detection systems, coupled with interpretation systems targeted solely on metaphors. This tendency has presented across various aspects of the field, such as the purposeful exclusion of figurative language from large scale datasets. This study investigates the potential of constructing systems that can jointly handle metaphoric and literal sentences by leveraging the newfound capabilities of deep learning systems.

We narrow the scope of the report, following earlier work, to evaluate deep learning systems fine-tuned on the task of textual entailment (TE). We argue that TE is a task naturally suited to the interpretation of metaphoric language. We show that TE systems can improve significantly in metaphoric performance by being fine-tuned on a small dataset with metaphoric premises. Even though the improvement in performance on metaphors is typically accompanied by a drop in performance on the original dataset we note that auto-regressive models seem to show a smaller drop in performance on literal examples compared to other types of models.

10.13.21 Invited guest - in person!: Arya McCarthy, Johns Hopkins University

Kilolanguage Processing by Projection

The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level. First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance. Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We begin by creating a corpus of the Bible in over 1600 languages. Independent web-scraping and aggregation, alignment, and normalization create a ripe multilingual dataset. We then show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.

10.20.21 *** CANCELLED ***
10.27.21 Invited talk: Lisa Miracchi, University of Pennsylvania

The Practical Emergence Approach to Meaning: Avoiding Echo Chambers

I argue for what I call a stance of practical emergence towards intelligence and related kinds such as knowledge and linguisitic competence. Practical emergence is a commitment in explanatory practice to treating higher-level kinds as distinct from lower-level kinds, such that they cannot be reductively identified in lower-level terms, and to assuming that explanations of them in terms of lower-level kinds may be substantive, in that behavior of higher-level kinds cannot be logically or mathematically deduced from lower-level behavior. I’ll flesh out this stance using the Generative Framework for explaining how higher-level kinds obtain in virtue of lower-level kinds. Then I’ll show how this stance of practical emergence, bolstered by the Generative Framework, helps us avoid the pitfall of creating echo chambers, where the reductive hypotheses about intelligence kinds are amplified, not because they are empirically supported, but because they allow for simpler interdisciplinary communication. I'll use as examples recent work on vector representations of word meanings (such as Word2Vec) and alleged implications for heuristic reasoning. Lastly, I’ll discuss some important ethical implications of these echo chambers. I'll argue that the more ethically responsible approach is to adopt practical emergence, because that will help us proactively identify and address the social and ethical implications of differences between vector representations and manipulations of them, on the one hand, and genuinely intelligent semantic knowledge and reasoning, on the other.

11.3.21 EMNLP preview/practice talks

1. Daniel Chen

AutoAspect: Automatic Annotation of Tense and Aspect for Uniform Meaning Representations

We present AutoAspect, a novel, rule-based annotation tool for labeling tense and aspect. The pilot version annotates English data. The aspect labels are designed specifically for Uniform Meaning Representations (UMR), an annotation schema that aims to encode crosslingual semantic information. The annotation tool combines syntactic and semantic cues to assign aspects on a sentence-by-sentence basis, following a sequence of rules that each output a UMR aspect. Identified events proceed through the sequence until they are assigned an aspect. We achieve a recall of 76.17% for identifying UMR events and an accuracy of 62.57% on all identified events, with high precision values for 2 of the aspect labels.

2. Shiran Dudy

Refocusing on Relevance: Personalization in NLG

Many NLG tasks such as summarization, dialogue response, or open domain question answering focus primarily on a source text in order to generate a target response. This standard approach falls short, however, when a user’s intent or context of work is not easily recoverable based solely on that source text - a scenario that we argue is more of the rule than the exception. In this work, we argue that NLG systems in general should place a much higher level of emphasis on making use of additional context, and suggest that relevance(as used in Information Retrieval) be thought of as a crucial tool for designing user-oriented text-generating tasks. We further discuss possible harms and hazards around such personalization, and argue that value-sensitive design represents a crucial path forward through these challenges.


11.10.21 EMNLP - no meeting
11.17.21
11.24.21 Fall break - no meeting
12.1.21 Invited talk: Abe Handler
12.8.21 Abhidip Bhattacharyya proposal defense


Past Schedules