Difference between revisions of "Fall 2022 Schedule"

From CompSemWiki
Jump to navigationJump to search
Line 43: Line 43:
  
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 10.13.21 || Invited guest: Arya McCarthy
+
| 10.13.21 || Invited guest - in person!: [https://cs.jhu.edu/~arya Arya McCarthy], Johns Hopkins University
 +
 
 +
''Kilolanguage Processing by Projection''
 +
 
 +
The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level.
 +
First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance.
 +
Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We begin by creating a corpus of the Bible in over 1600 languages. Independent web-scraping and aggregation, alignment, and normalization create a ripe multilingual dataset. We then show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.
 +
 
 +
 
 +
 
 +
 
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
| 10.20.21 ||  
 
| 10.20.21 ||  
Line 49: Line 59:
 
| 10.27.21 || Invited talk: Lisa Miracchi
 
| 10.27.21 || Invited talk: Lisa Miracchi
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
| 11.3.21 || EMNLP practice talks
+
| 11.3.21 || EMNLP practice talks/preview
 
|- style="border-top: 2px solid DarkGray;"
 
|- style="border-top: 2px solid DarkGray;"
 
| 11.10.21 || EMNLP - no meeting
 
| 11.10.21 || EMNLP - no meeting

Revision as of 19:27, 11 October 2021

Location: 279, Fleming Building.

Time: 10:30am, Mountain Time

Zoom link: https://cuboulder.zoom.us/j/97014876908

Date Title
9.1.21 Planning, introductions, welcome!

CompSem meetings will be hybrid this semester - in person at Fleming 279, and online here: https://cuboulder.zoom.us/j/97014876908

9.8.21 10am (NOTE: special start time)

Yoshinari Fujinuma thesis defense

Analysis and Applications of Cross-Lingual Models in Natural Language Processing

Human languages vary in terms of both typologically and data availability. A typical machine learning-based approach for natural language processing (NLP) requires training data from the language of interest. However, because machine learning-based approaches heavily rely on the amount of data available in each language, the quality of trained model languages without a large amount of data is poor. One way to overcome the lack of data in each language is to conduct cross-lingual transfer learning from resource-rich languages to resource-scarce languages. Cross-lingual word embeddings and multilingual contextualized embeddings are commonly used to conduct cross-lingual transfer learning. However, the lack of resources still makes it challenging to either evaluate or improve such models. This dissertation first proposes a graph-based method to overcome the lack of evaluation data in low-resource languages by focusing on the structure of cross-lingual word embeddings, further discussing approaches to improve cross-lingual transfer learning by using retrofitting methods and by focusing on a specific task. Finally, it provides an analysis of the effect of adding different languages when pretraining multilingual models.

9.15.21 ACL best paper recaps
9.22.21 Introduction to AI Institute (short talks)
9.29.21 *** CANCELLED ***
10.6.21 Invited talk: Artemis Panagopoulou, University of Pennsylvania

Link to thesis page

Metaphor and Entailment: Looking at metaphors through the lens of textual entailment

Metaphors are very intriguing elements of human language that are surprisingly prevalent in our everyday communications. Humans are pretty good at understanding metaphors, even if it is the first time they encounter them. Empirical studies indicate that 20% of our daily language use is metaphorical. Naturally, the ubiquity of metaphors draw the attention of psychologists who showed that the human brain processes conventional metaphors in the same speed as literal language. Nevertheless, the computational linguistics literature consistently treats metaphors as a separate domain to literal language. Earlier work has shown that traditional pipelines do not perform well on metaphoric datasets. Synchronously, the literature on computational understanding of metaphors has largely focused on developing concrete metaphor detection systems, coupled with interpretation systems targeted solely on metaphors. This tendency has presented across various aspects of the field, such as the purposeful exclusion of figurative language from large scale datasets. This study investigates the potential of constructing systems that can jointly handle metaphoric and literal sentences by leveraging the newfound capabilities of deep learning systems.

We narrow the scope of the report, following earlier work, to evaluate deep learning systems fine-tuned on the task of textual entailment (TE). We argue that TE is a task naturally suited to the interpretation of metaphoric language. We show that TE systems can improve significantly in metaphoric performance by being fine-tuned on a small dataset with metaphoric premises. Even though the improvement in performance on metaphors is typically accompanied by a drop in performance on the original dataset we note that auto-regressive models seem to show a smaller drop in performance on literal examples compared to other types of models.

10.13.21 Invited guest - in person!: Arya McCarthy, Johns Hopkins University

Kilolanguage Processing by Projection

The breadth of information digitized in the world’s languages gives opportunities for linguistic insights and computational tools with pan-lingual perspective. We can achieve this by projecting lexical information across language, either at the type or token level. First, we project information between thousands of languages at the type level to investigate the classic color word hypotheses of Berlin and Kay. Applying fourteen computational linguistic measures of color word basicness/secondariness, we find cross-linguistic credence and shed additional nuance. Second, we project information between thousands of languages at the token level to create fine-grained morphological analyzers and generators. We begin by creating a corpus of the Bible in over 1600 languages. Independent web-scraping and aggregation, alignment, and normalization create a ripe multilingual dataset. We then show applications to pronoun clusivity and multilingual MT. Finally, we produce morphological tools grounded in UniMorph that improve on strong initial models and generalize across languages.



10.20.21
10.27.21 Invited talk: Lisa Miracchi
11.3.21 EMNLP practice talks/preview
11.10.21 EMNLP - no meeting
11.17.21 Elizabeth Spaulding prelim
11.24.21 Fall break - no meeting
12.1.21 Invited talk: Abe Handler
12.8.21 Abhidip Bhattacharyya proposal defense


Past Schedules