Skip to main content

Showing 1–13 of 13 results for author: Yarowsky, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.10963  [pdf, other

    cs.CL

    Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That!

    Authors: Niyati Bafna, Philipp Koehn, David Yarowsky

    Abstract: While Transformer-based neural machine translation (NMT) is very effective in high-resource settings, many languages lack the necessary large parallel corpora to benefit from it. In the context of low-resource (LR) MT between two closely-related languages, a natural intuition is to seek benefits from structural "shortcuts", such as copying subwords from the source to the target, given that such la… ▽ More

    Submitted 25 March, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: 4 pages

  2. arXiv:2205.03608  [pdf, other

    cs.CL

    UniMorph 4.0: Universal Morphology

    Authors: Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay , et al. (71 additional authors not shown)

    Abstract: The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This pa… ▽ More

    Submitted 19 June, 2022; v1 submitted 7 May, 2022; originally announced May 2022.

    Comments: LREC 2022; The first two authors made equal contributions

  3. arXiv:1910.12299  [pdf, other

    cs.CL cs.SD eess.AS

    Induced Inflection-Set Keyword Search in Speech

    Authors: Oliver Adams, Matthew Wiesner, Jan Trmal, Garrett Nicolai, David Yarowsky

    Abstract: We investigate the problem of searching for a lexeme-set in speech by searching for its inflectional variants. Experimental results indicate how lexeme-set search performance changes with the number of hypothesized inflections, while ablation experiments highlight the relative importance of different components in the lexeme-set search pipeline and the value of using curated inflectional paradigms… ▽ More

    Submitted 21 May, 2020; v1 submitted 27 October, 2019; originally announced October 2019.

    Comments: To appear in SIGMORPHON 2020

  4. arXiv:1910.01531  [pdf, other

    cs.CL

    Modeling Color Terminology Across Thousands of Languages

    Authors: Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

    Abstract: There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Co… ▽ More

    Submitted 3 October, 2019; originally announced October 2019.

    Comments: Accepted for presentation at EMNLP-IJCNLP 2019

  5. arXiv:1904.02210  [pdf, other

    cs.CL cs.LG

    Massively Multilingual Adversarial Speech Recognition

    Authors: Oliver Adams, Matthew Wiesner, Shinji Watanabe, David Yarowsky

    Abstract: We report on adaptation of multilingual end-to-end speech recognition models trained on as many as 100 languages. Our findings shed light on the relative importance of similarity between the target and pretraining languages along the dimensions of phonetics, phonology, language family, geographical location, and orthography. In this context, experiments demonstrate the effectiveness of two additio… ▽ More

    Submitted 3 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL-HLT 2019

  6. arXiv:1810.11101  [pdf, other

    cs.CL

    UniMorph 2.0: Universal Morphology

    Authors: Christo Kirov, Ryan Cotterell, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sabrina J. Mielke, Arya D. McCarthy, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The Universal Morphology UniMorph project is a collaborative effort to improve how NLP handles complex morphology across the world's languages. The project releases annotated morphological data using a universal tagset, the UniMorph schema. Each inflected form is associated with a lemma, which typically carries its underlying lexical meaning, and a bundle of morphological features from our schema.… ▽ More

    Submitted 25 February, 2020; v1 submitted 25 October, 2018; originally announced October 2018.

    Comments: LREC 2018

  7. arXiv:1810.07125  [pdf, other

    cs.CL

    The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL--SIGMORPHON 2018 shared task on supervised learning of morphological generation featured data sets from 103 typologically diverse languages. Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a… ▽ More

    Submitted 25 February, 2020; v1 submitted 16 October, 2018; originally announced October 2018.

    Comments: CoNLL 2018. arXiv admin note: text overlap with arXiv:1706.09031

  8. Marrying Universal Dependencies and Universal Morphology

    Authors: Arya D. McCarthy, Miikka Silfverberg, Ryan Cotterell, Mans Hulden, David Yarowsky

    Abstract: The Universal Dependencies (UD) and Universal Morphology (UniMorph) projects each present schemata for annotating the morphosyntactic details of language. Each project also provides corpora of annotated text in many languages - UD at the token level and UniMorph at the type level. As each corpus is built by different annotators, language-specific decisions hinder the goal of universal schemata. Wi… ▽ More

    Submitted 15 October, 2018; originally announced October 2018.

    Comments: UDW18

    Journal ref: Proceedings of the Second Workshop on Universal Dependencies (2018) 91-101

  9. arXiv:1708.09151  [pdf, ps, other

    cs.CL

    Paradigm Completion for Derivational Morphology

    Authors: Ryan Cotterell, Ekaterina Vylomova, Huda Khayrallah, Christo Kirov, David Yarowsky

    Abstract: The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models, a… ▽ More

    Submitted 30 August, 2017; originally announced August 2017.

    Comments: EMNLP 2017

  10. arXiv:1706.09031  [pdf, other

    cs.CL

    CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages

    Authors: Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, Mans Hulden

    Abstract: The CoNLL-SIGMORPHON 2017 shared task on supervised morphological generation required systems to be trained and tested in each of 52 typologically diverse languages. In sub-task 1, submitted systems were asked to predict a specific inflected form of a given lemma. In sub-task 2, systems were given a lemma and some of its specific inflected forms, and asked to complete the inflectional paradigm by… ▽ More

    Submitted 4 July, 2017; v1 submitted 27 June, 2017; originally announced June 2017.

    Comments: CoNLL 2017

  11. arXiv:cs/0105003  [pdf, ps, other

    cs.CL cs.AI

    Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking

    Authors: Grace Ngai, David Yarowsky

    Abstract: This paper presents a comprehensive empirical comparison between two approaches for developing a base noun phrase chunker: human rule writing and active learning using interactive real-time human annotation. Several novel variations on active learning are investigated, and underlying cost models for cross-modal machine learning comparison are presented and explored. Results show that it is more… ▽ More

    Submitted 2 May, 2001; originally announced May 2001.

    Comments: 9 pages, 4 figures, appeared in ACL2000

    ACM Class: I.2.7

    Journal ref: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pages 117-125, Hong Kong (2000)

  12. arXiv:cs/0104019  [pdf, ps, other

    cs.CL

    Dynamic Nonlocal Language Modeling via Hierarchical Topic-Based Adaptation

    Authors: Radu Florian, David Yarowsky

    Abstract: This paper presents a novel method of generating and applying hierarchical, dynamic topic-based language models. It proposes and evaluates new cluster generation, hierarchical smoothing and adaptive topic-probability estimation techniques. These combined models help capture long-distance lexical dependencies. Experiments on the Broadcast News corpus show significant improvement in perplexity (10… ▽ More

    Submitted 27 April, 2001; originally announced April 2001.

    Comments: 8 pages, 29 figures, presented at ACL99, College Park, Maryland

    ACM Class: I.2.7

    Journal ref: Proceedings of the 37th Annual Meeting of the ACL, pages 167-174, College Park, Maryland

  13. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French

    Authors: David Yarowsky

    Abstract: This paper presents a statistical decision procedure for lexical ambiguity resolution. The algorithm exploits both local syntactic patterns and more distant collocational evidence, generating an efficient, effective, and highly perspicuous recipe for resolving a given ambiguity. By identifying and utilizing only the single best disambiguating evidence in a target context, the algorithm avoids th… ▽ More

    Submitted 22 June, 1994; originally announced June 1994.

    Comments: 8 pages, latex-acl, to appear in ACL-94