Skip to main content

Showing 1–30 of 30 results for author: Nematzadeh, A

Searching in archive cs. Search in all archives.
  1. arXiv:2404.16820  [pdf, other


    Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

    Authors: Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh

    Abstract: While text-to-image (T2I) generative models have become ubiquitous, they do not necessarily generate images that align with a given prompt. While previous work has evaluated T2I alignment by proposing metrics, benchmarks, and templates for collecting human judgements, the quality of these components is not systematically measured. Human-rated prompt sets are generally small and the reliability of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Data and code will be released at:

  2. arXiv:2310.03051  [pdf, other

    cs.CL cs.AI

    How FaR Are Large Language Models From Agents with Theory-of-Mind?

    Authors: Pei Zhou, Aman Madaan, Srividya Pranavi Potharaju, Aditya Gupta, Kevin R. McKee, Ari Holtzman, Jay Pujara, Xiang Ren, Swaroop Mishra, Aida Nematzadeh, Shyam Upadhyay, Manaal Faruqui

    Abstract: "Thinking is for Doing." Humans can infer other people's mental states from observations--an ability called Theory-of-Mind (ToM)--and subsequently act pragmatically on those inferences. Existing question answering benchmarks such as ToMi ask models questions to make inferences about beliefs of characters in a story, but do not test whether models can then use these inferences to guide their action… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Preprint, 18 pages, 6 figures, 6 tables

  3. arXiv:2305.14281  [pdf, other

    cs.CL cs.CV

    Weakly-Supervised Learning of Visual Relations in Multimodal Pretraining

    Authors: Emanuele Bugliarello, Aida Nematzadeh, Lisa Anne Hendricks

    Abstract: Recent work in vision-and-language pretraining has investigated supervised signals from object detection data to learn better, fine-grained multimodal representations. In this work, we take a step further and explore how we can tap into supervision from small-scale visual relation data. In particular, we propose two pretraining approaches to contextualise visual entities in a multimodal setup. Wit… ▽ More

    Submitted 19 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023

  4. arXiv:2305.07558  [pdf, other

    cs.CL cs.CV

    Measuring Progress in Fine-grained Vision-and-Language Understanding

    Authors: Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh

    Abstract: While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images. This has resulted in an increased interest in the community to either develop new benchmarks or model… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  5. arXiv:2303.07172  [pdf, other

    cs.AI cs.CV cs.LG

    Evaluating Visual Number Discrimination in Deep Neural Networks

    Authors: Ivana Kajić, Aida Nematzadeh

    Abstract: The ability to discriminate between large and small quantities is a core aspect of basic numerical competence in both humans and animals. In this work, we examine the extent to which the state-of-the-art neural networks designed for vision exhibit this basic ability. Motivated by studies in animal and infant numerical cognition, we use the numerical bisection procedure to test number discriminatio… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

  6. arXiv:2211.08371  [pdf, other


    Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

    Authors: Daniel Fried, Nicholas Tomlin, Jennifer Hu, Roma Patel, Aida Nematzadeh

    Abstract: People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, user-facing artificial intelligence systems will require similar skills in pragmatics: relying on various types of context -- from shared linguistic goals and conventions, to the visual and embodied world -- to use languag… ▽ More

    Submitted 21 November, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Findings of EMNLP 2023

  7. arXiv:2210.07179  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

    Authors: Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal

    Abstract: Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks. We propose MAPL, a simple and parameter-efficient method that reuses frozen pre-trained unimodal models and leverages their strong generalization capabilities in multimodal vision-language (VL) settings. MAPL learns a lightweight mapping between the representation… ▽ More

    Submitted 14 March, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted at EACL 2023 (main track); 26 pages, 21 figures, 6 tables; Pau Rodriguez and Saba Ahmadi had equal contributions

  8. arXiv:2205.12191  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

    Authors: Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

    Abstract: Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA). The quality of such models is commonly assessed by measuring their performance on unseen data that typically comes from the same distribution as the training data. However, when evaluated under out-of-distribu… ▽ More

    Submitted 1 April, 2023; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Findings of EACL 2023. Aishwarya, Ivana, Emanuele and Aida had equal first author contributions. Elnaz and Anita had equal contributions. Aida and Aishwarya had equal senior contributions

  9. arXiv:2204.14198  [pdf, other

    cs.CV cs.AI cs.LG

    Flamingo: a Visual Language Model for Few-Shot Learning

    Authors: Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals , et al. (2 additional authors not shown)

    Abstract: Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily i… ▽ More

    Submitted 15 November, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

    Comments: 54 pages. In Proceedings of Neural Information Processing Systems (NeurIPS) 2022

  10. arXiv:2112.11446  [pdf, other

    cs.CL cs.AI

    Scaling Language Models: Methods, Analysis & Insights from Training Gopher

    Authors: Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor , et al. (55 additional authors not shown)

    Abstract: Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gop… ▽ More

    Submitted 21 January, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: 120 pages

  11. arXiv:2111.00607  [pdf, other


    A Systematic Investigation of Commonsense Knowledge in Large Language Models

    Authors: Xiang Lorraine Li, Adhiguna Kuncoro, Jordan Hoffmann, Cyprien de Masson d'Autume, Phil Blunsom, Aida Nematzadeh

    Abstract: Language models (LMs) trained on large amounts of data have shown impressive performance on many NLP tasks under the zero-shot and few-shot setup. Here we aim to better understand the extent to which such models learn commonsense knowledge -- a critical component of many NLP applications. We conduct a systematic and rigorous zero-shot and few-shot commonsense evaluation of large pre-trained LMs, w… ▽ More

    Submitted 31 October, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

    Comments: Accepted to EMNLP 2022

  12. arXiv:2106.09141  [pdf, other

    cs.CL cs.CV

    Probing Image-Language Transformers for Verb Understanding

    Authors: Lisa Anne Hendricks, Aida Nematzadeh

    Abstract: Multimodal image-language transformers have achieved impressive results on a variety of tasks that rely on fine-tuning (e.g., visual question answering and image retrieval). We are interested in shedding light on the quality of their pretrained representations -- in particular, if these models can distinguish different types of verbs or if they rely solely on nouns in a given sentence. To do so, w… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

  13. arXiv:2102.00529  [pdf, other

    cs.CL cs.CV

    Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers

    Authors: Lisa Anne Hendricks, John Mellor, Rosalia Schneider, Jean-Baptiste Alayrac, Aida Nematzadeh

    Abstract: Recently multimodal transformer models have gained popularity because their performance on language and vision tasks suggest they learn rich visual-linguistic representations. Focusing on zero-shot image retrieval tasks, we study three important factors which can impact the quality of learned representations: pretraining data, the attention mechanism, and loss functions. By pretraining models on s… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

    Comments: pre-print of MIT Press Publication version

  14. arXiv:2012.03370  [pdf, other

    cs.CL cs.LG

    Competition in Cross-situational Word Learning: A Computational Study

    Authors: Aida Nematzadeh, Zahra Shekarchi, Thomas L. Griffiths, Suzanne Stevenson

    Abstract: Children learn word meanings by tapping into the commonalities across different situations in which words are used and overcome the high level of uncertainty involved in early word learning experiences. We propose a modeling framework to investigate the role of mutual exclusivity bias - asserting one-to-one mappings between words and their meanings - in reducing uncertainty in word learning. In a… ▽ More

    Submitted 27 July, 2021; v1 submitted 6 December, 2020; originally announced December 2020.

    Comments: 38 pages, 4 figures, 2 tables

    MSC Class: 68T50; 91F20; 68T05 ACM Class: I.2.7; I.2.6; G.3; J.4

  15. arXiv:2005.03684  [pdf, other

    cs.CL cs.CV

    Learning to Segment Actions from Observation and Narration

    Authors: Daniel Fried, Jean-Baptiste Alayrac, Phil Blunsom, Chris Dyer, Stephen Clark, Aida Nematzadeh

    Abstract: We apply a generative segmental model of task structure, guided by narration, to action segmentation in video. We focus on unsupervised and weakly-supervised settings where no action labels are known during training. Despite its simplicity, our model performs competitively with previous work on a dataset of naturalistic instructional videos. Our model allows us to vary the sources of supervision u… ▽ More

    Submitted 11 August, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: ACL 2020

  16. arXiv:2003.05078  [pdf, other

    cs.CV cs.CL cs.LG

    Visual Grounding in Video for Unsupervised Word Translation

    Authors: Gunnar A. Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, João Carreira, Phil Blunsom, Andrew Zisserman

    Abstract: There are thousands of actively spoken languages on Earth, but a single visual world. Grounding in this visual world has the potential to bridge the gap between all these languages. Our goal is to use visual grounding to improve unsupervised word mapping between languages. The key idea is to establish a common visual representation between two languages by learning embeddings from unpaired instruc… ▽ More

    Submitted 26 March, 2020; v1 submitted 10 March, 2020; originally announced March 2020.

    Comments: CVPR 2020

    Journal ref: CVPR 2020

  17. arXiv:1910.05870  [pdf, other

    physics.soc-ph cs.SI

    Network Modularity Controls the Speed of Information Diffusion

    Authors: Hao Peng, Azadeh Nematzadeh, Daniel M. Romero, Emilio Ferrara

    Abstract: The rapid diffusion of information and the adoption of social behaviors are of critical importance in situations as diverse as collective actions, pandemic prevention, or advertising and marketing. Although the dynamics of large cascades have been extensively studied in various contexts, few have systematically examined the impact of network topology on the efficiency of information diffusion. Her… ▽ More

    Submitted 30 July, 2020; v1 submitted 13 October, 2019; originally announced October 2019.

  18. arXiv:1909.01093  [pdf, ps, other

    cs.SI cs.CL cs.IR cs.LG stat.ML

    Empirical Study on Detecting Controversy in Social Media

    Authors: Azadeh Nematzadeh, Grace Bang, Xiaomo Liu, Zhiqiang Ma

    Abstract: Companies and financial investors are paying increasing attention to social consciousness in developing their corporate strategies and making investment decisions to support a sustainable economy for the future. Public discussion on incidents and events -- controversies -- of companies can provide valuable insights on how well the company operates with regards to social consciousness and indicate… ▽ More

    Submitted 25 August, 2019; originally announced September 2019.

    Comments: The work is accepted by the 2nd KDD Workshop on Anomaly Detection in Finance, 2019. The authors contributed equally to this work, listed in the alphabetical order

  19. arXiv:1902.04613  [pdf, other

    cs.SI physics.soc-ph q-fin.GN

    Global labor flow network reveals the hierarchical organization and dynamics of geo-industrial clusters in the world economy

    Authors: Jaehyuk Park, Ian Wood, Elise Jing, Azadeh Nematzadeh, Souvik Ghosh, Michael Conover, Yong-Yeol Ahn

    Abstract: Groups of firms often achieve a competitive advantage through the formation of geo-industrial clusters. Although many exemplary clusters, such as Hollywood or Silicon Valley, have been frequently studied, systematic approaches to identify and analyze the hierarchical structure of the geo-industrial clusters at the global scale are rare. In this work, we use LinkedIn's employment histories of more… ▽ More

    Submitted 19 March, 2019; v1 submitted 12 February, 2019; originally announced February 2019.

    Journal ref: Nature Communicationsvolume 10, Article number: 3449 (2019)

  20. arXiv:1808.09352  [pdf, other


    Evaluating Theory of Mind in Question Answering

    Authors: Aida Nematzadeh, Kaylee Burns, Erin Grant, Alison Gopnik, Thomas L. Griffiths

    Abstract: We propose a new dataset for evaluating question answering models with respect to their capacity to reason about beliefs. Our tasks are inspired by theory-of-mind experiments that examine whether children are able to reason about the beliefs of others, in particular when those beliefs differ from reality. We evaluate a number of recent neural models with memory augmentation. We find that all fail… ▽ More

    Submitted 28 August, 2018; originally announced August 2018.

  21. arXiv:1806.00074  [pdf, other

    physics.soc-ph cs.SI

    Optimal modularity in complex contagion

    Authors: Azadeh Nematzadeh, Nathaniel Rodriguez, Alessandro Flammini, Yong-Yeol Ahn

    Abstract: In this chapter, we apply the theoretical framework introduced in the previous chapter to study how the modular structure of the social network affects the spreading of complex contagion. In particular, we focus on the notion of optimal modularity, that predicts the occurrence of global cascades when the network exhibits just the right amount of modularity. Here we generalize the findings by assum… ▽ More

    Submitted 31 May, 2018; originally announced June 2018.

    Journal ref: Nematzadeh, A., Rodriguez, N., Flammini, A., & Ahn, Y. (2018). Optimal modularity in complex contagion. In Complex Spreading Phenomena in Social Systems (1st ed., Computational Social Sciences). Springer International Publishing

  22. arXiv:1805.07647  [pdf, other


    Learning Hierarchical Visual Representations in Deep Neural Networks Using Hierarchical Linguistic Labels

    Authors: Joshua C. Peterson, Paul Soulos, Aida Nematzadeh, Thomas L. Griffiths

    Abstract: Modern convolutional neural networks (CNNs) are able to achieve human-level object classification accuracy on specific tasks, and currently outperform competing models in explaining complex human visual representations. However, the categorization problem is posed differently for these networks than for humans: the accuracy of these networks is evaluated by their ability to identify single labels… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

    Comments: 6 pages, 4 figures, 1 table. Accepted as a paper to the 40th Annual Meeting of the Cognitive Science Society (CogSci 2018)

  23. arXiv:1711.11125  [pdf, other


    Predicting and Explaining Human Semantic Search in a Cognitive Model

    Authors: Filip Miscevic, Aida Nematzadeh, Suzanne Stevenson

    Abstract: Recent work has attempted to characterize the structure of semantic memory and the search algorithms which, together, best approximate human patterns of search revealed in a semantic fluency task. There are a number of models that seek to capture semantic search processes over networks, but they vary in the cognitive plausibility of their implementation. Existing work has also neglected to conside… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: To appear in proceedings for CMCL 2018

  24. How algorithmic popularity bias hinders or promotes quality

    Authors: Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Filippo Menczer, Alessandro Flammini

    Abstract: Algorithms that favor popular items are used to help us select among many choices, from engaging articles on a social media news feed to songs and books that others have purchased, and from top-raked search engine results to highly-cited scientific papers. The goal of these algorithms is to identify high-quality items such as reliable news, beautiful movies, prestigious information sources, and im… ▽ More

    Submitted 14 July, 2017; v1 submitted 3 July, 2017; originally announced July 2017.

    Journal ref: Scientific Reports Volume 8, Article number: 15951 (2018)

  25. arXiv:1702.06672  [pdf, other


    Calculating Probabilities Simplifies Word Learning

    Authors: Aida Nematzadeh, Barend Beekhuizen, Shanshan Huang, Suzanne Stevenson

    Abstract: Children can use the statistical regularities of their environment to learn word meanings, a mechanism known as cross-situational learning. We take a computational approach to investigate how the information present during each observation in a cross-situational framework can affect the overall acquisition of word meanings. We do so by formulating various in-the-moment learning mechanisms that are… ▽ More

    Submitted 21 February, 2017; originally announced February 2017.

  26. arXiv:1610.06497  [pdf, other

    cs.SI cs.HC physics.soc-ph

    Information Overload in Group Communication: From Conversation to Cacophony in the Twitch Chat

    Authors: Azadeh Nematzadeh, Giovanni Luca Ciampaglia, Yong-Yeol Ahn, Alessandro Flammini

    Abstract: Online communication channels, especially social web platforms, are rapidly replacing traditional ones. Online platforms allow users to overcome physical barriers, enabling worldwide participation. However, the power of online communication bears an important negative consequence --- we are exposed to too much information to process. Too many participants, for example, can turn online public space… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

    Comments: 25 pages, 8 figures

    Journal ref: Nematzadeh et al. 2019. R. Soc. open sci. 6: 191412

  27. arXiv:1602.05944  [pdf, ps, other


    The Interaction of Memory and Attention in Novel Word Generalization: A Computational Investigation

    Authors: Erin Grant, Aida Nematzadeh, Suzanne Stevenson

    Abstract: People exhibit a tendency to generalize a novel noun to the basic-level in a hierarchical taxonomy -- a cognitively salient category such as "dog" -- with the degree of generalization depending on the number and type of exemplars. Recently, a change in the presentation timing of exemplars has also been shown to have an effect, surprisingly reversing the prior observed pattern of basic-level genera… ▽ More

    Submitted 18 February, 2016; originally announced February 2016.

  28. arXiv:1602.03265  [pdf, other


    Simple Search Algorithms on Semantic Networks Learned from Language Use

    Authors: Aida Nematzadeh, Filip Miscevic, Suzanne Stevenson

    Abstract: Recent empirical and modeling research has focused on the semantic fluency task because it is informative about semantic memory. An interesting interplay arises between the richness of representations in semantic memory and the complexity of algorithms required to process it. It has remained an open question whether representations of words and their relations learned from language use can enable… ▽ More

    Submitted 10 February, 2016; v1 submitted 9 February, 2016; originally announced February 2016.

  29. arXiv:1401.1257  [pdf, other

    physics.soc-ph cs.SI

    Optimal network modularity for information diffusion

    Authors: Azadeh Nematzadeh, Emilio Ferrara, Alessandro Flammini, Yong-Yeol Ahn

    Abstract: We investigate the impact of community structure on information diffusion with the linear threshold model. Our results demonstrate that modular structure may have counter-intuitive effects on information diffusion when social reinforcement is present. We show that strong communities can facilitate global diffusion by enhancing local, intra-community spreading. Using both analytic approaches and nu… ▽ More

    Submitted 18 September, 2014; v1 submitted 6 January, 2014; originally announced January 2014.

    Comments: 8 pages, 10 figures

    Journal ref: Phys. Rev. Lett. 113, 088701 (2014)

  30. arXiv:1103.4090  [pdf

    q-bio.QM cs.CL cs.IR cs.LG

    A Linear Classifier Based on Entity Recognition Tools and a Statistical Approach to Method Extraction in the Protein-Protein Interaction Literature

    Authors: Anália Lourenço, Michael Conover, Andrew Wong, Azadeh Nematzadeh, Fengxia Pan, Hagit Shatkay, Luis M. Rocha

    Abstract: We participated, in the Article Classification and the Interaction Method subtasks (ACT and IMT, respectively) of the Protein-Protein Interaction task of the BioCreative III Challenge. For the ACT, we pursued an extensive testing of available Named Entity Recognition and dictionary tools, and used the most promising ones to extend our Variable Trigonometric Threshold linear classifier. For the IMT… ▽ More

    Submitted 22 April, 2011; v1 submitted 21 March, 2011; originally announced March 2011.

    Comments: BMC Bioinformatics. In Press