Skip to main content

Showing 1–28 of 28 results for author: Onoe, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02793  [pdf, other

    cs.CV cs.CL

    ImageInWords: Unlocking Hyper-Detailed Image Descriptions

    Authors: Roopal Garg, Andrea Burns, Burcu Karagol Ayan, Yonatan Bitton, Ceslee Montgomery, Yasumasa Onoe, Andrew Bunner, Ranjay Krishna, Jason Baldridge, Radu Soricut

    Abstract: Despite the longstanding adage "an image is worth a thousand words," creating accurate and hyper-detailed image descriptions for training Vision-Language models remains challenging. Current datasets typically have web-scraped descriptions that are short, low-granularity, and often contain details unrelated to the visual content. As a result, models trained on such data generate descriptions replet… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: Webpage (https://google.github.io/imageinwords), GitHub (https://github.com/google/imageinwords), HuggingFace (https://huggingface.co/datasets/google/imageinwords)

  2. arXiv:2404.19753  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    DOCCI: Descriptions of Connected and Contrasting Images

    Authors: Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

    Abstract: Vision-language datasets are vital for both text-to-image (T2I) and image-to-text (I2T) research. However, current datasets lack descriptions with fine-grained detail that would allow for richer associations to be learned by models. To fill the gap, we introduce Descriptions of Connected and Contrasting Images (DOCCI), a dataset with long, human-annotated English descriptions for 15k images that w… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  3. arXiv:2404.16820  [pdf, other

    cs.CV

    Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

    Authors: Olivia Wiles, Chuhan Zhang, Isabela Albuquerque, Ivana Kajić, Su Wang, Emanuele Bugliarello, Yasumasa Onoe, Chris Knutsen, Cyrus Rashtchian, Jordi Pont-Tuset, Aida Nematzadeh

    Abstract: While text-to-image (T2I) generative models have become ubiquitous, they do not necessarily generate images that align with a given prompt. While previous work has evaluated T2I alignment by proposing metrics, benchmarks, and templates for collecting human judgements, the quality of these components is not systematically measured. Human-rated prompt sets are generally small and the reliability of… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Data and code will be released at: https://github.com/google-deepmind/gecko_benchmark_t2i

  4. arXiv:2403.03741  [pdf, other

    cs.LG cs.AI cs.CV

    SUPClust: Active Learning at the Boundaries

    Authors: Yuta Ono, Till Aczel, Benjamin Estermann, Roger Wattenhofer

    Abstract: Active learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUPClust aims to gather information that is most informative for refining the model's… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted at ICLR 2024 Workshop on Practical Machine Learning for Low Resource Settings (PML4LRS)

  5. arXiv:2306.10727  [pdf, other

    cs.CL

    Jamp: Controlled Japanese Temporal Inference Dataset for Evaluating Generalization Capacity of Language Models

    Authors: Tomoki Sugimoto, Yasumasa Onoe, Hitomi Yanaka

    Abstract: Natural Language Inference (NLI) tasks involving temporal inference remain challenging for pre-trained language models (LMs). Although various datasets have been created for this task, they primarily focus on English and do not address the need for resources in other languages. It is unclear whether current LMs realize the generalization capacity for temporal inference across languages. In this pa… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: To appear in the Proceedings of the Association for Computational Linguistics: Student Research Workshop (ACL-SRW 2023)

  6. arXiv:2306.09306  [pdf, other

    cs.CL

    Propagating Knowledge Updates to LMs Through Distillation

    Authors: Shankar Padmanabhan, Yasumasa Onoe, Michael J. Q. Zhang, Greg Durrett, Eunsol Choi

    Abstract: Modern language models have the capacity to store and use immense amounts of knowledge about real-world entities, but it remains unclear how to update such knowledge stored in model parameters. While prior methods for updating knowledge in LMs successfully inject atomic facts, updated LMs fail to make inferences based on injected facts. In this work, we demonstrate that a context distillation-base… ▽ More

    Submitted 30 October, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Camera Ready

  7. arXiv:2305.08073  [pdf, other

    cs.LG

    HiPerformer: Hierarchically Permutation-Equivariant Transformer for Time Series Forecasting

    Authors: Ryo Umagami, Yu Ono, Yusuke Mukuta, Tatsuya Harada

    Abstract: It is imperative to discern the relationships between multiple time series for accurate forecasting. In particular, for stock prices, components are often divided into groups with the same characteristics, and a model that extracts relationships consistent with this group structure should be effective. Thus, we propose the concept of hierarchical permutation-equivariance, focusing on index swappin… ▽ More

    Submitted 14 May, 2023; originally announced May 2023.

    Comments: 10 pages, 3 figures

  8. arXiv:2305.01651  [pdf, other

    cs.CL

    Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected Knowledge

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Shankar Padmanabhan, Greg Durrett, Eunsol Choi

    Abstract: Pre-trained language models (LMs) are used for knowledge intensive tasks like question answering, but their knowledge gets continuously outdated as the world changes. Prior work has studied targeted updates to LMs, injecting individual facts and evaluating whether the model learns these facts while not changing predictions on other contexts. We take a step forward and study LMs' abilities to make… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  9. arXiv:2212.06909  [pdf, other

    cs.CV cs.AI

    Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting

    Authors: Su Wang, Chitwan Saharia, Ceslee Montgomery, Jordi Pont-Tuset, Shai Noy, Stefano Pellegrini, Yasumasa Onoe, Sarah Laszlo, David J. Fleet, Radu Soricut, Jason Baldridge, Mohammad Norouzi, Peter Anderson, William Chan

    Abstract: Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplish… ▽ More

    Submitted 12 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: CVPR 2023 Camera Ready

  10. arXiv:2212.01641  [pdf, other

    cs.CL cs.LG

    Intermediate Entity-based Sparse Interpretable Representation Learning

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh, Byron C. Wallace

    Abstract: Interpretable entity representations (IERs) are sparse embeddings that are "human-readable" in that dimensions correspond to fine-grained entity types and values are predicted probabilities that a given entity is of the corresponding type. These methods perform well in zero-shot and low supervision settings. Compared to standard dense neural embeddings, such interpretable representations may permi… ▽ More

    Submitted 3 December, 2022; originally announced December 2022.

    Comments: Accepted into BlackBox NLP Workshop at EMNLP 2022

  11. arXiv:2209.04077  [pdf, other

    cs.SD cs.MM eess.AS

    Prediction method of Soundscape Impressions using Environmental Sounds and Aerial Photographs

    Authors: Yusuke Ono, Sunao Hara, Masanobu Abe

    Abstract: We investigate an method for quantifying city characteristics based on impressions of a sound environment. The quantification of the city characteristics will be beneficial to government policy planning, tourism projects, etc. In this study, we try to predict two soundscape impressions, meaning pleasantness and eventfulness, using sound data collected by the cloud-sensing method. The collected sou… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

    Comments: Submitted APSIPA ASC 2022

  12. arXiv:2205.02832  [pdf, other

    cs.CL

    Entity Cloze By Date: What LMs Know About Unseen Entities

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, Greg Durrett

    Abstract: Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. However, in a dynamic world, new entities constantly arise. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedi… ▽ More

    Submitted 5 May, 2022; originally announced May 2022.

    Comments: NAACL 2022 Findings

  13. arXiv:2204.11278  [pdf, ps, other

    eess.SP cs.IT stat.ML

    Unsupervised Learning Discriminative MIG Detectors in Nonhomogeneous Clutter

    Authors: Xiaoqiang Hua, Yusuke Ono, Linyu Peng, Yuting Xu

    Abstract: Principal component analysis (PCA) is a commonly used pattern analysis method that maps high-dimensional data into a lower-dimensional space maximizing the data variance, that results in the promotion of separability of data. Inspired by the principle of PCA, a novel type of learning discriminative matrix information geometry (MIG) detectors in the unsupervised scenario are developed, and applied… ▽ More

    Submitted 8 May, 2022; v1 submitted 24 April, 2022; originally announced April 2022.

    Comments: 14 pages, 6 figures

    Journal ref: IEEE Transactions on Communications 70, 4107-4120, 2022

  14. arXiv:2112.06888  [pdf, other

    cs.CL cs.CV cs.LG

    Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh

    Abstract: Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, w… ▽ More

    Submitted 13 December, 2021; originally announced December 2021.

    Journal ref: Proceedings of the 1st International Workshop on Multimodal Understanding for the Web and Social Media, co-located with the Web Conference 2022 (WWW '22 Companion), April 25--29, 2022, Virtual Event, Lyon, France

  15. arXiv:2110.07837  [pdf, other

    cs.CL cs.LG

    Cross-Lingual Fine-Grained Entity Typing

    Authors: Nila Selvaraj, Yasumasa Onoe, Greg Durrett

    Abstract: The growth of cross-lingual pre-trained models has enabled NLP tools to rapidly generalize to new languages. While these models have been applied to tasks involving entities, their ability to explicitly predict typological features of these entities across languages has not been established. In this paper, we present a unified cross-lingual fine-grained entity typing model capable of handling over… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  16. arXiv:2109.01653  [pdf, other

    cs.CL cs.AI

    CREAK: A Dataset for Commonsense Reasoning over Entity Knowledge

    Authors: Yasumasa Onoe, Michael J. Q. Zhang, Eunsol Choi, Greg Durrett

    Abstract: Most benchmark datasets targeting commonsense reasoning focus on everyday scenarios: physical knowledge like knowing that you could fill a cup under a waterfall [Talmor et al., 2019], social knowledge like bumping into someone is awkward [Sap et al., 2019], and other generic situations. However, there is a rich space of commonsense inferences anchored to knowledge about specific entities: for exam… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

  17. arXiv:2106.12037  [pdf

    cs.CV

    Listen to Your Favorite Melodies with img2Mxml, Producing MusicXML from Sheet Music Image by Measure-based Multimodal Deep Learning-driven Assembly

    Authors: Tomoyuki Shishido, Fehmiju Fati, Daisuke Tokushige, Yasuhiro Ono

    Abstract: Deep learning has recently been applied to optical music recognition (OMR). However, currently OMR processing from various sheet music images still lacks precision to be widely applicable. Here, we present an MMdA (Measure-based Multimodal deep learning (DL)-driven Assembly) method allowing for end-to-end OMR processing from various images including inclined photo images. Using this method, measur… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

    Comments: 19 pages, 7 figures

  18. arXiv:2106.09502  [pdf, other

    cs.CL cs.LG

    Biomedical Interpretable Entity Representations

    Authors: Diego Garcia-Olano, Yasumasa Onoe, Ioana Baldini, Joydeep Ghosh, Byron C. Wallace, Kush R. Varshney

    Abstract: Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important domains such as biomedicine. There has been recent work on general interpretable representation learning (Onoe and Durrett, 2020), but these domain-agnostic represent… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted into Findings of ACL-IJCNLP 2021

  19. arXiv:2101.00345  [pdf, other

    cs.CL cs.AI cs.LG

    Modeling Fine-Grained Entity Types with Box Embeddings

    Authors: Yasumasa Onoe, Michael Boratko, Andrew McCallum, Greg Durrett

    Abstract: Neural entity typing models typically represent fine-grained entity types as vectors in a high-dimensional space, but such spaces are not well-suited to modeling these types' complex interdependencies. We study the ability of box embeddings, which embed concepts as d-dimensional hyperrectangles, to capture hierarchies of types even when these relationships are not defined explicitly in the ontolog… ▽ More

    Submitted 3 June, 2021; v1 submitted 1 January, 2021; originally announced January 2021.

    Comments: ACL 2021

  20. Target Detection within Nonhomogeneous Clutter via Total Bregman Divergence-Based Matrix Information Geometry Detectors

    Authors: Xiaoqiang Hua, Yusuke Ono, Linyu Peng, Yongqiang Cheng, Hongqiang Wang

    Abstract: Information divergences are commonly used to measure the dissimilarity of two elements on a statistical manifold. Differentiable manifolds endowed with different divergences may possess different geometric properties, which can result in totally different performances in many practical applications. In this paper, we propose a total Bregman divergence-based matrix information geometry (TBD-MIG) de… ▽ More

    Submitted 7 August, 2021; v1 submitted 26 December, 2020; originally announced December 2020.

    Comments: 15 pages, 8 figures

    Journal ref: IEEE Transactions on Signal Processing, 69, 4326-4340, 2021

  21. arXiv:2005.00147  [pdf, other

    cs.CL cs.LG

    Interpretable Entity Representations through Large-Scale Typing

    Authors: Yasumasa Onoe, Greg Durrett

    Abstract: In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they require end-task fine-tuning and are fundamentally difficult to interpret. In this paper, we present an approach to creating entity representations that are human rea… ▽ More

    Submitted 12 October, 2020; v1 submitted 30 April, 2020; originally announced May 2020.

    Comments: Findings of EMNLP 2020

  22. arXiv:2003.12443  [pdf

    physics.med-ph cs.CV eess.IV q-bio.TO

    A Computer-Aided Diagnosis System Using Artificial Intelligence for Hip Fractures -Multi-Institutional Joint Development Research-

    Authors: Yoichi Sato, Yasuhiko Takegami, Takamune Asamoto, Yutaro Ono, Tsugeno Hidetoshi, Ryosuke Goto, Akira Kitamura, Seiwa Honda

    Abstract: [Objective] To develop a Computer-aided diagnosis (CAD) system for plane frontal hip X-rays with a deep learning model trained on a large dataset collected at multiple centers. [Materials and Methods]. We included 5295 cases with neck fracture or trochanteric fracture who were diagnosed and treated by orthopedic surgeons using plane X-rays or computed tomography (CT) or magnetic resonance imaging… ▽ More

    Submitted 20 May, 2020; v1 submitted 11 March, 2020; originally announced March 2020.

    Comments: 9 pages, 4 tables, 7 figures. / author's homepage : https://www.fracture-ai.org

    MSC Class: 68-T01

  23. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified

    Authors: Takuma Shibahara, Chisa Wada, Yasuho Yamashita, Kazuhiro Fujita, Masamichi Sato, Junichi Kuwata, Atsushi Okamoto, Yoshimasa Ono

    Abstract: Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic sub… ▽ More

    Submitted 18 July, 2022; v1 submitted 20 January, 2020; originally announced January 2020.

    Comments: 25 pages, 5 figures

  24. arXiv:1909.05780  [pdf, other

    cs.CL cs.AI cs.LG

    Fine-Grained Entity Typing for Domain Independent Entity Linking

    Authors: Yasumasa Onoe, Greg Durrett

    Abstract: Neural entity linking models are very powerful, but run the risk of overfitting to the domain they are trained in. For this problem, a domain is characterized not just by genre of text but even by factors as specific as the particular distribution of entities, as neural models tend to overfit by memorizing properties of frequent entities in a dataset. We tackle the problem of building robust entit… ▽ More

    Submitted 8 January, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

    Comments: AAAI 2020

  25. arXiv:1905.01566  [pdf, other

    cs.CL cs.AI

    Learning to Denoise Distantly-Labeled Data for Entity Typing

    Authors: Yasumasa Onoe, Greg Durrett

    Abstract: Distantly-labeled data can be used to scale up training of statistical models, but it is typically noisy and that noise can vary with the distant labeling technique. In this work, we propose a two-stage procedure for handling this type of data: denoise it with a learned model, then train our final model on clean and denoised distant data with standard supervised training. Our denoising approach co… ▽ More

    Submitted 4 May, 2019; originally announced May 2019.

    Comments: NAACL 2019

  26. arXiv:1805.09662  [pdf, other

    cs.CV

    LF-Net: Learning Local Features from Images

    Authors: Yuki Ono, Eduard Trulls, Pascal Fua, Kwang Moo Yi

    Abstract: We present a novel deep architecture and a training strategy to learn a local feature pipeline from scratch, using collections of images without the need for human supervision. To do so we exploit depth and relative camera pose cues to create a virtual target that the network should achieve on one image, provided the outputs of the network for the other image. While this process is inherently non-… ▽ More

    Submitted 22 November, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: NIPS 2018

  27. arXiv:1711.05971  [pdf, other

    cs.CV

    Learning to Find Good Correspondences

    Authors: Kwang Moo Yi, Eduard Trulls, Yuki Ono, Vincent Lepetit, Mathieu Salzmann, Pascal Fua

    Abstract: We develop a deep architecture to learn to find good correspondences for wide-baseline stereo. Given a set of putative sparse matches and the camera intrinsics, we train our network in an end-to-end fashion to label the correspondences as inliers or outliers, while simultaneously using them to recover the relative pose, as encoded by the essential matrix. Our architecture is based on a multi-layer… ▽ More

    Submitted 21 May, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: CVPR 2018 (Oral)

  28. arXiv:1101.3393  [pdf, ps, other

    physics.soc-ph cond-mat.stat-mech cs.SI

    Traffic properties for stochastic routings on scale-free networks

    Authors: Yukio Hayashi, Yasumasa Ono

    Abstract: For realistic scale-free networks, we investigate the traffic properties of stochastic routing inspired by a zero-range process known in statistical physics. By parameters $α$ and $δ$, this model controls degree-dependent hopping of packets and forwarding of packets with higher performance at more busy nodes. Through a theoretical analysis and numerical simulations, we derive the condition for the… ▽ More

    Submitted 18 January, 2011; originally announced January 2011.

    Comments: 12 pages, 10 figures, 6 tables

    Journal ref: IEICE Trans. on Communication, Vol.E94-B, No.5, pp.1311-1322, 2011