Skip to main content

Showing 1–18 of 18 results for author: Usuyama, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08002  [pdf, other

    cs.CL cs.CV

    Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

    Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang , et al. (1 additional authors not shown)

    Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More

    Submitted 10 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  2. arXiv:2401.07654  [pdf, other

    cs.CV

    Foundation Models for Biomedical Image Segmentation: A Survey

    Authors: Ho Hin Lee, Yu Gu, Theodore Zhao, Yanbo Xu, Jianwei Yang, Naoto Usuyama, Cliff Wong, Mu Wei, Bennett A. Landman, Yuankai Huo, Alberto Santamaria-Pang, Hoifung Poon

    Abstract: Recent advancements in biomedical image analysis have been significantly driven by the Segment Anything Model (SAM). This transformative technology, originally developed for general-purpose computer vision, has found rapid application in medical image processing. Within the last year, marked by over 100 publications, SAM has demonstrated its prowess in zero-shot learning adaptations for medical im… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 22 pages, 4 figures, 7 tables

  3. arXiv:2312.03558  [pdf, other

    cs.CV

    When an Image is Worth 1,024 x 1,024 Words: A Case Study in Computational Pathology

    Authors: Wenhui Wang, Shuming Ma, Hanwen Xu, Naoto Usuyama, Jiayu Ding, Hoifung Poon, Furu Wei

    Abstract: This technical report presents LongViT, a vision Transformer that can process gigapixel images in an end-to-end manner. Specifically, we split the gigapixel image into a sequence of millions of patches and project them linearly into embeddings. LongNet is then employed to model the extremely long sequence, generating representations that capture both short-range and long-range dependencies. The li… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  4. arXiv:2311.16452  [pdf, other

    cs.CL

    Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine

    Authors: Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, Eric Horvitz

    Abstract: Generalist foundation models such as GPT-4 have displayed surprising capabilities in a wide variety of domains and tasks. Yet, there is a prevalent assumption that they cannot match specialist capabilities of fine-tuned models. For example, most explorations to date on medical competency benchmarks have leveraged domain-specific training, as exemplified by efforts on BioGPT and Med-PaLM. We build… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 21 pages, 7 figures

    ACM Class: I.2.7

  5. arXiv:2310.14573  [pdf, other

    cs.CL

    Exploring the Boundaries of GPT-4 in Radiology

    Authors: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Pérez-García, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Matthew P. Lungren, Ozan Oktay, Javier Alvarez-Valle

    Abstract: The recent success of general-domain large language models (LLMs) has significantly changed the natural language processing paradigm towards a unified foundation model across domains and applications. In this paper, we focus on assessing the performance of GPT-4, the most capable LLM so far, on the text-based applications for radiology reports, comparing against state-of-the-art (SOTA) radiology-s… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 main

  6. arXiv:2310.10765  [pdf, other

    cs.CV cs.AI cs.CL

    BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys

    Authors: Yu Gu, Jianwei Yang, Naoto Usuyama, Chunyuan Li, Sheng Zhang, Matthew P. Lungren, Jianfeng Gao, Hoifung Poon

    Abstract: Rapid progress has been made in instruction-learning for image editing with natural-language instruction, as exemplified by InstructPix2Pix. In biomedicine, such methods can be applied to counterfactual image generation, which helps differentiate causal structure from spurious correlation and facilitate robust image interpretation for disease progression modeling. However, generic image-editing mo… ▽ More

    Submitted 20 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Project page & demo: https://aka.ms/biomedjourney

  7. arXiv:2308.02180  [pdf, other

    cs.CL cs.LG

    Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology

    Authors: Cliff Wong, Sheng Zhang, Yu Gu, Christine Moung, Jacob Abel, Naoto Usuyama, Roshanthi Weerasinghe, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon

    Abstract: Clinical trial matching is a key process in health delivery and discovery. In practice, it is plagued by overwhelming unstructured data and unscalable manual processing. In this paper, we conduct a systematic study on scaling clinical trial matching using large language models (LLMs), with oncology as the focus area. Our study is grounded in a clinical trial matching system currently in test deplo… ▽ More

    Submitted 18 August, 2023; v1 submitted 4 August, 2023; originally announced August 2023.

    Comments: 24 pages, 5 figures, accepted at Machine Learning for Healthcare (MLHC) 2023

  8. arXiv:2307.06439  [pdf, other

    cs.CL cs.AI

    Distilling Large Language Models for Biomedical Knowledge Extraction: A Case Study on Adverse Drug Events

    Authors: Yu Gu, Sheng Zhang, Naoto Usuyama, Yonas Woldesenbet, Cliff Wong, Praneeth Sanapathi, Mu Wei, Naveen Valluri, Erika Strandberg, Tristan Naumann, Hoifung Poon

    Abstract: Large language models (LLMs), such as GPT-4, have demonstrated remarkable capabilities across a wide range of tasks, including health applications. In this paper, we study how LLMs can be used to scale biomedical knowledge curation. We find that while LLMs already possess decent competency in structuring biomedical text, by distillation into a task-specific student model through self-supervised le… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  9. arXiv:2306.00890  [pdf, other

    cs.CV cs.CL

    LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

    Authors: Chunyuan Li, Cliff Wong, Sheng Zhang, Naoto Usuyama, Haotian Liu, Jianwei Yang, Tristan Naumann, Hoifung Poon, Jianfeng Gao

    Abstract: Conversational generative AI has demonstrated remarkable promise for empowering biomedical practitioners, but current investigations focus on unimodal text. Multimodal conversational AI has seen rapid progress by leveraging billions of image-text pairs from the public web, but such general-domain vision-language models still lack sophistication in understanding and conversing about biomedical imag… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 17 pages; Website: https://aka.ms/llava-med

  10. arXiv:2303.13386  [pdf, other

    cs.CL cs.LG

    Compositional Zero-Shot Domain Transfer with Text-to-Text Models

    Authors: Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Sheng Zhang, Tristan Naumann, Aditya Nori, Hoifung Poon, Javier Alvarez-Valle, Ozan Oktay, Stephanie L. Hyland

    Abstract: Label scarcity is a bottleneck for improving task performance in specialised domains. We propose a novel compositional transfer learning framework (DoT5 - domain compositional zero-shot T5) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from MLM of unlabelled in-domain free text) and task knowledge (from task training on more readily availa… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted at TACL, pre-MIT Press publication version. 16 pages, 4 figures

  11. arXiv:2303.00915  [pdf, other

    cs.CV cs.CL

    BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

    Authors: Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Naveen Valluri, Cliff Wong, Andrea Tupini, Yu Wang, Matt Mazzola, Swadheen Shukla, Lars Liden, Jianfeng Gao, Matthew P. Lungren, Tristan Naumann, Sheng Wang, Hoifung Poon

    Abstract: Biomedical data is inherently multimodal, comprising physical measurements and natural language narratives. A generalist biomedical AI model needs to simultaneously process different modalities of data, including text and images. Therefore, training an effective generalist biomedical model requires high-quality multimodal data, such as parallel image-text pairs. Here, we present PMC-15M, a novel d… ▽ More

    Submitted 16 January, 2024; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: The models are released at https://aka.ms/biomedclip

  12. Making the Most of Text Semantics to Improve Biomedical Vision--Language Processing

    Authors: Benedikt Boecking, Naoto Usuyama, Shruthi Bannur, Daniel C. Castro, Anton Schwaighofer, Stephanie Hyland, Maria Wetscherek, Tristan Naumann, Aditya Nori, Javier Alvarez-Valle, Hoifung Poon, Ozan Oktay

    Abstract: Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical research. Biomedical text with its complex semantics poses additional challenges in vision--language modelling compared to the general domain, and previous work has used insufficiently adapted models that lack domain-speci… ▽ More

    Submitted 21 July, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: To appear in ECCV 2022. Code: https://aka.ms/biovil-code Dataset: https://aka.ms/ms-cxr Demo Notebook: https://aka.ms/biovil-demo-notebook

    Journal ref: Computer Vision - ECCV 2022, LNCS vol 13696, pp 1-21

  13. arXiv:2203.10442  [pdf, other

    cs.CL cs.LG

    Towards Structuring Real-World Data at Scale: Deep Learning for Extracting Key Oncology Information from Clinical Text with Patient-Level Supervision

    Authors: Sam Preston, Mu Wei, Rajesh Rao, Robert Tinn, Naoto Usuyama, Michael Lucas, Roshanthi Weerasinghe, Soohee Lee, Brian Piening, Paul Tittel, Naveen Valluri, Tristan Naumann, Carlo Bifulco, Hoifung Poon

    Abstract: Objective: The majority of detailed patient information in real-world data (RWD) is only consistently available in free-text clinical documents. Manual curation is expensive and time-consuming. Developing natural language processing (NLP) methods for structuring RWD is thus essential for scaling real-world evidence generation. Materials and Methods: Traditional rule-based systems are vulnerable… ▽ More

    Submitted 19 March, 2022; originally announced March 2022.

  14. arXiv:2112.07869  [pdf, other

    cs.CL cs.LG

    Fine-Tuning Large Neural Language Models for Biomedical Natural Language Processing

    Authors: Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

    Abstract: Motivation: A perennial challenge for biomedical researchers and clinical practitioners is to stay abreast with the rapid growth of publications and medical notes. Natural language processing (NLP) has emerged as a promising direction for taming information overload. In particular, large neural language models facilitate transfer learning by pretraining on unlabeled text, as exemplified by the suc… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  15. arXiv:2109.05362  [pdf, other

    cs.CL

    Modular Self-Supervision for Document-Level Relation Extraction

    Authors: Sheng Zhang, Cliff Wong, Naoto Usuyama, Sarthak Jain, Tristan Naumann, Hoifung Poon

    Abstract: Extracting relations across large text spans has been relatively underexplored in NLP, but it is particularly important for high-value domains such as biomedicine, where obtaining high recall of the latest findings is crucial for practical applications. Compared to conventional information extraction confined to short text spans, document-level relation extraction faces additional challenges in bo… ▽ More

    Submitted 11 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021

  16. arXiv:2106.13375  [pdf, other

    cs.IR cs.CL cs.DL

    Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

    Authors: Yu Wang, Jinchao Li, Tristan Naumann, Chenyan Xiong, Hao Cheng, Robert Tinn, Cliff Wong, Naoto Usuyama, Richard Rogahn, Zhihong Shen, Yang Qin, Eric Horvitz, Paul N. Bennett, Jianfeng Gao, Hoifung Poon

    Abstract: Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vert… ▽ More

    Submitted 16 September, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2021 Applied Data Science Track

  17. arXiv:2007.15779  [pdf, other

    cs.CL cs.LG

    Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

    Authors: Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

    Abstract: Pretraining large neural language models, such as BERT, has led to impressive gains on many natural language processing (NLP) tasks. However, most pretraining efforts focus on general domain corpora, such as newswire and Web. A prevailing assumption is that even domain-specific pretraining can benefit by starting from general-domain language models. In this paper, we challenge this assumption by s… ▽ More

    Submitted 16 September, 2021; v1 submitted 30 July, 2020; originally announced July 2020.

    Comments: ACM Transactions on Computing for Healthcare (HEALTH)

  18. arXiv:2005.14288  [pdf, other

    cs.CV cs.IR

    ePillID Dataset: A Low-Shot Fine-Grained Benchmark for Pill Identification

    Authors: Naoto Usuyama, Natalia Larios Delgado, Amanda K. Hall, Jessica Lundin

    Abstract: Identifying prescription medications is a frequent task for patients and medical professionals; however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the risk of medication errors. In this paper, we introduce ePillID, the largest public benchmark on pill image recognition, composed of 13k images representing 9804 appearance classes (t… ▽ More

    Submitted 7 September, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: CVPR 2020 VL3. Project Page: https://github.com/usuyama/ePillID-benchmark