Skip to main content

Showing 1–50 of 196 results for author: Bao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.17837  [pdf, other

    cs.CV cs.HC

    Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs

    Authors: Yiming Bao, Xu Zhao, Dahong Qian

    Abstract: Temporal 3D human pose estimation from monocular videos is a challenging task in human-centered computer vision due to the depth ambiguity of 2D-to-3D lifting. To improve accuracy and address occlusion issues, inertial sensor has been introduced to provide complementary source of information. However, it remains challenging to integrate heterogeneous sensor data for producing physically rational 3… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, Under Review

  2. arXiv:2404.17582  [pdf, other

    cs.HC cs.LG stat.AP

    Data Quality in Crowdsourcing and Spamming Behavior Detection

    Authors: Yang Ba, Michelle V. Mancenido, Erin K. Chiou, Rong Pan

    Abstract: As crowdsourcing emerges as an efficient and cost-effective method for obtaining labels for machine learning datasets, it is important to assess the quality of crowd-provided data, so as to improve analysis performance and reduce biases in subsequent machine learning tasks. Given the lack of ground truth in most cases of crowdsourcing, we refer to data quality as annotators' consistency and credib… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Preprint paper, under review on Behavior Research Methods. 45 pages, 10 figures

  3. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  4. arXiv:2404.11929  [pdf, other

    eess.IV cs.AI cs.CV

    A Symmetric Regressor for MRI-Based Assessment of Striatal Dopamine Transporter Uptake in Parkinson's Disease

    Authors: Walid Abdullah Al, Il Dong Yun, Yun Jung Bae

    Abstract: Dopamine transporter (DAT) imaging is commonly used for monitoring Parkinson's disease (PD), where striatal DAT uptake amount is computed to assess PD severity. However, DAT imaging has a high cost and the risk of radiance exposure and is not available in general clinics. Recently, MRI patch of the nigral region has been proposed as a safer and easier alternative. This paper proposes a symmetric r… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  5. arXiv:2404.08217  [pdf, other

    cs.PL

    Escape with Your Self: Expressive Reachability Types with Sound and Decidable Bidirectional Type Checking

    Authors: Songlin Jia, Guannan Wei, Siyuan He, Yueyang Tang, Yuyan Bao, Tiark Rompf

    Abstract: Despite Rust's success in systems programming, its "shared XOR mutable" principle significantly restricts how mutable values can be used, precluding many useful functional programming idioms. Reachability types are a recent proposal to address the key limitations of Rust-style "shared XOR mutable" approaches by tracking lifetimes and reachability of shared, escaping, and mutable data, even in the… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  6. arXiv:2403.20134  [pdf, other

    cs.CL

    User Modeling Challenges in Interactive AI Assistant Systems

    Authors: Megan Su, Yuwei Bao

    Abstract: Interactive Artificial Intelligent(AI) assistant systems are designed to offer timely guidance to help human users to complete a variety tasks. One of the remaining challenges is to understand user's mental states during the task for more personalized guidance. In this work, we analyze users' mental states during task executions and investigate the capabilities and challenges for large language mo… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  7. arXiv:2403.14874  [pdf, other

    cs.CV cs.LG

    WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

    Authors: Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Nathan Wei, Matthew Waliman, Yunhao Ba, Celso de Melo, Alex Wong, Achuta Kadambi

    Abstract: We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first… ▽ More

    Submitted 7 May, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2312.09534

  8. arXiv:2403.14541  [pdf, other

    cs.CL

    EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling

    Authors: Shimao Zhang, Yu Bao, Shujian Huang

    Abstract: Recently, Large Language Models (LLMs) have demonstrated outstanding performance across a wide range of downstream language tasks. Temperature sampling is a commonly used decoding strategy for LLMs' generation process. However, a fixed temperature parameter is used in most cases, which may not always be an optimal choice for balancing generation quality and diversity. In this paper, we propose an… ▽ More

    Submitted 3 April, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

  9. arXiv:2403.13829  [pdf, other

    q-bio.BM cs.LG

    DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

    Authors: Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu

    Abstract: Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024

  10. arXiv:2403.12327  [pdf, other

    cs.CV cs.LG

    GT-Rain Single Image Deraining Challenge Report

    Authors: Howard Zhang, Yunhao Ba, Ethan Yang, Rishi Upadhyay, Alex Wong, Achuta Kadambi, Yun Guo, Xueyao Xiao, Xiaoxiong Wang, Yi Li, Yi Chang, Luxin Yan, Chaochao Zheng, Luping Wang, Bin Liu, Sunder Ali Khowaja, Jiseok Yoon, Ik-Hyun Lee, Zhao Zhang, Yanyan Wei, Jiahuan Ren, Suiyi Zhao, Huan Zheng

    Abstract: This report reviews the results of the GT-Rain challenge on single image deraining at the UG2+ workshop at CVPR 2023. The aim of this competition is to study the rainy weather phenomenon in real world scenarios, provide a novel real world rainy image dataset, and to spark innovative ideas that will further the development of single image deraining methods on real images. Submissions were trained o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  11. arXiv:2403.09199  [pdf, other

    cs.CV cs.AI

    Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation

    Authors: Hyung-Il Kim, Kimin Yun, Jun-Seok Yun, Yuseok Bae

    Abstract: Recently, foundation models trained on massive datasets to adapt to a wide range of domains have attracted considerable attention and are actively being explored within the computer vision community. Among these, the Segment Anything Model (SAM) stands out for its remarkable progress in generalizability and flexibility for image segmentation tasks, achieved through prompt-based object mask generat… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 10 figures

  12. arXiv:2403.09192  [pdf, other

    cs.CV

    PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation

    Authors: Yizhe Xiong, Hui Chen, Tianxiang Hao, Zijia Lin, Jungong Han, Yuesong Zhang, Guoxin Wang, Yongjun Bao, Guiguang Ding

    Abstract: Recently, the scale of transformers has grown rapidly, which introduces considerable challenges in terms of training overhead and inference efficiency in the scope of task adaptation. Existing works, namely Parameter-Efficient Fine-Tuning (PEFT) and model compression, have separately investigated the challenges. However, PEFT cannot guarantee the inference efficiency of the original backbone, espe… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 15 pages, 5 figures, Under review

  13. arXiv:2403.07902  [pdf, other

    q-bio.BM cs.LG

    DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

    Authors: Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu

    Abstract: Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the… ▽ More

    Submitted 26 February, 2024; originally announced March 2024.

    Comments: Accepted to ICML 2023

  14. arXiv:2403.07728  [pdf, other

    stat.ML cs.LG stat.ME

    CAP: A General Algorithm for Online Selective Conformal Prediction with FCR Control

    Authors: Yajie Bao, Yuyang Huo, Haojie Ren, Changliang Zou

    Abstract: We study the problem of post-selection predictive inference in an online fashion. To avoid devoting resources to unimportant units, a preliminary selection of the current individual before reporting its prediction interval is common and meaningful in online predictive tasks. Since the online selection causes a temporal multiplicity in the selected prediction intervals, it is important to control t… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  15. arXiv:2403.06443  [pdf, other

    cs.CV

    Temporal-Mapping Photography for Event Cameras

    Authors: Yuhan Bao, Lei Sun, Yuqin Ma, Kaiwei Wang

    Abstract: Event cameras, or Dynamic Vision Sensors (DVS) are novel neuromorphic sensors that capture brightness changes as a continuous stream of ``events'' rather than traditional intensity frames. Converting sparse events to dense intensity frames faithfully has long been an ill-posed problem. Previous methods have primarily focused on converting events to video in dynamic scenes or with a moving camera.… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 17 pages, 10 figures

  16. arXiv:2403.03698  [pdf, other

    cs.LG cs.AI cs.DB

    Towards Controllable Time Series Generation

    Authors: Yifan Bao, Yihao Ang, Qiang Huang, Anthony K. H. Tung, Zhiyong Huang

    Abstract: Time Series Generation (TSG) has emerged as a pivotal technique in synthesizing data that accurately mirrors real-world time series, becoming indispensable in numerous applications. Despite significant advancements in TSG, its efficacy frequently hinges on having large training datasets. This dependency presents a substantial challenge in data-scarce scenarios, especially when dealing with rare or… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 14 pages, 13 figures, and 5 tables

  17. arXiv:2403.01549  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Meta Comprehensive Regularization

    Authors: Huijie Guo, Ying Ba, Jie Hu, Lingyu Si, Wenwen Qiang, Lei Shi

    Abstract: Self-Supervised Learning (SSL) methods harness the concept of semantic invariance by utilizing data augmentation strategies to produce similar representations for different deformations of the same input. Essentially, the model captures the shared information among multiple augmented views of samples, while disregarding the non-shared information that may be beneficial for downstream tasks. To add… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  18. arXiv:2402.18583  [pdf, other

    q-bio.BM cs.LG

    Binding-Adaptive Diffusion Models for Structure-Based Drug Design

    Authors: Zhilin Huang, Ling Yang, Zaixi Zhang, Xiangxin Zhou, Yu Bao, Xiawu Zheng, Yuwei Yang, Yu Wang, Wenming Yang

    Abstract: Structure-based drug design (SBDD) aims to generate 3D ligand molecules that bind to specific protein targets. Existing 3D deep generative models including diffusion models have shown great promise for SBDD. However, it is complex to capture the essential protein-ligand interactions exactly in 3D space for molecular generation. To address this problem, we propose a novel framework, namely Binding-… ▽ More

    Submitted 14 January, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024. Project: https://github.com/YangLing0818/BindDM

  19. arXiv:2402.15678  [pdf, other

    cs.DC

    Minions: Accelerating Large Language Model Inference with Adaptive and Collective Speculative Decoding

    Authors: Siqi Wang, Hailong Yang, Xuezhu Wang, Tongxuan Liu, Pengbo Wang, Xuning Liang, Kejie Ma, Tianyu Feng, Xin You, Yongjun Bao, Yi Liu, Zhongzhi Luan, Depei Qian

    Abstract: Large language models (LLM) have recently attracted surging interest due to their outstanding capabilities across various domains. However, enabling efficient LLM inference is challenging due to its autoregressive decoding that generates tokens only one at a time. Although research works apply pruning or quantization to speed up LLM inference, they typically require fine-tuning the LLM, incurring… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  20. arXiv:2402.03843  [pdf, other

    cs.CV cs.AI

    A new method for optical steel rope non-destructive damage detection

    Authors: Yunqing Bao, Bin Hu

    Abstract: This paper presents a novel algorithm for non-destructive damage detection for steel ropes in high-altitude environments (aerial ropeway). The algorithm comprises two key components: First, a segmentation model named RGBD-UNet is designed to accurately extract steel ropes from complex backgrounds. This model is equipped with the capability to process and combine color and depth information through… ▽ More

    Submitted 20 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  21. arXiv:2402.01929  [pdf, other

    cs.LG stat.ML

    Sample, estimate, aggregate: A recipe for causal discovery foundation models

    Authors: Menghua Wu, Yujia Bao, Regina Barzilay, Tommi Jaakkola

    Abstract: Causal discovery, the task of inferring causal structure from data, promises to accelerate scientific research, inform policy making, and more. However, the per-dataset nature of existing causal discovery algorithms renders them slow, data hungry, and brittle. Inspired by foundation models, we propose a causal discovery framework where a deep learning model is pretrained to resolve predictions fro… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Preprint. Under review

  22. arXiv:2402.01338  [pdf, other

    cond-mat.stat-mech cond-mat.soft cs.LG physics.bio-ph

    Inferring the Langevin Equation with Uncertainty via Bayesian Neural Networks

    Authors: Youngkyoung Bae, Seungwoong Ha, Hawoong Jeong

    Abstract: Pervasive across diverse domains, stochastic systems exhibit fluctuations in processes ranging from molecular dynamics to climate phenomena. The Langevin equation has served as a common mathematical model for studying such systems, enabling predictions of their temporal evolution and analyses of thermodynamic quantities, including absorbed heat, work done on the system, and entropy production. How… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 30 pages, 17 figures

  23. arXiv:2401.13850  [pdf, other

    cs.CY

    PADTHAI-MM: A Principled Approach for Designing Trustable, Human-centered AI systems using the MAST Methodology

    Authors: Nayoung Kim, Myke C. Cohen, Yang Ba, Anna Pan, Shawaiz Bhatti, Pouria Salehi, James Sung, Erik Blasch, Michelle V. Mancenido, Erin K. Chiou

    Abstract: Designing for AI trustworthiness is challenging, with a lack of practical guidance despite extensive literature on trust. The Multisource AI Scorecard Table (MAST), a checklist rating system, addresses this gap in designing and evaluating AI-enabled decision support systems. We propose the Principled Approach for Designing Trustable Human-centered AI systems using MAST Methodology (PADTHAI-MM), a… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  24. Exploring consumers response to text-based chatbots in e-commerce: The moderating role of task complexity and chatbot disclosure

    Authors: Xusen Cheng, Ying Bao, Alex Zarifis, Wankun Gong, Jian Mou

    Abstract: Artificial intelligence based chatbots have brought unprecedented business potential. This study aims to explore consumers trust and response to a text-based chatbot in ecommerce, involving the moderating effects of task complexity and chatbot identity disclosure. A survey method with 299 useable responses was conducted in this research. This study adopted the ordinary least squares regression to… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

    Comments: Internet Research (2021)

  25. arXiv:2401.11181  [pdf, other

    cs.DC

    Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads

    Authors: Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

    Abstract: Transformer-based large language model (LLM) inference serving is now the backbone of many cloud services. LLM inference consists of a prefill phase and a decode phase. However, existing LLM deployment practices often overlook the distinct characteristics of these phases, leading to significant interference. To mitigate interference, our insight is to carefully schedule and group inference request… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  26. arXiv:2312.14478  [pdf, other

    cs.LG

    Federated Learning via Input-Output Collaborative Distillation

    Authors: Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann

    Abstract: Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-releva… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  27. arXiv:2312.12095  [pdf, other

    cs.MA

    Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning

    Authors: Yanwen Ba, Xuan Liu, Xinning Chen, Hao Wang, Yang Xu, Kenli Li, Shigeng Zhang

    Abstract: While decentralized training is attractive in multi-agent reinforcement learning (MARL) for its excellent scalability and robustness, its inherent coordination challenges in collaborative tasks result in numerous interactions for agents to learn good policies. To alleviate this problem, action advising methods make experienced agents share their knowledge about what to do, while less experienced a… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 13 pages, 19 figures, 6 tables, to be published in AAAI 2024

  28. arXiv:2312.09534  [pdf, other

    cs.CV

    WeatherProof: A Paired-Dataset Approach to Semantic Segmentation in Adverse Weather

    Authors: Blake Gella, Howard Zhang, Rishi Upadhyay, Tiffany Chang, Matthew Waliman, Yunhao Ba, Alex Wong, Achuta Kadambi

    Abstract: The introduction of large, foundational models to computer vision has led to drastically improved performance on the task of semantic segmentation. However, these existing methods exhibit a large performance drop when testing on images degraded by weather conditions such as rain, fog, or snow. We introduce a general paired-training method that can be applied to all current foundational model archi… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  29. arXiv:2312.00944  [pdf, other

    cs.CV cs.GR

    Enhancing Diffusion Models with 3D Perspective Geometry Constraints

    Authors: Rishi Upadhyay, Howard Zhang, Yunhao Ba, Ethan Yang, Blake Gella, Sicheng Jiang, Alex Wong, Achuta Kadambi

    Abstract: While perspective is a well-studied topic in art, it is generally taken for granted in images. However, for the recent wave of high-quality image synthesis methods such as latent diffusion models, perspective accuracy is not an explicit requirement. Since these methods are capable of outputting a wide gamut of possible images, it is difficult for these synthesized images to adhere to the principle… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: Project Webpage: http://visual.ee.ucla.edu/diffusionperspective.htm/

  30. arXiv:2311.18040  [pdf, other

    cs.CY

    Evaluating Trustworthiness of AI-Enabled Decision Support Systems: Validation of the Multisource AI Scorecard Table (MAST)

    Authors: Pouria Salehi, Yang Ba, Nayoung Kim, Ahmadreza Mosallanezhad, Anna Pan, Myke C. Cohen, Yixuan Wang, Jieqiong Zhao, Shawaiz Bhatti, James Sung, Erik Blasch, Michelle V. Mancenido, Erin K. Chiou

    Abstract: The Multisource AI Scorecard Table (MAST) is a checklist tool based on analytic tradecraft standards to inform the design and evaluation of trustworthy AI systems. In this study, we evaluate whether MAST is associated with people's trust perceptions in AI-enabled decision support systems (AI-DSSs). Evaluating trust in AI-DSSs poses challenges to researchers and practitioners. These challenges incl… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  31. arXiv:2311.14242  [pdf, other

    cs.CV

    RSB-Pose: Robust Short-Baseline Binocular 3D Human Pose Estimation with Occlusion Handling

    Authors: Xiaoyue Wan, Zhuo Chen, Yiming Bao, Xu Zhao

    Abstract: In the domain of 3D Human Pose Estimation, which finds widespread daily applications, the requirement for convenient acquisition equipment continues to grow. To satisfy this demand, we set our sights on a short-baseline binocular setting that offers both portability and a geometric measurement property that radically mitigates depth ambiguity. However, as the binocular baseline shortens, two serio… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures, currently under review at IEEE Transactions on Image Processing journal

  32. arXiv:2311.00738  [pdf, other

    cs.AI cs.HC

    Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

    Authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai

    Abstract: Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multi… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

    Comments: Accepted to EMNLP 2023 Findings

  33. arXiv:2311.00353  [pdf, other

    cs.CV

    LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

    Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan

    Abstract: Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  34. arXiv:2310.15517  [pdf, other

    cs.CL

    MarkQA: A large scale KBQA dataset with numerical reasoning

    Authors: Xiang Huang, Sitao Cheng, Yuheng Bao, Shanshan Huang, Yuzhong Qu

    Abstract: While question answering over knowledge bases (KBQA) has shown progress in addressing factoid questions, KBQA with numerical reasoning remains relatively unexplored. In this paper, we focus on the complex numerical reasoning in KBQA and propose a new task, NR-KBQA, which necessitates the ability to perform both multi-hop reasoning and numerical reasoning. We design a logic form in Python format ca… ▽ More

    Submitted 13 December, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 main conference. Code: https://github.com/cdhx/MarkQA Homepage: http://ws.nju.edu.cn/MarkQA

  35. arXiv:2310.14540  [pdf, other

    cs.CL cs.AI

    Evaluating Spatial Understanding of Large Language Models

    Authors: Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim

    Abstract: Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language nav… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to TMLR 2024. Our code and data are available at https://github.com/runopti/SpatialEvalLLM, https://huggingface.co/datasets/yyamada/SpatialEvalLLM

  36. arXiv:2309.16108  [pdf, other

    cs.CV cs.AI cs.LG

    Channel Vision Transformers: An Image Is Worth 1 x 16 x 16 Words

    Authors: Yujia Bao, Srinivasan Sivanandan, Theofanis Karaletsos

    Abstract: Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness… ▽ More

    Submitted 18 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

  37. GME: GPU-based Microarchitectural Extensions to Accelerate Homomorphic Encryption

    Authors: Kaustubh Shivdikar, Yuhui Bao, Rashmi Agrawal, Michael Shen, Gilbert Jonatan, Evelio Mora, Alexander Ingare, Neal Livesay, José L. Abellán, John Kim, Ajay Joshi, David Kaeli

    Abstract: Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without decrypting it. FHE has garnered significant attention over the past decade as it supports secure outsourcing of data processing to remote cloud services. Despite its promise of strong data privacy and security guarantees, FHE introduces a slowdown of up to five orders of magnitude as compared to the same computatio… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

  38. arXiv:2309.08118  [pdf, ps, other

    cs.PL

    Graph IRs for Impure Higher-Order Languages (Technical Report)

    Authors: Oliver Bračevac, Guannan Wei, Songlin Jia, Supun Abeysinghe, Yuxuan Jiang, Yuyan Bao, Tiark Rompf

    Abstract: This is a companion report for the OOPSLA 2023 paper of the same title, presenting a detailed end-to-end account of the $λ^*_{\mathsf{G}}$ graph IR, at a level of detail beyond a regular conference paper. Our first concern is adequacy and soundness of $λ^*_{\mathsf{G}}$, which we derive from a direct-style imperative functional language (a variant of Bao et al.'s $λ^*$-calculus with reachability t… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: arXiv admin note: text overlap with arXiv:2309.05885

  39. arXiv:2309.05885  [pdf, ps, other

    cs.PL

    Modeling Reachability Types with Logical Relations

    Authors: Yuyan Bao, Guannan Wei, Oliver Bračevac, Tiark Rompf

    Abstract: Reachability types are a recent proposal to bring Rust-style reasoning about memory properties to higher-level languages. While key type soundness results for reachability types have been established using syntactic techniques in prior work, stronger metatheoretic properties have so far been unexplored. This paper presents an alternative semantic model of reachability types using logical relations… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  40. arXiv:2309.03755  [pdf, other

    cs.LG cs.AI cs.DB

    TSGBench: Time Series Generation Benchmark

    Authors: Yihao Ang, Qiang Huang, Yifan Bao, Anthony K. H. Tung, Zhiyong Huang

    Abstract: Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized sy… ▽ More

    Submitted 7 December, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted and to appear in VLDB 2024

  41. arXiv:2309.02165  [pdf, other

    cs.CV

    PCFGaze: Physics-Consistent Feature for Appearance-based Gaze Estimation

    Authors: Yiwei Bao, Feng Lu

    Abstract: Although recent deep learning based gaze estimation approaches have achieved much improvement, we still know little about how gaze features are connected to the physics of gaze. In this paper, we try to answer this question by analyzing the gaze feature manifold. Our analysis revealed the insight that the geodesic distance between gaze features is consistent with the gaze differences between sampl… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  42. arXiv:2308.13897  [pdf, other

    cs.CV

    InsertNeRF: Instilling Generalizability into NeRF with HyperNet Modules

    Authors: Yanqi Bao, Tianyu Ding, Jing Huo, Wenbin Li, Yuxin Li, Yang Gao

    Abstract: Generalizing Neural Radiance Fields (NeRF) to new scenes is a significant challenge that existing approaches struggle to address without extensive modifications to vanilla NeRF framework. We introduce InsertNeRF, a method for INStilling gEneRalizabiliTy into NeRF. By utilizing multiple plug-and-play HyperNet modules, InsertNeRF dynamically tailors NeRF's weights to specific reference scenes, trans… ▽ More

    Submitted 24 March, 2024; v1 submitted 26 August, 2023; originally announced August 2023.

    Comments: This work was accepted at ICLR 2024

  43. arXiv:2308.12219  [pdf, other

    cs.CL cs.AI cs.LG

    Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning

    Authors: Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, Quanquan Gu

    Abstract: The recent surge of generative AI has been fueled by the generative power of diffusion probabilistic models and the scalable capabilities of large language models. Despite their potential, it remains elusive whether diffusion language models can solve general language tasks comparable to their autoregressive counterparts. This paper demonstrates that scaling diffusion models w.r.t. data, sizes, an… ▽ More

    Submitted 25 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: added references

  44. arXiv:2308.02908  [pdf, other

    cs.CV

    Where and How: Mitigating Confusion in Neural Radiance Fields from Sparse Inputs

    Authors: Yanqi Bao, Yuxin Li, Jing Huo, Tianyu Ding, Xinyue Liang, Wenbin Li, Yang Gao

    Abstract: Neural Radiance Fields from Sparse input} (NeRF-S) have shown great potential in synthesizing novel views with a limited number of observed viewpoints. However, due to the inherent limitations of sparse inputs and the gap between non-adjacent views, rendering results often suffer from over-fitting and foggy surfaces, a phenomenon we refer to as "CONFUSION" during volume rendering. In this paper, w… ▽ More

    Submitted 5 August, 2023; originally announced August 2023.

    Comments: Accepted In Proceedings of the 31st ACM International Conference on Multimedia (MM' 23)

  45. arXiv:2308.01857  [pdf, other

    cs.AR

    iEDA: An Open-Source Intelligent Physical Implementation Toolkit and Library

    Authors: Xingquan Li, Simin Tao, Zengrong Huang, Shijian Chen, Zhisheng Zeng, Liwei Ni, Zhipeng Huang, Chunan Zhuang, Hongxi Wu, Weiguo Li1, Xueyan Zhao, He Liu, Shuaiying Long, Wei He, Bojun Liu, Sifeng Gan, Zihao Yu, Tong Liu, Yuchi Miao, Zhiyuan Yan, Hao Wang, Jie Zhao, Yifan Li, Ruizhi Liu, Xiaoze Lin , et al. (31 additional authors not shown)

    Abstract: Open-source EDA shows promising potential in unleashing EDA innovation and lowering the cost of chip design. This paper presents an open-source EDA project, iEDA, aiming for building a basic infrastructure for EDA technology evolution and closing the industrial-academic gap in the EDA area. iEDA now covers the whole flow of physical design (including Floorplan, Placement, CTS, Routing, Timing Opti… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

  46. Spatio-Temporal Branching for Motion Prediction using Motion Increments

    Authors: Jiexin Wang, Yujie Zhou, Wenwen Qiang, Ying Ba, Bing Su, Ji-Rong Wen

    Abstract: Human motion prediction (HMP) has emerged as a popular research topic due to its diverse applications, but it remains a challenging task due to the stochastic and aperiodic nature of future poses. Traditional methods rely on hand-crafted features and machine learning techniques, which often struggle to model the complex dynamics of human motion. Recent deep learning-based methods have achieved suc… ▽ More

    Submitted 11 August, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

    Journal ref: ACM MM 2023

  47. arXiv:2307.13844  [pdf, other

    cs.PL

    Polymorphic Reachability Types: Tracking Freshness, Aliasing, and Separation in Higher-Order Generic Programs

    Authors: Guannan Wei, Oliver Bračevac, Songlin Jia, Yuyan Bao, Tiark Rompf

    Abstract: Reachability types are a recent proposal that has shown promise in scaling to higher-order but monomorphic settings, tracking aliasing and separation on top of a substrate inspired by separation logic. The prior $λ^*$ reachability type system qualifies types with sets of reachable variables and guarantees separation if two terms have disjoint qualifiers. However, naive extensions with type polymor… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

  48. arXiv:2307.02615  [pdf, other

    cs.CL cs.AI cs.LG

    Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition

    Authors: Yuwei Bao, Barrett Martin Lattimer, Joyce Chai

    Abstract: Human language acquisition is an efficient, supervised, and continual process. In this work, we took inspiration from how human babies acquire their first language, and developed a computational process for word acquisition through comparative learning. Motivated by cognitive findings, we generated a small dataset that enables the computation models to compare the similarities and differences of v… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Journal ref: ACL 2023

  49. arXiv:2306.12341  [pdf, other

    cs.CV

    Geometric Pooling: maintaining more useful information

    Authors: Hao Xu, Jia Liu, Yang Shen, Kenan Lou, Yanxia Bao, Ruihua Zhang, Shuyue Zhou, Hongsen Zhao, Shuai Wang

    Abstract: Graph Pooling technology plays an important role in graph node classification tasks. Sorting pooling technologies maintain large-value units for pooling graphs of varying sizes. However, by analyzing the statistical characteristic of activated units after pooling, we found that a large number of units dropped by sorting pooling are negative-value units that contain useful information and can contr… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: 6 pages, 4 figures

  50. arXiv:2306.07597  [pdf, other

    cs.CL

    Question Decomposition Tree for Answering Complex Questions over Knowledge Bases

    Authors: Xiang Huang, Sitao Cheng, Yiheng Shu, Yuheng Bao, Yuzhong Qu

    Abstract: Knowledge base question answering (KBQA) has attracted a lot of interest in recent years, especially for complex questions which require multiple facts to answer. Question decomposition is a promising way to answer complex questions. Existing decomposition methods split the question into sub-questions according to a single compositionality type, which is not sufficient for questions involving mult… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted by AAAI2023