Skip to main content

Showing 1–50 of 113 results for author: Jiao, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.11442  [pdf, other

    cs.CV

    Unifying 3D Vision-Language Understanding via Promptable Queries

    Authors: Ziyu Zhu, Zhuofan Zhang, Xiaojian Ma, Xuesong Niu, Yixin Chen, Baoxiong Jia, Zhidong Deng, Siyuan Huang, Qing Li

    Abstract: A unified model for 3D vision-language (3D-VL) understanding is expected to take various scene representations and perform a wide range of tasks in a 3D scene. However, a considerable gap exists between existing methods and such a unified model, due to the independent application of representation and insufficient exploration of 3D multi-task training. In this paper, we introduce PQ3D, a unified m… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Project page: https://pq3d.github.io

  2. arXiv:2404.19738  [pdf, other

    cs.HC

    DiaryHelper: Exploring the Use of an Automatic Contextual Information Recording Agent for Elicitation Diary Study

    Authors: Junze Li, Changyang He, Jiaxiong Hu, Boyang Jia, Alon Halevy, Xiaojuan Ma

    Abstract: Elicitation diary studies, a type of qualitative, longitudinal research method, involve participants to self-report aspects of events of interest at their occurrences as memory cues for providing details and insights during post-study interviews. However, due to time constraints and lack of motivation, participants' diary entries may be vague or incomplete, impairing their later recall. To address… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CHI 2024

  3. arXiv:2404.19021  [pdf

    cs.HC

    Enhancing Autonomous Vehicle Design and Testing: A Comprehensive Review of AR and VR Integration

    Authors: Emanuella Ejichukwu, Lauren Tong, Gadir Hazime, Bochen Jia

    Abstract: This comprehensive literature review explores the potential of Augmented Reality and Virtual Reality technologies to enhance the design and testing of autonomous vehicles. By analyzing existing research, the review aims to identify how AR and VR can be leveraged to improve various aspects of autonomous vehicle development, including: creating more realistic and comprehensive testing environments,… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.18989  [pdf

    cs.CY cs.HC

    Cyberbully and Online Harassment: Issues Associated with Digital Wellbeing

    Authors: Manasi Kulkarni, Siddhi Durve, Bochen Jia

    Abstract: As digital technology becomes increasingly embedded in daily life, its impact on social interactions has become a critical area of study, particularly concerning cyberbullying. This meta-analysis investigates the dual role of technology in cyberbullying both as a catalyst that can exacerbate the issue and as a potential solution. Cyberbullying, characterized by the use of digital platforms to hara… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 35 pages, 7 figures

    ACM Class: J.4

  5. arXiv:2404.18695  [pdf, other

    cs.CV

    Dual-Modal Prompting for Sketch-Based Image Retrieval

    Authors: Liying Gao, Bingliang Jiao, Peng Wang, Shizhou Zhang, Hanwang Zhang, Yanning Zhang

    Abstract: Sketch-based image retrieval (SBIR) associates hand-drawn sketches with their corresponding realistic images. In this study, we aim to tackle two major challenges of this task simultaneously: i) zero-shot, dealing with unseen categories, and ii) fine-grained, referring to intra-category instance-level retrieval. Our key innovation lies in the realization that solely addressing this cross-category… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  7. arXiv:2404.10220  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

    Authors: Peiyuan Zhi, Zhiyuan Zhang, Muzhi Han, Zeyu Zhang, Zhitian Li, Ziyuan Jiao, Baoxiong Jia, Siyuan Huang

    Abstract: Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. We present COME-robot, the first closed-loop framework utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. We meticulously construct a library of action primitives for robot exploration, navigation, a… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  8. arXiv:2404.09465  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

    Authors: Yandan Yang, Baoxiong Jia, Peiyuan Zhi, Siyuan Huang

    Abstract: With recent developments in Embodied Artificial Intelligence (EAI) research, there has been a growing demand for high-quality, large-scale interactive scene generation. While prior methods in scene synthesis have prioritized the naturalness and realism of the generated scenes, the physical plausibility and interactivity of scenes have been largely left unexplored. To address this disparity, we int… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024, 18 pages

  9. arXiv:2404.01576  [pdf, other

    cs.CV cs.HC

    Leveraging Digital Perceptual Technologies for Remote Perception and Analysis of Human Biomechanical Processes: A Contactless Approach for Workload and Joint Force Assessment

    Authors: Jesudara Omidokun, Darlington Egeonu, Bochen Jia, Liang Yang

    Abstract: This study presents an innovative computer vision framework designed to analyze human movements in industrial settings, aiming to enhance biomechanical analysis by integrating seamlessly with existing software. Through a combination of advanced imaging and modeling techniques, the framework allows for comprehensive scrutiny of human motion, providing valuable insights into kinematic patterns and k… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  10. arXiv:2403.20025  [pdf, ps, other

    cs.IT eess.SP

    Secure Full-Duplex Communication via Movable Antennas

    Authors: Jingze Ding, Zijian Zhou, Chenbo Wang, Wenyao Li, Lifeng Lin, Bingli Jiao

    Abstract: This paper investigates physical layer security (PLS) for a movable antenna (MA)-assisted full-duplex (FD) system. In this system, an FD base station (BS) with multiple MAs for transmission and reception provides services for an uplink (UL) user and a downlink (DL) user. Each user operates in half-duplex (HD) mode and is equipped with a single fixed-position antenna (FPA), in the presence of a sin… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: This paper has been submitted for possible publication

  11. arXiv:2403.18036  [pdf, other

    cs.CV

    Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

    Authors: Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang

    Abstract: Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges. These challenges stem primarily from (i) the absence of powerful generative models capable of jointly modeling natural language, 3D scenes, and human motion, and (ii) the generative models' intensive data requirements contrasted with the scarcit… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024; 16 pages

  12. arXiv:2403.01164  [pdf, other

    cs.PF cs.DC

    HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices

    Authors: Xuanlei Zhao, Bin Jia, Haotian Zhou, Ziming Liu, Shenggan Cheng, Yang You

    Abstract: In recent times, the emergence of Large Language Models (LLMs) has resulted in increasingly larger model size, posing challenges for inference on low-resource devices. Prior approaches have explored offloading to facilitate low-memory inference but often suffer from efficiency due to I/O bottlenecks. To achieve low-latency LLMs inference on resource-constrained devices, we introduce HeteGen, a nov… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: MLSys 2024

  13. arXiv:2401.17049  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna-Enabled Co-Frequency Co-Time Full-Duplex Wireless Communication

    Authors: Jingze Ding, Zijian Zhou, Wenyao Li, Chenbo Wang, Lifeng Lin, Bingli Jiao

    Abstract: Movable antenna (MA) provides an innovative way to arrange antennas that can contribute to improved signal quality and more effective interference management. This method is especially beneficial for co-frequency co-time full-duplex (CCFD) wireless communication, which struggles with self-interference (SI) that usually overpowers the desired incoming signals. By dynamically repositioning transmit/… ▽ More

    Submitted 7 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: This paper has been submitted to IEEE Wireless Communications Letters

  14. arXiv:2401.10652  [pdf, other

    cs.PF cs.DC cs.LG

    AutoChunk: Automated Activation Chunk for Memory-Efficient Long Sequence Inference

    Authors: Xuanlei Zhao, Shenggan Cheng, Guangyang Lu, Jiarui Fang, Haotian Zhou, Bin Jia, Ziming Liu, Yang You

    Abstract: Large deep learning models have achieved impressive performance across a range of applications. However, their large memory requirements, including parameter memory and activation memory, have become a significant challenge for their practical serving. While existing methods mainly address parameter memory, the importance of activation memory has been overlooked. Especially for long input sequence… ▽ More

    Submitted 2 March, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: ICLR 2024

  15. arXiv:2401.09340  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO

    SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

    Authors: Baoxiong Jia, Yixin Chen, Huangyue Yu, Yan Wang, Xuesong Niu, Tengyu Liu, Qing Li, Siyuan Huang

    Abstract: 3D vision-language grounding, which focuses on aligning language with the 3D physical environment, stands as a cornerstone in the development of embodied agents. In comparison to recent advancements in the 2D domain, grounding language in 3D scenes faces several significant challenges: (i) the inherent complexity of 3D scenes due to the diverse object configurations, their rich attributes, and int… ▽ More

    Submitted 6 March, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

    Comments: 21 pages

  16. arXiv:2312.15993  [pdf

    cs.AI cs.RO eess.SY

    Adaptive Kalman-based hybrid car following strategy using TD3 and CACC

    Authors: Yuqi Zheng, Ruidong Yan, Bin Jia, Rui Jiang, Adriana TAPUS, Xiaojing Chen, Shiteng Zheng, Ying Shang

    Abstract: In autonomous driving, the hybrid strategy of deep reinforcement learning and cooperative adaptive cruise control (CACC) can fully utilize the advantages of the two algorithms and significantly improve the performance of car following. However, it is challenging for the traditional hybrid strategy based on fixed coefficients to adapt to mixed traffic flow scenarios, which may decrease the performa… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

    Comments: 32pages,13figures

  17. arXiv:2311.12871  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    An Embodied Generalist Agent in 3D World

    Authors: Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

    Abstract: Leveraging massive knowledge from large language models (LLMs), recent machine learning models show notable successes in general-purpose task solving in diverse domains such as computer vision and robotics. However, several significant challenges remain: (i) most of these models rely on 2D images yet exhibit a limited capacity for 3D input; (ii) these models rarely explore the tasks inherently def… ▽ More

    Submitted 9 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: ICML 2024. The first four authors contribute equally. Project page: https://embodied-generalist.github.io

  18. arXiv:2311.00556  [pdf, other

    cs.CV

    ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

    Authors: Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu

    Abstract: The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of modern intelligent systems has led to notable progress in various domains. Consequently, we embarked on an investigation of intelligent monitoring systems as a means of tackling the issue of the reproducibility crisis. Specifically, we first curate a comprehensive multi… ▽ More

    Submitted 1 November, 2023; originally announced November 2023.

  19. arXiv:2309.08919  [pdf, other

    cs.CV

    Pixel Adapter: A Graph-Based Post-Processing Approach for Scene Text Image Super-Resolution

    Authors: Wenyu Zhang, Xin Deng, Baojun Jia, Xingtong Yu, Yifan Chen, jin Ma, Qing Ding, Xinming Zhang

    Abstract: Current Scene text image super-resolution approaches primarily focus on extracting robust features, acquiring text information, and complex training strategies to generate super-resolution images. However, the upsampling module, which is crucial in the process of converting low-resolution images to high-resolution ones, has received little attention in existing works. To address this issue, we pro… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

    Journal ref: ACM Multimedia 2023

  20. arXiv:2309.08457  [pdf, other

    cs.RO

    Sim-to-Real Brush Manipulation using Behavior Cloning and Reinforcement Learning

    Authors: Biao Jia, Dinesh Manocha

    Abstract: Developing proficient brush manipulation capabilities in real-world scenarios is a complex and challenging endeavor, with wide-ranging applications in fields such as art, robotics, and digital design. In this study, we introduce an approach designed to bridge the gap between simulated environments and real-world brush manipulation. Our framework leverages behavior cloning and reinforcement learnin… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  21. arXiv:2308.16149  [pdf, other

    cs.CL cs.AI cs.LG

    Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Authors: Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock , et al. (7 additional authors not shown)

    Abstract: We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-chat

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  22. arXiv:2308.14279  [pdf

    cs.DS

    Sampling unknown large networks restricted by low sampling rates

    Authors: Bo Jiao

    Abstract: Graph sampling plays an important role in data mining for large networks. Specifically, larger networks often correspond to lower sampling rates. Under the situation, traditional traversal-based samplings for large networks usually have an excessive preference for densely-connected network core nodes. Aim at this issue, this paper proposes a sampling method for unknown networks at low sampling rat… ▽ More

    Submitted 4 January, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: 25 pages,11 figures

  23. arXiv:2308.12729  [pdf, other

    cs.IR cs.LG

    Out of the Box Thinking: Improving Customer Lifetime Value Modelling via Expert Routing and Game Whale Detection

    Authors: Shijie Zhang, Xin Yan, Xuejiao Yang, Binfeng Jia, Shuangyang Wang

    Abstract: Customer lifetime value (LTV) prediction is essential for mobile game publishers trying to optimize the advertising investment for each user acquisition based on the estimated worth. In mobile games, deploying microtransactions is a simple yet effective monetization strategy, which attracts a tiny group of game whales who splurge on in-game purchases. The presence of such game whales may impede th… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  24. arXiv:2308.11181   

    cs.HC

    Temporal Interaction -- Bridging Time and Experience in Human-Computer Interaction

    Authors: Li He, Baixi Jiao, Yuxi Liu

    Abstract: Traditional static user interfaces (UI) have given way to dynamic systems that can intelligently adapt to and respond to users' changing needs. Temporal interaction is an emerging field in human-computer interaction (HCI), which refers to the study and design of UI that are capable of adapting and responding to the user's changing behavioral and emotional states. By comprehending and incorporating… ▽ More

    Submitted 15 November, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: The content is not particularly relevant to the research

  25. arXiv:2308.10441  [pdf, other

    cs.AI cs.CV

    X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

    Authors: Bo Dai, Linge Wang, Baoxiong Jia, Zeyu Zhang, Song-Chun Zhu, Chi Zhang, Yixin Zhu

    Abstract: Intuitive physics is pivotal for human understanding of the physical world, enabling prediction and interpretation of events even in infancy. Nonetheless, replicating this level of intuitive physics in artificial intelligence (AI) remains a formidable challenge. This study introduces X-VoE, a comprehensive benchmark dataset, to assess AI agents' grasp of intuitive physics. Built on the development… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: 19 pages, 16 figures, selected for an Oral presentation at ICCV 2023. Project link: https://pku.ai/publication/intuitive2023iccv/

  26. arXiv:2305.10036  [pdf, other

    cs.CL cs.CY

    Are You Copying My Model? Protecting the Copyright of Large Language Models for EaaS via Backdoor Watermark

    Authors: Wenjun Peng, Jingwei Yi, Fangzhao Wu, Shangxi Wu, Bin Zhu, Lingjuan Lyu, Binxing Jiao, Tong Xu, Guangzhong Sun, Xing Xie

    Abstract: Large language models (LLMs) have demonstrated powerful capabilities in both text understanding and generation. Companies have begun to offer Embedding as a Service (EaaS) based on these LLMs, which can benefit various natural language processing (NLP) tasks for customers. However, previous studies have shown that EaaS is vulnerable to model extraction attacks, which can cause significant losses f… ▽ More

    Submitted 2 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023

  27. arXiv:2305.01341  [pdf, other

    cs.IT eess.SP

    Next-Generation Full Duplex Networking System Empowered by Reconfigurable Intelligent Surfaces

    Authors: Yingyang Chen, Yuncong Li, Miaowen Wen, Duoying Zhang, Bingli Jiao, Zhiguo Ding, Theodoros A. Tsiftsis, H. Vincent Poor

    Abstract: Full duplex (FD) radio has attracted extensive attention due to its co-time and co-frequency transceiving capability. {However, the potential gain brought by FD radios is closely related to the management of self-interference (SI), which imposes high or even stringent requirements on SI cancellation (SIC) techniques. When the FD deployment evolves into next-generation mobile networking, the SI pro… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: 15 pages, 14 figures

  28. arXiv:2304.04487  [pdf, other

    cs.CL cs.AI

    Inference with Reference: Lossless Acceleration of Large Language Models

    Authors: Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei

    Abstract: We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios (e.g., retrieved documents). LLMA first selects a text span from the reference and copies its tokens t… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 9 pages

  29. arXiv:2304.04321  [pdf, other

    cs.AI cs.CL cs.CV cs.RO

    ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

    Authors: Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang

    Abstract: Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete (e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's abil… ▽ More

    Submitted 11 September, 2023; v1 submitted 9 April, 2023; originally announced April 2023.

    Comments: The first two authors contributed equally; 20 pages; 17 figures; project availalbe: https://arnold-benchmark.github.io/ ICCV 2023

  30. arXiv:2302.12095  [pdf, other

    cs.AI cs.CL cs.LG

    On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

    Authors: Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, Xing Xie

    Abstract: ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct… ▽ More

    Submitted 29 August, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: Highlighted paper at ICLR 2023 workshop on Trustworthy and Reliable Large-Scale Machine Learning Models; code is at: https://github.com/microsoft/robustlearn; more works: https://llm-eval.github.io/

  31. arXiv:2301.06015  [pdf, other

    cs.CV

    Diffusion-based Generation, Optimization, and Planning in 3D Scenes

    Authors: Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu, Wei Liang, Song-Chun Zhu

    Abstract: We introduce SceneDiffuser, a conditional generative model for 3D scene understanding. SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning. In contrast to prior works, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented. With an iterative sampling strategy, SceneDiffuser jointly formulates the scene-aware generation,… ▽ More

    Submitted 14 January, 2023; originally announced January 2023.

    Comments: 20 pages

  32. arXiv:2212.10192  [pdf, other

    cs.CL

    Adam: Dense Retrieval Distillation with Adaptive Dark Examples

    Authors: Chang Liu, Chongyang Tao, Xiubo Geng, Tao Shen, Dongyan Zhao, Can Xu, Binxing Jiao, Daxin Jiang

    Abstract: To improve the performance of the dual-encoder retriever, one effective approach is knowledge distillation from the cross-encoder ranker. Existing works construct the candidate passages following the supervised learning setting where a query is paired with a positive passage and a batch of negatives. However, through empirical observation, we find that even the hard negatives from advanced methods… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 9 pages, 2 figures

  33. arXiv:2212.03533  [pdf, other

    cs.CL cs.IR

    Text Embeddings by Weakly-Supervised Contrastive Pre-training

    Authors: Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

    Abstract: This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a wide range of tasks. The model is trained in a contrastive manner with weak supervision signals from our curated large-scale text pair dataset (called CCPairs). E5 can be readily used as a general-purpose embedding model for any tasks requiring a single-vector representation of texts such as retrieval, clu… ▽ More

    Submitted 22 February, 2024; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 17 pages, v2 fixes the SummEval numbers

  34. arXiv:2212.02398  [pdf, other

    cs.CV

    Generalizable Person Re-Identification via Viewpoint Alignment and Fusion

    Authors: Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Ruiqi Wu, Shizhou Zhang, Peng Wang, Yanning Zhang

    Abstract: In the current person Re-identification (ReID) methods, most domain generalization works focus on dealing with style differences between domains while largely ignoring unpredictable camera view change, which we identify as another major factor leading to a poor generalization of ReID methods. To tackle the viewpoint change, this work proposes to use a 3D dense pose estimation model and a texture m… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  35. arXiv:2211.15402  [pdf, other

    cs.CV cs.AI

    Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation

    Authors: Jiangyong Huang, William Yicheng Zhu, Baoxiong Jia, Zan Wang, Xiaojian Ma, Qing Li, Siyuan Huang

    Abstract: Current computer vision models, unlike the human visual system, cannot yet achieve general-purpose visual understanding. Existing efforts to create a general vision model are limited in the scope of assessed tasks and offer no overarching framework to perform them holistically. We present a new comprehensive benchmark, General-purpose Visual Understanding Evaluation (G-VUE), covering the full spec… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

  36. arXiv:2211.11275  [pdf, other

    eess.AS cs.AI cs.CL cs.CV cs.SD

    VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning

    Authors: Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu, Binxing Jiao, Jie Zhang, Lirong Dai, Daxin Jiang, Jinyu Li, Furu Wei

    Abstract: Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate different modal information and leverage different resources (e.g., visual-audio pairs, audio-text pairs, unlabeled speech, and unlabeled text) to facilitate speech rep… ▽ More

    Submitted 19 May, 2023; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: 11 pages, Accepted by IEEE Transactions on Multimedia

  37. arXiv:2210.08990  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Object-centric Learning with Query Optimization

    Authors: Baoxiong Jia, Yu Liu, Siyuan Huang

    Abstract: The ability to decompose complex natural scenes into meaningful object-centric abstractions lies at the core of human perception and reasoning. In the recent culmination of unsupervised object-centric learning, the Slot-Attention module has played an important role with its simple yet effective design and fostered many powerful variants. These methods, however, have been exceedingly difficult to t… ▽ More

    Submitted 10 February, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Published as a conference paper at ICLR 2023

  38. arXiv:2210.08809  [pdf, other

    cs.AI

    Effective and Efficient Query-aware Snippet Extraction for Web Search

    Authors: Jingwei Yi, Fangzhao Wu, Chuhan Wu, Xiaolong Huang, Binxing Jiao, Guangzhong Sun, Xing Xie

    Abstract: Query-aware webpage snippet extraction is widely used in search engines to help users better understand the content of the returned webpages before clicking. Although important, it is very rarely studied. In this paper, we propose an effective query-aware webpage snippet extraction method named DeepQSE, aiming to select a few sentences which can best summarize the webpage content in the context of… ▽ More

    Submitted 27 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted by EMNLP2022

  39. arXiv:2210.03929  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    EgoTaskQA: Understanding Human Tasks in Egocentric Videos

    Authors: Baoxiong Jia, Ting Lei, Song-Chun Zhu, Siyuan Huang

    Abstract: Understanding human tasks through video observations is an essential capability of intelligent agents. The challenges of such capability lie in the difficulty of generating a detailed understanding of situated actions, their effects on object states (i.e., state changes), and their causal dependencies. These challenges are further aggravated by the natural parallelism from multi-tasking and partia… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: Published at NeurIPS Track on Datasets and Benchmarks 2022

  40. arXiv:2209.04128  [pdf, other

    cs.RO

    Modelling Power Consumptions for Multi-rotor UAVs

    Authors: Hao Gong, Baoqi Huang, Bing Jia, Hansu Dai

    Abstract: Unmanned aerial vehicles (UAVs) have various advantages, but their practical applications are influenced by their limited energy. Therefore, it is important to manage their power consumption and also important to establish corresponding power consumption models. However, most of existing works either establish theoretical power consumption models for fixed-wing UAVs and single-rotor UAVs, or provi… ▽ More

    Submitted 9 September, 2022; originally announced September 2022.

  41. arXiv:2208.14754  [pdf, other

    cs.IR

    LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval

    Authors: Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang

    Abstract: In large-scale retrieval, the lexicon-weighting paradigm, learning weighted sparse representations in vocabulary space, has shown promising results with high quality and low latency. Despite it deeply exploiting the lexicon-representing capability of pre-trained language models, a crucial gap remains between language modeling and lexicon-weighting retrieval -- the former preferring certain or low-… ▽ More

    Submitted 4 June, 2023; v1 submitted 31 August, 2022; originally announced August 2022.

    Comments: Appeared at ICLR 2023

  42. arXiv:2208.13661  [pdf, other

    cs.CL

    LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

    Authors: Kai Zhang, Chongyang Tao, Tao Shen, Can Xu, Xiubo Geng, Binxing Jiao, Daxin Jiang

    Abstract: Retrieval models based on dense representations in semantic space have become an indispensable branch for first-stage retrieval. These retrievers benefit from surging advances in representation learning towards compressive global sequence-level embeddings. However, they are prone to overlook local salient phrases and entity mentions in texts, which usually play pivot roles in first-stage retrieval… ▽ More

    Submitted 2 March, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

    Comments: 14 pages, 6 tables, 4 figures. WWW 2023

  43. arXiv:2207.07869  [pdf, other

    cs.CV

    CA-SpaceNet: Counterfactual Analysis for 6D Pose Estimation in Space

    Authors: Shunli Wang, Shuaibing Wang, Bo Jiao, Dingkang Yang, Liuzhen Su, Peng Zhai, Chixiao Chen, Lihua Zhang

    Abstract: Reliable and stable 6D pose estimation of uncooperative space objects plays an essential role in on-orbit servicing and debris removal missions. Considering that the pose estimator is sensitive to background interference, this paper proposes a counterfactual analysis framework named CASpaceNet to complete robust 6D pose estimation of the spaceborne targets under complicated background. Specificall… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

    Comments: 8 pages, 6 figures, IROS-2022 conference paper

    ACM Class: I.4.9; I.2.9

  44. arXiv:2207.02578  [pdf, other

    cs.IR

    SimLM: Pre-training with Representation Bottleneck for Dense Passage Retrieval

    Authors: Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei

    Abstract: In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a simple yet effective pre-training method for dense passage retrieval. It employs a simple bottleneck architecture that learns to compress the passage information into a dense vector through self-supervised pre-training. We use a replaced language modeling objective, which is inspired by ELECTRA, to improve th… ▽ More

    Submitted 12 May, 2023; v1 submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to ACL 2023

  45. arXiv:2206.08063  [pdf, other

    cs.IR cs.CL

    Towards Robust Ranker for Text Retrieval

    Authors: Yucheng Zhou, Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Guodong Long, Binxing Jiao, Daxin Jiang

    Abstract: A ranker plays an indispensable role in the de facto 'retrieval & rerank' pipeline, but its training still lags behind -- learning from moderate negatives or/and serving as an auxiliary module for a retriever. In this work, we first identify two major barriers to a robust ranker, i.e., inherent label noises caused by a well-trained retriever and non-ideal negatives sampled for a high-capable ranke… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 11 pages of main content, 4 tables, 3 figures

  46. arXiv:2206.05895  [pdf, other

    cs.LG cs.CL

    Latent Diffusion Energy-Based Model for Interpretable Text Modeling

    Authors: Peiyu Yu, Sirui Xie, Xiaojian Ma, Baoxiong Jia, Bo Pang, Ruiqi Gao, Yixin Zhu, Song-Chun Zhu, Ying Nian Wu

    Abstract: Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data spa… ▽ More

    Submitted 4 October, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: ICML 2022

  47. arXiv:2206.00277  [pdf, other

    cs.LG cs.AI

    Task-Specific Expert Pruning for Sparse Mixture-of-Experts

    Authors: Tianyu Chen, Shaohan Huang, Yuan Xie, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, Furu Wei

    Abstract: The sparse Mixture-of-Experts (MoE) model is powerful for large-scale pre-training and has achieved promising results due to its model capacity. However, with trillions of parameters, MoE is hard to be deployed on cloud or mobile environment. The inference of MoE requires expert parallelism, which is not hardware-friendly and communication expensive. Especially for resource-limited downstream task… ▽ More

    Submitted 1 June, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: under review

  48. arXiv:2206.00216  [pdf, other

    cs.CR cs.CL

    THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

    Authors: Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, Furu Wei

    Abstract: As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e.g., search history, medical record, bank account). Privacy-preserving inference of transformer models is on the demand of cloud service users. To protect privacy, it is an attractive choice to compute only with ciphertext in homomorphic encrypt… ▽ More

    Submitted 1 June, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

    Comments: Findings of ACL 2022

  49. arXiv:2111.12990  [pdf, other

    cs.AI cs.CV cs.LG

    Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning

    Authors: Chi Zhang, Sirui Xie, Baoxiong Jia, Ying Nian Wu, Song-Chun Zhu, Yixin Zhu

    Abstract: Is intelligence realized by connectionist or classicist? While connectionist approaches have achieved superhuman performance, there has been growing evidence that such task-specific superiority is particularly fragile in systematic generalization. This observation lies in the central debate between connectionist and classicist, wherein the latter continually advocates an algebraic treatment in cog… ▽ More

    Submitted 20 July, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: ECCV 2022 paper. Supplementary: http://wellyzhang.github.io/attach/eccv22zhang_alans_supp.pdf Project: http://wellyzhang.github.io/project/alans.html

  50. arXiv:2108.09193  [pdf, other

    cs.CL

    Smart Bird: Learnable Sparse Attention for Efficient and Effective Transformer

    Authors: Chuhan Wu, Fangzhao Wu, Tao Qi, Binxing Jiao, Daxin Jiang, Yongfeng Huang, Xing Xie

    Abstract: Transformer has achieved great success in NLP. However, the quadratic complexity of the self-attention mechanism in Transformer makes it inefficient in handling long sequences. Many existing works explore to accelerate Transformers by computing sparse self-attention instead of a dense one, which usually attends to tokens at certain positions or randomly selected tokens. However, manually selected… ▽ More

    Submitted 2 September, 2021; v1 submitted 20 August, 2021; originally announced August 2021.