Skip to main content

Showing 1–50 of 1,075 results for author: Hu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.05155  [pdf, other

    cs.CE

    An efficient truncation scheme for Eulerian and total Lagrangian SPH methods

    Authors: Zhentong Wang, Chi Zhang, Oskar J. Haidn, Xiangyu Hu

    Abstract: In smoothed particle hydrodynamics (SPH) method, the particle-based approximations are implemented via kernel functions, and the evaluation of performance involves two key criteria: numerical accuracy and computational efficiency. In the SPH community, the Wendland kernel reigns as the prevailing choice due to its commendable accuracy and reasonable computational efficiency. Nevertheless, there ex… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 38 pages and 14 figures

  3. arXiv:2405.04861  [pdf, other

    cs.SE

    Insights into Deep Learning Refactoring: Bridging the Gap Between Practices and Expectations

    Authors: SiQi Wang, Xing Hu, Bei Wang, WenXin Yao, Xin Xia, XingYu Wang

    Abstract: With the rapid development of deep learning, the implementation of intricate algorithms and substantial data processing have become standard elements of deep learning projects. As a result, the code has become progressively complex as the software evolves, which is difficult to maintain and understand. Existing studies have investigated the impact of refactoring on software quality within traditio… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 24 pages, 18 figures

  4. arXiv:2405.03711  [pdf, other

    cs.LG cs.AI cs.NE eess.SY

    Guidance Design for Escape Flight Vehicle Using Evolution Strategy Enhanced Deep Reinforcement Learning

    Authors: Xiao Hu, Tianshu Wang, Min Gong, Shaoshi Yang

    Abstract: Guidance commands of flight vehicles are a series of data sets with fixed time intervals, thus guidance design constitutes a sequential decision problem and satisfies the basic conditions for using deep reinforcement learning (DRL). In this paper, we consider the scenario where the escape flight vehicle (EFV) generates guidance commands based on DRL and the pursuit flight vehicle (PFV) generates g… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages, 13 figures, accepted to appear on IEEE Access, Mar. 2024

    Journal ref: IEEE Access, vol. 12, pp. 48210-48222, Mar. 2024

  5. Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

    Authors: Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao, Yu Liu, Jianling Sun

    Abstract: Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Sp… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.03387  [pdf, ps, other

    cs.CL

    The high dimensional psychological profile and cultural bias of ChatGPT

    Authors: Hang Yuan, Zhongyue Che, Shao Li, Yue Zhang, Xiaomeng Hu, Siyang Luo

    Abstract: Given the rapid advancement of large-scale language models, artificial intelligence (AI) models, like ChatGPT, are playing an increasingly prominent role in human society. However, to ensure that artificial intelligence models benefit human society, we must first fully understand the similarities and differences between the human-like characteristics exhibited by artificial intelligence models and… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.03273  [pdf, other

    cs.RO

    Evaluation of Drivers' Interaction Ability at Social Scenarios: A Process-Based Framework

    Authors: Jiaqi Liu, Peng Hang, Xiangwang Hu, Jian Sun

    Abstract: Assessing drivers' interaction capabilities is crucial for understanding human driving behavior and enhancing the interactive abilities of autonomous vehicles. In scenarios involving strong interaction, existing metrics focused on interaction outcomes struggle to capture the evolutionary process of drivers' interactive behaviors, making it challenging for autonomous vehicles to dynamically assess… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  8. arXiv:2405.03178  [pdf, other

    cs.SD eess.AS

    POPDG: Popular 3D Dance Generation with PopDanceSet

    Authors: Zhenye Luo, Min Ren, Xuecai Hu, Yongzhen Huang, Li Yao

    Abstract: Generating dances that are both lifelike and well-aligned with music continues to be a challenging task in the cross-modal domain. This paper introduces PopDanceSet, the first dataset tailored to the preferences of young audiences, enabling the generation of aesthetically oriented dances. And it surpasses the AIST++ dataset in music genre diversity and the intricacy and depth of dance movements. M… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  9. Easy over Hard: A Simple Baseline for Test Failures Causes Prediction

    Authors: Zhipeng Gao, Zhipeng Xue, Xing Hu, Weiyi Shang, Xin Xia

    Abstract: The test failure causes analysis is critical since it determines the subsequent way of handling different types of bugs, which is the prerequisite to get the bugs properly analyzed and fixed. After a test case fails, software testers have to inspect the test execution logs line by line to identify its root cause. However, manual root cause determination is often tedious and time-consuming, which c… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  10. arXiv:2405.02843  [pdf, other

    cs.CV

    Residual-Conditioned Optimal Transport: Towards Structure-preserving Unpaired and Paired Image Restoration

    Authors: Xiaole Tang, Xin Hu, Xiang Gu, Jian Sun

    Abstract: Deep learning-based image restoration methods have achieved promising performance. However, how to faithfully preserve the structure of the original image remains challenging. To address this challenge, we propose a novel Residual-Conditioned Optimal Transport (RCOT) approach, which models the image restoration as an optimal transport (OT) problem for both unpaired and paired settings, integrating… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  11. arXiv:2405.01775  [pdf, other

    cs.AR cs.LG

    Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design

    Authors: Jian Meng, Yuan Liao, Anupreetham Anupreetham, Ahmed Hasssan, Shixing Yu, Han-sok Suh, Xiaofeng Hu, Jae-sun Seo

    Abstract: The development of model compression is continuously motivated by the evolution of various neural network accelerators with ASIC or FPGA. On the algorithm side, the ultimate goal of quantization or pruning is accelerating the expensive DNN computations on low-power hardware. However, such a "design-and-deploy" workflow faces under-explored challenges in the current hardware-algorithm co-design com… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted for publication at MLSys 2024

  12. arXiv:2405.00244  [pdf, other

    cs.CV

    Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

    Authors: Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

    Abstract: As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruc… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This paper has been accepted by CVPR 2024

  13. arXiv:2404.17964  [pdf, other

    cs.SE

    Automating Zero-Shot Patch Porting for Hard Forks

    Authors: Shengyi Pan, You Wang, Zhongxin Liu, Xing Hu, Xin Xia, Shanping Li

    Abstract: Forking is a typical way of code reuse, which provides a simple way for developers to create a variant software (denoted as hard fork) by copying and modifying an existing codebase. Despite of the benefits, forking also leads to duplicate efforts in software maintenance. Developers need to port patches across the hard forks to address similar bugs or implement similar features. Due to the divergen… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Accepted by ISSTA 2024

  14. arXiv:2404.17667  [pdf, other

    eess.SP cs.LG

    SiamQuality: A ConvNet-Based Foundation Model for Imperfect Physiological Signals

    Authors: Cheng Ding, Zhicheng Guo, Zhaoliang Chen, Randall J Lee, Cynthia Rudin, Xiao Hu

    Abstract: Foundation models, especially those using transformers as backbones, have gained significant popularity, particularly in language and language-vision tasks. However, large foundation models are typically trained on high-quality data, which poses a significant challenge, given the prevalence of poor-quality real-world data. This challenge is more pronounced for developing foundation models for phys… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  15. arXiv:2404.16781  [pdf, other

    cs.CV

    Registration by Regression (RbR): a framework for interpretable and flexible atlas registration

    Authors: Karthik Gopinath, Xiaoling Hu, Malte Hoffmann, Oula Puonti, Juan Eugenio Iglesias

    Abstract: In human neuroimaging studies, atlas registration enables mapping MRI scans to a common coordinate frame, which is necessary to aggregate data from multiple subjects. Machine learning registration methods have achieved excellent speed and accuracy but lack interpretability. More recently, keypoint-based methods have been proposed to tackle this issue, but their accuracy is still subpar, particular… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 11 pages, 3 figures

  16. arXiv:2404.15353  [pdf, other

    eess.SP cs.AI cs.LG

    SQUWA: Signal Quality Aware DNN Architecture for Enhanced Accuracy in Atrial Fibrillation Detection from Noisy PPG Signals

    Authors: Runze Yan, Cheng Ding, Ran Xiao, Aleksandr Fedorov, Randall J Lee, Fadi Nahab, Xiao Hu

    Abstract: Atrial fibrillation (AF), a common cardiac arrhythmia, significantly increases the risk of stroke, heart disease, and mortality. Photoplethysmography (PPG) offers a promising solution for continuous AF monitoring, due to its cost efficiency and integration into wearable devices. Nonetheless, PPG signals are susceptible to corruption from motion artifacts and other factors often encountered in ambu… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 15 pages; 9 figures; 2024 Conference on Health, Inference, and Learning (CHIL)

  17. arXiv:2404.11225  [pdf, other

    cs.CL cs.AI

    In-Context Learning State Vector with Inner and Momentum Optimization

    Authors: Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

    Abstract: Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this g… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 7 figures, 5 tables

  18. arXiv:2404.09438  [pdf, other

    math.OC cs.LG stat.ML

    Developing Lagrangian-based Methods for Nonsmooth Nonconvex Optimization

    Authors: Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh

    Abstract: In this paper, we consider the minimization of a nonsmooth nonconvex objective function $f(x)$ over a closed convex subset $\mathcal{X}$ of $\mathbb{R}^n$, with additional nonsmooth nonconvex constraints $c(x) = 0$. We develop a unified framework for developing Lagrangian-based methods, which takes a single-step update to the primal variables by some subgradient methods in each iteration. These su… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 30 pages, 4 figures

  19. arXiv:2404.08563  [pdf, other

    cs.RO

    FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments

    Authors: Hexiang Wei, Jianhao Jiao, Xiangcheng Hu, Jingwen Yu, Xupeng Xie, Jin Wu, Yilong Zhu, Yuxuan Liu, Lujia Wang, Ming Liu

    Abstract: Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM data… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 20 pages, 17 figures, 7 tables. Submitted for IJRR dataset paper

  20. arXiv:2404.06483  [pdf, other

    cs.CV

    RhythmMamba: Fast Remote Physiological Measurement with Arbitrary Length Videos

    Authors: Bochao Zou, Zizheng Guo, Xiaocheng Hu, Huimin Ma

    Abstract: Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals from facial videos, holding great potential in various applications such as healthcare, affective computing, and anti-spoofing. Existing deep learning methods struggle to address two core issues of rPPG simultaneously: extracting weak rPPG signals from video segments with large spatiotemporal redundancy… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.12788

  21. arXiv:2404.06114  [pdf, other

    cs.DC cs.AI

    Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

    Authors: Feng Liang, Zhen Zhang, Haifeng Lu, Victor C. M. Leung, Yanyi Guo, Xiping Hu

    Abstract: With the rapid growth in the volume of data sets, models, and devices in the domain of deep learning, there is increasing attention on large-scale distributed deep learning. In contrast to traditional distributed deep learning, the large-scale scenario poses new challenges that include fault tolerance, scalability of algorithms and infrastructures, and heterogeneity in data sets, models, and resou… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.03194  [pdf, other

    cs.DB

    Reservoir Sampling over Joins

    Authors: Binyang Dai, Xiao Hu, Ke Yi

    Abstract: Sampling over joins is a fundamental task in large-scale data analytics. Instead of computing the full join results, which could be massive, a uniform sample of the join results would suffice for many purposes, such as answering analytical queries or training machine learning models. In this paper, we study the problem of how to maintain a random sample over joins while the tuples are streaming in… ▽ More

    Submitted 9 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  23. arXiv:2404.02439  [pdf, other

    cs.HC

    A neuroergonomics model to evaluating nuclear power plants operators' performance under heat stress driven by ECG time-frequency spectrums and fNIRS prefrontal cortex network: a CNN-GAT fusion model

    Authors: Yan Zhang, Ming Jia, Meng Li, JianYu Wang, XiangMin Hu, ZhiHui Xu, Tao Chen

    Abstract: Operators experience complicated physiological and psychological states when exposed to extreme heat stress, which can impair cognitive function and decrease performance significantly, ultimately leading to severe secondary disasters. Therefore, there is an urgent need for a feasible technique to identify their abnormal states to enhance the reliability of human-cybernetics systems. With the advan… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  24. arXiv:2403.18381  [pdf, other

    cs.CL cs.AI

    Improving Attributed Text Generation of Large Language Models via Preference Learning

    Authors: Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang

    Abstract: Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirrorin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages, 15 tables, 2 figures

  25. arXiv:2403.18209  [pdf, other

    cs.LG cs.AI cs.RO

    Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving

    Authors: Xuemin Hu, Pan Chen, Yijun Wen, Bo Tang, Long Chen

    Abstract: Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training o… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  26. arXiv:2403.16649  [pdf, other

    cs.AI

    CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment

    Authors: Feiteng Fang, Liang Zhu, Min Yang, Xi Feng, Jinchang Hou, Qixuan Zhao, Chengming Li, Xiping Hu, Ruifeng Xu

    Abstract: Reinforcement learning from human feedback (RLHF) is a crucial technique in aligning large language models (LLMs) with human preferences, ensuring these LLMs behave in beneficial and comprehensible ways to users. However, a longstanding challenge in human alignment techniques based on reinforcement learning lies in their inherent complexity and difficulty in training. To address this challenge, we… ▽ More

    Submitted 26 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  27. arXiv:2403.16437  [pdf, other

    cs.SE cs.CL

    Evaluating Large Language Models with Runtime Behavior of Program Execution

    Authors: Junkai Chen, Zhiyuan Pan, Xing Hu, Zhenhao Li, Ge Li, Xin Xia

    Abstract: Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. To evaluate the capabilities of code LLMs in various aspects, many benchmarks have been proposed (e.g., HumanEval and ClassEval). Code reasoning is one of the most essential abilities of code LLMs, but existing benchmarks for code reasoning are not sufficient. Typically, they focus on… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  28. arXiv:2403.16312  [pdf, other

    cs.DB

    On Reporting Durable Patterns in Temporal Proximity Graphs

    Authors: Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang

    Abstract: Finding patterns in graphs is a fundamental problem in databases and data mining. In many applications, graphs are temporal and evolve over time, so we are interested in finding durable patterns, such as triangles and paths, which persist over a long time. While there has been work on finding durable simple patterns, existing algorithms do not have provable guarantees and run in strictly super-lin… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Journal ref: PODS 2024

  29. arXiv:2403.15361  [pdf, other

    cs.CV cs.LG

    Learning Topological Representations for Deep Image Understanding

    Authors: Xiaoling Hu

    Abstract: In many scenarios, especially biomedical applications, the correct delineation of complex fine-scaled structures such as neurons, tissues, and vessels is critical for downstream analysis. Despite the strong predictive power of deep learning methods, they do not provide a satisfactory representation of these structures, thus creating significant barriers in scalable annotation and downstream analys… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Ph.D. thesis from Stony Brook University. This thesis includes works arXiv:1906.05404, arXiv:2110.08335, arXiv:2112.07812, arXiv:2103.09992, arXiv:2206.01742

  30. arXiv:2403.14668  [pdf, other

    cs.CY cs.AI cs.CL cs.LG

    Predicting Learning Performance with Large Language Models: A Study in Adult Literacy

    Authors: Liang Zhang, Jionghao Lin, Conrad Borchers, John Sabatini, John Hollander, Meng Cao, Xiangen Hu

    Abstract: Intelligent Tutoring Systems (ITSs) have significantly enhanced adult literacy training, a key factor for societal participation, employment opportunities, and lifelong learning. Our study investigates the application of advanced AI models, including Large Language Models (LLMs) like GPT-4, for predicting learning performance in adult literacy programs in ITSs. This research is motivated by the po… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 26TH International Conference on Human-Computer Interaction

  31. arXiv:2403.13996  [pdf, other

    eess.IV cs.CV

    P-Count: Persistence-based Counting of White Matter Hyperintensities in Brain MRI

    Authors: Xiaoling Hu, Annabel Sorby-Adams, Frederik Barkhof, W Taylor Kimberly, Oula Puonti, Juan Eugenio Iglesias

    Abstract: White matter hyperintensities (WMH) are a hallmark of cerebrovascular disease and multiple sclerosis. Automated WMH segmentation methods enable quantitative analysis via estimation of total lesion load, spatial distribution of lesions, and number of lesions (i.e., number of connected components after thresholding), all of which are correlated with patient outcomes. While the two former measures ca… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures

  32. arXiv:2403.12686  [pdf, other

    cs.CV cs.MM cs.RO

    WaterVG: Waterway Visual Grounding based on Text-Guided Vision and mmWave Radar

    Authors: Runwei Guan, Liye Jia, Fengyufan Yang, Shanliang Yao, Erick Purwanto, Xiaohui Zhu, Eng Gee Lim, Jeremy Smith, Ka Lok Man, Xuming Hu, Yutao Yue

    Abstract: The perception of waterways based on human intent is significant for autonomous navigation and operations of Unmanned Surface Vehicles (USVs) in water environments. Inspired by visual grounding, we introduce WaterVG, the first visual grounding dataset designed for USV-based waterway perception based on human prompts. WaterVG encompasses prompts describing multiple targets, with annotations at the… ▽ More

    Submitted 4 April, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

    Comments: 10 pages, 10 figures

  33. arXiv:2403.12077  [pdf, other

    cs.CL cs.AI cs.IR

    Evaluating Robustness of Generative Search Engine on Adversarial Factual Questions

    Authors: Xuming Hu, Xiaochuan Li, Junzhe Chen, Yinghui Li, Yangning Li, Xiaoguang Li, Yasheng Wang, Qun Liu, Lijie Wen, Philip S. Yu, Zhijiang Guo

    Abstract: Generative search engines have the potential to transform how people seek information online, but generated responses from existing large language models (LLMs)-backed generative search engines may not always be accurate. Nonetheless, retrieval-augmented generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable par… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

    Comments: 21 pages, 7 figures, 4 tables

  34. arXiv:2403.11561  [pdf, other

    cs.CV

    Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection

    Authors: Liren He, Zhengkai Jiang, Jinlong Peng, Liang Liu, Qiangang Du, Xiaobin Hu, Wenbing Zhu, Mingmin Chi, Yabiao Wang, Chengjie Wang

    Abstract: In the field of multi-class anomaly detection, reconstruction-based methods derived from single-class anomaly detection face the well-known challenge of ``learning shortcuts'', wherein the model fails to learn the patterns of normal samples as it should, opting instead for shortcuts such as identity mapping or artificial noise elimination. Consequently, the model becomes unable to reconstruct genu… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  35. arXiv:2403.11289  [pdf, other

    cs.RO

    ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models

    Authors: Siyuan Huang, Iaroslav Ponomarenko, Zhengkai Jiang, Xiaoqi Li, Xiaobin Hu, Peng Gao, Hongsheng Li, Hao Dong

    Abstract: The integration of Multimodal Large Language Models (MLLMs) with robotic systems has significantly enhanced the ability of robots to interpret and act upon natural language instructions. Despite these advancements, conventional MLLMs are typically trained on generic image-text pairs, lacking essential robotics knowledge such as affordances and physical knowledge, which hampers their efficacy in ma… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Code and dataset will be made publicly available at https://github.com/SiyuanHuang95/ManipVQA

  36. arXiv:2403.09732  [pdf, other

    cs.CL cs.AI

    PET-SQL: A Prompt-enhanced Two-stage Text-to-SQL Framework with Cross-consistency

    Authors: Zhishuai Li, Xiang Wang, Jingjing Zhao, Sun Yang, Guoqing Du, Xiaoru Hu, Bin Zhang, Yuxiao Ye, Ziyue Li, Rui Zhao, Hangyu Mao

    Abstract: Recent advancements in Text-to-SQL (Text2SQL) emphasize stimulating the large language models (LLM) on in-context learning, achieving significant results. Nevertheless, they face challenges when dealing with verbose database information and complex user intentions. This paper presents a two-stage framework to enhance the performance of current LLM-based natural language to SQL systems. We first in… ▽ More

    Submitted 28 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  37. arXiv:2403.08293  [pdf, other

    cs.CL cs.AI

    Generative Pretrained Structured Transformers: Unsupervised Syntactic Language Models at Scale

    Authors: Xiang Hu, Pengyu Ji, Qingyang Zhu, Wei Wu, Kewei Tu

    Abstract: A syntactic language model (SLM) incrementally generates a sentence with its syntactic tree in a left-to-right manner. We present Generative Pretrained Structured Transformers (GPST), an unsupervised SLM at scale capable of being pre-trained from scratch on raw texts with high parallelism. GPST circumvents the limitations of previous SLMs such as relying on gold trees and sequential training. It c… ▽ More

    Submitted 18 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: preprint

  38. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  39. arXiv:2403.06403  [pdf, other

    cs.CV

    PointSeg: A Training-Free Paradigm for 3D Scene Segmentation via Foundation Models

    Authors: Qingdong He, Jinlong Peng, Zhengkai Jiang, Xiaobin Hu, Jiangning Zhang, Qiang Nie, Yabiao Wang, Chengjie Wang

    Abstract: Recent success of vision foundation models have shown promising performance for the 2D perception tasks. However, it is difficult to train a 3D foundation network directly due to the limited dataset and it remains under explored whether existing foundation models can be lifted to 3D space seamlessly. In this paper, we present PointSeg, a novel training-free paradigm that leverages off-the-shelf vi… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  40. arXiv:2403.06168  [pdf, other

    cs.CV cs.AI

    DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation

    Authors: Xiaobin Hu, Xu Peng, Donghao Luo, Xiaozhong Ji, Jinlong Peng, Zhengkai Jiang, Jiangning Zhang, Taisong Jin, Chengjie Wang, Rongrong Ji

    Abstract: Due to the difficulty and labor-consuming nature of getting highly accurate or matting annotations, there only exists a limited amount of highly accurate labels available to the public. To tackle this challenge, we propose a DiffuMatting which inherits the strong Everything generation ability of diffusion and endows the power of "matting anything". Our DiffuMatting can 1). act as an anything matti… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  41. arXiv:2403.05817  [pdf, other

    cs.CV

    SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection

    Authors: Gang Zhang, Junnan Chen, Guohuan Gao, Jianmin Li, Si Liu, Xiaolin Hu

    Abstract: LiDAR-based 3D object detection plays an essential role in autonomous driving. Existing high-performing 3D object detectors usually build dense feature maps in the backbone network and prediction head. However, the computational costs introduced by the dense feature maps grow quadratically as the perception range increases, making these models hard to scale up to long-range detection. Some recent… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  42. To Reach the Unreachable: Exploring the Potential of VR Hand Redirection for Upper Limb Rehabilitation

    Authors: Peixuan Xiong, Yukai Zhang, Nandi Zhang, Shihan Fu, Xin Li, Yadan Zheng, Jinni Zhou, Xiquan Hu, Mingming Fan

    Abstract: Rehabilitation therapies are widely employed to assist people with motor impairments in regaining control over their affected body parts. Nevertheless, factors such as fatigue and low self-efficacy can hinder patient compliance during extensive rehabilitation processes. Utilizing hand redirection in virtual reality (VR) enables patients to accomplish seemingly more challenging tasks, thereby bolst… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  43. arXiv:2403.05135  [pdf, other

    cs.CV

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Authors: Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

    Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Ad… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project Page: https://ella-diffusion.github.io/

  44. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  45. arXiv:2403.04444  [pdf, other

    cs.CV

    Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser

    Authors: Qingyuan Cai, Xuecai Hu, Saihui Hou, Li Yao, Yongzhen Huang

    Abstract: Recently, diffusion-based methods for monocular 3D human pose estimation have achieved state-of-the-art (SOTA) performance by directly regressing the 3D joint coordinates from the 2D pose sequence. Although some methods decompose the task into bone length and bone direction prediction based on the human anatomical skeleton to explicitly incorporate more human body prior constraints, the performanc… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI24

  46. arXiv:2403.04247  [pdf, other

    cs.CL

    UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

    Authors: Yangning Li, Qingsong Lv, Tianyu Yu, Yinghui Li, Shulin Huang, Tingwei Lu, Xuming Hu, Wenhao JIang, Hai-Tao Zheng, Hui Wang

    Abstract: Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes… ▽ More

    Submitted 23 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Initial Version

  47. arXiv:2403.03442  [pdf, other

    cs.AR

    CAMASim: A Comprehensive Simulation Framework for Content-Addressable Memory based Accelerators

    Authors: Mengyuan Li, Shiyi Liu, Mohammad Mehdi Sharifi, X. Sharon Hu

    Abstract: Content addressable memory (CAM) stands out as an efficient hardware solution for memory-intensive search operations by supporting parallel computation in memory. However, developing a CAM-based accelerator architecture that achieves acceptable accuracy, while minimizing hardware cost and catering to both exact and approximate search, still presents a significant challenge especially when consider… ▽ More

    Submitted 7 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  48. arXiv:2403.02951  [pdf, other

    cs.CL cs.AI

    Benchmarking the Text-to-SQL Capability of Large Language Models: A Comprehensive Evaluation

    Authors: Bin Zhang, Yuxiao Ye, Guoqing Du, Xiaoru Hu, Zhishuai Li, Sun Yang, Chi Harold Liu, Rui Zhao, Ziyue Li, Hangyu Mao

    Abstract: Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods. Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt templates and design frameworks. Additionally, existing benchmarks inadequately explore the performance of LLMs across the various sub-tasks of the Text-to-SQL pr… ▽ More

    Submitted 6 March, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: 26pages, 6figures, 14tables

  49. arXiv:2403.02905  [pdf, other

    cs.MM

    MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion Model

    Authors: Sen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang Ma

    Abstract: The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based o… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  50. arXiv:2403.00867  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

    Authors: Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho

    Abstract: Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer. To reduce harm and misuse, efforts have been made to align these LLMs to human values using advanced training techniques such as Reinforcement Learning from Human Feedback (RLHF). However, recent studies have highlighted the vulnerability of LLMs to adversarial j… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: Project page: https://huggingface.co/spaces/TrustSafeAI/GradientCuff-Jailbreak-Defense