Skip to main content

Showing 1–50 of 525 results for author: Su, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05941  [pdf, other

    cs.RO cs.CV cs.LG

    Evaluating Real-World Robot Manipulation Policies in Simulation

    Authors: Xuanlin Li, Kyle Hsu, Jiayuan Gu, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

    Abstract: The field of robotics has made significant advances towards generalist robot manipulation policies. However, real-world evaluation of such policies is not scalable and faces reproducibility challenges, which are likely to worsen as policies broaden the spectrum of tasks they can perform. We identify control and visual disparities between real and simulated environments as key challenges for reliab… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.03908  [pdf, other

    cs.DC cs.DS

    Deterministic Expander Routing: Faster and More Versatile

    Authors: Yi-Jun Chang, Shang-En Huang, Hsin-Hao Su

    Abstract: We consider the expander routing problem formulated by Ghaffari, Kuhn, and Su (PODC 2017), where the goal is to route all the tokens to their destinations given that each vertex is the source and the destination of at most $°(v)$ tokens. They developed $\textit{randomized algorithms}$ that solve this problem in $\text{poly}(φ^{-1}) \cdot 2^{O(\sqrt{\log n \log \log n})}$ rounds in the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to PODC 2024

  3. A Construct-Optimize Approach to Sparse View Synthesis without Camera Pose

    Authors: Kaiwen Jiang, Yang Fu, Mukund Varma T, Yash Belhe, Xiaolong Wang, Hao Su, Ravi Ramamoorthi

    Abstract: Novel view synthesis from a sparse set of input images is a challenging problem of great practical interest, especially when camera poses are absent or inaccurate. Direct optimization of camera poses and usage of estimated depths in neural radiance field algorithms usually do not produce good results because of the coupling between poses and depths, and inaccuracies in monocular depth estimation.… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.03379  [pdf, other

    cs.LG cs.AI cs.RO

    Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning

    Authors: Stone Tao, Arth Shukla, Tse-kai Chan, Hao Su

    Abstract: Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction, but often requires an infeasible amount of interaction data to solve complex tasks from sparse rewards. One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data that is difficult to obtain, espe… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at The Twelfth International Conference on Learning Representations (ICLR 2024). Website: https://reverseforward-cl.github.io/

  5. arXiv:2405.00885  [pdf, other

    cs.LG cs.NI eess.IV

    WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

    Authors: Huai-an Su, Jiaxiang Geng, Liang Li, Xiaoqi Qin, Yanzhao Hou, Xin Fu, Miao Pan

    Abstract: As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training b… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  6. arXiv:2405.00566  [pdf, other

    cs.CE cs.CL q-fin.GN

    NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

    Authors: Huan-Yi Su, Ke Wu, Yu-Hao Huang, Wu-Jun Li

    Abstract: Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (Nu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  7. arXiv:2404.19525  [pdf, other

    cs.CV

    MicroDreamer: Zero-shot 3D Generation in $\sim$20 Seconds by Score-based Iterative Reconstruction

    Authors: Luxi Chen, Zhengyi Wang, Chongxuan Li, Tingting Gao, Hang Su, Jun Zhu

    Abstract: Optimization-based approaches, such as score distillation sampling (SDS), show promise in zero-shot 3D generation but suffer from low efficiency, primarily due to the high number of function evaluations (NFEs) required for each sample. In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  8. arXiv:2404.17302  [pdf, other

    cs.RO cs.AI cs.CV

    Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

    Authors: Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, Jing Xu, Guijin Wang, Hao Su

    Abstract: Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  9. arXiv:2404.16779  [pdf, other

    cs.LG cs.AI cs.RO

    DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

    Authors: Tongzhou Mu, Minghua Liu, Hao Su

    Abstract: The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-qu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: ICLR 2024. Explore videos, data, code, and more at https://sites.google.com/view/iclr24drs

  10. arXiv:2404.12713  [pdf, other

    cs.NI

    Energy Conserved Failure Detection for NS-IoT Systems

    Authors: Guojin Liu, Jianhong Zhou, Hang Su, Biaohong Xiong, Xianhua Niu

    Abstract: Nowadays, network slicing (NS) technology has gained widespread adoption within Internet of Things (IoT) systems to meet diverse customized requirements. In the NS based IoT systems, the detection of equipment failures necessitates comprehensive equipment monitoring, which leads to significant resource utilization, particularly within large-scale IoT ecosystems. Thus, the imperative task of reduci… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  11. arXiv:2404.12385  [pdf, other

    cs.CV cs.GR

    MeshLRM: Large Reconstruction Model for High-Quality Mesh

    Authors: Xinyue Wei, Kai Zhang, Sai Bi, Hao Tan, Fujun Luan, Valentin Deschaintre, Kalyan Sunkavalli, Hao Su, Zexiang Xu

    Abstract: We propose MeshLRM, a novel LRM-based approach that can reconstruct a high-quality mesh from merely four input images in less than one second. Different from previous large reconstruction models (LRMs) that focus on NeRF-based reconstruction, MeshLRM incorporates differentiable mesh extraction and rendering within the LRM framework. This allows for end-to-end mesh reconstruction by fine-tuning a p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  12. arXiv:2404.12379  [pdf, other

    cs.CV

    Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Monocular Videos

    Authors: Isabella Liu, Hao Su, Xiaolong Wang

    Abstract: Modern 3D engines and graphics pipelines require mesh as a memory-efficient representation, which allows efficient rendering, geometry processing, texture editing, and many other downstream operations. However, it is still highly difficult to obtain high-quality mesh in terms of structure and detail from monocular visual observations. The problem becomes even more challenging for dynamic scenes an… ▽ More

    Submitted 22 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Project page: https://www.liuisabella.com/DG-Mesh/

  13. arXiv:2404.12139  [pdf, other

    cs.CV

    Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models

    Authors: Shouwei Ruan, Yinpeng Dong, Hanqing Liu, Yao Huang, Hang Su, Xingxing Wei

    Abstract: Vision-Language Pre-training (VLP) models like CLIP have achieved remarkable success in computer vision and particularly demonstrated superior robustness to distribution shifts of 2D images. However, their robustness under 3D viewpoint variations is still limited, which can hinder the development for real-world applications. This paper successfully addresses this concern while keeping VLPs' origin… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 20 pages

  14. arXiv:2404.11207  [pdf, other

    cs.CV cs.AI cs.LG

    Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

    Authors: Yichi Zhang, Yinpeng Dong, Siyuan Zhang, Tianzan Min, Hang Su, Jun Zhu

    Abstract: Although Multimodal Large Language Models (MLLMs) have demonstrated promising versatile capabilities, their performance is still inferior to specialized models on downstream tasks, which makes adaptation necessary to enhance their utility. However, fine-tuning methods require independent training for every model, leading to huge computation and memory overheads. In this paper, we propose a novel s… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted in CVPR 2024 as Poster (Highlight)

  15. arXiv:2404.09524  [pdf, other

    cs.LG

    Dynamic fault detection and diagnosis of industrial alkaline water electrolyzer process with variational Bayesian dictionary learning

    Authors: Qi Zhang, Lei Xie, Weihua Xu, Hongye Su

    Abstract: Alkaline Water Electrolysis (AWE) is one of the simplest green hydrogen production method using renewable energy. AWE system typically yields process variables that are serially correlated and contaminated by measurement uncertainty. A novel robust dynamic variational Bayesian dictionary learning (RDVDL) monitoring approach is proposed to improve the reliability and safety of AWE operation.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.09519  [pdf, other

    cs.LG eess.SY

    Nonlinear sparse variational Bayesian learning based model predictive control with application to PEMFC temperature control

    Authors: Qi Zhang, Lei Wang, Weihua Xu, Hongye Su, Lei Xie

    Abstract: The accuracy of the underlying model predictions is crucial for the success of model predictive control (MPC) applications. If the model is unable to accurately analyze the dynamics of the controlled system, the performance and stability guarantees provided by MPC may not be achieved. Learning-based MPC can learn models from data, improving the applicability and reliability of MPC. This study deve… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.09193  [pdf, other

    cs.CV

    FaceCat: Enhancing Face Recognition Security with a Unified Generative Model Framework

    Authors: Jiawei Chen, Xiao Yang, Yinpeng Dong, Hang Su, Jianteng Peng, Zhaoxia Yin

    Abstract: Face anti-spoofing (FAS) and adversarial detection (FAD) have been regarded as critical technologies to ensure the safety of face recognition systems. As a consequence of their limited practicality and generalization, some existing methods aim to devise a framework capable of concurrently detecting both threats to address the challenge. Nevertheless, these methods still encounter challenges of ins… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Under review

  18. arXiv:2404.08285  [pdf

    cs.CV cs.AI eess.SY

    A Survey of Neural Network Robustness Assessment in Image Recognition

    Authors: Jie Wang, Jun Ai, Minyan Lu, Haoran Su, Dan Yu, Yutao Zhang, Junda Zhu, Jingyu Liu

    Abstract: In recent years, there has been significant attention given to the robustness assessment of neural networks. Robustness plays a critical role in ensuring reliable operation of artificial intelligence (AI) systems in complex and uncertain environments. Deep learning's robustness problem is particularly significant, highlighted by the discovery of adversarial attacks on image classification models.… ▽ More

    Submitted 15 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: Corrected typos and grammatical errors in Section 5

  19. arXiv:2404.07428  [pdf, other

    cs.RO cs.LG

    AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

    Abstract: Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  20. arXiv:2404.01699  [pdf, other

    cs.CV

    Task Integration Distillation for Object Detectors

    Authors: Hai Su, ZhenWen Jian, Songsen Yu

    Abstract: Knowledge distillation is a widely adopted technique for model lightening. However, the performance of most knowledge distillation methods in the domain of object detection is not satisfactory. Typically, knowledge distillation approaches consider only the classification task among the two sub-tasks of an object detector, largely overlooking the regression task. This oversight leads to a partial u… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  21. arXiv:2404.00842  [pdf, other

    cs.CV

    An N-Point Linear Solver for Line and Motion Estimation with Event Cameras

    Authors: Ling Gao, Daniel Gehrig, Hang Su, Davide Scaramuzza, Laurent Kneip

    Abstract: Event cameras respond primarily to edges--formed by strong gradients--and are thus particularly well-suited for line-based motion estimation. Recent work has shown that events generated by a single line each satisfy a polynomial constraint which describes a manifold in the space-time volume. Multiple such constraints can be solved simultaneously to recover the partial linear velocity and line para… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  22. arXiv:2404.00540  [pdf, other

    cs.CV cs.AI

    Embodied Active Defense: Leveraging Recurrent Feedback to Counter Adversarial Patches

    Authors: Lingxuan Wu, Xiao Yang, Yinpeng Dong, Liuwei Xie, Hang Su, Jun Zhu

    Abstract: The vulnerability of deep neural networks to adversarial patches has motivated numerous defense strategies for boosting model robustness. However, the prevailing defenses depend on single observation or pre-established adversary information to counter adversarial patches, often failing to be confronted with unseen or adaptive adversarial attacks and easily exhibiting unsatisfying performance in dy… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 27pages

  23. arXiv:2404.00247  [pdf, ps, other

    eess.SY cs.AI cs.LG

    Facilitating Reinforcement Learning for Process Control Using Transfer Learning: Perspectives

    Authors: Runze Lin, Junghui Chen, Lei Xie, Hongye Su, Biao Huang

    Abstract: This paper provides insights into deep reinforcement learning (DRL) for process control from the perspective of transfer learning. We analyze the challenges of applying DRL in the field of process industries and the necessity of introducing transfer learning. Furthermore, recommendations and prospects are provided for future research directions on how transfer learning can be integrated with DRL t… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Final Version of Asian Control Conference (ASCC 2024)

  24. arXiv:2403.18922  [pdf, other

    cs.CV

    Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D

    Authors: Mukund Varma T, Peihao Wang, Zhiwen Fan, Zhangyang Wang, Hao Su, Ravi Ramamoorthi

    Abstract: In recent years, there has been an explosion of 2D vision models for numerous tasks such as semantic segmentation, style transfer or scene editing, enabled by large-scale 2D image datasets. At the same time, there has been renewed interest in 3D scene representations such as neural radiance fields from multi-view images. However, the availability of 3D or multiview data is still substantially limi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Computer Vision and Pattern Recognition Conference (CVPR), 2024

  25. arXiv:2403.18330  [pdf, other

    cs.CV cs.LG

    Tracking-Assisted Object Detection with Event Cameras

    Authors: Ting-Kang Yen, Igor Morawski, Shusil Dangi, Kai He, Chung-Yi Lin, Jia-Fong Yeh, Hung-Ting Su, Winston Hsu

    Abstract: Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to p… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  26. arXiv:2403.15004  [pdf

    cs.CV cs.LG

    ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding

    Authors: Novendra Setyawan, Ghufron Wahyu Kurniawan, Chi-Chia Sun, Jun-Wei Hsieh, Hui-Kai Su, Wen-Kai Kuo

    Abstract: This work presents ParFormer as an enhanced transformer architecture that allows the incorporation of different token mixers into a single stage, hence improving feature extraction capabilities. Integrating both local and global data allows for precise representation of short- and long-range spatial relationships without the need for computationally intensive methods such as shifting windows. Alon… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  27. arXiv:2403.13324  [pdf, other

    cs.CV

    Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model

    Authors: K Huang, G Song, Hanwen Su, Jiyan Wang

    Abstract: Out-of-distribution (OOD) detection is a critical task to ensure the reliability and security of machine learning models deployed in real-world applications. Conventional methods for OOD detection that rely on single-modal information, often struggle to capture the rich variety of OOD instances. The primary difficulty in OOD detection arises when an input image has numerous similarities to a parti… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  28. arXiv:2403.12991  [pdf, other

    cs.CV cs.LG

    Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Vehicle flow, a crucial indicator for transportation, is often limited by detector coverage. With the advent of extensive mobile network coverage, we can leverage mobile user activities, or cellular traffic, on roadways as a proxy for vehicle flow. However, as counts of cellular traffic may not directly align with vehicle flow due to data from various user types, we present a new task: predicting… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 4 pages, 5 figures, 4 tables. Accepted by WWW'24, to appear

  29. arXiv:2403.12032  [pdf, other

    cs.CV cs.GR

    Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

    Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

    Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

  30. arXiv:2403.10043  [pdf, other

    cs.RO

    GeoPro-VO: Dynamic Obstacle Avoidance with Geometric Projector Based on Velocity Obstacle

    Authors: Jihao Huang, Xuemin Chi, Jun Zeng, Zhitao Liu, Hongye Su

    Abstract: Optimization-based approaches are widely employed to generate optimal robot motions while considering various constraints, such as robot dynamics, collision avoidance, and physical limitations. It is crucial to efficiently solve the optimization problems in practice, yet achieving rapid computations remains a great challenge for optimization-based approaches with nonlinear constraints. In this pap… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  31. Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning

    Authors: Hongyuan Su, Yu Zheng, Jingtao Ding, Depeng Jin, Yong Li

    Abstract: Social media platforms have become one of the main channels where people disseminate and acquire information, of which the reliability is severely threatened by rumors widespread in the network. Existing approaches such as suspending users or broadcasting real information to combat rumors are either with high cost or disturbing users. In this paper, we introduce a novel rumor mitigation paradigm,… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: WWW24 short

    MSC Class: 68T09

  32. Efficient size-prescribed $k$-core search

    Authors: Yiping Liu, Bo Yan, Bo Zhao, Hongyi Su, Yang Chen, Michael Witbrock

    Abstract: $k$-core is a subgraph where every node has at least $k$ neighbors within the subgraph. The $k$-core subgraphs has been employed in large platforms like Network Repository to comprehend the underlying structures and dynamics of the network. Existing studies have primarily focused on finding $k… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  33. MetroGNN: Metro Network Expansion with Reinforcement Learning

    Authors: Hongyuan Su, Yu Zheng, Jingtao Ding, Depeng Jin, Yong Li

    Abstract: Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinfor… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: WWW24 short

    MSC Class: 68T09

  34. arXiv:2403.07232  [pdf, other

    cs.RO cs.LG

    Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving

    Authors: Adam Villaflor, Brian Yang, Huangyuan Su, Katerina Fragkiadaki, John Dolan, Jeff Schneider

    Abstract: Significant progress has been made in training multimodal trajectory forecasting models for autonomous driving. However, effectively integrating these models with downstream planners and model-based control approaches is still an open problem. Although these models have conventionally been evaluated for open-loop prediction, we show that they can be used to parameterize autoregressive closed-loop… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  35. arXiv:2403.06563  [pdf, other

    cs.LG cs.CL

    Unraveling the Mystery of Scaling Laws: Part I

    Authors: Hui Su, Zhi Tian, Xiaoyu Shen, Xunliang Cai

    Abstract: Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by O… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  36. arXiv:2403.05046  [pdf, other

    cs.RO

    EgoPAT3Dv2: Predicting 3D Action Target from 2D Egocentric Vision for Human-Robot Interaction

    Authors: Irving Fang, Yuzhong Chen, Yifan Wang, Jianghan Zhang, Qiushi Zhang, Jiali Xu, Xibo He, Weibo Gao, Hao Su, Yiming Li, Chen Feng

    Abstract: A robot's ability to anticipate the 3D action target location of a hand's movement from egocentric videos can greatly improve safety and efficiency in human-robot interaction (HRI). While previous research predominantly focused on semantic action classification or 2D target region prediction, we argue that predicting the action target's 3D coordinate could pave the way for more versatile downstrea… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 6 pages. Accepted at ICRA 2024

  37. arXiv:2403.05034  [pdf, other

    cs.CV cs.LG

    CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model

    Authors: Zhengyi Wang, Yikai Wang, Yifei Chen, Chendong Xiang, Shuo Chen, Dajiang Yu, Chongxuan Li, Hang Su, Jun Zhu

    Abstract: Feed-forward 3D generative models like the Large Reconstruction Model (LRM) have demonstrated exceptional generation speed. However, the transformer-based methods do not leverage the geometric priors of the triplane component in their architecture, often leading to sub-optimal quality given the limited size of 3D data and slow training. In this work, we present the Convolutional Reconstruction Mod… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Project page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/

  38. arXiv:2403.04314  [pdf, other

    cs.CL

    Can Your Model Tell a Negation from an Implicature? Unravelling Challenges With Intent Encoders

    Authors: Yuwei Zhang, Siffi Singh, Sailik Sengupta, Igor Shalyminov, Hang Su, Hwanjun Song, Saab Mansour

    Abstract: Conversational systems often rely on embedding models for intent classification and intent clustering tasks. The advent of Large Language Models (LLMs), which enable instructional embeddings allowing one to adjust semantics over the embedding space using prompts, are being viewed as a panacea for these downstream conversational tasks. However, traditional evaluation benchmarks rely solely on task… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  39. arXiv:2403.04073  [pdf, other

    cs.CL cs.AI

    Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection

    Authors: Jianfeng He, Hang Su, Jason Cai, Igor Shalyminov, Hwanjun Song, Saab Mansour

    Abstract: Semi-supervised dialogue summarization (SSDS) leverages model-generated summaries to reduce reliance on human-labeled data and improve the performance of summarization models. While addressing label noise, previous works on semi-supervised learning primarily focus on natural language understanding tasks, assuming each sample has a unique label. However, these methods are not directly applicable to… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 21 pages, 10 figures

  40. arXiv:2403.03542  [pdf, other

    cs.LG math.NA

    DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training

    Authors: Zhongkai Hao, Chang Su, Songming Liu, Julius Berner, Chengyang Ying, Hang Su, Anima Anandkumar, Jian Song, Jun Zhu

    Abstract: Pre-training has been investigated to improve the efficiency and performance of training neural operators in data-scarce settings. However, it is largely in its infancy due to the inherent complexity and diversity, such as long trajectories, multiple scales and varying dimensions of partial differential equations (PDEs) data. In this paper, we present a new auto-regressive denoising pre-training s… ▽ More

    Submitted 6 May, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  41. arXiv:2403.03194  [pdf, other

    cs.CL

    MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets

    Authors: Hossein Aboutalebi, Hwanjun Song, Yusheng Xie, Arshit Gupta, Justin Sun, Hang Su, Igor Shalyminov, Nikolaos Pappas, Siffi Singh, Saab Mansour

    Abstract: Development of multimodal interactive systems is hindered by the lack of rich, multimodal (text, images) conversational data, which is needed in large quantities for LLMs. Previous approaches augment textual dialogues with retrieved images, posing privacy, diversity, and quality constraints. In this work, we introduce \textbf{M}ultimodal \textbf{A}ugmented \textbf{G}enerative \textbf{I}mages \text… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  42. arXiv:2403.03145  [pdf, other

    cs.CV cs.LG cs.MM cs.SD eess.AS

    Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Hu Su, Zhiqing Wang, Yuhao Zhao, Wei Zou, Siyang Sun, Yun Zheng

    Abstract: Audio-Visual Source Localization (AVSL) aims to locate sounding objects within video frames given the paired audio clips. Existing methods predominantly rely on self-supervised contrastive learning of audio-visual correspondence. Without any bounding-box annotations, they struggle to achieve precise localization, especially for small objects, and suffer from blurry boundaries and false positives.… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to NeurIPS2023

  43. arXiv:2403.03095  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Cross Pseudo-Labeling for Semi-Supervised Audio-Visual Source Localization

    Authors: Yuxin Guo, Shijie Ma, Yuhao Zhao, Hu Su, Wei Zou

    Abstract: Audio-Visual Source Localization (AVSL) is the task of identifying specific sounding objects in the scene given audio cues. In our work, we focus on semi-supervised AVSL with pseudo-labeling. To address the issues with vanilla hard pseudo-labels including bias accumulation, noise sensitivity, and instability, we propose a novel method named Cross Pseudo-Labeling (XPL), wherein two models learn fro… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted To ICASSP2024

  44. arXiv:2402.15218  [pdf, other

    cs.CR cs.CL cs.CV

    BSPA: Exploring Black-box Stealthy Prompt Attacks against Image Generators

    Authors: Yu Tian, Xiao Yang, Yinpeng Dong, Heming Yang, Hang Su, Jun Zhu

    Abstract: Extremely large image generators offer significant transformative potential across diverse sectors. It allows users to design specific prompts to generate realistic images through some black-box APIs. However, some studies reveal that image generators are notably susceptible to attacks and generate Not Suitable For Work (NSFW) contents by manually designed toxin texts, especially imperceptible to… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  45. arXiv:2402.14795  [pdf, other

    cs.RO cs.CV

    CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation

    Authors: Jun Wang, Yuzhe Qin, Kaiming Kuang, Yigit Korkmaz, Akhilan Gurumoorthy, Hao Su, Xiaolong Wang

    Abstract: We introduce CyberDemo, a novel approach to robotic imitation learning that leverages simulated human demonstrations for real-world tasks. By incorporating extensive data augmentation in a simulated environment, CyberDemo outperforms traditional in-domain real-world demonstrations when transferred to the real world, handling diverse physical and visual conditions. Regardless of its affordability a… ▽ More

    Submitted 1 March, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  46. arXiv:2402.13249  [pdf, other

    cs.CL cs.AI

    TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

    Authors: Liyan Tang, Igor Shalyminov, Amy Wing-mei Wong, Jon Burnsky, Jake W. Vincent, Yu'an Yang, Siffi Singh, Song Feng, Hwanjun Song, Hang Su, Lijia Sun, Yi Zhang, Saab Mansour, Kathleen McKeown

    Abstract: Single document news summarization has seen substantial progress on faithfulness in recent years, driven by research on the evaluation of factual consistency, or hallucinations. We ask whether these advances carry over to other text summarization domains. We propose a new evaluation benchmark on topic-focused dialogue summarization, generated by LLMs of varying sizes. We provide binary sentence-le… ▽ More

    Submitted 31 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: NAACL 2024; Linguistic annotations available at https://github.com/amazon-science/tofueval

  47. arXiv:2402.12317  [pdf, other

    cs.CL cs.AI

    ARKS: Active Retrieval in Knowledge Soup for Code Generation

    Authors: Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu

    Abstract: Recently the retrieval-augmented generation (RAG) paradigm has raised much attention for its potential in incorporating external knowledge into large language models (LLMs) without further training. While widely explored in natural language applications, its utilization in code generation remains under-explored. In this paper, we introduce Active Retrieval in Knowledge Soup (ARKS), an advanced str… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Retrieval-augmented code generation

  48. arXiv:2402.10894  [pdf, other

    cs.CV cs.LG

    Fusion of Diffusion Weighted MRI and Clinical Data for Predicting Functional Outcome after Acute Ischemic Stroke with Deep Contrastive Learning

    Authors: Chia-Ling Tsai, Hui-Yun Su, Shen-Feng Sung, Wei-Yang Lin, Ying-Ying Su, Tzu-Hsien Yang, Man-Lin Mai

    Abstract: Stroke is a common disabling neurological condition that affects about one-quarter of the adult population over age 25; more than half of patients still have poor outcomes, such as permanent functional dependence or even death, after the onset of acute stroke. The aim of this study is to investigate the efficacy of diffusion-weighted MRI modalities combining with structured health profile on predi… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: 12 pages, 5 figures, 5 tables

  49. arXiv:2402.09906  [pdf, other

    cs.CL cs.AI cs.LG

    Generative Representational Instruction Tuning

    Authors: Niklas Muennighoff, Hongjin Su, Liang Wang, Nan Yang, Furu Wei, Tao Yu, Amanpreet Singh, Douwe Kiela

    Abstract: All text-based language problems can be reduced to either generation or embedding. Current models only perform well at one or the other. We introduce generative representational instruction tuning (GRIT) whereby a large language model is trained to handle both generative and embedding tasks by distinguishing between them through instructions. Compared to other open models, our resulting GritLM 7B… ▽ More

    Submitted 17 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: 66 pages (16 main), 25 figures, 34 tables

  50. arXiv:2402.07562  [pdf, other

    cs.CR cs.AI

    Discovering Universal Semantic Triggers for Text-to-Image Synthesis

    Authors: Shengfang Zhai, Weilong Wang, Jiajun Li, Yinpeng Dong, Hang Su, Qingni Shen

    Abstract: Recently text-to-image models have gained widespread attention in the community due to their controllable and high-quality generation ability. However, the robustness of such models and their potential ethical issues have not been fully explored. In this paper, we introduce Universal Semantic Trigger, a meaningless token sequence that can be added at any location within the input text yet can indu… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages, 5 figures. Work in progress