Skip to main content

Showing 1–50 of 1,916 results for author: Chen, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05506  [pdf, other

    cs.CL

    Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

    Authors: Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, Danielle S. Bitterman

    Abstract: Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Submitted for review

  2. arXiv:2405.05498  [pdf, other

    cs.SD eess.AS

    The RoyalFlush Automatic Speech Diarization and Recognition System for In-Car Multi-Channel Automatic Speech Recognition Challenge

    Authors: Jingguang Tian, Shuaishuai Ye, Shunfei Chen, Yang Xiang, Zhaohui Yin, Xinhui Hu, Xinkang Xu

    Abstract: This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we develop end-to-end speaker diarization models that notably decrease the diarization error rate (DER) by 49.58\% compared to the official baseline on t… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.04966  [pdf, other

    cs.IT cs.CV cs.MA

    Communication-Efficient Collaborative Perception via Information Filling with Codebook

    Authors: Yue Hu, Juntong Peng, Sifei Liu, Junhao Ge, Si Liu, Siheng Chen

    Abstract: Collaborative perception empowers each agent to improve its perceptual ability through the exchange of perceptual messages with other agents. It inherently results in a fundamental trade-off between perception ability and communication cost. To address this bottleneck issue, our core idea is to optimize the collaborative messages from two key aspects: representation and selection. The proposed cod… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 10 pages, Accepted by CVPR 2024

  4. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  5. arXiv:2405.03775  [pdf, other

    cs.CR

    Secure Inference for Vertically Partitioned Data Using Multiparty Homomorphic Encryption

    Authors: Shuangyi Chen, Yue Ju, Zhongwen Zhu, Ashish Khisti

    Abstract: We propose a secure inference protocol for a distributed setting involving a single server node and multiple client nodes. We assume that the observed data vector is partitioned across multiple client nodes while the deep learning model is located at the server node. Each client node is required to encrypt its portion of the data vector and transmit the resulting ciphertext to the server node. The… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.02965  [pdf, other

    cs.AI cs.RO

    Robust Collaborative Perception without External Localization and Clock Devices

    Authors: Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Chen Feng, Siheng Chen, Yanfeng Wang

    Abstract: A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6pages, accepted to ICRA 2024

  7. arXiv:2405.02784  [pdf, other

    eess.IV cs.CV

    MR-Transformer: Vision Transformer for Total Knee Replacement Prediction Using Magnetic Resonance Imaging

    Authors: Chaojie Zhang, Shengjia Chen, Ozkan Cigdem, Haresh Rengaraj Rajamohan, Kyunghyun Cho, Richard Kijowski, Cem M. Deniz

    Abstract: A transformer-based deep learning model, MR-Transformer, was developed for total knee replacement (TKR) prediction using magnetic resonance imaging (MRI). The model incorporates the ImageNet pre-training and captures three-dimensional (3D) spatial correlation from the MR images. The performance of the proposed model was compared to existing state-of-the-art deep learning models for knee injury dia… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  8. arXiv:2405.02714  [pdf, other

    cs.IR cs.CL

    Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

    Authors: Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

    Abstract: The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expect… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  9. arXiv:2405.02430  [pdf, ps, other

    cs.SC

    How to generate all possible rational Wilf-Zeilberger forms?

    Authors: Shaoshi Chen, Christoph Koutschan, Yisen Wang

    Abstract: Wilf-Zeilberger pairs are fundamental in the algorithmic theory of Wilf and Zeilberger for computer-generated proofs of combinatorial identities. Wilf-Zeilberger forms are their high-dimensional generalizations, which can be used for proving and discovering convergence acceleration formulas. This paper presents a structural description of all possible rational such forms, which can be viewed as an… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  10. arXiv:2405.01814  [pdf, other

    cs.LG cs.DC

    Efficient and Economic Large Language Model Inference with Attention Offloading

    Authors: Shaoyuan Chen, Yutong Lin, Mingxing Zhang, Yongwei Wu

    Abstract: Transformer-based large language models (LLMs) exhibit impressive performance in generative tasks but introduce significant challenges in real-world serving due to inefficient use of the expensive, computation-optimized accelerators. This mismatch arises from the autoregressive nature of LLMs, where the generation phase comprises operators with varying resource demands. Specifically, the attention… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  11. arXiv:2405.01144  [pdf, other

    cs.CR

    Boosting Communication Efficiency of Federated Learning's Secure Aggregation

    Authors: Niousha Nazemi, Omid Tavallaie, Shuaijun Chen, Albert Y. Zomaya, Ralph Holz

    Abstract: Federated Learning (FL) is a decentralized machine learning approach where client devices train models locally and send them to a server that performs aggregation to generate a global model. FL is vulnerable to model inversion attacks, where the server can infer sensitive client data from trained models. Google's Secure Aggregation (SecAgg) protocol addresses this data privacy issue by masking eac… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 2 pages, 4 figures, The 54th Annual IEEE/IFIP International Conference on Dependable Systems and Networks

  12. arXiv:2405.01044  [pdf, other

    cs.RO

    Differentiable Particles for General-Purpose Deformable Object Manipulation

    Authors: Siwei Chen, Yiqing Xu, Cunjun Yu, Linfeng Li, David Hsu

    Abstract: Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  13. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  14. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  15. arXiv:2404.19279  [pdf, other

    cs.CV

    Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training

    Authors: Xingyu Song, Zhan Li, Shi Chen, Kazuyuki Demachi

    Abstract: 3D human pose estimation is a vital task in computer vision, involving the prediction of human joint positions from images or videos to reconstruct a skeleton of a human in three-dimensional space. This technology is pivotal in various fields, including animation, security, human-computer interaction, and automotive safety, where it promotes both technological progress and enhanced human well-bein… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  16. arXiv:2404.19105  [pdf, other

    quant-ph cs.IT

    Optimal tradeoffs for estimating Pauli observables

    Authors: Sitan Chen, Weiyuan Gong, Qi Ye

    Abstract: We revisit the problem of Pauli shadow tomography: given copies of an unknown $n$-qubit quantum state $ρ$, estimate $\text{tr}(Pρ)$ for some set of Pauli operators $P$ to within additive error $ε$. This has been a popular testbed for exploring the advantage of protocols with quantum memory over those without: with enough memory to measure two copies at a time, one can use Bell sampling to estimate… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 59 pages, 1 figure

  17. arXiv:2404.18893  [pdf, other

    cs.DS cs.LG stat.ML

    Learning general Gaussian mixtures with efficient score matching

    Authors: Sitan Chen, Vasilis Kontonis, Kulin Shah

    Abstract: We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions. We make no separation assumptions on the underlying mixture components: we only require that the covariance matrices have bounded condition number and that the means and covariances lie in a ball of bounded radius. We give an algorithm that draws $d^{\mathrm{poly}(k/\varepsilon)}$ samples from the target mixture, runs in… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 57 pages

  18. arXiv:2404.18532  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    MileBench: Benchmarking MLLMs in Long Context

    Authors: Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

    Abstract: Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 29 pages, 13 figures, 14 tables

  19. Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?

    Authors: Kaixuan Li, Yue Xue, Sen Chen, Han Liu, Kairan Sun, Ming Hu, Haijun Wang, Yang Liu, Yixiang Chen

    Abstract: In recent years, the importance of smart contract security has been heightened by the increasing number of attacks against them. To address this issue, a multitude of static application security testing (SAST) tools have been proposed for detecting vulnerabilities in smart contracts. However, objectively comparing these tools to determine their effectiveness remains challenging. Existing studies o… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: to appear at FSE 2024

  20. arXiv:2404.18136  [pdf, other

    cs.CV cs.MM

    SafePaint: Anti-forensic Image Inpainting with Domain Adaptation

    Authors: Dunyun Chen, Xin Liao, Xiaoshuai Wu, Shiwei Chen

    Abstract: Existing image inpainting methods have achieved remarkable accomplishments in generating visually appealing results, often accompanied by a trend toward creating more intricate structural textures. However, while these models excel at creating more realistic image content, they often leave noticeable traces of tampering, posing a significant threat to security. In this work, we take the anti-foren… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  21. arXiv:2404.17830  [pdf, other

    cs.LG cs.CV

    Dynamic Against Dynamic: An Open-set Self-learning Framework

    Authors: Haifeng Yang, Chuanxing Geng, Pong C. Yuen, Songcan Chen

    Abstract: In open-set recognition, existing methods generally learn statically fixed decision boundaries using known classes to reject unknown classes. Though they have achieved promising results, such decision boundaries are evidently insufficient for universal unknown classes in dynamic and open scenarios as they can potentially appear at any position in the feature space. Moreover, these methods just sim… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work. Accepted at IJCAI2024

  22. arXiv:2404.17302  [pdf, other

    cs.RO cs.AI cs.CV

    Part-Guided 3D RL for Sim2Real Articulated Object Manipulation

    Authors: Pengwei Xie, Rui Chen, Siang Chen, Yuzhe Qin, Fanbo Xiang, Tianyu Sun, Jing Xu, Guijin Wang, Hao Su

    Abstract: Manipulating unseen articulated objects through visual feedback is a critical but challenging task for real robots. Existing learning-based solutions mainly focus on visual affordance learning or other pre-trained visual models to guide manipulation policies, which face challenges for novel instances in real-world scenarios. In this paper, we propose a novel part-guided 3D RL framework, which can… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 9 pages

  23. arXiv:2404.17113  [pdf, other

    cs.LG cs.HC

    MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition

    Authors: Zheng Lian, Haiyang Sun, Licai Sun, Zhuofan Wen, Siyuan Zhang, Shun Chen, Hao Gu, Jinming Zhao, Ziyang Ma, Xie Chen, Jiangyan Yi, Rui Liu, Kele Xu, Bin Liu, Erik Cambria, Guoying Zhao, Björn W. Schuller, Jianhua Tao

    Abstract: Multimodal emotion recognition is an important research topic in artificial intelligence. Over the past few decades, researchers have made remarkable progress by increasing dataset size and building more effective architectures. However, due to various reasons (such as complex environments and inaccurate labels), current systems still cannot meet the demands of practical applications. Therefore, w… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  24. arXiv:2404.16914  [pdf, other

    cs.LG cs.AI cs.CL

    Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing

    Authors: Peizhuang Cong, Aomufei Yuan, Shimao Chen, Yuxuan Tian, Bowen Ye, Tong Yang

    Abstract: MoE facilitates the development of large models by making the computational complexity of the model no longer scale linearly with increasing parameters. The learning sparse gating network selects a set of experts for each token to be processed; however, this may lead to differences in the number of tokens processed by each expert over several successive iterations, i.e., the expert load fluctuatio… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  25. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  26. arXiv:2404.16232  [pdf, other

    cs.CR cs.DC

    SECO: Secure Inference With Model Splitting Across Multi-Server Hierarchy

    Authors: Shuangyi Chen, Ashish Khisti

    Abstract: In the context of prediction-as-a-service, concerns about the privacy of the data and the model have been brought up and tackled via secure inference protocols. These protocols are built up by using single or multiple cryptographic tools designed under a variety of different security assumptions. In this paper, we introduce SECO, a secure inference protocol that enables a user holding an input d… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  27. arXiv:2404.16204  [pdf, other

    quant-ph cs.NI

    Entanglement-Based Artificial Topology: Neighboring Remote Network Nodes

    Authors: Si-Yi Chen, Jessica Illiano, Angela Sara Cacciapuoti, Marcello Caleffi

    Abstract: Entanglement is unanimously recognized as the key communication resource of the Quantum Internet. Yet, the possibility of implementing novel network functionalities by exploiting the marvels of entanglement has been poorly investigated so far, by mainly restricting the attention to bipartite entanglement. Conversely, in this paper, we aim at exploiting multipartite entanglement as inter-network re… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  28. arXiv:2404.15709  [pdf, other

    cs.CV cs.LG cs.RO

    ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos

    Authors: Zerui Chen, Shizhe Chen, Cordelia Schmid, Ivan Laptev

    Abstract: In this work, we aim to learn a unified vision-based policy for a multi-fingered robot hand to manipulate different objects in diverse poses. Though prior work has demonstrated that human videos can benefit policy learning, performance improvement has been limited by physically implausible trajectories extracted from videos. Moreover, reliance on privileged object information such as ground-truth… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Project Page: https://zerchen.github.io/projects/vividex.html

  29. arXiv:2404.15661  [pdf, other

    cs.GR cs.CG cs.CV

    CWF: Consolidating Weak Features in High-quality Mesh Simplification

    Authors: Rui Xu, Longdu Liu, Ningna Wang, Shuangmin Chen, Shiqing Xin, Xiaohu Guo, Zichun Zhong, Taku Komura, Wenping Wang, Changhe Tu

    Abstract: In mesh simplification, common requirements like accuracy, triangle quality, and feature alignment are often considered as a trade-off. Existing algorithms concentrate on just one or a few specific aspects of these requirements. For example, the well-known Quadric Error Metrics (QEM) approach prioritizes accuracy and can preserve strong feature lines/points as well but falls short in ensuring high… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 22 figures

  30. arXiv:2404.15588  [pdf, other

    cs.CL

    Minimal Evidence Group Identification for Claim Verification

    Authors: Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang, Fan Zhang

    Abstract: Claim verification in real-world settings (e.g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim. The problem becomes particularly challenging when there exists distinct sets of evidence that could be used to verify the claim from different p… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  31. arXiv:2404.15157  [pdf, other

    cs.CL cs.AI

    FASTTRACK: Fast and Accurate Fact Tracing for LLMs

    Authors: Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia

    Abstract: Fact tracing seeks to identify specific training examples that serve as the knowledge source for a given query. Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space. However, these methods fall short of effectively distinguishing between samples that are me… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  32. arXiv:2404.14808  [pdf, other

    cs.CV

    Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning

    Authors: Wenjin Hou, Shiming Chen, Shuhuang Chen, Ziming Hong, Yan Wang, Xuetao Feng, Salman Khan, Fahad Shahbaz Khan, Xinge You

    Abstract: Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor gene… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  33. arXiv:2404.14705  [pdf, other

    cs.CV

    Think-Program-reCtify: 3D Situated Reasoning with Large Language Models

    Authors: Qingrong He, Kejun Lin, Shizhe Chen, Anwen Hu, Qin Jin

    Abstract: This work addresses the 3D situated reasoning task which aims to answer questions given egocentric observations in a 3D environment. The task remains challenging as it requires comprehensive 3D perception and complex reasoning skills. End-to-end models trained on supervised data for 3D situated reasoning suffer from data scarcity and generalization ability. Inspired by the recent success of levera… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  34. arXiv:2404.14397  [pdf, other

    cs.CL cs.CY cs.LG

    RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

    Authors: Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao , et al. (8 additional authors not shown)

    Abstract: Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transc… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Work in progress

  35. arXiv:2404.14351  [pdf, other

    cs.CV

    Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer

    Authors: Eric Brachmann, Jamie Wynn, Shuai Chen, Tommaso Cavallari, Áron Monszpart, Daniyar Turmukhambetov, Victor Adrian Prisacariu

    Abstract: We address the task of estimating camera parameters from a set of images depicting a scene. Popular feature-based structure-from-motion (SfM) tools solve this task by incremental reconstruction: they repeat triangulation of sparse 3D points and registration of more camera views to the sparse point cloud. We re-interpret incremental structure-from-motion as an iterated application and refinement of… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project page: https://nianticlabs.github.io/acezero/

  36. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  37. arXiv:2404.14238  [pdf, other

    cs.NI cs.AI

    Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories

    Authors: Ning Yang, Shuo Chen, Haijun Zhang, Randall Berry

    Abstract: Mobile Edge Computing (MEC) broadens the scope of computation and storage beyond the central network, incorporating edge nodes close to end devices. This expansion facilitates the implementation of large-scale "connected things" within edge networks. The advent of applications necessitating real-time, high-quality service presents several challenges, such as low latency, high data rate, reliabilit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by IEEE Communications Surveys and Tutorials (COMST)

  38. arXiv:2404.13603  [pdf, other

    cs.IT eess.SP

    Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

    Authors: Bin Li, Ziping Wei, Shaoshi Yang, Yang Zhang, Jun Zhang, Chenglin Zhao, Sheng Chen

    Abstract: To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 12 figures, accepted to appear on IEEE Transactions on Communications, Apr. 2024

  39. arXiv:2404.13532  [pdf, other

    cs.RO

    SpringGrasp: Synthesizing Compliant, Dexterous Grasps under Shape Uncertainty

    Authors: Sirui Chen, Jeannette Bohg, C. Karen Liu

    Abstract: Generating stable and robust grasps on arbitrary objects is critical for dexterous robotic hands, marking a significant step towards advanced dexterous manipulation. Previous studies have mostly focused on improving differentiable grasping metrics with the assumption of precisely known object geometry. However, shape uncertainty is ubiquitous due to noisy and partial shape observations, which intr… ▽ More

    Submitted 25 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  40. arXiv:2404.13420  [pdf, other

    cs.CV

    NeurCADRecon: Neural Representation for Reconstructing CAD Surfaces by Enforcing Zero Gaussian Curvature

    Authors: Qiujie Dong, Rui Xu, Pengfei Wang, Shuangmin Chen, Shiqing Xin, Xiaohong Jia, Wenping Wang, Changhe Tu

    Abstract: Despite recent advances in reconstructing an organic model with the neural signed distance function (SDF), the high-fidelity reconstruction of a CAD model directly from low-quality unoriented point clouds remains a significant challenge. In this paper, we address this challenge based on the prior observation that the surface of a CAD model is generally composed of piecewise surface patches, each a… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ACM Transactions on Graphics (SIGGRAPH 2024)

  41. arXiv:2404.12916  [pdf, other

    cs.CR

    Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models

    Authors: Zhenyang Ni, Rui Ye, Yuxi Wei, Zhen Xiang, Yanfeng Wang, Siheng Chen

    Abstract: Vision-Large-Language-models(VLMs) have great application prospects in autonomous driving. Despite the ability of VLMs to comprehend and make decisions in complex scenarios, their integration into safety-critical autonomous driving systems poses serious security risks. In this paper, we propose BadVLMDriver, the first backdoor attack against VLMs for autonomous driving that can be launched in prac… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  42. arXiv:2404.12608  [pdf, other

    cs.DB cs.CL cs.PL

    Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

    Authors: Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

    Abstract: Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: full version of a paper to appear in SIGMOD 2024

  43. arXiv:2404.12104  [pdf, other

    cs.CV cs.CL cs.LG

    Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

    Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

    Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 42 pages, 17 figures, 29 tables

  44. SIGformer: Sign-aware Graph Transformer for Recommendation

    Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

    Abstract: In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  45. arXiv:2404.11129  [pdf, other

    cs.CV

    Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

    Authors: Minghe Gao, Shuang Chen, Liang Pang, Yuan Yao, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

    Abstract: The remarkable performance of Multimodal Large Language Models (MLLMs) has unequivocally demonstrated their proficient understanding capabilities in handling a wide array of visual tasks. Nevertheless, the opaque nature of their black-box reasoning processes persists as an enigma, rendering them uninterpretable and struggling with hallucination. Their ability to execute intricate compositional rea… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  46. arXiv:2404.11120  [pdf, other

    cs.CV

    TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing

    Authors: Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch, David Asulin, Aviad Moreshet, Kuo-Chin Lien, Misha Sra, Pradeep Sen

    Abstract: Despite many attempts to leverage pre-trained text-to-image models (T2I) like Stable Diffusion (SD) for controllable image editing, producing good predictable results remains a challenge. Previous approaches have focused on either fine-tuning pre-trained T2I models on specific datasets to generate certain kinds of images (e.g., with a specific object or person), or on optimizing the weights, text… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  47. arXiv:2404.10775  [pdf, other

    cs.CV cs.AI cs.MA

    COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

    Authors: Hongxin Zhang, Zeyuan Wang, Qiushi Lyu, Zheyuan Zhang, Sunli Chen, Tianmin Shu, Yilun Du, Chuang Gan

    Abstract: In this paper, we investigate the problem of embodied multi-agent cooperation, where decentralized agents must cooperate given only partial egocentric views of the world. To effectively plan in this setting, in contrast to learning world dynamics in a single-agent scenario, we must simulate world dynamics conditioned on an arbitrary number of agents' actions given only partial egocentric visual ob… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 23 pages. The first three authors contributed equally

  48. arXiv:2404.10315  [pdf, other

    cs.CL

    Enhancing Confidence Expression in Large Language Models Through Learning from Past Experience

    Authors: Haixia Han, Tingyun Li, Shisong Chen, Jie Shi, Chengyu Du, Yanghua Xiao, Jiaqing Liang, Xin Lin

    Abstract: Large Language Models (LLMs) have exhibited remarkable performance across various downstream tasks, but they may generate inaccurate or false information with a confident tone. One of the possible solutions is to empower the LLM confidence expression capability, in which the confidence expressed can be well-aligned with the true probability of the generated answer being correct. However, leveragin… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  49. arXiv:2404.09981  [pdf, other

    cs.DM

    Robot Positioning Using Torus Packing for Multisets

    Authors: Chung Shue Chen, Peter Keevash, Sean Kennedy, Élie de Panafieu, Adrian Vetta

    Abstract: We consider the design of a positioning system where a robot determines its position from local observations. This is a well-studied problem of considerable practical importance and mathematical interest. The dominant paradigm derives from the classical theory of de Bruijn sequences, where the robot has access to a window within a larger code and can determine its position if these windows are dis… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 22 pages, accepted at ICALP 2024

    ACM Class: G.2.1

  50. arXiv:2404.09911  [pdf, other

    cs.CL

    ChatShop: Interactive Information Seeking with Language Agents

    Authors: Sanxing Chen, Sam Wiseman, Bhuwan Dhingra

    Abstract: The desire and ability to seek new information strategically are fundamental to human learning but often overlooked in current language agent development. Using a web shopping task as an example, we show that it can be reformulated and solved as a retrieval task without a requirement of interactive information seeking. We then redesign the task to introduce a new role of shopper, serving as a real… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.