Skip to main content

Showing 1–50 of 1,759 results for author: Zhu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05942  [pdf, other

    cs.DS

    Improved Evolutionary Algorithms for Submodular Maximization with Cost Constraints

    Authors: Yanhui Zhu, Samik Basu, A Pavan

    Abstract: We present an evolutionary algorithm evo-SMC for the problem of Submodular Maximization under Cost constraints (SMC). Our algorithm achieves $1/2$-approximation with a high probability $1-1/n$ within $\mathcal{O}(n^2K_β)$ iterations, where $K_β$ denotes the maximum size of a feasible solution set with cost constraint $β$. To the best of our knowledge, this is the best approximation guarantee offer… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024

  2. arXiv:2405.05254  [pdf, other

    cs.CL

    You Only Cache Once: Decoder-Decoder Architectures for Language Models

    Authors: Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

    Abstract: We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once. It consists of two components, i.e., a cross-decoder stacked upon a self-decoder. The self-decoder efficiently encodes global key-value (KV) caches that are reused by the cross-decoder via cross-attention. The overall model behaves like a decoder-only Transformer, although YOCO onl… ▽ More

    Submitted 9 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  4. arXiv:2405.04536  [pdf, other

    cs.CV cs.AI cs.LG

    When Training-Free NAS Meets Vision Transformer: A Neural Tangent Kernel Perspective

    Authors: Qiqi Zhou, Yichen Zhu

    Abstract: This paper investigates the Neural Tangent Kernel (NTK) to search vision transformers without training. In contrast with the previous observation that NTK-based metrics can effectively predict CNNs performance at initialization, we empirically show their inefficacy in the ViT search space. We hypothesize that the fundamental feature learning preference within ViT contributes to the ineffectiveness… ▽ More

    Submitted 15 March, 2024; originally announced May 2024.

    Comments: ICASSP2024 oral

  5. arXiv:2405.03712  [pdf, other

    cs.LG cs.AI cs.CR cs.NE

    Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

    Authors: Xiaoyan Su, Yinghao Zhu, Run Li

    Abstract: In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first met… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  6. arXiv:2405.03697  [pdf, other

    cs.HC

    GeoViz: A Multi-View Visualization Platform for Spatio-temporal Knowledge Graph

    Authors: Jianping Zhou, Junhao Li, Guanjie Zheng, Yunqiang Zhu, Xinbing Wang, Chenghu Zhou

    Abstract: In this paper, we propose a multi-view visualization technology for spatio-temporal knowledge graph(STKG), which utilizes three distinct perspectives: knowledge tree, knowledge net, and knowledge map, to facilitate a comprehensive analysis of the STKG. The knowledge tree enables the visualization of hierarchical interrelation within the STKG, while the knowledge net elucidates semantic relationshi… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

    Comments: 4 pages, 2 figures

  7. Towards Building Autonomous Data Services on Azure

    Authors: Yiwen Zhu, Yuanyuan Tian, Joyce Cahoon, Subru Krishnan, Ankita Agarwal, Rana Alotaibi, Jesús Camacho-Rodríguez, Bibin Chundatt, Andrew Chung, Niharika Dutta, Andrew Fogarty, Anja Gruenheid, Brandon Haynes, Matteo Interlandi, Minu Iyer, Nick Jurgens, Sumeet Khushalani, Brian Kroth, Manoj Kumar, Jyoti Leeka, Sergiy Matusevych, Minni Mittal, Andreas Mueller, Kartheek Muthyala, Harsha Nagulapalli , et al. (13 additional authors not shown)

    Abstract: Modern cloud has turned data services into easily accessible commodities. With just a few clicks, users are now able to access a catalog of data processing systems for a wide range of tasks. However, the cloud brings in both complexity and opportunity. While cloud users can quickly start an application by using various data services, it can be difficult to configure and optimize these services to… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: SIGMOD Companion of the 2023 International Conference on Management of Data. 2023

  8. arXiv:2405.01701  [pdf

    cs.CV

    Active Learning Enabled Low-cost Cell Image Segmentation Using Bounding Box Annotation

    Authors: Yu Zhu, Qiang Yang, Li Xu

    Abstract: Cell image segmentation is usually implemented using fully supervised deep learning methods, which heavily rely on extensive annotated training data. Yet, due to the complexity of cell morphology and the requirement for specialized knowledge, pixel-level annotation of cell images has become a highly labor-intensive task. To address the above problems, we propose an active learning framework for ce… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  9. arXiv:2404.19585  [pdf, other

    cs.RO

    Integrating Visuo-tactile Sensing with Haptic Feedback for Teleoperated Robot Manipulation

    Authors: Noah Becker, Erik Gattung, Kay Hansel, Tim Schneider, Yaonan Zhu, Yasuhisa Hasegawa, Jan Peters

    Abstract: Telerobotics enables humans to overcome spatial constraints and allows them to physically interact with the environment in remote locations. However, the sensory feedback provided by the system to the operator is often purely visual, limiting the operator's dexterity in manipulation tasks. In this work, we address this issue by equipping the robot's end-effector with high-resolution visuotactile G… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  10. Interest Clock: Time Perception in Real-Time Streaming Recommendation System

    Authors: Yongchun Zhu, Jingwu Chen, Ling Chen, Yitan Li, Feng Zhang, Zuotao Liu

    Abstract: User preferences follow a dynamic pattern over a day, e.g., at 8 am, a user might prefer to read news, while at 8 pm, they might prefer to watch movies. Time modeling aims to enable recommendation systems to perceive time changes to capture users' dynamic preferences over time, which is an important and challenging problem in recommendation systems. Especially, streaming recommendation systems in… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  11. arXiv:2404.18948  [pdf, other

    cs.LG

    Sub-Adjacent Transformer: Improving Time Series Anomaly Detection with Reconstruction Error from Sub-Adjacent Neighborhoods

    Authors: Wenzhen Yue, Xianghua Ying, Ruohao Guo, DongDong Chen, Ji Shi, Bowei Xing, Yuqing Zhu, Taiyan Chen

    Abstract: In this paper, we present the Sub-Adjacent Transformer with a novel attention mechanism for unsupervised time series anomaly detection. Unlike previous approaches that rely on all the points within some neighborhood for time point reconstruction, our method restricts the attention to regions not immediately adjacent to the target points, termed sub-adjacent neighborhoods. Our key observation is th… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  12. arXiv:2404.18580  [pdf, other

    cs.RO eess.SY

    Data-Driven Dynamics Modeling of Miniature Robotic Blimps Using Neural ODEs With Parameter Auto-Tuning

    Authors: Yongjian Zhu, Hao Cheng, Feitian Zhang

    Abstract: Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bo… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures

  13. arXiv:2404.18443  [pdf, other

    cs.CL cs.AI cs.IR q-bio.QM

    BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers

    Authors: Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang, Joyce C. Ho, Chao Zhang, Carl Yang

    Abstract: Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by ins… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Work in progress. The model and data will be uploaded to \url{https://github.com/ritaranx/BMRetriever}

  14. arXiv:2404.18319  [pdf, other

    cs.IR

    User Welfare Optimization in Recommender Systems with Competing Content Creators

    Authors: Fan Yao, Yiming Liao, Mingzhe Wu, Chuanhao Li, Yan Zhu, James Yang, Qifan Wang, Haifeng Xu, Hongning Wang

    Abstract: Driven by the new economic opportunities created by the creator economy, an increasing number of content creators rely on and compete for revenue generated from online content recommendation platforms. This burgeoning competition reshapes the dynamics of content distribution and profoundly impacts long-term user welfare on the platform. However, the absence of a comprehensive picture of global use… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  15. arXiv:2404.17955  [pdf, other

    cs.SE

    A Survey of Third-Party Library Security Research in Application Software

    Authors: Jia Zeng, Dan Han, Yaling Zhu, Yangzhong Wang, Fangchen Weng

    Abstract: In the current software development environment, third-party libraries play a crucial role. They provide developers with rich functionality and convenient solutions, speeding up the pace and efficiency of software development. However, with the widespread use of third-party libraries, associated security risks and potential vulnerabilities are increasingly apparent. Malicious attackers can exploit… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 21 pages, 3 figures, one table

  16. arXiv:2404.17521  [pdf, other

    cs.RO cs.CV

    Ag2Manip: Learning Novel Manipulation Skills with Agent-Agnostic Visual and Action Representations

    Authors: Puhao Li, Tengyu Liu, Yuyang Li, Muzhi Han, Haoran Geng, Shu Wang, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: Autonomous robotic systems capable of learning novel manipulation tasks are poised to transform industries from manufacturing to service automation. However, modern methods (e.g., VIP and R3M) still face significant hurdles, notably the domain gap among robotic embodiments and the sparsity of successful task executions within specific action spaces, resulting in misaligned and ambiguous task repre… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Project website and open-source code: https://xiaoyao-li.github.io/research/ag2manip

  17. arXiv:2404.17028  [pdf, ps, other

    cs.HC cs.AI

    Generative AI in Color-Changing Systems: Re-Programmable 3D Object Textures with Material and Design Constraints

    Authors: Yunyi Zhu, Faraz Faruqi, Stefanie Mueller

    Abstract: Advances in Generative AI tools have allowed designers to manipulate existing 3D models using text or image-based prompts, enabling creators to explore different design goals. Photochromic color-changing systems, on the other hand, allow for the reprogramming of surface texture of 3D models, enabling easy customization of physical objects and opening up the possibility of using object surfaces for… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  18. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  19. arXiv:2404.16666  [pdf, other

    cs.CV

    PhyRecon: Physically Plausible Neural Scene Reconstruction

    Authors: Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: While neural implicit representations have gained popularity in multi-view 3D reconstruction, previous work struggles to yield physically plausible results, thereby limiting their applications in physics-demanding domains like embodied AI and robotics. The lack of plausibility originates from both the absence of physics modeling in the existing pipeline and their inability to recover intricate geo… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://phyrecon.github.io/

  20. arXiv:2404.15956  [pdf, other

    cs.CV

    A Survey on Visual Mamba

    Authors: Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye

    Abstract: State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is th… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  21. arXiv:2404.15954  [pdf, other

    cs.IR cs.LG

    Mixed Supervised Graph Contrastive Learning for Recommendation

    Authors: Weizhi Zhang, Liangwei Yang, Zihe Song, Henry Peng Zou, Ke Xu, Yuanjie Zhu, Philip S. Yu

    Abstract: Recommender systems (RecSys) play a vital role in online platforms, offering users personalized suggestions amidst vast information. Graph contrastive learning aims to learn from high-order collaborative filtering signals with unsupervised augmentation on the user-item bipartite graph, which predominantly relies on the multi-task learning framework involving both the pair-wise recommendation loss… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.15733  [pdf, other

    cs.AR

    BlissCam: Boosting Eye Tracking Efficiency with Learned In-Sensor Sparse Sampling

    Authors: Yu Feng, Tianrui Ma, Yuhao Zhu, Xuan Zhang

    Abstract: Eye tracking is becoming an increasingly important task domain in emerging computing platforms such as Augmented/Virtual Reality (AR/VR). Today's eye tracking system suffers from long end-to-end tracking latency and can easily eat up half of the power budget of a mobile VR device. Most existing optimization efforts exclusively focus on the computation pipeline by optimizing the algorithm and/or de… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.15380  [pdf, other

    cs.LG cs.AI

    ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model

    Authors: Yuanshao Zhu, James Jianqiao Yu, Xiangyu Zhao, Qidong Liu, Yongchao Ye, Wei Chen, Zijian Zhang, Xuetao Wei, Yuxuan Liang

    Abstract: Generating trajectory data is among promising solutions to addressing privacy concerns, collection costs, and proprietary restrictions usually associated with human mobility analyses. However, existing trajectory generation methods are still in their infancy due to the inherent diversity and unpredictability of human activities, grappling with issues such as fidelity, flexibility, and generalizabi… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  24. arXiv:2404.14851  [pdf, other

    cs.IR cs.AI cs.CL

    From Matching to Generation: A Survey on Generative Information Retrieval

    Authors: Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yuyao Zhang, Peitian Zhang, Yutao Zhu, Zhicheng Dou

    Abstract: Information Retrieval (IR) systems are crucial tools for users to access information, widely applied in scenarios like search engines, question answering, and recommendation systems. Traditional IR methods, based on similarity matching to return ranked lists of documents, have been reliable means of information acquisition, dominating the IR field for years. With the advancement of pre-trained lan… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  25. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  26. arXiv:2404.14092  [pdf, ps, other

    cs.IT eess.SP

    Multi-agent Reinforcement Learning-based Joint Precoding and Phase Shift Optimization for RIS-aided Cell-Free Massive MIMO Systems

    Authors: Yiyang Zhu, Enyu Shi, Ziheng Liu, Jiayi Zhang, Bo Ai

    Abstract: Cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising technique for achieving high spectral efficiency (SE) using multiple distributed access points (APs). However, harsh propagation environments often lead to significant communication performance degradation due to high penetration loss. To overcome this issue, we introduce the reconfigurable intelligent surface (RIS) into… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  27. arXiv:2404.14073  [pdf, other

    cs.LG cs.AI

    Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning

    Authors: Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang, Zhengyang Zhou, Sijie Ruan, Yuxuan Liang

    Abstract: Trajectory modeling refers to characterizing human movement behavior, serving as a pivotal step in understanding mobility patterns. Nevertheless, existing studies typically ignore the confounding effects of geospatial context, leading to the acquisition of spurious correlations and limited generalization capabilities. To bridge this gap, we initially formulate a Structural Causal Model (SCM) to de… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: The paper has been accepted by IJCAI 2024

  28. arXiv:2404.14061  [pdf, other

    cs.LG cs.AI cs.DB cs.SI

    FedTAD: Topology-aware Data-free Knowledge Distillation for Subgraph Federated Learning

    Authors: Yinlin Zhu, Xunkai Li, Zhengyu Wu, Di Wu, Miao Hu, Rong-Hua Li

    Abstract: Subgraph federated learning (subgraph-FL) is a new distributed paradigm that facilitates the collaborative training of graph neural networks (GNNs) by multi-client subgraphs. Unfortunately, a significant challenge of subgraph-FL arises from subgraph heterogeneity, which stems from node and topology variation, causing the impaired performance of the global GNN. Despite various studies, they have no… ▽ More

    Submitted 25 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  29. arXiv:2404.13515  [pdf, other

    cs.LG cs.AI cs.DC

    FedTrans: Efficient Federated Learning via Multi-Model Transformation

    Authors: Yuxuan Zhu, Jiachen Liu, Mosharaf Chowdhury, Fan Lai

    Abstract: Federated learning (FL) aims to train machine learning (ML) models across potentially millions of edge client devices. Yet, training and customizing models for FL clients is notoriously challenging due to the heterogeneity of client data, device capabilities, and the massive scale of clients, making individualized model exploration prohibitively expensive. State-of-the-art FL solutions personalize… ▽ More

    Submitted 25 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Journal ref: MLSys (2024)

  30. arXiv:2404.12666  [pdf, other

    cs.DC cs.CR cs.ET

    A Survey on Federated Analytics: Taxonomy, Enabling Techniques, Applications and Open Issues

    Authors: Zibo Wang, Haichao Ji, Yifei Zhu, Dan Wang, Zhu Han

    Abstract: The escalating influx of data generated by networked edge devices, coupled with the growing awareness of data privacy, has promoted a transformative shift in computing paradigms from centralized data processing to privacy-preserved distributed data processing. Federated analytics (FA) is an emerging technique to support collaborative data analytics among diverse data owners without centralizing th… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: This survey has been submitted to IEEE Communications Surveys & Tutorials

  31. arXiv:2404.12522  [pdf, other

    cs.LG cs.AI

    Neural Active Learning Beyond Bandits

    Authors: Yikun Ban, Ishika Agarwal, Ziwei Wu, Yada Zhu, Kommy Weldemariam, Hanghang Tong, Jingrui He

    Abstract: We study both stream-based and pool-based active learning with neural network approximations. A recent line of works proposed bandit-based approaches that transformed active learning into a bandit problem, achieving both theoretical and empirical success. However, the performance and computational costs of these methods may be susceptible to the number of classes, denoted as $K$, due to this trans… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Published on ICLR 2024, 40 Pages

  32. arXiv:2404.12312  [pdf, ps, other

    cs.LG math.OC stat.ML

    A Mean-Field Analysis of Neural Gradient Descent-Ascent: Applications to Functional Conditional Moment Equations

    Authors: Yuchen Zhu, Yufeng Zhang, Zhaoran Wang, Zhuoran Yang, Xiaohong Chen

    Abstract: We study minimax optimization problems defined over infinite-dimensional function classes. In particular, we restrict the functions to the class of overparameterized two-layer neural networks and study (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural network. As an initial step, we consider the minimax optimization problem stemming fro… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 72 pages, submitted

  33. arXiv:2404.12000  [pdf, other

    cs.SE

    How far are AI-powered programming assistants from meeting developers' needs?

    Authors: Xin Tan, Xiao Long, Xianjun Ni, Yinghao Zhu, Jing Jiang, Li Zhang

    Abstract: Recent In-IDE AI coding assistant tools (ACATs) like GitHub Copilot have significantly impacted developers' coding habits. While some studies have examined their effectiveness, there lacks in-depth investigation into the actual assistance process. To bridge this gap, we simulate real development scenarios encompassing three typical types of software development tasks and recruit 27 computer scienc… ▽ More

    Submitted 24 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  34. arXiv:2404.11852  [pdf, other

    cs.AR cs.GR

    Cicero: Addressing Algorithmic and Architectural Bottlenecks in Neural Rendering by Radiance Warping and Memory Optimizations

    Authors: Yu Feng, Zihan Liu, Jingwen Leng, Minyi Guo, Yuhao Zhu

    Abstract: Neural Radiance Field (NeRF) is widely seen as an alternative to traditional physically-based rendering. However, NeRF has not yet seen its adoption in resource-limited mobile systems such as Virtual and Augmented Reality (VR/AR), because it is simply extremely slow. On a mobile Volta GPU, even the state-of-the-art NeRF models generally execute only at 0.8 FPS. We show that the main performance bo… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  35. arXiv:2404.11699  [pdf, other

    cs.RO

    Retrieval-Augmented Embodied Agents

    Authors: Yichen Zhu, Zhicai Ou, Xiaofeng Mou, Jian Tang

    Abstract: Embodied agents operating in complex and uncertain environments face considerable challenges. While some advanced agents handle complex manipulation tasks with proficiency, their success often hinges on extensive training data to develop their capabilities. In contrast, humans typically rely on recalling past experiences and analogous situations to solve new problems. Aiming to emulate this human… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  36. arXiv:2404.11500  [pdf, other

    cs.CL cs.AI

    Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models

    Authors: Yue Zhou, Yada Zhu, Diego Antognini, Yoon Kim, Yang Zhang

    Abstract: This paper studies the relationship between the surface form of a mathematical problem and its solvability by large language models. We find that subtle alterations in the surface form can significantly impact the answer distribution and the solve rate, exposing the language model's lack of robustness and sensitivity to the surface form in reasoning through complex problems. To improve mathematica… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to the main conference of NAACL (2024)

  37. arXiv:2404.11206  [pdf, other

    cs.CL

    Prompt-tuning for Clickbait Detection via Text Summarization

    Authors: Haoxiang Deng, Yi Zhu, Ye Wang, Jipeng Qiang, Yunhao Yuan, Yun Li, Runmei Zhang

    Abstract: Clickbaits are surprising social posts or deceptive news headlines that attempt to lure users for more clicks, which have posted at unprecedented rates for more profit or commercial revenue. The spread of clickbait has significant negative impacts on the users, which brings users misleading or even click-jacking attacks. Different from fake news, the crucial problem in clickbait detection is deter… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  38. arXiv:2404.09317  [pdf, other

    cs.AR cs.AI

    Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator

    Authors: Abhishek Tyagi, Reiley Jeyapaul, Chuteng Zhu, Paul Whatmough, Yuhao Zhu

    Abstract: As Neural Processing Units (NPU) or accelerators are increasingly deployed in a variety of applications including safety critical applications such as autonomous vehicle, and medical imaging, it is critical to understand the fault-tolerance nature of the NPUs. We present a reliability study of Arm's Ethos-U55, an important industrial-scale NPU being utilised in embedded and IoT applications. We pe… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  39. arXiv:2404.08563  [pdf, other

    cs.RO

    FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments

    Authors: Hexiang Wei, Jianhao Jiao, Xiangcheng Hu, Jingwen Yu, Xupeng Xie, Jin Wu, Yilong Zhu, Yuxuan Liu, Lujia Wang, Ming Liu

    Abstract: Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM data… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 20 pages, 17 figures, 7 tables. Submitted for IJRR dataset paper

  40. arXiv:2404.07992  [pdf, other

    cs.CV

    GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo

    Authors: Jiang Wu, Rui Li, Haofei Xu, Wenxun Zhao, Yu Zhu, Jinqiu Sun, Yanning Zhang

    Abstract: Matching cost aggregation plays a fundamental role in learning-based multi-view stereo networks. However, directly aggregating adjacent costs can lead to suboptimal results due to local geometric inconsistency. Related methods either seek selective aggregation or improve aggregated depth in the 2D space, both are unable to handle geometric inconsistency in the cost volume effectively. In this pape… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page: https://wuuu3511.github.io/gomvs/ Code: https://github.com/Wuuu3511/GoMVS

  41. arXiv:2404.06336  [pdf, other

    quant-ph cs.LG stat.ML

    Quantum State Generation with Structure-Preserving Diffusion Model

    Authors: Yuchen Zhu, Tianrong Chen, Evangelos A. Theodorou, Xie Chen, Molei Tao

    Abstract: This article considers the generative modeling of the states of quantum systems, and an approach based on denoising diffusion model is proposed. The key contribution is an algorithmic innovation that respects the physical nature of quantum states. More precisely, the commonly used density matrix representation of mixed-state has to be complex-valued Hermitian, positive semi-definite, and trace one… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 6 figures

  42. arXiv:2404.05674  [pdf, other

    cs.CV

    MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

    Authors: Kunpeng Song, Yizhe Zhu, Bingchen Liu, Qing Yan, Ahmed Elgammal, Xiao Yang

    Abstract: In this paper, we present MoMA: an open-vocabulary, training-free personalized image model that boasts flexible zero-shot capabilities. As foundational text-to-image models rapidly evolve, the demand for robust image-to-image translation grows. Addressing this need, MoMA specializes in subject-driven personalized image generation. Utilizing an open-source, Multimodal Large Language Model (MLLM), w… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  43. arXiv:2404.04969  [pdf, other

    cs.LG cs.AI

    Temporal Generalization Estimation in Evolving Graphs

    Authors: Bin Lu, Tingyan Ma, Xiaoying Gan, Xinbing Wang, Yunqiang Zhu, Chenghu Zhou, Shiyu Liang

    Abstract: Graph Neural Networks (GNNs) are widely deployed in vast fields, but they often struggle to maintain accurate representations as graphs evolve. We theoretically establish a lower bound, proving that under mild conditions, representation distortion inevitably occurs over time. To estimate the temporal distortion without human annotation after deployment, one naive approach is to pre-train a recurre… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Published as a conference paper at ICLR 2024

  44. arXiv:2404.04860  [pdf, other

    cs.CV

    ByteEdit: Boost, Comply and Accelerate Generative Image Editing

    Authors: Yuxi Ren, Jie Wu, Yanzuo Lu, Huafeng Kuang, Xin Xia, Xionghui Wang, Qianqian Wang, Yixing Zhu, Pan Xie, Shiyin Wang, Xuefeng Xiao, Yitong Wang, Min Zheng, Lean Fu

    Abstract: Recent advancements in diffusion-based generative image editing have sparked a profound revolution, reshaping the landscape of image outpainting and inpainting tasks. Despite these strides, the field grapples with inherent challenges, including: i) inferior quality; ii) poor consistency; iii) insufficient instrcution adherence; iv) suboptimal generation efficiency. To address these obstacles, we p… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  45. arXiv:2404.04584  [pdf, other

    cs.CV

    D$^3$: Scaling Up Deepfake Detection by Learning from Discrepancy

    Authors: Yongqi Yang, Zhihao Qian, Ye Zhu, Yu Wu

    Abstract: The boom of Generative AI brings opportunities entangled with risks and concerns. In this work, we seek a step toward a universal deepfake detection system with better generalization and robustness, to accommodate the responsible deployment of diverse image generative models. We do so by first scaling up the existing detection task setup from the one-generator to multiple-generators in training, d… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 14 pages, 3 figures

  46. arXiv:2404.04579  [pdf, other

    cs.HC

    TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

    Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

    Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 33 pages, 12 figures

    MSC Class: H.5.2

    Journal ref: IMUWT 2024

  47. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-grasping techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes. Addressing this, we introduce PreAfford, a novel pre-grasping planning f… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  48. Towards Pareto Optimal Throughput in Small Language Model Serving

    Authors: Pol G. Recasens, Yue Zhu, Chen Wang, Eun Kyung Lee, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Ll. Berral

    Abstract: Large language models (LLMs) have revolutionized the state-of-the-art of many different natural language processing tasks. Although serving LLMs is computationally and memory demanding, the rise of Small Language Models (SLMs) offers new opportunities for resource-constrained users, who now are able to serve small models with cutting-edge performance. In this paper, we present a set of experiments… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: It is going to be published at EuroMLSys'24

  49. arXiv:2404.03254  [pdf, ps, other

    cs.DC

    Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

    Authors: Chen Li, Ye Zhu, Yang Cao, Jinli Zhang, Annisa Annisa, Debo Cheng, Yasuhiko Morimoto

    Abstract: The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform,… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  50. arXiv:2404.02159  [pdf, other

    cs.IT eess.SP

    Fairness-aware Age-of-Information Minimization in WPT-Assisted Short-Packet THz Communications for mURLLC

    Authors: Yao Zhu, Xiaopeng Yuan, Yulin Hu, Bo Ai, Ruikang Wang, Bin Han, Anke Schmeink

    Abstract: The technological landscape is swiftly advancing towards large-scale systems, creating significant opportunities, particularly in the domain of Terahertz (THz) communications. Networks designed for massive connectivity, comprising numerous Internet of Things (IoT) devices, are at the forefront of this advancement. In this paper, we consider Wireless Power Transfer (WPT)-enabled networks that suppo… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.