Skip to main content

Showing 1–50 of 492 results for author: Gao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02982  [pdf, other

    cs.CV

    Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

    Authors: Xin Jin, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

    Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  2. arXiv:2405.02942  [pdf, other

    physics.optics cs.CV cs.RO eess.IV

    Design, analysis, and manufacturing of a glass-plastic hybrid minimalist aspheric panoramic annular lens

    Authors: Shaohua Gao, Qi Jiang, Yiqi Liao, Yi Qiu, Wanglei Ying, Kailun Yang, Kaiwei Wang, Benhao Zhang, Jian Bai

    Abstract: We propose a high-performance glass-plastic hybrid minimalist aspheric panoramic annular lens (ASPAL) to solve several major limitations of the traditional panoramic annular lens (PAL), such as large size, high weight, and complex system. The field of view (FoV) of the ASPAL is 360°x(35°~110°) and the imaging quality is close to the diffraction limit. This large FoV ASPAL is composed of only 4 len… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to Optics & Laser Technology

  3. arXiv:2405.02561  [pdf, other

    cs.LG math.NA

    Understanding the Difficulty of Solving Cauchy Problems with PINNs

    Authors: Tao Wang, Bo Zhao, Sicun Gao, Rose Yu

    Abstract: Physics-Informed Neural Networks (PINNs) have gained popularity in scientific computing in recent years. However, they often fail to achieve the same level of accuracy as classical methods in solving differential equations. In this paper, we identify two sources of this issue in the case of Cauchy problems: the use of $L^2$ residuals as objective functions and the approximation gap of neural netwo… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 13 pages and 18 figures

  4. arXiv:2404.19326  [pdf, other

    cs.CV

    LVOS: A Benchmark for Large-scale Long-term Video Object Segmentation

    Authors: Lingyi Hong, Zhongying Liu, Wenchao Chen, Chenzhi Tan, Yuang Feng, Xinyu Zhou, Pinxue Guo, Jinglun Li, Zhaoyu Chen, Shuyong Gao, Wei Zhang, Wenqiang Zhang

    Abstract: Video object segmentation (VOS) aims to distinguish and track target objects in a video. Despite the excellent performance achieved by off-the-shell VOS models, existing VOS benchmarks mainly focus on short-term videos lasting about 5 seconds, where objects remain visible most of the time. However, these benchmarks poorly represent practical applications, and the absence of long-term datasets rest… ▽ More

    Submitted 30 April, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: LVOS V2

  5. arXiv:2404.19232  [pdf, other

    cs.CL cs.AI

    GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model

    Authors: Xinzhe Li, Ming Liu, Shang Gao

    Abstract: Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem… ▽ More

    Submitted 2 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.19201  [pdf, other

    eess.IV cs.CV cs.RO physics.optics

    Global Search Optics: Automatically Exploring Optimal Solutions to Compact Computational Imaging Systems

    Authors: Yao Gao, Qi Jiang, Shaohua Gao, Lei Sun, Kailun Yang, Kaiwei Wang

    Abstract: The popularity of mobile vision creates a demand for advanced compact computational imaging systems, which call for the development of both a lightweight optical system and an effective image reconstruction model. Recently, joint design pipelines come to the research forefront, where the two significant components are simultaneously optimized via data-driven learning to realize the optimal system… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: The source code will be made publicly available at https://github.com/wumengshenyou/GSO

  7. arXiv:2404.18114  [pdf, other

    cs.CV cs.MM

    Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

    Authors: Haiwen Diao, Ying Zhang, Shang Gao, Xiang Ruan, Huchuan Lu

    Abstract: Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or exploiting cross-modal correspondence for more accurate retrieval, in this paper we aim to leverage the knowledge transfer between peer branches in a b… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 12 pages, 9 figures, Accepted by TIP2024

  8. arXiv:2404.17381  [pdf, other

    cs.CV

    Frequency-Guided Multi-Level Human Action Anomaly Detection with Normalizing Flows

    Authors: Shun Maeda, Chunzhi Gu, Jun Yu, Shogo Tokai, Shangce Gao, Chao Zhang

    Abstract: We introduce the task of human action anomaly detection (HAAD), which aims to identify anomalous motions in an unsupervised manner given only the pre-determined normal category of training action samples. Compared to prior human-related anomaly detection tasks which primarily focus on unusual events from videos, HAAD involves the learning of specific action labels to recognize semantically anomalo… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  9. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  10. arXiv:2404.16280  [pdf, ps, other

    cs.NE cs.AI cs.LG

    An Efficient Reconstructed Differential Evolution Variant by Some of the Current State-of-the-art Strategies for Solving Single Objective Bound Constrained Problems

    Authors: Sichen Tao, Ruihan Zhao, Kaiyu Wang, Shangce Gao

    Abstract: Complex single-objective bounded problems are often difficult to solve. In evolutionary computation methods, since the proposal of differential evolution algorithm in 1997, it has been widely studied and developed due to its simplicity and efficiency. These developments include various adaptive strategies, operator improvements, and the introduction of other search methods. After 2014, research ba… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  11. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using metastable 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  12. arXiv:2404.16127  [pdf, other

    cs.LG stat.ML

    Comparison of static and dynamic random forests models for EHR data in the presence of competing risks: predicting central line-associated bloodstream infection

    Authors: Elena Albu, Shan Gao, Pieter Stijnen, Frank Rademakers, Christel Janssens, Veerle Cossey, Yves Debaveye, Laure Wynants, Ben Van Calster

    Abstract: Prognostic outcomes related to hospital admissions typically do not suffer from censoring, and can be modeled either categorically or as time-to-event. Competing events are common but often ignored. We compared the performance of random forest (RF) models to predict the risk of central line-associated bloodstream infections (CLABSI) using different outcome operationalizations. We included data fro… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.13388  [pdf

    eess.IV cs.CV cs.LG

    Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning

    Authors: Yong Liu, Mengtian Kang, Shuo Gao, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Arokia Nathan, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Luigi Occhipinti

    Abstract: Fundus diseases are major causes of visual impairment and blindness worldwide, especially in underdeveloped regions, where the shortage of ophthalmologists hinders timely diagnosis. AI-assisted fundus image analysis has several advantages, such as high accuracy, reduced workload, and improved accessibility, but it requires a large amount of expert-annotated data to build reliable models. To addres… ▽ More

    Submitted 23 April, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

  14. arXiv:2404.13386  [pdf

    eess.IV cs.CV cs.LG

    SSVT: Self-Supervised Vision Transformer For Eye Disease Diagnosis Based On Fundus Images

    Authors: Jiaqi Wang, Mengtian Kang, Yong Liu, Chi Zhang, Ying Liu, Shiming Li, Yue Qi, Wenjun Xu, Chenyu Tang, Edoardo Occhipinti, Mayinuer Yusufu, Ningli Wang, Weiling Bai, Shuo Gao, Luigi G. Occhipinti

    Abstract: Machine learning-based fundus image diagnosis technologies trigger worldwide interest owing to their benefits such as reducing medical resource power and providing objective evaluation results. However, current methods are commonly based on supervised methods, bringing in a heavy workload to biomedical staff and hence suffering in expanding effective databases. To address this issue, in this artic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: ISBI 2024

  15. arXiv:2404.13262  [pdf, other

    cs.DC

    An Accurate Beam-Tracking Algorithm with Adaptive Beam Reconstruction via UAV-BSs for Mobile Users

    Authors: Jing Zhang, Sheng Gao, Xin Feng, Hongwei Yang, Geng Sun

    Abstract: Unmanned aerial vehicles (UAVs) with flexible deployment contribute to enlarging the distance of information transmission to mobile users (MUs) in constrained environment. However, due to the high mobility of both UAVs and MUs, it is challenging to establish an accurate beam towards the target MU with high beam gain in real-time. In this study, UAV base stations (UAV-BSs) consisting of position-kn… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  16. arXiv:2404.08893  [pdf, other

    cs.LG math.DS q-bio.PE stat.AP

    Early detection of disease outbreaks and non-outbreaks using incidence data

    Authors: Shan Gao, Amit K. Chakraborty, Russell Greiner, Mark A. Lewis, Hao Wang

    Abstract: Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  17. arXiv:2404.07191  [pdf, other

    cs.CV

    InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

    Authors: Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, Ying Shan

    Abstract: We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability. By synergizing the strengths of an off-the-shelf multiview diffusion model and a sparse-view reconstruction model based on the LRM architecture, InstantMesh is able to create diverse 3D assets within 10 seconds. To… ▽ More

    Submitted 14 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Technical report. Project: https://github.com/TencentARC/InstantMesh

  18. arXiv:2404.06311  [pdf, other

    cs.IR

    DRE: Generating Recommendation Explanations by Aligning Large Language Models at Data-level

    Authors: Shen Gao, Yifan Wang, Jiabao Fang, Lisi Chen, Peng Han, Shuo Shang

    Abstract: Recommendation systems play a crucial role in various domains, suggesting items based on user behavior.However, the lack of transparency in presenting recommendations can lead to user confusion. In this paper, we introduce Data-level Recommendation Explanation (DRE), a non-intrusive explanation framework for black-box recommendation models.Different from existing methods, DRE does not require any… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages, 2 figures

  19. arXiv:2404.05569  [pdf, other

    cs.AI cs.CL cs.MA

    360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

    Authors: Shen Gao, Hao Li, Zhengliang Shi, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang

    Abstract: Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  20. arXiv:2404.04272  [pdf, other

    cs.IR cs.CL

    Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations

    Authors: Xiaoqing Zhang, Xiuying Chen, Shen Gao, Shuqi Li, Xin Gao, Ji-Rong Wen, Rui Yan

    Abstract: Information-seeking dialogue systems are widely used in e-commerce systems, with answers that must be tailored to fit the specific settings of the online system. Given the user query, the information-seeking dialogue systems first retrieve a subset of response candidates, then further select the best response from the candidate set through re-ranking. Current methods mainly retrieve response candi… ▽ More

    Submitted 22 March, 2024; originally announced April 2024.

  21. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  22. arXiv:2404.02831  [pdf, other

    cs.AI

    Empowering Biomedical Discovery with AI Agents

    Authors: Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, Marinka Zitnik

    Abstract: We envision 'AI scientists' as systems capable of skeptical learning and reasoning that empower biomedical research through collaborative agents that integrate machine learning tools with experimental platforms. Rather than taking humans out of the discovery process, biomedical AI agents combine human creativity and expertise with AI's ability to analyze large datasets, navigate hypothesis spaces,… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  23. arXiv:2404.01598  [pdf, other

    cs.LG cs.AI cs.RO

    Extremum-Seeking Action Selection for Accelerating Policy Optimization

    Authors: Ya-Chien Chang, Sicun Gao

    Abstract: Reinforcement learning for control over continuous spaces typically uses high-entropy stochastic policies, such as Gaussian distributions, for local exploration and estimating policy gradient to optimize performance. Many robotic control problems deal with complex unstable dynamics, where applying actions that are off the feasible control manifolds can quickly lead to undesirable divergence. In su… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  24. arXiv:2404.01204  [pdf, other

    cs.CL

    The Fine Line: Navigating Large Language Model Pretraining with Down-streaming Capability Analysis

    Authors: Chen Yang, Junzhuo Li, Xinyao Niu, Xinrun Du, Songyang Gao, Haoran Zhang, Zhaoliang Chen, Xingwei Qu, Ruibin Yuan, Yizhi Li, Jiaheng Liu, Stephen W. Huang, Shawn Yue, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: Uncovering early-stage metrics that reflect final model performance is one core principle for large-scale pretraining. The existing scaling law demonstrates the power-law correlation between pretraining loss and training flops, which serves as an important indicator of the current training state for large language models. However, this principle only focuses on the model's compression properties o… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  25. arXiv:2404.01184  [pdf, other

    cs.RO cs.LG

    Efficient Motion Planning for Manipulators with Control Barrier Function-Induced Neural Controller

    Authors: Mingxin Yu, Chenning Yu, M-Mahdi Naddaf-Sh, Devesh Upadhyay, Sicun Gao, Chuchu Fan

    Abstract: Sampling-based motion planning methods for manipulators in crowded environments often suffer from expensive collision checking and high sampling complexity, which make them difficult to use in real time. To address this issue, we propose a new generalizable control barrier function (CBF)-based steering controller to reduce the number of samples needed in a sampling-based motion planner RRT. Our me… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA2024)

  26. arXiv:2404.00443  [pdf, ps, other

    cs.RO

    UDE-based Dynamic Motion Force Control of Mobile Manipulators

    Authors: Songqun Gao, Wendi Ding, Qinyuan Ren, Ben M. Chen

    Abstract: Mobile manipulators are known for their superior mobility over manipulators on fixed bases, offering promising applications in smart industry and housekeeping scenarios. However, the dynamic coupling nature between the mobile base and the manipulator presents challenges for the physical interactive tasks of the mobile manipulator. Current methods suffer from complex modeling processes and poor tra… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  27. arXiv:2403.19490  [pdf, other

    cs.CV

    Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

    Authors: Alireza Ganjdanesh, Shangqian Gao, Heng Huang

    Abstract: Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

  28. arXiv:2403.19425  [pdf, ps, other

    eess.IV cs.CV

    A Robust Ensemble Algorithm for Ischemic Stroke Lesion Segmentation: Generalizability and Clinical Utility Beyond the ISLES Challenge

    Authors: Ezequiel de la Rosa, Mauricio Reyes, Sook-Lei Liew, Alexandre Hutton, Roland Wiest, Johannes Kaesmacher, Uta Hanning, Arsany Hakim, Richard Zubal, Waldo Valenzuela, David Robben, Diana M. Sima, Vincenzo Anania, Arne Brys, James A. Meakin, Anne Mickan, Gabriel Broocks, Christian Heitkamp, Shengbo Gao, Kongming Liang, Ziji Zhang, Md Mahfuzur Rahman Siddiquee, Andriy Myronenko, Pooya Ashtari, Sabine Van Huffel , et al. (33 additional authors not shown)

    Abstract: Diffusion-weighted MRI (DWI) is essential for stroke diagnosis, treatment decisions, and prognosis. However, image and disease variability hinder the development of generalizable AI algorithms with clinical value. We address this gap by presenting a novel ensemble algorithm derived from the 2022 Ischemic Stroke Lesion Segmentation (ISLES) challenge. ISLES'22 provided 400 patient scans with ischemi… ▽ More

    Submitted 3 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  29. arXiv:2403.19096  [pdf, other

    cs.SE cs.CR

    SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection

    Authors: Xin-Cheng Wen, Cuiyun Gao, Shuzheng Gao, Yang Xiao, Michael R. Lyu

    Abstract: Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting vulnerabilities. However, the existing pre-trained model-based approaches generally employ code sequences as input during prediction, and may ignore vulnerability-related stru… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by ISSTA 2024

  30. arXiv:2403.17888  [pdf, other

    cs.CV cs.GR

    2D Gaussian Splatting for Geometrically Accurate Radiance Fields

    Authors: Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, Shenghua Gao

    Abstract: 3D Gaussian Splatting (3DGS) has recently revolutionized radiance field reconstruction, achieving high quality novel view synthesis and fast rendering speed without baking. However, 3DGS fails to accurately represent surfaces due to the multi-view inconsistent nature of 3D Gaussians. We present 2D Gaussian Splatting (2DGS), a novel approach to model and reconstruct geometrically accurate radiance… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 12 pages, 12 figures

  31. arXiv:2403.16233  [pdf, other

    cs.LG q-bio.PE stat.AP

    An early warning indicator trained on stochastic disease-spreading models with different noises

    Authors: Amit K. Chakraborty, Shan Gao, Reza Miry, Pouria Ramazi, Russell Greiner, Mark A. Lewis, Hao Wang

    Abstract: The timely detection of disease outbreaks through reliable early warning signals (EWSs) is indispensable for effective public health mitigation strategies. Nevertheless, the intricate dynamics of real-world disease spread, often influenced by diverse sources of noise and limited data in the early stages of outbreaks, pose a significant challenge in developing reliable EWSs, as the performance of e… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  32. arXiv:2403.15676  [pdf, other

    cs.SE cs.CL cs.CR

    AC4: Algebraic Computation Checker for Circuit Constraints in ZKPs

    Authors: Hao Chen, Minyu Chen, Ruibang Liu, Guoqiang Li, Sinka Gao

    Abstract: ZKP systems have surged attention and held a fundamental role in contemporary cryptography. Zk-SNARK protocols dominate the ZKP usage, often implemented through arithmetic circuit programming paradigm. However, underconstrained or overconstrained circuits may lead to bugs. Underconstrained circuits refer to circuits that lack the necessary constraints, resulting in unexpected solutions in the circ… ▽ More

    Submitted 7 May, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: 20 pages, 4 figures

  33. arXiv:2403.14729  [pdf, other

    cs.CV cs.LG

    Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch

    Authors: Xidong Wu, Shangqian Gao, Zeyu Zhang, Zhenzhen Li, Runxue Bao, Yanfu Zhang, Xiaoqian Wang, Heng Huang

    Abstract: Current techniques for deep neural network (DNN) pruning often involve intricate multi-step processes that require domain-specific expertise, making their widespread adoption challenging. To address the limitation, the Only-Train-Once (OTO) and OTOv2 are proposed to eliminate the need for additional fine-tuning steps by directly training and compressing a general DNN from scratch. Nevertheless, th… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  34. arXiv:2403.12445  [pdf, other

    cs.CV

    Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory

    Authors: Sensen Gao, Xiaojun Jia, Xuhong Ren, Ivor Tsang, Qing Guo

    Abstract: Vision-language pre-training (VLP) models exhibit remarkable capabilities in comprehending both images and text, yet they remain susceptible to multimodal adversarial examples (AEs). Strengthening adversarial attacks and uncovering vulnerabilities, especially common issues in VLP models (e.g., high transferable AEs), can stimulate further research on constructing reliable and practical VLP models.… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  35. arXiv:2403.12368  [pdf, other

    cs.CL cs.AI

    Characteristic AI Agents via Large Language Models

    Authors: Xi Wang, Hongliang Dai, Shen Gao, Piji Li

    Abstract: The advancement of Large Language Models (LLMs) has led to significant enhancements in the performance of chatbot systems. Many researchers have dedicated their efforts to the development of bringing characteristics to chatbots. While there have been commercial products for developing role-driven chatbots using LLMs, it is worth noting that academic research in this area remains relatively scarce.… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: COLING 2024,The benchmark is available at: https://github.com/nuaa-nlp/Character100

  36. arXiv:2403.12171  [pdf, other

    cs.CL cs.AI

    EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

    Authors: Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing Shao, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  37. arXiv:2403.11453  [pdf, other

    cs.GR cs.CV

    Bridging 3D Gaussian and Mesh for Freeview Video Rendering

    Authors: Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao

    Abstract: This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the f… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 7 pages

  38. arXiv:2403.10883  [pdf, other

    cs.CV cs.CR cs.MM

    Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

    Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Jiafeng Wang, Shuyong Gao, Wenqiang Zhang

    Abstract: Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models, resulting in a substantial performance gap from white-box attacks. We observe that prior work overlooks the interaction mechanisms between modalities, which plays a cr… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  39. arXiv:2403.10830  [pdf, other

    cs.CV

    View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV

    Authors: Deyi Ji, Siqi Gao, Lanyun Zhu, Yiru Zhao, Peng Xu, Hongtao Lu, Feng Zhao

    Abstract: In this paper, we address the challenge of multi-object tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/right, and moving up/down, lead to significantly greater complexity compared to fixed-camera MOT. Specifically, changes in the scene background not only render traditional frame-to-frame object IOU association… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  40. arXiv:2403.10828  [pdf, other

    cs.CR

    Data Availability and Decentralization: New Techniques for zk-Rollups in Layer 2 Blockchain Networks

    Authors: Chengpeng Huang, Rui Song, Shang Gao, Yu Guo, Bin Xiao

    Abstract: The scalability limitations of public blockchains have hindered their widespread adoption in real-world applications. While the Ethereum community is pushing forward in zk-rollup (zero-knowledge rollup) solutions, such as introducing the ``blob transaction'' in EIP-4844, Layer 2 networks encounter a data availability problem: storing transactions completely off-chain poses a risk of data loss, par… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  41. arXiv:2403.10012  [pdf, other

    cs.CV cs.RO eess.IV physics.optics

    Real-World Computational Aberration Correction via Quantized Domain-Mixing Representation

    Authors: Qi Jiang, Zhonghua Yi, Shaohua Gao, Yao Gao, Xiaolong Qian, Hao Shi, Lei Sun, Zhijie Xu, Kailun Yang, Kaiwei Wang

    Abstract: Relying on paired synthetic data, existing learning-based Computational Aberration Correction (CAC) methods are confronted with the intricate and multifaceted synthetic-to-real domain gap, which leads to suboptimal performance in real-world applications. In this paper, in contrast to improving the simulation pipeline, we deliver a novel insight into real-world CAC from the perspective of Unsupervi… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Codes and datasets will be made publicly available at https://github.com/zju-jiangqi/QDMR

  42. arXiv:2403.09630  [pdf, other

    cs.CV

    Generalized Predictive Model for Autonomous Driving

    Authors: Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

    Abstract: In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline. To eliminate the restriction of high-cost data collection and empower the generalization ability of our model, we acquire massive data from the web and pair it with diverse and high-quality text descriptions. The resultant dataset accumulates over 2000 hours of driving videos, spanning ar… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  43. arXiv:2403.08505  [pdf, other

    eess.IV cs.AI cs.CV cs.MM

    Content-aware Masked Image Modeling Transformer for Stereo Image Compression

    Authors: Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang

    Abstract: Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compressi… ▽ More

    Submitted 19 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  44. arXiv:2403.06130  [pdf, other

    cs.CV

    ClickVOS: Click Video Object Segmentation

    Authors: Pinxue Guo, Lingyi Hong, Xinyu Zhou, Shuyong Gao, Wanyun Li, Jinglun Li, Zhaoyu Chen, Xiaoqiang Li, Wei Zhang, Wenqiang Zhang

    Abstract: Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest a… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  45. arXiv:2403.05834  [pdf, other

    cs.MM cs.SD eess.AS

    Enhancing Expressiveness in Dance Generation via Integrating Frequency and Music Style Information

    Authors: Qiaochu Huang, Xu He, Boshi Tang, Haolin Zhuang, Liyang Chen, Shuochen Gao, Zhiyong Wu, Haozhi Huang, Helen Meng

    Abstract: Dance generation, as a branch of human motion generation, has attracted increasing attention. Recently, a few works attempt to enhance dance expressiveness, which includes genre matching, beat alignment, and dance dynamics, from certain aspects. However, the enhancement is quite limited as they lack comprehensive consideration of the aforementioned three factors. In this paper, we propose Expressi… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  46. arXiv:2403.05477  [pdf, other

    cs.RO

    Take Your Best Shot: Sampling-Based Next-Best-View Planning for Autonomous Photography & Inspection

    Authors: Shijie Gao, Lauren Bramblett, Nicola Bezzo

    Abstract: Autonomous mobile robots (AMRs) equipped with high-quality cameras have revolutionized the field of inspections by providing efficient and cost-effective means of conducting surveys. The use of autonomous inspection is becoming more widespread in a variety of contexts, yet it is still challenging to acquire the best inspection information autonomously. In situations where objects may block a robot… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: For code and videos, see https://www.bezzorobotics.com/sg-lb-iros24

  47. arXiv:2403.05004  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Can't Remember Details in Long Documents? You Need Some R&R

    Authors: Devanshu Agrawal, Shang Gao, Martin Gajek

    Abstract: Long-context large language models (LLMs) hold promise for tasks such as question-answering (QA) over long documents, but they tend to miss important information in the middle of context documents (arXiv:2307.03172v3). Here, we introduce $\textit{R&R}$ -- a combination of two novel prompt-based methods called $\textit{reprompting}$ and $\textit{in-context retrieval}$ (ICR) -- to alleviate this eff… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 13 pages, 1 figure, 9 tables. For associated code repository see https://github.com/casetext/r-and-r

  48. arXiv:2403.03424  [pdf, other

    cs.IR

    Generative News Recommendation

    Authors: Shen Gao, Jiabao Fang, Quan Tu, Zhitao Yao, Zhumin Chen, Pengjie Ren, Zhaochun Ren

    Abstract: Most existing news recommendation methods tackle this task by conducting semantic matching between candidate news and user representation produced by historical clicked news. However, they overlook the high-level connections among different news articles and also ignore the profound relationship between these news articles and users. And the definition of these methods dictates that they can only… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW 2024

  49. arXiv:2403.03031  [pdf, other

    cs.CL

    Learning to Use Tools via Cooperative and Interactive Agents

    Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Pengjie Ren, Suzan Verberne, Zhaochun Ren

    Abstract: Tool learning empowers large language models (LLMs) as agents to use external tools to extend their capability. Existing methods employ one single LLM-based agent to iteratively select and execute tools, thereafter incorporating the result into the next action prediction. However, they still suffer from potential performance degradation when addressing complex tasks due to: (1) the limitation of t… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 20 pages

  50. arXiv:2403.01548  [pdf, other

    cs.CL cs.AI cs.LG

    In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

    Authors: Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He

    Abstract: Large language models (LLMs) frequently hallucinate and produce factual errors, yet our understanding of why they make these errors remains limited. In this study, we delve into the underlying mechanisms of LLM hallucinations from the perspective of inner representations, and discover a salient pattern associated with hallucinations: correct generations tend to have sharper context activations in… ▽ More

    Submitted 12 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    Comments: code repo is available at: https://github.com/hkust-nlp/Activation_decoding.git