Skip to main content

Showing 1–50 of 3,684 results for author: Liu, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05808  [pdf, other

    cs.CV

    Fast and Controllable Post-training Sparsity: Learning Optimal Sparsity Allocation with Global Constraint in Minutes

    Authors: Ruihao Gong, Yang Yong, Zining Wang, Jinyang Guo, Xiuying Wei, Yuqing Ma, Xianglong Liu

    Abstract: Neural network sparsity has attracted many research interests due to its similarity to biological schemes and high energy efficiency. However, existing methods depend on long-time training or fine-tuning, which prevents large-scale applications. Recently, some works focusing on post-training sparsity (PTS) have emerged. They get rid of the high training cost but usually suffer from distinct accura… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05553  [pdf, other

    cs.CV cs.AI

    Towards Robust Physical-world Backdoor Attacks on Lane Detection

    Authors: Xinwei Zhang, Aishan Liu, Tianyuan Zhang, Siyuan Liang, Xianglong Liu

    Abstract: Deep learning-based lane detection (LD) plays a critical role in autonomous driving systems, such as adaptive cruise control. However, it is vulnerable to backdoor attacks. Existing backdoor attack methods on LD exhibit limited effectiveness in dynamic real-world scenarios, primarily because they fail to consider dynamic scene factors, including changes in driving perspectives (e.g., viewpoint tra… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.04932  [pdf, other

    cs.NI

    FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering

    Authors: Ximeng Liu, Shizhen Zhao, Yong Cui

    Abstract: Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes often struggle to accurately determine the extent of focus required for these surges, thereby facing difficulties in achieving a balance between performance under normal and peak traffic conditions. To address this issue, we i… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.04520  [pdf, other

    cs.CL cs.LG cs.SE

    NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

    Authors: Shudan Zhang, Hanlin Zhao, Xiao Liu, Qinkai Zheng, Zehan Qi, Xiaotao Gu, Xiaohan Zhang, Yuxiao Dong, Jie Tang

    Abstract: Large language models (LLMs) have manifested strong ability to generate codes for productive activities. However, current benchmarks for code synthesis, such as HumanEval, MBPP, and DS-1000, are predominantly oriented towards introductory tasks on algorithm and data science, insufficiently satisfying challenging requirements prevalent in real-world coding. To fill this gap, we propose NaturalCodeB… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  5. arXiv:2405.04496  [pdf, other

    cs.CV

    Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing

    Authors: Yi Zuo, Lingling Li, Licheng Jiao, Fang Liu, Xu Liu, Wenping Ma, Shuyuan Yang, Yuwei Guo

    Abstract: Existing diffusion-based video editing methods have achieved impressive results in motion editing. Most of the existing methods focus on the motion alignment between the edited video and the reference video. However, these methods do not constrain the background and object content of the video to remain unchanged, which makes it possible for users to generate unexpected videos. In this paper, we p… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  6. arXiv:2405.04404  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Vision Mamba: A Comprehensive Survey and Taxonomy

    Authors: Xiao Liu, Chenxu Zhang, Lei Zhang

    Abstract: State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy

  7. arXiv:2405.04286  [pdf, other

    cs.CL

    Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore

    Authors: Junchao Wu, Runzhe Zhan, Derek F. Wong, Shu Yang, Xuebo Liu, Lidia S. Chao, Min Zhang

    Abstract: The efficacy of an large language model (LLM) generated text detector depends substantially on the availability of sizable training data. White-box zero-shot detectors, which require no such data, are nonetheless limited by the accessibility of the source model of the LLM-generated text. In this paper, we propose an simple but effective black-box zero-shot detection approach, predicated on the obs… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2405.04064  [pdf, other

    cs.AI

    MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

    Authors: Yanli Yuan, Bingbing Wang, Chuan Zhang, Jingyi Xu, Ximeng Liu, Liehuang Zhu

    Abstract: Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2)… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024

  9. arXiv:2405.03806  [pdf, other

    cs.HC

    In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

    Authors: Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

    Abstract: Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-c… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  10. arXiv:2405.03534  [pdf, other

    cs.RO cs.AI cs.LG cs.NE

    Meta-Evolve: Continuous Robot Evolution for One-to-many Policy Transfer

    Authors: Xingyu Liu, Deepak Pathak, Ding Zhao

    Abstract: We investigate the problem of transferring an expert policy from a source robot to multiple different robots. To solve this problem, we propose a method named $Meta$-$Evolve$ that uses continuous robot evolution to efficiently transfer the policy to each target robot through a set of tree-structured evolutionary robot sequences. The robot evolution tree allows the robot evolution paths to be share… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICLR 2024

  11. arXiv:2405.03333  [pdf, other

    cs.CV

    Light-VQA+: A Video Quality Assessment Model for Exposure Correction with Vision-Language Guidance

    Authors: Xunchu Zhou, Xiaohong Liu, Yunlong Dong, Tengchuan Kou, Yixuan Gao, Zicheng Zhang, Chunyi Li, Haoning Wu, Guangtao Zhai

    Abstract: Recently, User-Generated Content (UGC) videos have gained popularity in our daily lives. However, UGC videos often suffer from poor exposure due to the limitations of photographic equipment and techniques. Therefore, Video Exposure Correction (VEC) algorithms have been proposed, Low-Light Video Enhancement (LLVE) and Over-Exposed Video Recovery (OEVR) included. Equally important to the VEC is the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  12. arXiv:2405.03144  [pdf, other

    cs.CV cs.LG

    PTQ4SAM: Post-Training Quantization for Segment Anything

    Authors: Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, Xianglong Liu

    Abstract: Segment Anything Model (SAM) has achieved impressive performance in many computer vision tasks. However, as a large-scale model, the immense memory and computation costs hinder its practical deployment. In this paper, we propose a post-training quantization (PTQ) framework for Segment Anything Model, namely PTQ4SAM. First, we investigate the inherent bottleneck of SAM quantization attributed to th… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  13. Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning

    Authors: Deng Li, Bohao Xing, Xin Liu

    Abstract: Psychological studies have shown that Micro Gestures (MG) are closely linked to human emotions. MG-based emotion understanding has attracted much attention because it allows for emotion understanding through nonverbal body gestures without relying on identity information (e.g., facial and electrocardiogram data). Therefore, it is essential to recognize MG effectively for advanced emotion understan… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: accepted by IEEE Signal Processing Letters

    Journal ref: IEEE Signal Processing Letters (2024)

  14. GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression

    Authors: Daxin Li, Yuanchao Bai, Kai Wang, Junjun Jiang, Xianming Liu, Wen Gao

    Abstract: Transformer-based entropy models have gained prominence in recent years due to their superior ability to capture long-range dependencies in probability distribution estimation compared to convolution-based methods. However, previous transformer-based entropy models suffer from a sluggish coding process due to pixel-wise autoregression or duplicated computation during inference. In this paper, we p… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE TCSVT

  15. arXiv:2405.00998  [pdf, other

    cs.CV

    Part-aware Shape Generation with Latent 3D Diffusion of Neural Voxel Fields

    Authors: Yuhang Huang, SHilong Zou, Xinwang Liu, Kai Xu

    Abstract: This paper presents a novel latent 3D diffusion model for the generation of neural voxel fields, aiming to achieve accurate part-aware structures. Compared to existing methods, there are two key designs to ensure high-quality and accurate part-aware generation. On one hand, we introduce a latent 3D diffusion process for neural voxel fields, enabling generation at significantly higher resolutions t… ▽ More

    Submitted 9 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  16. arXiv:2405.00957  [pdf, other

    cs.LG cs.AI cs.SI

    IntraMix: Intra-Class Mixup Generation for Accurate Labels and Neighbors

    Authors: Shenghe Zheng, Hongzhi Wang, Xianglong Liu

    Abstract: Graph Neural Networks (GNNs) demonstrate excellent performance on graphs, with their core idea about aggregating neighborhood information and learning from labels. However, the prevailing challenges in most graph datasets are twofold of Insufficient High-Quality Labels and Lack of Neighborhoods, resulting in weak GNNs. Existing data augmentation methods designed to address these two issues often t… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 18 pages

  17. arXiv:2405.00739  [pdf, other

    cs.LG cs.CV eess.IV

    Why does Knowledge Distillation Work? Rethink its Attention and Fidelity Mechanism

    Authors: Chenqi Guo, Shiwei Zhong, Xiaofeng Liu, Qianli Feng, Yinglong Ma

    Abstract: Does Knowledge Distillation (KD) really work? Conventional wisdom viewed it as a knowledge transfer procedure where a perfect mimicry of the student to its teacher is desired. However, paradoxical studies indicate that closely replicating the teacher's behavior does not consistently improve student generalization, posing questions on its possible causes. Confronted with this gap, we hypothesize th… ▽ More

    Submitted 29 April, 2024; originally announced May 2024.

  18. arXiv:2405.00574  [pdf, other

    cs.CV cs.MM

    EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

    Authors: Deng Li, Xin Liu, Bohao Xing, Baiqiang Xia, Yuan Zong, Bihan Wen, Heikki Kälviäinen

    Abstract: Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliber… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  19. arXiv:2405.00527  [pdf, other

    cs.DB

    ChatBI: Towards Natural Language to Complex Business Intelligence SQL

    Authors: Jinqing Lian, Xinyi Liu, Yingxia Shao, Yang Dong, Ming Wang, Zhang Wei, Tianqi Wan, Ming Dong, Hailin Yan

    Abstract: The Natural Language to SQL (NL2SQL) technology provides non-expert users who are unfamiliar with databases the opportunity to use SQL for data analysis.Converting Natural Language to Business Intelligence (NL2BI) is a popular practical scenario for NL2SQL in actual production systems. Compared to NL2SQL, NL2BI introduces more challenges. In this paper, we propose ChatBI, a comprehensive and eff… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  20. arXiv:2405.00301  [pdf, other

    cs.CL

    LITO: Learnable Intervention for Truthfulness Optimization

    Authors: Farima Fatahi Bayat, Xin Liu, H. V. Jagadish, Lu Wang

    Abstract: Large language models (LLMs) can generate long-form and coherent text, but they still frequently hallucinate facts, thus limiting their reliability. To address this issue, inference-time methods that elicit truthful responses have been proposed by shifting LLM representations towards learned "truthful directions". However, applying the truthful directions with the same intensity fails to generaliz… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures

  21. arXiv:2405.00295  [pdf, other

    cs.GT

    Proof of Sampling: A Nash Equilibrium-Secured Verification Protocol for Decentralized Systems

    Authors: Yue Zhang, Shouqiao Wang, Xiaoyuan Liu, Sijun Tan, Raluca Ada Popa, Ciamac C. Moallemi

    Abstract: This paper presents a secure and versatile sampling-based verification protocol, Proof of Sampling (PoSP) protocol, suitable for a wide range of decentralized applications. Our protocol has a pure strategy Nash Equilibrium, which compels rational participants to act honestly, thus fortifying the network's integrity. This design effectively eliminates the possibility of free-riding, achieving this… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  22. arXiv:2404.19335  [pdf, other

    cs.CL

    StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

    Authors: Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

    Abstract: Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Submitted to ACL 2024

  23. arXiv:2404.19267  [pdf

    cs.DL math.NA

    Study on the Temporal Evolution of Literature Bradford Curves in the Context of Library Specialization

    Authors: Haobai Xue, Xian Liu

    Abstract: The Bradford's law of bibliographic scattering is a fundamental law in bibliometrics and can provide valuable guidance to academic libraries in literature search and procurement. However, the Bradford's curves can take various shapes at different time points and there is still a lack of causal explanation for it, so the prediction of its shape is still an open question. This paper attributes the d… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  24. arXiv:2404.18961  [pdf, other

    cs.LG cs.AI cs.CV

    Unleashing the Power of Multi-Task Learning: A Comprehensive Survey Spanning Traditional, Deep, and Pretrained Foundation Model Eras

    Authors: Jun Yu, Yutong Dai, Xiaokang Liu, Jin Huang, Yishan Shen, Ke Zhang, Rong Zhou, Eashan Adhikarla, Wenxuan Ye, Yixin Liu, Zhaoming Kong, Kai Zhang, Yilong Yin, Vinod Namboodiri, Brian D. Davison, Jason H. Moore, Yong Chen

    Abstract: MTL is a learning paradigm that effectively leverages both task-specific and shared information to address multiple related tasks simultaneously. In contrast to STL, MTL offers a suite of benefits that enhance both the training process and the inference efficiency. MTL's key advantages encompass streamlined model architecture, performance enhancement, and cross-domain generalizability. Over the pa… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 60 figures, 116 pages, 500+ references

  25. arXiv:2404.18955  [pdf, other

    cs.NE cs.AI

    GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes

    Authors: Zhaoning Shi, Meng Xiang, Zhaoyang Hai, Xiabi Liu, Yan Pei

    Abstract: Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize rela… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  26. arXiv:2404.18413  [pdf, other

    cs.CV cs.AI

    3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

    Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

    Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  27. arXiv:2404.18392  [pdf, other

    cs.DC

    Dflow, a Python framework for constructing cloud-native AI-for-Science workflows

    Authors: Xinzijian Liu, Yanbo Han, Zhuoyuan Li, Jiahao Fan, Chengqian Zhang, Jinzhe Zeng, Yifan Shan, Yannan Yuan, Wei-Hong Xu, Yun-Pei Liu, Yuzhi Zhang, Tongqi Wen, Darrin M. York, Zhicheng Zhong, Hang Zheng, Jun Cheng, Linfeng Zhang, Han Wang

    Abstract: In the AI-for-science era, scientific computing scenarios such as concurrent learning and high-throughput computing demand a new generation of infrastructure that supports scalable computing resources and automated workflow management on both cloud and high-performance supercomputers. Here we introduce Dflow, an open-source Python toolkit designed for scientists to construct workflows with simple… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  28. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  29. arXiv:2404.18203  [pdf, other

    cs.CV cs.AI

    LMM-PCQA: Assisting Point Cloud Quality Assessment with LMM

    Authors: Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Wei Sun, Chaofeng Chen, Xiongkuo Min, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: Although large multi-modality models (LMMs) have seen extensive exploration and application in various quality assessment studies, their integration into Point Cloud Quality Assessment (PCQA) remains unexplored. Given LMMs' exceptional performance and robustness in low-level vision and quality assessment tasks, this study aims to investigate the feasibility of imparting PCQA knowledge to LMMs thro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  30. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  31. arXiv:2404.17847  [pdf, other

    cs.LG

    pFedAFM: Adaptive Feature Mixture for Batch-Level Personalization in Heterogeneous Federated Learning

    Authors: Liping Yi, Han Yu, Chao Ren, Heng Zhang, Gang Wang, Xiaoguang Liu, Xiaoxiao Li

    Abstract: Model-heterogeneous personalized federated learning (MHPFL) enables FL clients to train structurally different personalized models on non-independent and identically distributed (non-IID) local data. Existing MHPFL methods focus on achieving client-level personalization, but cannot address batch-level data heterogeneity. To bridge this important gap, we propose a model-heterogeneous personalized F… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  32. arXiv:2404.17821  [pdf

    cs.SD cs.MM eess.AS

    An automatic mixing speech enhancement system for multi-track audio

    Authors: Xiaojing Liu, Angeliki Mourgela, Hongwei Ai, Joshua D. Reiss

    Abstract: We propose a speech enhancement system for multitrack audio. The system will minimize auditory masking while allowing one to hear multiple simultaneous speakers. The system can be used in multiple communication scenarios e.g., teleconferencing, invoice gaming, and live streaming. The ITU-R BS.1387 Perceptual Evaluation of Audio Quality (PEAQ) model is used to evaluate the amount of masking in the… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 5 pages

  33. arXiv:2404.17806  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    T-CLAP: Temporal-Enhanced Contrastive Language-Audio Pretraining

    Authors: Yi Yuan, Zhuo Chen, Xubo Liu, Haohe Liu, Xuenan Xu, Dongya Jia, Yuanzhe Chen, Mark D. Plumbley, Wenwu Wang

    Abstract: Contrastive language-audio pretraining~(CLAP) has been developed to align the representations of audio and language, achieving remarkable performance in retrieval and classification tasks. However, current CLAP struggles to capture temporal information within audio and text features, presenting substantial limitations for tasks such as audio retrieval and generation. To address this gap, we introd… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IEEE MLSP 2024

  34. arXiv:2404.17774  [pdf, other

    cs.CV cs.GR

    High-quality Surface Reconstruction using Gaussian Surfels

    Authors: Pinxuan Dai, Jiamin Xu, Wenxiang Xie, Xinguo Liu, Huamin Wang, Weiwei Xu

    Abstract: We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.… ▽ More

    Submitted 29 April, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: Results added and improved

  35. arXiv:2404.17685  [pdf

    cs.RO

    Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses

    Authors: Yi Shen, Hao Liu, Xinxin Liu, Wenjing Zhou, Chang Zhou, Yizhou Chen

    Abstract: The reduced cost and computational and calibration requirements of monocular cameras make them ideal positioning sensors for mobile robots, albeit at the expense of any meaningful depth measurement. Solutions proposed by some scholars to this localization problem involve fusing pose estimates from convolutional neural networks (CNNs) with pose estimates from geometric constraints on motion to gene… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  36. arXiv:2404.17173  [pdf, other

    cs.CV cs.AI

    Exploring Beyond Logits: Hierarchical Dynamic Labeling Based on Embeddings for Semi-Supervised Classification

    Authors: Yanbiao Ma, Licheng Jiao, Fang Liu, Lingling Li, Shuyuan Yang, Xu Liu

    Abstract: In semi-supervised learning, methods that rely on confidence learning to generate pseudo-labels have been widely proposed. However, increasing research finds that when faced with noisy and biased data, the model's representation network is more reliable than the classification network. Additionally, label generation methods based on model predictions often show poor adaptability across different d… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  37. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  38. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  39. arXiv:2404.16645  [pdf, other

    cs.CL cs.AI

    Tele-FLM Technical Report

    Authors: Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang, Shuangyong Song, Yongxiang Li, Zheng Zhang, Bo Zhao, Aixin Sun, Yequan Wang, Zhongjiang He, Zhongyuan Wang, Xuelong Li, Tiejun Huang

    Abstract: Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on efficiently scaling LLMs beyond 50 billion parameters with minimum trial-and-error cost and computational resources. In this report, we introduce Tele-FLM (aka FLM-2), a… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  40. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  41. arXiv:2404.16205  [pdf, other

    cs.CV cs.MM

    AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results

    Authors: Marcos V. Conde, Saman Zadtootaghaj, Nabajeet Barman, Radu Timofte, Chenlong He, Qi Zheng, Ruoxi Zhu, Zhengzhong Tu, Haiqiang Wang, Xiangguang Chen, Wenhui Meng, Xiang Pan, Huiying Shi, Han Zhu, Xiaozhong Xu, Lei Sun, Zhenzhong Chen, Shan Liu, Zicheng Zhang, Haoning Wu, Yingjie Zhou, Chunyi Li, Xiaohong Liu, Weisi Lin, Guangtao Zhai , et al. (11 additional authors not shown)

    Abstract: This paper reviews the AIS 2024 Video Quality Assessment (VQA) Challenge, focused on User-Generated Content (UGC). The aim of this challenge is to gather deep learning-based methods capable of estimating the perceptual quality of UGC videos. The user-generated videos from the YouTube UGC Dataset include diverse content (sports, games, lyrics, anime, etc.), quality and resolutions. The proposed met… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Workshop -- AI for Streaming (AIS) Video Quality Assessment Challenge

  42. arXiv:2404.15676  [pdf, other

    cs.CL cs.AI

    Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs

    Authors: Yu Xia, Rui Wang, Xu Liu, Mingyan Li, Tong Yu, Xiang Chen, Julian McAuley, Shuai Li

    Abstract: Chain-of-Thought (CoT) has been a widely adopted prompting method, eliciting impressive reasoning abilities of Large Language Models (LLMs). Inspired by the sequential thought structure of CoT, a number of Chain-of-X (CoX) methods have been developed to address various challenges across diverse domains and tasks involving LLMs. In this paper, we provide a comprehensive survey of Chain-of-X methods… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  43. arXiv:2404.15409  [pdf, ps, other

    cs.LG cs.CR stat.ML

    Insufficient Statistics Perturbation: Stable Estimators for Private Least Squares

    Authors: Gavin Brown, Jonathan Hayase, Samuel Hopkins, Weihao Kong, Xiyang Liu, Sewoong Oh, Juan C. Perdomo, Adam Smith

    Abstract: We present a sample- and time-efficient differentially private algorithm for ordinary least squares, with error that depends linearly on the dimension and is independent of the condition number of $X^\top X$, where $X$ is the design matrix. All prior private algorithms for this task require either $d^{3/2}$ examples, error growing polynomially with the condition number, or exponential time. Our ne… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 42 pages, 3 figures

  44. arXiv:2404.15366  [pdf, other

    eess.SP cs.LG

    A Weight-aware-based Multi-source Unsupervised Domain Adaptation Method for Human Motion Intention Recognition

    Authors: Xiao-Yin Liu, Guotao Li, Xiao-Hu Zhou, Xu Liang, Zeng-Guang Hou

    Abstract: Accurate recognition of human motion intention (HMI) is beneficial for exoskeleton robots to improve the wearing comfort level and achieve natural human-robot interaction. A classifier trained on labeled source subjects (domains) performs poorly on unlabeled target subject since the difference in individual motor characteristics. The unsupervised domain adaptation (UDA) method has become an effect… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 5 figures

  45. arXiv:2404.15009  [pdf, other

    cs.CV eess.IV

    The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033

  46. arXiv:2404.15008  [pdf, other

    cs.CV

    External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection

    Authors: Wen Liang, Peipei Ran, Mengchao Bai, Xiao Liu, P. Bilha Githinji, Wei Zhao, Peiwu Qin

    Abstract: Salient object detection (SOD) aims at finding the most salient objects in images and outputs pixel-level binary masks. Transformer-based methods achieve promising performance due to their global semantic understanding, crucial for identifying salient objects. However, these models tend to be large and require numerous training parameters. To better harness the potential of transformers for SOD, w… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  47. arXiv:2404.14928  [pdf, other

    cs.LG cs.AI cs.CL cs.SI

    Graph Machine Learning in the Era of Large Language Models (LLMs)

    Authors: Wenqi Fan, Shijie Wang, Jiani Huang, Zhikai Chen, Yu Song, Wenzhuo Tang, Haitao Mao, Hui Liu, Xiaorui Liu, Dawei Yin, Qing Li

    Abstract: Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  48. arXiv:2404.14527  [pdf, other

    cs.DC cs.LG

    Mélange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity

    Authors: Tyler Griggs, Xiaoxuan Liu, Jiaxiang Yu, Doyoung Kim, Wei-Lin Chiang, Alvin Cheung, Ion Stoica

    Abstract: Large language models (LLMs) are increasingly integrated into many online services. However, a major challenge in deploying LLMs is their high cost, due primarily to the use of expensive GPU instances. To address this problem, we find that the significant heterogeneity of GPU types presents an opportunity to increase GPU cost efficiency and reduce deployment costs. The broad and growing market of… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  49. arXiv:2404.14453  [pdf, other

    cs.CL cs.AI cs.DB

    EPI-SQL: Enhancing Text-to-SQL Translation with Error-Prevention Instructions

    Authors: Xiping Liu, Zhao Tan

    Abstract: The conversion of natural language queries into SQL queries, known as Text-to-SQL, is a critical yet challenging task. This paper introduces EPI-SQL, a novel methodological framework leveraging Large Language Models (LLMs) to enhance the performance of Text-to-SQL tasks. EPI-SQL operates through a four-step process. Initially, the method involves gathering instances from the Spider dataset on whic… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  50. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report