Skip to main content

Showing 1–50 of 598 results for author: Song, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.01043  [pdf, ps, other

    cs.IT

    Reed-Solomon Codes over Cyclic Polynomial Ring with Lower Encoding/Decoding Complexity

    Authors: Wenhao Liu, Zhengyi Jiang, Zhongyi Huang, Linqi Song, Hanxu Hou

    Abstract: Reed-Solomon (RS) codes are constructed over a finite field that have been widely employed in storage and communication systems. Many fast encoding/decoding algorithms such as fast Fourier transform (FFT) and modular approach are designed for RS codes to reduce the encoding/decoding complexity defined as the number of XORs involved in the encoding/decoding procedure. In this paper, we present the… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  2. arXiv:2404.16678  [pdf, other

    cs.CV

    Multimodal Semantic-Aware Automatic Colorization with Diffusion Prior

    Authors: Han Wang, Xinning Chai, Yiwen Wang, Yuhong Zhang, Rong Xie, Li Song

    Abstract: Colorizing grayscale images offers an engaging visual experience. Existing automatic colorization methods often fail to generate satisfactory results due to incorrect semantic colors and unsaturated colors. In this work, we propose an automatic colorization pipeline to overcome these challenges. We leverage the extraordinary generative ability of the diffusion prior to synthesize color with plausi… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  3. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using metastable 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2404.14396  [pdf, other

    cs.CV

    SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation

    Authors: Yuying Ge, Sijie Zhao, Jinguo Zhu, Yixiao Ge, Kun Yi, Lin Song, Chen Li, Xiaohan Ding, Ying Shan

    Abstract: The rapid evolution of multimodal foundation model has demonstrated significant progresses in vision-language understanding and generation, e.g., our previous work SEED-LLaMA. However, there remains a gap between its capability and the real-world applicability, primarily due to the model's limited capacity to effectively respond to various user instructions and interact with diverse visual data. I… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Project released at: https://github.com/AILab-CVC/SEED-X

  5. arXiv:2404.13968  [pdf, other

    cs.CL cs.AI cs.CR

    Protecting Your LLMs with Information Bottleneck

    Authors: Zichuan Liu, Zefan Wang, Linjie Xu, Jinyu Wang, Lei Song, Tianchun Wang, Chunlin Chen, Wei Cheng, Jiang Bian

    Abstract: The advent of large language models (LLMs) has revolutionized the field of natural language processing, yet they might be attacked to produce harmful content. Despite efforts to ethically align LLMs, these are often fragile and can be circumvented by jailbreaking attacks through optimized or manual adversarial prompts. To address this, we introduce the Information Bottleneck Protector (IBProtector… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2404.12253  [pdf, other

    cs.CL cs.LG

    Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

    Authors: Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, Dong Yu

    Abstract: Despite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced prompting techniques and the necessity of fine-tuning with high-quality data to augment LLMs' reasoning abilities. However, these approaches are inherently constrained by data availability and quality. I… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  7. arXiv:2404.12020  [pdf, other

    cs.CV

    Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering

    Authors: Jie Ma, Min Hu, Pinghui Wang, Wangchun Sun, Lingyun Song, Hongbin Pei, Jun Liu, Youtian Du

    Abstract: Audio-Visual Question Answering (AVQA) is a complex multi-modal reasoning task, demanding intelligent systems to accurately respond to natural language queries based on audio-video input pairs. Nevertheless, prevalent AVQA approaches are prone to overlearning dataset biases, resulting in poor robustness. Furthermore, current datasets may not provide a precise diagnostic for these methods. To tackl… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 16 pages, 9 figures,5 Tables

    ACM Class: I.2.10

  8. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  9. arXiv:2404.09715  [pdf, other

    cs.LG cs.AI cs.MA

    Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

    Authors: Linjie Xu, Zichuan Liu, Alexander Dockhorn, Diego Perez-Liebana, Jinyu Wang, Lei Song, Jiang Bian

    Abstract: One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency. Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning (MARL) is more challenging because of its inherent partial observability, non-stationary training, and enormous strategy space. Although much effort has been devoted to developing new methods and enhancing sample efficiency… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  10. arXiv:2404.09633  [pdf, other

    cs.CV

    In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation

    Authors: Han Xue, Qianru Sun, Li Song, Wenjun Zhang, Zhiwu Huang

    Abstract: We propose In-Context Translation (ICT), a general learning framework to unify visual recognition (e.g., semantic segmentation), low-level image processing (e.g., denoising), and conditional image generation (e.g., edge-to-image synthesis). Thanks to unification, ICT significantly reduces the inherent inductive bias that comes with designing models for specific tasks, and it maximizes mutual enhan… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  11. arXiv:2404.09338  [pdf, other

    cs.CL

    Entropy Guided Extrapolative Decoding to Improve Factuality in Large Language Models

    Authors: Souvik Das, Lifeng Jin, Linfeng Song, Haitao Mi, Baolin Peng, Dong Yu

    Abstract: Large language models (LLMs) exhibit impressive natural language capabilities but suffer from hallucination -- generating content ungrounded in the realities of training data. Recent work has focused on decoding techniques to improve factuality during inference by leveraging LLMs' hierarchical representation of factual knowledge, manipulating the predicted distributions at inference time. Current… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  12. arXiv:2404.04306  [pdf, other

    cs.CR cs.AI cs.CL cs.CY

    AuditGPT: Auditing Smart Contracts with ChatGPT

    Authors: Shihao Xia, Shuai Shao, Mengting He, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: To govern smart contracts running on Ethereum, multiple Ethereum Request for Comment (ERC) standards have been developed, each containing a set of rules to guide the behaviors of smart contracts. Violating the ERC rules could cause serious security issues and financial loss, signifying the importance of verifying smart contracts follow ERCs. Today's practices of such verification are to either man… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  13. arXiv:2404.04232  [pdf, other

    cs.CL

    Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation

    Authors: Tianqi Zhong, Zhaoyi Li, Quan Wang, Linqi Song, Ying Wei, Defu Lian, Zhendong Mao

    Abstract: Compositional generalization, representing the model's ability to generate text with new attribute combinations obtained by recombining single attributes from the training data, is a crucial property for multi-aspect controllable text generation (MCTG) methods. Nonetheless, a comprehensive compositional generalization evaluation benchmark of MCTG is still lacking. We propose CompMCTG, a benchmark… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  14. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  15. arXiv:2403.19094  [pdf, other

    cs.CL

    Learning From Correctness Without Prompting Makes LLM Efficient Reasoner

    Authors: Yuxuan Yao, Han Wu, Zhijiang Guo, Biyan Zhou, Jiahui Gao, Sichun Luo, Hanxu Hou, Xiaojin Fu, Linqi Song

    Abstract: Large language models (LLMs) have demonstrated outstanding performance across various tasks, yet they still exhibit limitations such as hallucination, unfaithful reasoning, and toxic content. One potential approach to mitigate these issues is learning from human or external feedback (e.g. tools). In this paper, we introduce an intrinsic self-correct reasoning framework for LLMs that eliminates the… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  16. arXiv:2403.17676  [pdf

    physics.app-ph cs.ET

    Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

    Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

    Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  17. arXiv:2403.15944  [pdf, other

    cs.CV cs.AI eess.IV

    Adaptive Super Resolution For One-Shot Talking-Head Generation

    Authors: Luchuan Song, Pinxin Liu, Guojun Yin, Chenliang Xu

    Abstract: The one-shot talking-head generation learns to synthesize a talking-head video with one source portrait image under the driving of same or different identity video. Usually these methods require plane-based pixel transformations via Jacobin matrices or facial image warps for novel poses generation. The constraints of using a single image source and pixel displacements often compromise the clarity… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: 5 pages, 3 figures

  18. arXiv:2403.09849  [pdf, other

    cs.CL cs.AI

    Self-Consistency Boosts Calibration for Math Reasoning

    Authors: Ante Wang, Linfeng Song, Ye Tian, Baolin Peng, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: Calibration, which establishes the correlation between accuracy and model confidence, is important for LLM development. We design three off-the-shelf calibration methods based on self-consistency (Wang et al., 2022) for math reasoning tasks. Evaluation on two popular benchmarks (GSM8K and MathQA) using strong open-source LLMs (Mistral and LLaMA2), our methods better bridge model confidence and acc… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  19. Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

    Authors: Mingyue Cheng, Hao Zhang, Jiqian Yang, Qi Liu, Li Li, Xin Huang, Liwei Song, Zhi Li, Zhenya Huang, Enhong Chen

    Abstract: Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large language models have been proposed in this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking the capability to evaluate subjective questions which is extremely common for large language models. Ad… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  20. arXiv:2403.07839  [pdf, other

    cs.CV cs.AI cs.MM

    MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

    Authors: Haokun Lin, Haoli Bai, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun

    Abstract: Vision-language pre-trained models have achieved impressive performance on various downstream tasks. However, their large model sizes hinder their utilization on platforms with limited computational resources. We find that directly using smaller pre-trained models and applying magnitude-based pruning on CLIP models leads to inflexibility and inferior performance. Recent efforts for VLP compression… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 18 pages, 8 figures, Published in CVPR2024

    Journal ref: In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  21. arXiv:2403.06097  [pdf, other

    cs.CL cs.AI cs.IR

    Can LLM Substitute Human Labeling? A Case Study of Fine-grained Chinese Address Entity Recognition Dataset for UAV Delivery

    Authors: Yuxuan Yao, Sichun Luo, Haohan Zhao, Guanzhi Deng, Linqi Song

    Abstract: We present CNER-UAV, a fine-grained \textbf{C}hinese \textbf{N}ame \textbf{E}ntity \textbf{R}ecognition dataset specifically designed for the task of address resolution in \textbf{U}nmanned \textbf{A}erial \textbf{V}ehicle delivery systems. The dataset encompasses a diverse range of five categories, enabling comprehensive training and evaluation of NER models. To construct this dataset, we sourced… ▽ More

    Submitted 19 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by TheWebConf'24 (WWW'24) as a Resource Paper

  22. arXiv:2403.03496  [pdf, ps, other

    cs.CL

    A Knowledge Plug-and-Play Test Bed for Open-domain Dialogue Generation

    Authors: Xiangci Li, Linfeng Song, Lifeng Jin, Haitao Mi, Jessica Ouyang, Dong Yu

    Abstract: Knowledge-based, open-domain dialogue generation aims to build chit-chat systems that talk to humans using mined support knowledge. Many types and sources of knowledge have previously been shown to be useful as support knowledge. Even in the era of large language models, response generation grounded in knowledge retrieved from additional up-to-date sources remains a practically important approach.… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  23. arXiv:2403.02661  [pdf, other

    cs.SE

    How to Save My Gas Fees: Understanding and Detecting Real-world Gas Issues in Solidity Programs

    Authors: Mengting He, Shihao Xia, Boqin Qin, Nobuko Yoshida, Tingting Yu, Linhai Song, Yiying Zhang

    Abstract: The execution of smart contracts on Ethereum, a public blockchain system, incurs a fee called gas fee for its computation and data-store consumption. When programmers develop smart contracts (e.g., in the Solidity programming language), they could unknowingly write code snippets that unnecessarily cause more gas fees. These issues, or what we call gas wastes, could lead to significant monetary was… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  24. arXiv:2403.02063  [pdf, other

    cs.CV

    Depth-Guided Robust and Fast Point Cloud Fusion NeRF for Sparse Input Views

    Authors: Shuai Guo, Qiuwen Wang, Yijie Gao, Rong Xie, Li Song

    Abstract: Novel-view synthesis with sparse input views is important for real-world applications like AR/VR and autonomous driving. Recent methods have integrated depth information into NeRFs for sparse input synthesis, leveraging depth prior for geometric and spatial understanding. However, most existing works tend to overlook inaccuracies within depth maps and have low time efficiency. To address these iss… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  25. arXiv:2403.01244  [pdf, other

    cs.CL cs.AI

    Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

    Authors: Jianheng Huang, Leyang Cui, Ante Wang, Chengyi Yang, Xinting Liao, Linfeng Song, Junfeng Yao, Jinsong Su

    Abstract: Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To addr… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  26. arXiv:2403.00561  [pdf, other

    cs.CV cs.AI

    Multi-Task Learning Using Uncertainty to Weigh Losses for Heterogeneous Face Attribute Estimation

    Authors: Huaqing Yuan, Yi He, Peng Du, Lu Song

    Abstract: Face images contain a wide variety of attribute information. In this paper, we propose a generalized framework for joint estimation of ordinal and nominal attributes based on information sharing. We tackle the correlation problem between heterogeneous attributes using hard parameter sharing of shallow features, and trade-off multiple loss functions by considering homoskedastic uncertainty for each… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  27. arXiv:2402.17982  [pdf, other

    cs.CL

    Collaborative decoding of critical tokens for boosting factuality of large language models

    Authors: Lifeng Jin, Baolin Peng, Linfeng Song, Haitao Mi, Ye Tian, Dong Yu

    Abstract: The most common training pipeline for large language models includes pretraining, finetuning and aligning phases, with their respective resulting models, such as the pretrained model and the finetuned model. Finetuned and aligned models show improved abilities of instruction following and safe generation, however their abilities to stay factual about the world are impacted by the finetuning proces… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: work in progress

  28. arXiv:2402.17423  [pdf, other

    cs.LG cs.AI cs.NE

    Reinforced In-Context Black-Box Optimization

    Authors: Lei Song, Chenxiao Gao, Ke Xue, Chenyang Wu, Dong Li, Jianye Hao, Zongzhang Zhang, Chao Qian

    Abstract: Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  29. arXiv:2402.16908  [pdf

    cs.ET cond-mat.mtrl-sci cs.LG eess.IV

    Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics

    Authors: Lekai Song, Pengyu Liu, Jingfang Pei, Yang Liu, Songwei Liu, Shengbo Wang, Leonard W. T. Ng, Tawfique Hasan, Kong-Pang Pun, Shuo Gao, Guohua Hu

    Abstract: The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech… ▽ More

    Submitted 20 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  30. arXiv:2402.15631  [pdf, other

    cs.CL cs.AI

    Fine-Grained Self-Endorsement Improves Factuality and Reasoning

    Authors: Ante Wang, Linfeng Song, Baolin Peng, Ye Tian, Lifeng Jin, Haitao Mi, Jinsong Su, Dong Yu

    Abstract: This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses. Compared with prior ensemble methods (Wang et al., 2022;Chen et al., 2023)) that perform response-level selection, our appro… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  31. arXiv:2402.14328  [pdf, other

    cs.CL

    Understanding and Patching Compositional Reasoning in LLMs

    Authors: Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, Ying Wei

    Abstract: LLMs have marked a revolutonary shift, yet they falter when faced with compositional reasoning tasks. Our research embarks on a quest to uncover the root causes of compositional reasoning failures of LLMs, uncovering that most of them stem from the improperly generated or leveraged implicit reasoning results. Inspired by our empirical findings, we resort to Logit Lens and an intervention experimen… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Work In Progress

  32. arXiv:2402.11438  [pdf, other

    cs.CR cs.AR

    NestedSGX: Bootstrapping Trust to Enclaves within Confidential VMs

    Authors: Wenhao Wang, Linke Song, Benshan Mei, Shuang Liu, Shijun Zhao, Shoumeng Yan, XiaoFeng Wang, Dan Meng, Rui Hou

    Abstract: Integrity is critical for maintaining system security, as it ensures that only genuine software is loaded onto a machine. Although confidential virtual machines (CVMs) function within isolated environments separate from the host, it is important to recognize that users still encounter challenges in maintaining control over the integrity of the code running within the trusted execution environments… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

  33. arXiv:2402.11359  [pdf, other

    cs.AI cs.CL

    Offline Training of Language Model Agents with Functions as Learnable Weights

    Authors: Shaokun Zhang, Jieyu Zhang, Jiale Liu, Linxin Song, Chi Wang, Ranjay Krishna, Qingyun Wu

    Abstract: Researchers and practitioners have recently reframed powerful Large Language Models (LLMs) as agents, enabling them to automate complex tasks largely via the use of specialized functions. To facilitate the development of LLM agents, we present a novel paradigm of training LLM agents without modifying the LLM weights, which is particularly useful when the LLMs are difficult or inaccessible for modi… ▽ More

    Submitted 3 May, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: 22 pages, 10 figures

  34. arXiv:2402.09452  [pdf, other

    eess.SP cs.LG eess.SY

    Data Distribution Dynamics in Real-World WiFi-Based Patient Activity Monitoring for Home Healthcare

    Authors: Mahathir Monjur, Jia Liu, Jingye Xu, Yuntong Zhang, Xiaomeng Wang, Chengdong Li, Hyejin Park, Wei Wang, Karl Shieh, Sirajum Munir, Jing Wang, Lixin Song, Shahriar Nirjon

    Abstract: This paper examines the application of WiFi signals for real-world monitoring of daily activities in home healthcare scenarios. While the state-of-the-art of WiFi-based activity recognition is promising in lab environments, challenges arise in real-world settings due to environmental, subject, and system configuration variables, affecting accuracy and adaptability. The research involved deploying… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

  35. arXiv:2402.09267  [pdf, other

    cs.CL cs.AI

    Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation

    Authors: Xiaoying Zhang, Baolin Peng, Ye Tian, Jingyan Zhou, Lifeng Jin, Linfeng Song, Haitao Mi, Helen Meng

    Abstract: Despite showing increasingly human-like abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e. "hallucinations", even when they hold relevant knowledge. To address these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capa… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 19 pages

  36. arXiv:2402.08957  [pdf, other

    cs.AI cs.CL cs.FL cs.LG cs.PL

    MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data

    Authors: Yinya Huang, Xiaohan Lin, Zhengying Liu, Qingxing Cao, Huajian Xin, Haiming Wang, Zhenguo Li, Linqi Song, Xiaodan Liang

    Abstract: Recent large language models (LLMs) have witnessed significant advancement in various tasks, including mathematical reasoning and theorem proving. As these two tasks require strict and formal multi-step inference, they are appealing domains for exploring the reasoning ability of LLMs but still face important challenges. Previous studies such as Chain-of-Thought (CoT) have revealed the effectivenes… ▽ More

    Submitted 7 March, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Journal ref: ICLR 2024 spotlight

  37. arXiv:2402.04286  [pdf

    q-bio.QM cs.AI cs.LG

    Progress and Opportunities of Foundation Models in Bioinformatics

    Authors: Qing Li, Zhihang Hu, Yixuan Wang, Lei Li, Yimin Fan, Irwin King, Le Song, Yu Li

    Abstract: Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 27 pages, 3 figures, 2 tables

    MSC Class: cs.CL; 92-02 ACM Class: I.2.1

  38. arXiv:2402.01798  [pdf, other

    cs.LG cs.DC

    Improved Quantization Strategies for Managing Heavy-tailed Gradients in Distributed Learning

    Authors: Guangfeng Yan, Tan Li, Yuanzhang Xiao, Hanxu Hou, Linqi Song

    Abstract: Gradient compression has surfaced as a key technique to address the challenge of communication efficiency in distributed learning. In distributed deep learning, however, it is observed that gradient distributions are heavy-tailed, with outliers significantly influencing the design of compression strategies. Existing parameter quantization methods experience performance degradation when this heavy-… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.01160

  39. arXiv:2402.01575  [pdf, other

    cs.RO

    Efficient and Interaction-Aware Trajectory Planning for Autonomous Vehicles with Particle Swarm Optimization

    Authors: Lin Song, David Isele, Naira Hovakimyan, Sangjae Bae

    Abstract: This paper introduces a novel numerical approach to achieving smooth lane-change trajectories in autonomous driving scenarios. Our trajectory generation approach leverages particle swarm optimization (PSO) techniques, incorporating Neural Network (NN) predictions for trajectory refinement. The generation of smooth and dynamically feasible trajectories for the lane change maneuver is facilitated by… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  40. arXiv:2402.01380  [pdf, other

    cs.CV eess.IV

    Efficient Dynamic-NeRF Based Volumetric Video Coding with Rate Distortion Optimization

    Authors: Zhiyu Zhang, Guo Lu, Huanxiong Liang, Anni Tang, Qiang Hu, Li Song

    Abstract: Volumetric videos, benefiting from immersive 3D realism and interactivity, hold vast potential for various applications, while the tremendous data volume poses significant challenges for compression. Recently, NeRF has demonstrated remarkable potential in volumetric video compression thanks to its simple representation and powerful 3D modeling capabilities, where a notable work is ReRF. However, R… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  41. arXiv:2402.01160  [pdf, other

    cs.LG cs.DC

    Truncated Non-Uniform Quantization for Distributed SGD

    Authors: Guangfeng Yan, Tan Li, Yuanzhang Xiao, Congduan Li, Linqi Song

    Abstract: To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients ba… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  42. arXiv:2402.01154  [pdf, other

    cs.CR

    Towards Quantum-Safe Federated Learning via Homomorphic Encryption: Learning with Gradients

    Authors: Guangfeng Yan, Shanxiang Lyu, Hanxu Hou, Zhiyong Zheng, Linqi Song

    Abstract: This paper introduces a privacy-preserving distributed learning framework via private-key homomorphic encryption. Thanks to the randomness of the quantization of gradients, our learning with error (LWE) based encryption can eliminate the error terms, thus avoiding the issue of error expansion in conventional LWE-based homomorphic encryption. The proposed system allows a large number of learning pa… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  43. arXiv:2402.00827   

    cs.CV

    Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

    Authors: Pinxin Liu, Luchuan Song, Daoan Zhang, Hang Hua, Yunlong Tang, Huaijin Tu, Jiebo Luo, Chenliang Xu

    Abstract: Artistic video portrait generation is a significant and sought-after task in the fields of computer graphics and vision. While various methods have been developed that integrate NeRFs or StyleGANs with instructional editing models for creating and editing drivable portraits, these approaches face several challenges. They often rely heavily on large datasets, require extensive customization process… ▽ More

    Submitted 14 March, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: The paper paper needs a big modification, including the tile. This work is no longer its original version

  44. arXiv:2401.17270  [pdf, other

    cs.CV

    YOLO-World: Real-Time Open-Vocabulary Object Detection

    Authors: Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan

    Abstract: The You Only Look Once (YOLO) series of detectors have established themselves as efficient and practical tools. However, their reliance on predefined and trained object categories limits their applicability in open scenarios. Addressing this limitation, we introduce YOLO-World, an innovative approach that enhances YOLO with open-vocabulary detection capabilities through vision-language modeling an… ▽ More

    Submitted 22 February, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Work still in progress. Code & models are available at: https://github.com/AILab-CVC/YOLO-World

  45. arXiv:2401.16045  [pdf, other

    cs.AI

    Type-based Neural Link Prediction Adapter for Complex Query Answering

    Authors: Lingning Song, Yi Zu, Shan Lu, Jieyue He

    Abstract: Answering complex logical queries on incomplete knowledge graphs (KGs) is a fundamental and challenging task in multi-hop reasoning. Recent work defines this task as an end-to-end optimization problem, which significantly reduces the training cost and enhances the generalization of the model by a pretrained link predictors for query answering. However, most existing proposals ignore the critical s… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 11 pages, 3 figures

  46. arXiv:2401.15042  [pdf, other

    cs.CL cs.AI

    PROXYQA: An Alternative Framework for Evaluating Long-Form Text Generation with Large Language Models

    Authors: Haochen Tan, Zhijiang Guo, Zhan Shi, Lu Xu, Zhili Liu, Yunlong Feng, Xiaoguang Li, Yasheng Wang, Lifeng Shang, Qun Liu, Linqi Song

    Abstract: Large Language Models (LLMs) have exhibited remarkable success in long-form context comprehension tasks. However, their capacity to generate long contents, such as reports and articles, remains insufficiently explored. Current benchmarks do not adequately assess LLMs' ability to produce informative and comprehensive content, necessitating a more rigorous evaluation approach. In this study, we intr… ▽ More

    Submitted 13 February, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

  47. arXiv:2401.14112  [pdf, other

    cs.LG cs.AI cs.AR

    FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design

    Authors: Haojun Xia, Zhen Zheng, Xiaoxia Wu, Shiyang Chen, Zhewei Yao, Stephen Youn, Arash Bakhtiari, Michael Wyatt, Donglin Zhuang, Zhongzhu Zhou, Olatunji Ruwase, Yuxiong He, Shuaiwen Leon Song

    Abstract: Six-bit quantization (FP6) can effectively reduce the size of large language models (LLMs) and preserve the model quality consistently across varied applications. However, existing systems do not provide Tensor Core support for FP6 quantization and struggle to achieve practical performance improvements during LLM inference. It is challenging to support FP6 quantization on GPUs due to (1) unfriendl… ▽ More

    Submitted 3 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: Adding URL link of the source code

  48. arXiv:2401.13870  [pdf, other

    cs.IR

    Integrating Large Language Models into Recommendation via Mutual Augmentation and Adaptive Aggregation

    Authors: Sichun Luo, Yuxuan Yao, Bowei He, Yinya Huang, Aojun Zhou, Xinyi Zhang, Yuanzhang Xiao, Mingjie Zhan, Linqi Song

    Abstract: Conventional recommendation methods have achieved notable advancements by harnessing collaborative or sequential information from user behavior. Recently, large language models (LLMs) have gained prominence for their capabilities in understanding and reasoning over textual semantics, and have found utility in various domains, including recommendation. Conventional recommendation methods and LLMs e… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  49. arXiv:2401.10578  [pdf, other

    cs.CV

    3D Shape Completion on Unseen Categories:A Weakly-supervised Approach

    Authors: Lintai Wu, Junhui Hou, Linqi Song, Yong Xu

    Abstract: 3D shapes captured by scanning devices are often incomplete due to occlusion. 3D shape completion methods have been explored to tackle this limitation. However, most of these methods are only trained and tested on a subset of categories, resulting in poor generalization to unseen categories. In this paper, we introduce a novel weakly-supervised framework to reconstruct the complete shapes from uns… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: 13 pages,8 figures

  50. arXiv:2401.10353  [pdf, other

    cs.CL

    Inconsistent dialogue responses and how to recover from them

    Authors: Mian Zhang, Lifeng Jin, Linfeng Song, Haitao Mi, Dong Yu

    Abstract: One critical issue for chat systems is to stay consistent about preferences, opinions, beliefs and facts of itself, which has been shown a difficult problem. In this work, we study methods to assess and bolster utterance consistency of chat systems. A dataset is first developed for studying the inconsistencies, where inconsistent dialogue responses, explanations of the inconsistencies, and recover… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: Accepted in EACL 2024. Code and dataset available at https://github.com/mianzhang/CIDER