Skip to main content

Showing 1–50 of 650 results for author: Cheng, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05751  [pdf, other

    cs.LG cs.AI cs.PL

    A Multi-Level Superoptimizer for Tensor Programs

    Authors: Mengdi Wu, Xinhao Cheng, Oded Padon, Zhihao Jia

    Abstract: We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $μ$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $μ$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To na… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.04883  [pdf, other

    cs.CV cs.AI cs.LG

    Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

    Authors: Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao

    Abstract: Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose Molecule-Space, an idea that treats multimodal representation spaces as "molecules", and augments pre-trained unified space… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. The code and checkpoints are released at https://github.com/MoleculeSpace/MoleculeSpace

  3. arXiv:2405.04299  [pdf, other

    cs.CV

    ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

    Authors: Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang

    Abstract: 3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to senso… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  4. arXiv:2405.03217  [pdf, other

    cs.CR cs.AR

    PCG: Mitigating Conflict-based Cache Side-channel Attacks with Prefetching

    Authors: Fang Jiang, Fei Tong, Hongyu Wang, Xiaoyu Cheng, Zhe Zhou, Ming Ling, Yuxing Mao

    Abstract: To defend against conflict-based cache side-channel attacks, cache partitioning or remapping techniques were proposed to prevent set conflicts between different security domains or obfuscate the locations of such conflicts. But such techniques complicate cache design and may result in significant performance penalties. Therefore, there have been lightweight prefetching-based schemes proposed to in… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 9 figures, submitting to a journal

  5. arXiv:2405.02241  [pdf, other

    cs.RO

    WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD

    Authors: Xuxin Cheng, Heng Yu, Harry Zhang, Wenxing Deng

    Abstract: We present a novel method for robotic manipulation tasks in human environments that require reasoning about the 3D geometric relationship between a pair of objects. Traditional end-to-end trained policies, which map from pixel observations to low-level robot actions, struggle to reason about complex pose relationships and have difficulty generalizing to unseen object configurations. To address the… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2211.09325

  6. arXiv:2405.02042  [pdf, other

    cs.IT

    Sampling to Achieve the Goal: An Age-aware Remote Markov Decision Process

    Authors: Aimin Li, Shaohua Wu, Gary C. F. Lee, Xiaomeng Cheng, Sumei Sun

    Abstract: Age of Information (AoI) has been recognized as an important metric to measure the freshness of information. Central to this consensus is that minimizing AoI can enhance the freshness of information, thereby facilitating the accuracy of subsequent decision-making processes. However, to date the direct causal relationship that links AoI to the utility of the decision-making process is unexplored. T… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  7. arXiv:2405.01990  [pdf, other

    cs.LG

    Soft Label PU Learning

    Authors: Puning Zhao, Jintao Deng, Xu Cheng

    Abstract: PU learning refers to the classification problem in which only part of positive samples are labeled. Existing PU learning methods treat unlabeled samples equally. However, in many real tasks, from common sense or domain knowledge, some unlabeled samples are more likely to be positive than others. In this paper, we propose soft label PU learning, in which unlabeled data are assigned soft labels acc… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  8. arXiv:2404.19509  [pdf, other

    cs.CL

    Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom

    Authors: Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu

    Abstract: Understanding the non-literal meaning of an utterance is critical for large language models (LLMs) to become human-like social communicators. In this work, we introduce SwordsmanImp, the first Chinese multi-turn-dialogue-based dataset aimed at conversational implicature, sourced from dialogues in the Chinese sitcom $\textit{My Own Swordsman}$. It includes 200 carefully handcrafted questions, all a… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 14 pages, 8 tables and 5 figures

    ACM Class: J.5

  9. arXiv:2404.18246  [pdf, other

    cs.LG cs.CV

    AdaFSNet: Time Series Classification Based on Convolutional Network with a Adaptive and Effective Kernel Size Configuration

    Authors: Haoxiao Wang, Bo Peng, Jianhua Zhang, Xu Cheng

    Abstract: Time series classification is one of the most critical and challenging problems in data mining, existing widely in various fields and holding significant research importance. Despite extensive research and notable achievements with successful real-world applications, addressing the challenge of capturing the appropriate receptive field (RF) size from one-dimensional or multi-dimensional time serie… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCNN 2024

  10. arXiv:2404.16825  [pdf, other

    cs.CV eess.IV

    ResVR: Joint Rescaling and Viewport Rendering of Omnidirectional Images

    Authors: Weiqi Li, Shijie Zhao, Bin Chen, Xinhua Cheng, Junlin Li, Li Zhang, Jian Zhang

    Abstract: With the advent of virtual reality technology, omnidirectional image (ODI) rescaling techniques are increasingly embraced for reducing transmitted and stored file sizes while preserving high image quality. Despite this progress, current ODI rescaling methods predominantly focus on enhancing the quality of images in equirectangular projection (ERP) format, which overlooks the fact that the content… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.15158  [pdf, other

    econ.TH cs.IT

    Blackwell-Monotone Information Costs

    Authors: Xiaoyu Cheng, Yonggyun Kim

    Abstract: A Blackwell-monotone information cost function assigns higher costs to Blackwell more informative experiments. This paper provides simple necessary and sufficient conditions for Blackwell monotonicity over finite experiments. The key condition is a system of linear differential inequalities that are convenient to check given an arbitrary cost function. When the cost function is additively separabl… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 44 pages, 2 figures

  12. arXiv:2404.13985  [pdf, other

    cs.CL

    Information Re-Organization Improves Reasoning in Large Language Models

    Authors: Xiaoxia Cheng, Zeqi Tan, Weiming Lu

    Abstract: Improving the reasoning capabilities of large language models (LLMs) has attracted considerable interest. Recent approaches primarily focus on improving the reasoning process to yield a more precise final answer. However, in scenarios involving contextually aware reasoning, these methods neglect the importance of first identifying logical relationships from the context before proceeding with the r… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 10 pages, 3 figures

  13. arXiv:2404.12888  [pdf, other

    cs.CV cs.GR cs.LG

    Learn2Talk: 3D Talking Face Learns from 2D Talking Face

    Authors: Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin

    Abstract: Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we prop… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  14. arXiv:2404.12115  [pdf, other

    cs.RO

    Caging in Motion: Characterizing Robustness in Manipulation through Energy Margin and Dynamic Caging Analysis

    Authors: Yifei Dong, Xianyi Cheng, Florian T. Pokorny

    Abstract: To develop robust manipulation policies, quantifying robustness is essential. Evaluating robustness in general dexterous manipulation, nonetheless, poses significant challenges due to complex hybrid dynamics, combinatorial explosion of possible contact interactions, global geometry, etc. This paper introduces ``caging in motion'', an approach for analyzing manipulation robustness through energy ma… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 8 pages

  15. arXiv:2404.09313  [pdf, other

    eess.AS cs.AI

    Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

    Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

    Abstract: A song is a combination of singing voice and accompaniment. However, existing works focus on singing voice synthesis and music generation independently. Little attention was paid to explore song synthesis. In this work, we propose a novel task called text-to-song synthesis which incorporating both vocals and accompaniments generation. We develop Melodist, a two-stage text-to-song method that consi… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  16. arXiv:2404.09043  [pdf, other

    cs.CL

    Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation

    Authors: Jia Gu, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: With the rapid advancement of large language models (LLMs) and their remarkable capabilities in handling complex language tasks, an increasing number of studies are employing LLMs as agents to emulate the sequential decision-making processes of humans often represented as Markov decision-making processes (MDPs). The actions within this decision-making framework adhere to specific probability distr… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  17. arXiv:2404.08980  [pdf, other

    cs.LG stat.ML

    Stability and Generalization in Free Adversarial Training

    Authors: Xiwei Cheng, Kexin Fu, Farzan Farnia

    Abstract: While adversarial training methods have resulted in significant improvements in the deep neural nets' robustness against norm-bounded adversarial perturbations, their generalization performance from training samples to test data has been shown to be considerably worse than standard empirical risk minimization methods. Several recent studies seek to connect the generalization behavior of adversaria… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  18. arXiv:2404.08963  [pdf, ps, other

    cs.GT

    Facility Assignment with Fair Cost Sharing: Equilibrium and Mechanism Design

    Authors: Mengfan Ma, Mingyu Xiao, Tian Bai, Xin Cheng

    Abstract: In the one-dimensional facility assignment problem, m facilities and n agents are positioned along the real line. Each agent will be assigned to a single facility to receive service. Each facility incurs a building cost, which is shared equally among the agents utilizing it. Additionally, each agent independently bears a connection cost to access a facility. Thus, an agent's cost is the sum of the… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  19. arXiv:2404.05014  [pdf, other

    cs.CV

    MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

    Authors: Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

    Abstract: Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V is that existing models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose \textbf{MagicTime}, a m… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  20. arXiv:2404.04990  [pdf, other

    cs.CL

    MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models

    Authors: Zihao Wei, Jingcheng Deng, Liang Pang, Hanxing Ding, Huawei Shen, Xueqi Cheng

    Abstract: The extensive utilization of large language models (LLMs) underscores the crucial necessity for precise and contemporary knowledge embedded within their intrinsic parameters. Existing research on knowledge editing primarily concentrates on monolingual scenarios, neglecting the complexities presented by multilingual contexts and multi-hop reasoning. To address these challenges, our study introduces… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  21. arXiv:2404.01767  [pdf, other

    cs.CL

    Class-Incremental Few-Shot Event Detection

    Authors: Kailin Zhao, Xiaolong Jin, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2404.01695  [pdf, other

    cs.LG

    Selective Temporal Knowledge Graph Reasoning

    Authors: Zhongni Hou, Xiaolong Jin, Zixuan Li, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Temporal Knowledge Graph (TKG), which characterizes temporally evolving facts in the form of (subject, relation, object, timestamp), has attracted much attention recently. TKG reasoning aims to predict future facts based on given historical ones. However, existing TKG reasoning models are unable to abstain from predictions they are uncertain, which will inevitably bring risks in real-world applica… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  23. arXiv:2404.01574  [pdf, other

    cs.IR cs.CR cs.LG

    Multi-granular Adversarial Attacks against Black-box Neural Ranking Models

    Authors: Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Adversarial ranking attacks have gained increasing attention due to their success in probing vulnerabilities, and, hence, enhancing the robustness, of neural ranking models. Conventional attack methods employ perturbations at a single granularity, e.g., word or sentence level, to target documents. However, limiting perturbations to a single level of granularity may reduce the flexibility of advers… ▽ More

    Submitted 10 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  24. arXiv:2404.00216  [pdf, other

    cs.CL cs.AI

    Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark

    Authors: Baolong Bi, Shenghua Liu, Yiwei Wang, Lingrui Mei, Xueqi Cheng

    Abstract: The rapid development of large language models (LLMs) enables them to convey factual knowledge in a more human-like fashion. Extensive efforts have been made to reduce factual hallucinations by modifying LLMs with factuality decoding. However, they also pose risks of hindering knowledge updates, as they make models overly confident in known facts. In this work, we first revisite the current factua… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  25. arXiv:2403.19275  [pdf, other

    cs.CL cs.AI

    Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent

    Authors: Junkai Zhou, Liang Pang, Ya Jing, Jia Gu, Huawei Shen, Xueqi Cheng

    Abstract: Constructing personalized and anthropomorphic agents holds significant importance in the simulation of social networks. However, there are still two key problems in existing works: the agent possesses world knowledge that does not belong to its personas, and it cannot eliminate the interference of diverse persona information on current actions, which reduces the personalization and anthropomorphis… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  26. arXiv:2403.19216  [pdf, other

    cs.IR

    Are Large Language Models Good at Utility Judgments?

    Authors: Hengran Zhang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Retrieval-augmented generation (RAG) is considered to be a promising approach to alleviate the hallucination issue of large language models (LLMs), and it has received widespread attention from researchers recently. Due to the limitation in the semantic understanding of retrieval models, the success of RAG heavily lies on the ability of LLMs to identify passages with utility. Recent efforts have e… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Acctepted by SIGIR2024

  27. arXiv:2403.16967  [pdf, other

    cs.RO cs.CV cs.LG

    Visual Whole-Body Control for Legged Loco-Manipulation

    Authors: Minghuan Liu, Zixuan Chen, Xuxin Cheng, Yandong Ji, Rizhao Qiu, Ruihan Yang, Xiaolong Wang

    Abstract: We study the problem of mobile manipulation using legged robots equipped with an arm, namely legged loco-manipulation. The robot legs, while usually utilized for mobility, offer an opportunity to amplify the manipulation capabilities by conducting whole-body control. That is, the robot can control the legs and the arm at the same time to extend its workspace. We propose a framework that can conduc… ▽ More

    Submitted 20 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Add more details. The first two authors contribute equally. Project page: https://wholebody-b1.github.io

  28. arXiv:2403.14985  [pdf, other

    cs.CR

    FileDES: A Secure Scalable and Succinct Decentralized Encrypted Storage Network

    Authors: Minghui Xu, Jiahao Zhang, Hechuan Guo, Xiuzhen Cheng, Dongxiao Yu, Qin Hu, Yijun Li, Yipu Wu

    Abstract: Decentralized Storage Network (DSN) is an emerging technology that challenges traditional cloud-based storage systems by consolidating storage capacities from independent providers and coordinating to provide decentralized storage and retrieval services. However, current DSNs face several challenges associated with data privacy and efficiency of the proof systems. To address these issues, we propo… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 10 pages, 8 figures, 1 table. Accepted by 2024 IEEE INFOCOM

  29. arXiv:2403.14312  [pdf, other

    cs.CL

    ChainLM: Empowering Large Language Models with Improved Chain-of-Thought Prompting

    Authors: Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Chain-of-Thought (CoT) prompting can enhance the reasoning capabilities of large language models (LLMs), establishing itself as a primary approach to solving complex reasoning tasks. Existing CoT synthesis approaches usually focus on simpler reasoning tasks and thus result in low-quality and inconsistent CoT prompts. In response to this challenge, we present an empirical investigation of CoT promp… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  30. arXiv:2403.13829  [pdf, other

    q-bio.BM cs.LG

    DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

    Authors: Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu

    Abstract: Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted to ICLR 2024

  31. arXiv:2403.12499  [pdf, other

    cs.IR

    Listwise Generative Retrieval Models via a Sequential Learning Process

    Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Xueqi Cheng

    Abstract: Recently, a novel generative retrieval (GR) paradigm has been proposed, where a single sequence-to-sequence model is learned to directly generate a list of relevant document identifiers (docids) given a query. Existing GR models commonly employ maximum likelihood estimation (MLE) for optimization: this involves maximizing the likelihood of a single relevant docid given an input query, with the ass… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM Transactions on Information Systems

  32. arXiv:2403.12350  [pdf, other

    cs.LG

    Friendly Sharpness-Aware Minimization

    Authors: Tao Li, Pan Zhou, Zhengbao He, Xinwen Cheng, Xiaolin Huang

    Abstract: Sharpness-Aware Minimization (SAM) has been instrumental in improving deep neural network training by minimizing both training loss and loss sharpness. Despite the practical success, the mechanisms behind SAM's generalization enhancements remain elusive, limiting its progress in deep learning optimization. In this work, we investigate SAM's core components for generalization improvement and introd… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  33. arXiv:2403.12038  [pdf, other

    cs.CV

    Zero-Shot Image Feature Consensus with Deep Functional Maps

    Authors: Xinle Cheng, Congyue Deng, Adam Harley, Yixin Zhu, Leonidas Guibas

    Abstract: Correspondences emerge from large-scale vision models trained for generative and discriminative tasks. This has been revealed and benchmarked by computing correspondence maps between pairs of images, using nearest neighbors on the feature grids. Existing work has attempted to improve the quality of these correspondence maps by carefully mixing features from different sources, such as by combining… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  34. arXiv:2403.11439  [pdf, other

    cs.CL

    StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation

    Authors: Jinpeng Li, Zekai Zhang, Quan Tu, Xin Cheng, Dongyan Zhao, Rui Yan

    Abstract: Large Language Models (LLMs) demonstrate superior performance in generative scenarios and have attracted widespread attention. Among them, stylized dialogue generation is essential in the context of LLMs for building intelligent and engaging dialogue agent. However the ability of LLMs is data-driven and limited by data bias, leading to poor performance on specific tasks. In particular, stylized di… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  35. arXiv:2403.10629  [pdf, other

    cs.RO eess.SY

    Virtual Elastic Tether: a New Approach for Multi-agent Navigation in Confined Aquatic Environments

    Authors: Kanzhong Yao, Xueliang Cheng, Keir Groves, Barry Lennox, Ognjen Marjanovic, Simon Watson

    Abstract: Underwater navigation is a challenging area in the field of mobile robotics due to inherent constraints in self-localisation and communication in underwater environments. Some of these challenges can be mitigated by using collaborative multi-agent teams. However, when applied underwater, the robustness of traditional multi-agent collaborative control approaches is highly limited due to the unavail… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  36. arXiv:2403.08902  [pdf, other

    cs.CV

    Envision3D: One Image to 3D with Anchor Views Interpolation

    Authors: Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

    Abstract: We present Envision3D, a novel method for efficiently generating high-quality 3D content from a single image. Recent methods that extract 3D content from multi-view images generated by diffusion models show great potential. However, it is still challenging for diffusion models to generate dense multi-view consistent images, which is crucial for the quality of 3D content extraction. To address this… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: GitHub repository: https://github.com/PKU-YuanGroup/Envision3D

  37. arXiv:2403.07969  [pdf, other

    cs.LG cs.AI

    KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

    Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.07444  [pdf, other

    cs.NI eess.SP

    A Survey on Federated Learning in Intelligent Transportation Systems

    Authors: Rongqing Zhang, Hanqiu Wang, Bing Li, Xiang Cheng, Liuqing Yang

    Abstract: The development of Intelligent Transportation System (ITS) has brought about comprehensive urban traffic information that not only provides convenience to urban residents in their daily lives but also enhances the efficiency of urban road usage, leading to a more harmonious and sustainable urban life. Typical scenarios in ITS mainly include traffic flow prediction, traffic target recognition, and… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  39. arXiv:2403.05156  [pdf, other

    cs.CR

    On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

    Authors: Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, Xiuzhen Cheng

    Abstract: Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 18 pages, 4 figures

  40. arXiv:2403.00107  [pdf

    cs.DL

    Talent hat, cross-border mobility, and career development in China

    Authors: Yurui Huang, Xuesen Cheng, Chaolin Tian, Xunyi Jiang, Langtian Ma, Yifang Ma

    Abstract: This study aims to investigate the influence of cross-border recruitment program in China, which confers scientists with a 'talent hat' including a startup package comprising significant bonuses, pay, and funding, on their future performance and career development. By curating a unique dataset from China's 10-year talent recruitment program, we employed multiple matching designs to quantify the ef… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  41. arXiv:2402.18789  [pdf, other

    cs.DC cs.CL cs.LG

    FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning

    Authors: Xupeng Miao, Gabriele Oliaro, Xinhao Cheng, Mengdi Wu, Colin Unger, Zhihao Jia

    Abstract: Parameter-efficient finetuning (PEFT) is a widely used technique to adapt large language models for different tasks. Service providers typically create separate systems for users to perform PEFT model finetuning and inference tasks. This is because existing systems cannot handle workloads that include a mix of inference and PEFT finetuning requests. As a result, shared GPU resources are underutili… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  42. arXiv:2402.18150  [pdf, other

    cs.CL cs.AI cs.IR

    Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation

    Authors: Shicheng Xu, Liang Pang, Mo Yu, Fandong Meng, Huawei Shen, Xueqi Cheng, Jie Zhou

    Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating additional information from retrieval. However, studies have shown that LLMs still face challenges in effectively using the retrieved information, even ignoring it or being misled by it. The key reason is that the training of LLMs does not clearly make LLMs learn how to utilize input retrieved texts with va… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  43. arXiv:2402.17532  [pdf, other

    cs.CL

    Retrieval is Accurate Generation

    Authors: Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

    Abstract: Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retr… ▽ More

    Submitted 16 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  44. arXiv:2402.16796  [pdf, other

    cs.RO cs.LG

    Expressive Whole-Body Control for Humanoid Robots

    Authors: Xuxin Cheng, Yandong Ji, Junming Chen, Ruihan Yang, Ge Yang, Xiaolong Wang

    Abstract: Can we enable humanoid robots to generate rich, diverse, and expressive motions in the real world? We propose to learn a whole-body control policy on a human-sized robot to mimic human motions as realistic as possible. To train such a policy, we leverage the large-scale human motion capture data from the graphics community in a Reinforcement Learning framework. However, directly performing imitati… ▽ More

    Submitted 5 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Website: https://expressive-humanoid.github.io

  45. arXiv:2402.16767  [pdf, other

    cs.IR cs.CL

    CorpusBrain++: A Continual Generative Pre-Training Framework for Knowledge-Intensive Language Tasks

    Authors: Jiafeng Guo, Changjiang Zhou, Ruqing Zhang, Jiangui Chen, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Knowledge-intensive language tasks (KILTs) typically require retrieving relevant documents from trustworthy corpora, e.g., Wikipedia, to produce specific answers. Very recently, a pre-trained generative retrieval model for KILTs, named CorpusBrain, was proposed and reached new state-of-the-art retrieval performance. However, most existing research on KILTs, including CorpusBrain, has predominantly… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Submitted to ACM Transactions on Information Systems

  46. arXiv:2402.16297  [pdf, other

    cs.LG cs.AI

    Poisson-Gamma Dynamical Systems with Non-Stationary Transition Dynamics

    Authors: Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

    Abstract: Bayesian methodologies for handling count-valued time series have gained prominence due to their ability to infer interpretable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with noisy and incomplete count data. Among these Bayesian models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing the evolving dynamics underlyin… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  47. arXiv:2402.15109  [pdf, other

    cs.LG

    Machine Unlearning by Suppressing Sample Contribution

    Authors: Xinwen Cheng, Zhehao Huang, Xiaolin Huang

    Abstract: Machine Unlearning (MU) is to forget data from a well-trained model, which is practically important due to the "right to be forgotten". In this paper, we start from the fundamental distinction between training data and unseen data on their contribution to the model: the training data contributes to the final model while the unseen data does not. We theoretically discover that the input sensitivity… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  48. arXiv:2402.14272  [pdf, other

    cs.CL

    Qsnail: A Questionnaire Dataset for Sequential Question Generation

    Authors: Yan Lei, Liang Pang, Yuanzhuo Wang, Huawei Shen, Xueqi Cheng

    Abstract: The questionnaire is a professional research methodology used for both qualitative and quantitative analysis of human opinions, preferences, attitudes, and behaviors. However, designing and evaluating questionnaires demands significant effort due to their intricate and complex structure. Questionnaires entail a series of questions that must conform to intricate constraints involving the questions,… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted to the LREC-COLING 2024

  49. arXiv:2402.13625  [pdf, other

    cs.CL

    MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning

    Authors: Wanqing Cui, Keping Bi, Jiafeng Guo, Xueqi Cheng

    Abstract: Since commonsense information has been recorded significantly less frequently than its existence, language models pre-trained by text generation have difficulty to learn sufficient commonsense knowledge. Several studies have leveraged text retrieval to augment the models' commonsense ability. Unlike text, images capture commonsense information inherently but little effort has been paid to effectiv… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  50. arXiv:2402.13576  [pdf, other

    cs.CV cs.IR

    Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement

    Authors: Danyang Hou, Liang Pang, Huawei Shen, Xueqi Cheng

    Abstract: Video Corpus Moment Retrieval (VCMR) is a new video retrieval task aimed at retrieving a relevant moment from a large corpus of untrimmed videos using a text query. The relevance between the video and query is partial, mainly evident in two aspects:~(1)~Scope: The untrimmed video contains many frames, but not all are relevant to the query. Strong relevance is typically observed only within the rel… ▽ More

    Submitted 23 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: camera-ready version of ACM ICMR 2024