Skip to main content

Showing 1–50 of 306 results for author: Wen, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05584  [pdf, other

    cs.CV cs.AI

    A Survey on Backbones for Deep Video Action Recognition

    Authors: Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

    Abstract: Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods bas… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICME workshop

  2. arXiv:2405.04299  [pdf, other

    cs.CV

    ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers

    Authors: Jinke Li, Xiao He, Chonghua Zhou, Xiaoqiang Cheng, Yang Wen, Dan Zhang

    Abstract: 3D occupancy, an advanced perception technology for driving scenarios, represents the entire scene without distinguishing between foreground and background by quantifying the physical space into a grid map. The widely adopted projection-first deformable attention, efficient in transforming image features into 3D representations, encounters challenges in aggregating multi-view features due to senso… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2404.16271  [pdf

    cs.CR cond-mat.mtrl-sci

    True random number generation using metastable 1T' molybdenum ditelluride

    Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

    Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  4. arXiv:2404.15598  [pdf, other

    cs.LG cs.CR

    Federated Learning with Only Positive Labels by Exploring Label Correlations

    Authors: Xuming An, Dui Wang, Li Shen, Yong Luo, Han Hu, Bo Du, Yonggang Wen, Dacheng Tao

    Abstract: Federated learning aims to collaboratively learn a model by using the data from multiple users under privacy constraints. In this paper, we study the multi-label classification problem under the federated learning setting, where trivial solution and extremely poor performance may be obtained, especially when only positive data w.r.t. a single class label are provided for each client. This issue ca… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

  5. arXiv:2404.11869  [pdf, other

    cs.LG cs.SI

    Multi-view Graph Structural Representation Learning via Graph Coarsening

    Authors: Xiaorui Qi, Qijie Bai, Yanlong Wen, Haiwei Zhang, Xiaojie Yuan

    Abstract: Graph Transformers (GTs) have made remarkable achievements in graph-level tasks. However, most existing works regard graph structures as a form of guidance or bias for enhancing node representations, which focuses on node-central perspectives and lacks explicit representations of edges and structures. One natural question is, can we treat graph structures node-like as a whole to learn high-level f… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  6. arXiv:2404.09445  [pdf, other

    cs.LG cs.AI cs.CV

    Exploring Text-to-Motion Generation with Human Preference

    Authors: Jenny Sheng, Matthieu Lin, Andrew Zhao, Kevin Pruvost, Yu-Hui Wen, Yangguang Li, Gao Huang, Yong-Jin Liu

    Abstract: This paper presents an exploration of preference learning in text-to-motion generation. We find that current improvements in text-to-motion generation still rely on datasets requiring expert labelers with motion capture systems. Instead, learning from human preference data does not require motion capture systems; a labeler with no expertise simply compares two generated motions. This is particular… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 HuMoGen Workshop

  7. arXiv:2404.01231  [pdf, other

    cs.CR cs.LG

    Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models

    Authors: Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini

    Abstract: It is commonplace to produce application-specific models by fine-tuning large pre-trained models using a small bespoke dataset. The widespread availability of foundation model checkpoints on the web poses considerable risks, including the vulnerability to backdoor attacks. In this paper, we unveil a new vulnerability: the privacy backdoor attack. This black-box privacy attack aims to amplify the p… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  8. arXiv:2404.00368  [pdf, other

    cs.CV

    Towards Variable and Coordinated Holistic Co-Speech Motion Generation

    Authors: Yifei Liu, Qiong Cao, Yandong Wen, Huaiguang Jiang, Changxing Ding

    Abstract: This paper addresses the problem of generating lifelike holistic co-speech motions for 3D avatars, focusing on two key aspects: variability and coordination. Variability allows the avatar to exhibit a wide range of motions even with similar speech content, while coordination ensures a harmonious alignment among facial expressions, hand gestures, and body poses. We aim to achieve both with ProbTalk… ▽ More

    Submitted 15 April, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: CVPR 2024

  9. arXiv:2403.19866  [pdf, other

    cs.CV cs.AI

    Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization

    Authors: Yuhang Li, Xin Dong, Chen Chen, Jingtao Li, Yuxin Wen, Michael Spranger, Lingjuan Lyu

    Abstract: Synthetic image data generation represents a promising avenue for training deep learning models, particularly in the realm of transfer learning, where obtaining real images within a specific domain can be prohibitively expensive due to privacy and intellectual property considerations. This work delves into the generation and utilization of synthetic images derived from text-to-image generative mod… ▽ More

    Submitted 2 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: ICLR24 Score 6865 https://openreview.net/forum?id=CjPt1AC6w0

  10. arXiv:2403.19438  [pdf, other

    cs.CV cs.RO

    SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control

    Authors: Binyuan Huang, Yuqing Wen, Yucheng Zhao, Yaosi Hu, Yingfei Liu, Fan Jia, Weixin Mao, Tiancai Wang, Chi Zhang, Chang Wen Chen, Zhenzhong Chen, Xiangyu Zhang

    Abstract: Autonomous driving progress relies on large-scale annotated datasets. In this work, we explore the potential of generative models to produce vast quantities of freely-labeled data for autonomous driving applications and present SubjectDrive, the first model proven to scale generative data production in a way that could continuously improve autonomous driving applications. We investigate the impact… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Project page: https://subjectdrive.github.io/

  11. arXiv:2403.18209  [pdf, other

    cs.LG cs.AI cs.RO

    Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous Driving

    Authors: Xuemin Hu, Pan Chen, Yijun Wen, Bo Tang, Long Chen

    Abstract: Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training o… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  12. arXiv:2403.17676  [pdf

    physics.app-ph cs.ET

    Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

    Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

    Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  13. arXiv:2403.14409  [pdf, other

    cs.CL cs.AI

    Locating and Mitigating Gender Bias in Large Language Models

    Authors: Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen

    Abstract: Large language models(LLM) are pre-trained on extensive corpora to learn facts and human cognition which contain human preferences. However, this process can inadvertently lead to these models acquiring biases and stereotypes prevalent in society. Prior research has typically tackled the issue of bias through a one-dimensional perspective, concentrating either on locating or mitigating it. This li… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 23 pages, 5 figures

  14. arXiv:2403.14381  [pdf, other

    cs.CL cs.AI

    Editing Knowledge Representation of Language Lodel via Rephrased Prefix Prompts

    Authors: Yuchen Cai, Ding Cao, Rongxi Guo, Yaqin Wen, Guiquan Liu, Enhong Chen

    Abstract: Neural language models (LMs) have been extensively trained on vast corpora to store factual knowledge about various aspects of the world described in texts. Current technologies typically employ knowledge editing methods or specific prompts to modify LM outputs. However, existing knowledge editing methods are costly and inefficient, struggling to produce appropriate text. Additionally, prompt engi… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 19pages,3figures

  15. arXiv:2403.13863  [pdf, other

    cs.LG cs.AI cs.DB

    DiffImpute: Tabular Data Imputation With Denoising Diffusion Probabilistic Model

    Authors: Yizhu Wen, Kai Yi, Jing Ke, Yiqing Shen

    Abstract: Tabular data plays a crucial role in various domains but often suffers from missing values, thereby curtailing its potential utility. Traditional imputation techniques frequently yield suboptimal results and impose substantial computational burdens, leading to inaccuracies in subsequent modeling tasks. To address these challenges, we propose DiffImpute, a novel Denoising Diffusion Probabilistic Mo… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: 26 pages, 6 figures

  16. arXiv:2403.10127  [pdf, other

    cs.CV

    TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model

    Authors: Changhong Hou, Junchuan Yu, Daqing Ge, Liu Yang, Laidian Xi, Yunxuan Pang, Yi Wen

    Abstract: Landslides are one of the most destructive natural disasters in the world, posing a serious threat to human life and safety. The development of foundation models has provided a new research paradigm for large-scale landslide detection. The Segment Anything Model (SAM) has garnered widespread attention in the field of image segmentation. However, our experiment found that SAM performed poorly in th… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  17. arXiv:2403.07648  [pdf, other

    cs.DC cs.LG

    Characterization of Large Language Model Development in the Datacenter

    Authors: Qinghao Hu, Zhisheng Ye, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang

    Abstract: Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs, often riddled with numerous challenges such as frequent hardware failures, intricate parallelization strategies, and imbalanced resource utilization. In this paper, we present an in-depth characteriz… ▽ More

    Submitted 3 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  18. arXiv:2403.06221  [pdf, other

    cs.AI cs.CL cs.IR

    TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision

    Authors: Ruiwen Zhou, Yingxuan Yang, Muning Wen, Ying Wen, Wenhao Wang, Chunling Xi, Guoqiang Xu, Yong Yu, Weinan Zhang

    Abstract: Numerous large language model (LLM) agents have been built for different tasks like web navigation and online shopping due to LLM's wide knowledge and text-understanding ability. Among these works, many of them utilize in-context examples to achieve generalization without the need for fine-tuning, while few of them have considered the problem of how to select and effectively utilize these examples… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: Codes available at: https://github.com/skyriver-2000/TRAD-Official

  19. arXiv:2403.02576  [pdf, other

    cs.DL cs.LG cs.SI

    AceMap: Knowledge Discovery through Academic Graph

    Authors: Xinbing Wang, Luoyi Fu, Xiaoying Gan, Ying Wen, Guanjie Zheng, Jiaxin Ding, Liyao Xiang, Nanyang Ye, Meng Jin, Shiyu Liang, Bin Lu, Haiwen Wang, Yi Xu, Cheng Deng, Shao Zhang, Huquan Kang, Xingli Wang, Qi Li, Zhixin Guo, Jiexing Qi, Pan Liu, Yuyang Ren, Lyuwen Wu, Jungang Yang, Jianping Zhou , et al. (1 additional authors not shown)

    Abstract: The exponential growth of scientific literature requires effective management and extraction of valuable insights. While existing scientific search engines excel at delivering search results based on relational databases, they often neglect the analysis of collaborations between scientific entities and the evolution of ideas, as well as the in-depth analysis of content within scientific publicatio… ▽ More

    Submitted 14 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Technical Report for AceMap (https://www.acemap.info)

  20. arXiv:2403.00841  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Offline Fictitious Self-Play for Competitive Games

    Authors: Jingxiao Chen, Weiji Xie, Weinan Zhang, Yong yu, Ying Wen

    Abstract: Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a m… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  21. arXiv:2403.00255  [pdf, other

    cs.GT cs.MA

    Leveraging Team Correlation for Approximating Equilibrium in Two-Team Zero-Sum Games

    Authors: Naming Liu, Mingzhi Wang, Youzhi Zhang, Yaodong Yang, Bo An, Ying Wen

    Abstract: Two-team zero-sum games are one of the most important paradigms in game theory. In this paper, we focus on finding an unexploitable equilibrium in large team games. An unexploitable equilibrium is a worst-case policy, where members in the opponent team cannot increase their team reward by taking any policy, e.g., cooperatively changing to other joint policies. As an optimal unexploitable equilibri… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  22. arXiv:2403.00144  [pdf, other

    cs.CL cs.AI cs.LG

    EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

    Authors: Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou

    Abstract: The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performa… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    ACM Class: I.2.7; I.2.6; I.2.m; I.5.1; I.7.m

  23. arXiv:2403.00143  [pdf, other

    cs.CL cs.AI cs.LG

    Ensemble-Based Unsupervised Discontinuous Constituency Parsing by Tree Averaging

    Authors: Behzad Shayegh, Yuqiao Wen, Lili Mou

    Abstract: We address unsupervised discontinuous constituency parsing, where we observe a high variance in the performance of the only previous model. We propose to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance. To begin with, we provide comprehensive computational complexity analysis (in terms of P and NP-complet… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

  24. arXiv:2402.18591  [pdf, ps, other

    cs.LG cs.GT math.ST

    Stochastic contextual bandits with graph feedback: from independence number to MAS number

    Authors: Yuxiao Wen, Yanjun Han, Zhengyuan Zhou

    Abstract: We consider contextual bandits with graph feedback, a class of interactive learning problems with richer structures than vanilla contextual bandits, where taking an action reveals the rewards for all neighboring actions in the feedback graph under all contexts. Unlike the multi-armed bandits setting where a growing literature has painted a near-complete understanding of graph feedback, much remain… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  25. arXiv:2402.17453  [pdf, other

    cs.LG

    DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

    Authors: Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, Jun Wang

    Abstract: In this work, we investigate the potential of large language models (LLMs) based agents to automate data science tasks, with the goal of comprehending task requirements, then building and training the best-fit machine learning models. Despite their widespread success, existing LLM agents are hindered by generating unreasonable experiment plans within this scenario. To this end, we present DS-Agent… ▽ More

    Submitted 6 April, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  26. arXiv:2402.15972  [pdf, other

    cs.LG cs.NI

    Structural Knowledge-Driven Meta-Learning for Task Offloading in Vehicular Networks with Integrated Communications, Sensing and Computing

    Authors: Ruijin Sun, Yao Wen, Nan Cheng, Wei Wan, Rong Chai, Yilong Hui

    Abstract: Task offloading is a potential solution to satisfy the strict requirements of computation-intensive and latency-sensitive vehicular applications due to the limited onboard computing resources. However, the overwhelming upload traffic may lead to unacceptable uploading time. To tackle this issue, for tasks taking environmental data as input, the data perceived by roadside units (RSU) equipped with… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  27. arXiv:2402.14020  [pdf, other

    cs.LG cs.CL cs.CR

    Coercing LLMs to do and reveal (almost) anything

    Authors: Jonas Geiping, Alex Stein, Manli Shu, Khalid Saifullah, Yuxin Wen, Tom Goldstein

    Abstract: It has recently been shown that adversarial attacks on large language models (LLMs) can "jailbreak" the model into making harmful statements. In this work, we argue that the spectrum of adversarial attacks on LLMs is much larger than merely jailbreaking. We provide a broad overview of possible attack surfaces and attack goals. Based on a series of concrete examples, we discuss, categorize and syst… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 32 pages. Implementation available at https://github.com/JonasGeiping/carving

  28. arXiv:2402.12416  [pdf, other

    cs.MA cs.AI

    Aligning Individual and Collective Objectives in Multi-Agent Cooperation

    Authors: Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, Wei Pan

    Abstract: In the field of multi-agent learning, the challenge of mixed-motive cooperation is pronounced, given the inherent contradictions between individual and collective goals. Current research in this domain primarily focuses on incorporating domain knowledge into rewards or introducing additional mechanisms to foster cooperation. However, many of these methods suffer from the drawbacks of manual design… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 15 pages

  29. arXiv:2402.08552  [pdf, other

    cs.LG cs.CV

    Confronting Reward Overoptimization for Diffusion Models: A Perspective of Inductive and Primacy Biases

    Authors: Ziyi Zhang, Sen Zhang, Yibing Zhan, Yong Luo, Yonggang Wen, Dacheng Tao

    Abstract: Bridging the gap between diffusion models and human preferences is crucial for their integration into practical generative workflows. While optimizing downstream reward models has emerged as a promising alignment strategy, concerns arise regarding the risk of excessive optimization with learned reward models, which potentially compromises ground-truth performance. In this work, we confront the rew… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  30. arXiv:2402.08073  [pdf, other

    cs.LG cs.PL cs.SE

    Grounding Data Science Code Generation with Input-Output Specifications

    Authors: Yeming Wen, Pengcheng Yin, Kensen Shi, Henryk Michalewski, Swarat Chaudhuri, Alex Polozov

    Abstract: Large language models (LLMs) have recently demonstrated a remarkable ability to generate code from natural language (NL) prompts. However, in the real world, NL is often too ambiguous to capture the true intent behind programming problems, requiring additional input-output (I/O) specifications. Unfortunately, LLMs can have difficulty aligning their outputs with both the NL prompt and the I/O speci… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  31. arXiv:2402.08010  [pdf, other

    cs.LG cs.AI stat.ML

    Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning

    Authors: Yuxiao Wen, Arthur Jacot

    Abstract: We describe the emergence of a Convolution Bottleneck (CBN) structure in CNNs, where the network uses its first few layers to transform the input representation into a representation that is supported only along a few frequencies and channels, before using the last few layers to map back to the outputs. We define the CBN rank, which describes the number and type of frequencies that are kept inside… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  32. arXiv:2402.07792  [pdf, other

    cs.LG cs.DC

    Empowering Federated Learning for Massive Models with NVIDIA FLARE

    Authors: Holger R. Roth, Ziyue Xu, Yuan-Ting Hsieh, Adithya Renduchintala, Isaac Yang, Zhihong Zhang, Yuhong Wen, Sean Yang, Kevin Lu, Kristopher Kersten, Camir Ricketts, Daguang Xu, Chester Chen, Yan Cheng, Andrew Feng

    Abstract: In the ever-evolving landscape of artificial intelligence (AI) and large language models (LLMs), handling and leveraging data effectively has become a critical challenge. Most state-of-the-art machine learning algorithms are data-centric. However, as the lifeblood of model performance, necessary data cannot always be centralized due to various factors such as privacy, regulation, geopolitics, copy… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  33. arXiv:2402.07157  [pdf, other

    cs.CL cs.AI cs.LG

    Natural Language Reinforcement Learning

    Authors: Xidong Feng, Ziyu Wan, Mengyue Yang, Ziyan Wang, Girish A. Koushik, Yali Du, Ying Wen, Jun Wang

    Abstract: Reinforcement Learning (RL) has shown remarkable abilities in learning policies for decision-making tasks. However, RL is often hindered by issues such as low sample efficiency, lack of interpretability, and sparse supervision signals. To tackle these limitations, we take inspiration from the human learning process and introduce Natural Language Reinforcement Learning (NLRL), which innovatively co… ▽ More

    Submitted 14 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

    Comments: Work in Progress

  34. arXiv:2402.06700  [pdf, other

    cs.LG cs.AI

    Entropy-Regularized Token-Level Policy Optimization for Large Language Models

    Authors: Muning Wen, Cheng Deng, Jun Wang, Weinan Zhang, Ying Wen

    Abstract: Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging di… ▽ More

    Submitted 5 March, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  35. arXiv:2401.12491  [pdf, other

    cs.CL cs.AI

    Assessing and Understanding Creativity in Large Language Models

    Authors: Yunpu Zhao, Rui Zhang, Wenyi Li, Di Huang, Jiaming Guo, Shaohui Peng, Yifan Hao, Yuanbo Wen, Xing Hu, Zidong Du, Qi Guo, Ling Li, Yunji Chen

    Abstract: In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  36. arXiv:2401.10005  [pdf, other

    cs.CV cs.CL

    Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation

    Authors: Kohei Uehara, Nabarun Goswami, Hanqin Wang, Toshiaki Baba, Kohtaro Tanaka, Tomohiro Hashimoto, Kai Wang, Rei Ito, Takagi Naoya, Ryo Umagami, Yingyi Wen, Tanachai Anakewat, Tatsuya Harada

    Abstract: The increasing demand for intelligent systems capable of interpreting and reasoning about visual content requires the development of Large Multi-Modal Models (LMMs) that are not only accurate but also have explicit reasoning capabilities. This paper presents a novel approach to imbue an LMM with the ability to conduct explicit reasoning based on visual content and textual instructions. We introduc… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  37. arXiv:2401.09149  [pdf, other

    cs.DC

    InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

    Authors: Qiaoling Chen, Diandian Gu, Guoteng Wang, Xun Chen, YingTong Xiong, Ting Huang, Qinghao Hu, Xin Jin, Yonggang Wen, Tianwei Zhang, Peng Sun

    Abstract: Large language models (LLMs) with long sequences begin to power more and more fundamentally new applications we use every day. Existing methods for long-sequence LLM training are neither efficient nor compatible with commonly-used training algorithms such as FlashAttention. We design InternEvo to address these issues. InternEvo decouples all of the sharding dimensions into a new hierarchical space… ▽ More

    Submitted 22 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  38. arXiv:2401.08939  [pdf, other

    cs.RO

    Enhancing Campus Mobility: Achievements and Challenges of Autonomous Shuttle "Snow Lion''

    Authors: Yingbing Chen, Jie Cheng, Sheng Wang, Hongji Liu, Xiaodong Mei, Xiaoyang Yan, Mingkai Tang, Ge Sun, Ya Wen, Junwei Cai, Xupeng Xie, Lu Gan, Mandan Chao, Ren Xin, Ming Liu, Jianhao Jiao, Kangcheng Liu, Lujia Wang

    Abstract: The rapid evolution of autonomous vehicles (AVs) has significantly influenced global transportation systems. In this context, we present ``Snow Lion'', an autonomous shuttle meticulously designed to revolutionize on-campus transportation, offering a safer and more efficient mobility solution for students, faculty, and visitors. The primary objective of this research is to enhance campus mobility b… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 9 pages, 9 figures

  39. arXiv:2401.08573  [pdf, other

    cs.CV cs.CR cs.LG

    Benchmarking the Robustness of Image Watermarks

    Authors: Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang

    Abstract: This paper investigates the weaknesses of image watermarking techniques. We present WAVES (Watermark Analysis Via Enhanced Stress-testing), a novel benchmark for assessing watermark robustness, overcoming the limitations of current evaluation methods.WAVES integrates detection and identification tasks, and establishes a standardized evaluation protocol comprised of a diverse range of stress tests.… ▽ More

    Submitted 22 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  40. arXiv:2401.06781  [pdf, other

    cs.AI cs.CL

    PokerGPT: An End-to-End Lightweight Solver for Multi-Player Texas Hold'em via Large Language Model

    Authors: Chenghao Huang, Yanbo Cao, Yinlong Wen, Tao Zhou, Yanru Zhang

    Abstract: Poker, also known as Texas Hold'em, has always been a typical research target within imperfect information games (IIGs). IIGs have long served as a measure of artificial intelligence (AI) development. Representative prior works, such as DeepStack and Libratus heavily rely on counterfactual regret minimization (CFR) to tackle heads-up no-limit Poker. However, it is challenging for subsequent resear… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

  41. arXiv:2401.02138  [pdf, other

    cs.CV

    Explore Human Parsing Modality for Action Recognition

    Authors: Jinfu Liu, Runwei Ding, Yuhang Wen, Nan Dai, Fanyang Meng, Shen Zhao, Mengyuan Liu

    Abstract: Multimodal-based action recognition methods have achieved high success using pose and RGB modality. However, skeletons sequences lack appearance depiction and RGB images suffer irrelevant noise due to modality limitations. To address this, we introduce human parsing feature map as a novel modality, since it can selectively retain effective semantic features of the body parts, while filtering out m… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: text overlap with arXiv:2307.07977

  42. arXiv:2312.17606  [pdf, other

    cs.RO cs.AI cs.LG

    Adaptive Control Strategy for Quadruped Robots in Actuator Degradation Scenarios

    Authors: Xinyuan Wu, Wentao Dong, Hang Lai, Yong Yu, Ying Wen

    Abstract: Quadruped robots have strong adaptability to extreme environments but may also experience faults. Once these faults occur, robots must be repaired before returning to the task, reducing their practical feasibility. One prevalent concern among these faults is actuator degradation, stemming from factors like device aging or unexpected operational events. Traditionally, addressing this problem has re… ▽ More

    Submitted 29 December, 2023; originally announced December 2023.

    Comments: 13 pages, 14 figures, in proceeding of DAI'23

  43. arXiv:2312.16046  [pdf, other

    cs.LG cs.AI physics.ao-ph

    AdaNAS: Adaptively Post-processing with Self-supervised Neural Architecture Search for Ensemble Rainfall Forecasts

    Authors: Yingpeng Wen, Weijiang Yu, Fudan Zheng, Dan Huang, Nong Xiao

    Abstract: Previous post-processing studies on rainfall forecasts using numerical weather prediction (NWP) mainly focus on statistics-based aspects, while learning-based aspects are rarely investigated. Although some manually-designed models are proposed to raise accuracy, they are customized networks, which need to be repeatedly tried and verified, at a huge cost in time and labor. Therefore, a self-supervi… ▽ More

    Submitted 4 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

  44. arXiv:2312.15880  [pdf, other

    cs.CL

    KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph

    Authors: Tiezheng Guo, Qingwen Yang, Chen Wang, Yanyi Liu, Pan Li, Jiawei Tang, Dapeng Li, Yingyou Wen

    Abstract: Large language model (LLM) has achieved outstanding performance on various downstream tasks with its powerful natural language understanding and zero-shot capability, but LLM still suffers from knowledge limitation. Especially in scenarios that require long logical chains or complex reasoning, the hallucination and knowledge limitation of LLM limit its performance in question answering (QA). In th… ▽ More

    Submitted 19 January, 2024; v1 submitted 25 December, 2023; originally announced December 2023.

  45. arXiv:2312.15139  [pdf, other

    cs.CV

    Automatic Tooth Arrangement with Joint Features of Point and Mesh Representations via Diffusion Probabilistic Models

    Authors: Changsong Lei, Mengfei Xia, Shaofeng Wang, Yaqian Liang, Ran Yi, Yuhui Wen, Yongjin Liu

    Abstract: Tooth arrangement is a crucial step in orthodontics treatment, in which aligning teeth could improve overall well-being, enhance facial aesthetics, and boost self-confidence. To improve the efficiency of tooth arrangement and minimize errors associated with unreasonable designs by inexperienced practitioners, some deep learning-based tooth arrangement methods have been proposed. Currently, most ex… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  46. arXiv:2312.13716  [pdf, other

    cs.LG cs.AI

    Critic-Guided Decision Transformer for Offline Reinforcement Learning

    Authors: Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, Yu Qiao

    Abstract: Recent advancements in offline reinforcement learning (RL) have underscored the capabilities of Return-Conditioned Supervised Learning (RCSL), a paradigm that learns the action distribution based on target returns for each state in a supervised manner. However, prevailing RCSL methods largely focus on deterministic trajectory modeling, disregarding stochastic state transitions and the diversity of… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted at AAAI 2024

  47. arXiv:2312.13654  [pdf, other

    cs.IT eess.SP math.OC

    Free Space Optical Integrated Sensing and Communication Based on DCO-OFDM: Performance Metrics and Resource Allocation

    Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

    Abstract: As one of the six usage scenarios of the sixth generation (6G) mobile communication system, integrated sensing and communication (ISAC) has garnered considerable attention, and numerous studies have been conducted on radio-frequency (RF)-ISAC. Benefitting from the communication and sensing capabilities of an optical system, free space optical (FSO)-ISAC becomes a potential complement to RF-ISAC. I… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 13 pages, 8 figures

  48. arXiv:2312.13640  [pdf, other

    eess.SP cs.IT

    Optical Integrated Sensing and Communication: Architectures, Potentials and Challenges

    Authors: Yunfeng Wen, Fang Yang, Jian Song, Zhu Han

    Abstract: Integrated sensing and communication (ISAC) is viewed as a crucial component of future mobile networks and has gained much interest in both academia and industry. Similar to the emergence of radio-frequency (RF) ISAC, the integration of free space optical communication and optical sensing yields optical ISAC (O-ISAC), which is regarded as a powerful complement to its RF counterpart. In this articl… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 7 pages, 5 figures

  49. arXiv:2312.11774  [pdf, other

    cs.CV

    Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation

    Authors: Yuze He, Yushi Bai, Matthieu Lin, Jenny Sheng, Yubin Hu, Qi Wang, Yu-Hui Wen, Yong-Jin Liu

    Abstract: By lifting the pre-trained 2D diffusion models into Neural Radiance Fields (NeRFs), text-to-3D generation methods have made great progress. Many state-of-the-art approaches usually apply score distillation sampling (SDS) to optimize the NeRF representations, which supervises the NeRF optimization with pre-trained text-conditioned 2D diffusion models such as Imagen. However, the supervision signal… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

  50. arXiv:2312.09245  [pdf, other

    cs.CV

    DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

    Authors: Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

    Abstract: Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge… ▽ More

    Submitted 25 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Technical Report