Skip to main content

Showing 1–50 of 226 results for author: Sun, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02811  [pdf, other

    cs.CV

    PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection

    Authors: Zhaoqi Leng, Pei Sun, Tong He, Dragomir Anguelov, Mingxing Tan

    Abstract: 3D object detectors for point clouds often rely on a pooling-based PointNet to encode sparse points into grid-like voxels or pillars. In this paper, we identify that the common PointNet design introduces an information bottleneck that limits 3D object detection accuracy and scalability. To address this limitation, we propose PVTransformer: a transformer-based point-to-voxel architecture for 3D det… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  2. arXiv:2405.02357  [pdf, other

    cs.LG

    Large Language Models for Mobility in Transportation Systems: A Survey on Forecasting Tasks

    Authors: Zijian Zhang, Yujie Sun, Zepu Wang, Yuqi Nie, Xiaobo Ma, Peng Sun, Ruolin Li

    Abstract: Mobility analysis is a crucial element in the research area of transportation systems. Forecasting traffic information offers a viable solution to address the conflict between increasing transportation demands and the limitations of transportation infrastructure. Predicting human travel is significant in aiding various transportation and urban management tasks, such as taxi dispatch and urban plan… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 9 pages

  3. arXiv:2405.02320  [pdf, other

    cs.IT cs.AI

    A SER-based Device Selection Mechanism in Multi-bits Quantization Federated Learning

    Authors: Pengcheng Sun, Erwu Liu, Rui Wang

    Abstract: The quality of wireless communication will directly affect the performance of federated learning (FL), so this paper analyze the influence of wireless communication on FL through symbol error rate (SER). In FL system, non-orthogonal multiple access (NOMA) can be used as the basic communication framework to reduce the communication congestion and interference caused by multiple users, which takes a… ▽ More

    Submitted 20 April, 2024; originally announced May 2024.

  4. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  5. arXiv:2404.14700  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    FlashSpeech: Efficient Zero-Shot Speech Synthesis

    Authors: Zhen Ye, Zeqian Ju, Haohe Liu, Xu Tan, Jianyi Chen, Yiwen Lu, Peiwen Sun, Jiahao Pan, Weizhen Bian, Shulin He, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent progress in large-scale zero-shot speech synthesis has been significantly advanced by language models and diffusion models. However, the generation process of both methods is slow and computationally intensive. Efficient speech synthesis using a lower computing budget to achieve quality on par with previous work remains a significant challenge. In this paper, we present FlashSpeech, a large… ▽ More

    Submitted 24 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Efficient zero-shot speech synthesis

  6. arXiv:2404.13940  [pdf, other

    cs.CL

    A User-Centric Benchmark for Evaluating Large Language Models

    Authors: Jiayin Wang, Fengran Mo, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie

    Abstract: Large Language Models (LLMs) are essential tools to collaborate with users on different tasks. Evaluating their performance to serve users' needs in real-world scenarios is important. While many benchmarks have been created, they mainly focus on specific predefined model abilities. Few have covered the intended utilization of LLMs by real users. To address this oversight, we propose benchmarking L… ▽ More

    Submitted 22 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.11903  [pdf, other

    cs.CV

    Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition

    Authors: Xunsong Li, Pengzhan Sun, Yangcen Liu, Lixin Duan, Wen Li

    Abstract: The interactions between human and objects are important for recognizing object-centric actions. Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action recognition model for extracting video features and learning the object relations for action recognition. However, since the action prior is unknown… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE Transactions on Multimedia

  8. arXiv:2404.11051  [pdf

    cs.CV

    WPS-Dataset: A benchmark for wood plate segmentation in bark removal processing

    Authors: Rijun Wang, Guanghao Zhang, Fulong Liang, Bo Wang, Xiangwei Mou, Yesheng Chen, Peng Sun, Canjin Wang

    Abstract: Using deep learning methods is a promising approach to improving bark removal efficiency and enhancing the quality of wood products. However, the lack of publicly available datasets for wood plate segmentation in bark removal processing poses challenges for researchers in this field. To address this issue, a benchmark for wood plate segmentation in bark removal processing named WPS-dataset is prop… ▽ More

    Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Report number: b06d7e0b-306f-476a-a72d-59a8793ac232 | v.1.2

  9. arXiv:2404.09526  [pdf, other

    cs.DC cs.LG

    LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism

    Authors: Bingyang Wu, Shengyu Liu, Yinmin Zhong, Peng Sun, Xuanzhe Liu, Xin Jin

    Abstract: The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request. Restricted by static parallelism strategies, existing LLM serving systems cannot efficiently utilize the underlying resources to serve variable-length requests in different phases. To address this… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  10. Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty

    Authors: Peijie Sun, Yifan Wang, Min Zhang, Chuhan Wu, Yan Fang, Hong Zhu, Yuan Fang, Meng Wang

    Abstract: With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extrem… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages,6 figures, WWW 2024 Industry Track, with three accept, two weak accept scores

  11. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  12. arXiv:2404.01008  [pdf, other

    cs.IR

    EEG-SVRec: An EEG Dataset with User Multidimensional Affective Engagement Labels in Short Video Recommendation

    Authors: Shaorun Zhang, Zhiyu He, Ziyi Ye, Peijie Sun, Qingyao Ai, Min Zhang, Yiqun Liu

    Abstract: In recent years, short video platforms have gained widespread popularity, making the quality of video recommendations crucial for retaining users. Existing recommendation systems primarily rely on behavioral data, which faces limitations when inferring user preferences due to issues such as data sparsity and noise from accidental interactions or personal habits. To address these challenges and pro… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  13. arXiv:2404.00774  [pdf, other

    cs.LG

    SOAR: Improved Indexing for Approximate Nearest Neighbor Search

    Authors: Philip Sun, David Simcha, Dave Dopson, Ruiqi Guo, Sanjiv Kumar

    Abstract: This paper introduces SOAR: Spilling with Orthogonality-Amplified Residuals, a novel data indexing technique for approximate nearest neighbor (ANN) search. SOAR extends upon previous approaches to ANN search, such as spill trees, that utilize multiple redundant representations while partitioning the data to reduce the probability of missing a nearest neighbor during search. Rather than training an… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: Advances in Neural Information Processing Systems 36 (2023) 3189-3204

  14. arXiv:2403.20296  [pdf, other

    cs.IR

    Aiming at the Target: Filter Collaborative Information for Cross-Domain Recommendation

    Authors: Hanyu Li, Weizhi Ma, Peijie Sun, Jiayu Li, Cunxiang Yin, Yancheng He, Guoqiang Xu, Min Zhang, Shaoping Ma

    Abstract: Cross-domain recommender (CDR) systems aim to enhance the performance of the target domain by utilizing data from other related domains. However, irrelevant information from the source domain may instead degrade target domain performance, which is known as the negative transfer problem. There have been some attempts to address this problem, mostly by designing adaptive representations for overlapp… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  15. arXiv:2403.18348  [pdf, other

    cs.IR

    Sequential Recommendation with Latent Relations based on Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Qingyao Ai, Yiqun Liu, Mingchen Cai, Min Zhang

    Abstract: Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeli… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by SIGIR 2024

  16. arXiv:2403.18325  [pdf, other

    cs.IR

    Common Sense Enhanced Knowledge-based Recommendation with Large Language Model

    Authors: Shenghao Yang, Weizhi Ma, Peijie Sun, Min Zhang, Qingyao Ai, Yiqun Liu, Mingchen Cai

    Abstract: Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the for… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by DASFAA 2024

  17. arXiv:2403.18317  [pdf, other

    cs.IR

    A Situation-aware Enhancer for Personalized Recommendation

    Authors: Jiayu Li, Peijie Sun, Chumeng Jiang, Weizhi Ma, Qingyao Ai, Min Zhang

    Abstract: When users interact with Recommender Systems (RecSys), current situations, such as time, location, and environment, significantly influence their preferences. Situations serve as the background for interactions, where relationships between users and items evolve with situation changes. However, existing RecSys treat situations, users, and items on the same level. They can only model the relations… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at the International Conference on Database Systems for Advanced Applications (DASFAA 2024)

  18. arXiv:2403.17297  [pdf, other

    cs.CL cs.AI

    InternLM2 Technical Report

    Authors: Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang , et al. (75 additional authors not shown)

    Abstract: The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context m… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  19. arXiv:2403.15769  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    FusionINN: Invertible Image Fusion for Brain Tumor Monitoring

    Authors: Nishant Kumar, Ziyan Tao, Jaikirat Singh, Yang Li, Peiwen Sun, Binghui Zhao, Stefan Gumhold

    Abstract: Image fusion typically employs non-invertible neural networks to merge multiple source images into a single fused image. However, for clinical experts, solely relying on fused images may be insufficient for making diagnostic decisions, as the fusion mechanism blends features from source images, thereby making it difficult to interpret the underlying tumor pathology. We introduce FusionINN, a novel… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: Source code available at https://github.com/nish03/FusionINN

  20. arXiv:2403.15489  [pdf

    eess.SP cs.AI cs.HC cs.LG

    EEG decoding with conditional identification information

    Authors: Pengfei Sun, Jorg De Winne, Paul Devos, Dick Botteldooren

    Abstract: Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces. Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals. Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities. However, DNN still faces challenge in d… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by 6th International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI' 2024)

  21. arXiv:2403.09720  [pdf, other

    cs.CL cs.AI

    Fine-tuning vs Prompting, Can Language Models Understand Human Values?

    Authors: Pingwei Sun

    Abstract: Accurately handling the underlying support values in sentences is crucial for understanding the speaker's tendencies, yet it poses a challenging task in natural language understanding (NLU). In this article, we explore the potential of fine-tuning and prompt tuning in this downstream task, using the Human Value Detection 2023. Additionally, we attempt to validate whether models can effectively sol… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  22. arXiv:2403.07648  [pdf, other

    cs.DC cs.LG

    Characterization of Large Language Model Development in the Datacenter

    Authors: Qinghao Hu, Zhisheng Ye, Zerui Wang, Guoteng Wang, Meng Zhang, Qiaoling Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang Wen, Tianwei Zhang

    Abstract: Large Language Models (LLMs) have presented impressive performance across several transformative tasks. However, it is non-trivial to efficiently utilize large-scale cluster resources to develop LLMs, often riddled with numerous challenges such as frequent hardware failures, intricate parallelization strategies, and imbalanced resource utilization. In this paper, we present an in-depth characteriz… ▽ More

    Submitted 3 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  23. arXiv:2403.01587  [pdf, other

    cs.CE

    Monitoring the Seismic Behavior of a Scaled RC Frame with Intermediate Ductility in a Shaking Table Test

    Authors: Mohammad Vasef, Mohammad Sadegh Marefat, Sina Shid-Moosavi, Peng "Patrick" Sun

    Abstract: One of the commonly used seismic force-resisting systems in structures is Reinforced Concrete (RC) Intermediate Moment Frames (IMF). Although using the IMF is not allowed in high seismic hazard zones according to ASCE 7-10, it is permitted in both Iran's 2800 Seismic Standard and New Zealand's Seismic Code. This study investigates the seismic behavior of a reinforced concrete IMF subjected to eart… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 8th World Conference on Structural Control and Monitoring (8WCSCM), Orlando, FL. (2022)

  24. arXiv:2402.19054  [pdf, other

    cs.CR cs.AI

    RobWE: Robust Watermark Embedding for Personalized Federated Learning Model Ownership Protection

    Authors: Yang Xu, Yunlin Tan, Cheng Zhang, Kai Chi, Peng Sun, Wenyuan Yang, Ju Ren, Hongbo Jiang, Yaoxue Zhang

    Abstract: Embedding watermarks into models has been widely used to protect model ownership in federated learning (FL). However, existing methods are inadequate for protecting the ownership of personalized models acquired by clients in personalized FL (PFL). This is due to the aggregation of the global model in PFL, resulting in conflicts over clients' private watermarks. Moreover, malicious clients may tamp… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  25. arXiv:2402.16117  [pdf, other

    cs.RO cs.AI cs.CV

    RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis

    Authors: Yao Mu, Junting Chen, Qinglong Zhang, Shoufa Chen, Qiaojun Yu, Chongjian Ge, Runjian Chen, Zhixuan Liang, Mengkang Hu, Chaofan Tao, Peize Sun, Haibao Yu, Chao Yang, Wenqi Shao, Wenhai Wang, Jifeng Dai, Yu Qiao, Mingyu Ding, Ping Luo

    Abstract: Robotic behavior synthesis, the problem of understanding multimodal inputs and generating precise physical control for robots, is an important part of Embodied AI. Despite successes in applying multimodal large language models for high-level understanding, it remains challenging to translate these conceptual understandings into detailed robotic actions while achieving generalization across various… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  26. arXiv:2402.14440  [pdf, other

    cs.IR

    Recommender for Its Purpose: Repeat and Exploration in Food Delivery Recommendations

    Authors: Jiayu Li, Aixin Sun, Weizhi Ma, Peijie Sun, Min Zhang

    Abstract: Recommender systems have been widely used for various scenarios, such as e-commerce, news, and music, providing online contents to help and enrich users' daily life. Different scenarios hold distinct and unique characteristics, calling for domain-specific investigations and corresponding designed recommender systems. Therefore, in this paper, we focus on food delivery recommendations to unveil uni… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  27. Neighborhood-Enhanced Supervised Contrastive Learning for Collaborative Filtering

    Authors: Peijie Sun, Le Wu, Kun Zhang, Xiangzhi Chen, Meng Wang

    Abstract: While effective in recommendation tasks, collaborative filtering (CF) techniques face the challenge of data sparsity. Researchers have begun leveraging contrastive learning to introduce additional self-supervised signals to address this. However, this approach often unintentionally distances the target user/item from their collaborative neighbors, limiting its efficacy. In response, we propose a s… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Journal ref: IEEE TKDE, 2023

  28. arXiv:2402.08172  [pdf, other

    cs.CE math.NA

    A Projection-Based Time-Segmented Reduced Order Model for Fluid-Structure Interactions

    Authors: Qijia Zhai, Shiquan Zhang, Pengtao Sun, Xiaoping Xie

    Abstract: In this paper, a type of novel projection-based, time-segmented reduced order model (ROM) is proposed for dynamic fluid-structure interaction (FSI) problems based upon the arbitrary Lagrangian--Eulerian (ALE)-finite element method (FEM) in a monolithic frame, where spatially, each variable is separated from others in terms of their attribution (fluid/structure), category (velocity/pressure) and co… ▽ More

    Submitted 14 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  29. arXiv:2402.08078  [pdf, other

    cs.CL cs.LG

    Large Language Models as Agents in Two-Player Games

    Authors: Yang Liu, Peng Sun, Hang Li

    Abstract: By formally defining the training processes of large language models (LLMs), which usually encompasses pre-training, supervised fine-tuning, and reinforcement learning with human feedback, within a single and unified machine learning paradigm, we can glean pivotal insights for advancing LLM technologies. This position paper delineates the parallels between the training methods of LLMs and the stra… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  30. arXiv:2402.05808  [pdf, other

    cs.AI cs.CL cs.LG

    Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

    Authors: Zhiheng Xi, Wenxiang Chen, Boyang Hong, Senjie Jin, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challenge in applying RL to complex reasoning is to identify a sequence of actions that result in positive rewards and provide appropriate supervision for o… ▽ More

    Submitted 17 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Preprint. Codes released: https://github.com/WooooDyy/LLM-Reverse-Curriculum-RL

  31. arXiv:2402.02816  [pdf, other

    cs.IR cs.CY cs.LG

    Intersectional Two-sided Fairness in Recommendation

    Authors: Yifan Wang, Peijie Sun, Weizhi Ma, Min Zhang, Yuan Zhang, Peng Jiang, Shaoping Ma

    Abstract: Fairness of recommender systems (RS) has attracted increasing attention recently. Based on the involved stakeholders, the fairness of RS can be divided into user fairness, item fairness, and two-sided fairness which considers both user and item fairness simultaneously. However, we argue that the intersectional two-sided unfairness may still exist even if the RS is two-sided fair, which is observed… ▽ More

    Submitted 15 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: accepted by WWW2024

  32. arXiv:2401.14611  [pdf, other

    cs.IT eess.SP

    Hybrid Message Passing-Based Detectors for Uplink Grant-Free NOMA Systems

    Authors: Yi Song, Yiwen Zhu, Kun Chen-Hu, Xinhua Lu, Peng Sun, Zhongyong Wang

    Abstract: This paper studies improving the detector performance which considers the activity state (AS) temporal correlation of the user equipments (UEs) in the time domain under the uplink grant-free non-orthogonal multiple access (GF-NOMA) system. The Bernoulli Gaussian-Markov chain (BG-MC) probability model is used for exploiting both the sparsity and slow change characteristic of the AS of the UE. The G… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  33. arXiv:2401.09149  [pdf, other

    cs.DC

    InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

    Authors: Qiaoling Chen, Diandian Gu, Guoteng Wang, Xun Chen, YingTong Xiong, Ting Huang, Qinghao Hu, Xin Jin, Yonggang Wen, Tianwei Zhang, Peng Sun

    Abstract: Large language models (LLMs) with long sequences begin to power more and more fundamentally new applications we use every day. Existing methods for long-sequence LLM training are neither efficient nor compatible with commonly-used training algorithms such as FlashAttention. We design InternEvo to address these issues. InternEvo decouples all of the sharding dimensions into a new hierarchical space… ▽ More

    Submitted 22 January, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

  34. arXiv:2401.08967  [pdf, other

    cs.CL

    ReFT: Reasoning with Reinforced Fine-Tuning

    Authors: Trung Quoc Luong, Xinbo Zhang, Zhanming Jie, Peng Sun, Xiaoran Jin, Hang Li

    Abstract: One way to enhance the reasoning capability of Large Language Models (LLMs) is to conduct Supervised Fine-Tuning (SFT) using Chain-of-Thought (CoT) annotations. This approach does not show sufficiently strong generalization ability, however, because the training only relies on the given CoT data. In math problem-solving, for example, there is usually only one annotated reasoning path for each ques… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 13 pages

  35. arXiv:2401.08329  [pdf, other

    cs.HC

    Understanding User Experience in Large Language Model Interactions

    Authors: Jiayin Wang, Weizhi Ma, Peijie Sun, Min Zhang, Jian-Yun Nie

    Abstract: In the rapidly evolving landscape of large language models (LLMs), most research has primarily viewed them as independent individuals, focusing on assessing their capabilities through standardized benchmarks and enhancing their general intelligence. This perspective, however, tends to overlook the vital role of LLMs as user-centric services in human-AI collaboration. This gap in research becomes i… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: 15 pages + 3 page references + 2 page Appendix

  36. arXiv:2401.04429  [pdf, other

    cs.AI cs.MA

    i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

    Authors: Haoyang Chen, Peiyan Sun, Qiyuan Song, Wanyuan Wang, Weiwei Wu, Wencan Zhang, Guanyu Gao, Yan Lyu

    Abstract: Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the r… ▽ More

    Submitted 2 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  37. arXiv:2312.15889  [pdf, other

    cs.LG cs.HC cs.NE q-bio.NC

    ANN vs SNN: A case study for Neural Decoding in Implantable Brain-Machine Interfaces

    Authors: Biyan Zhou, Pao-Sheng Vincent Sun, Arindam Basu

    Abstract: While it is important to make implantable brain-machine interfaces (iBMI) wireless to increase patient comfort and safety, the trend of increased channel count in recent neural probes poses a challenge due to the concomitant increase in the data rate. Extracting information from raw data at the source by using edge computing is a promising solution to this problem, with integrated intention decode… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  38. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1320 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  39. arXiv:2312.07212  [pdf, other

    cs.MM cs.AI cs.SD eess.AS

    More than Vanilla Fusion: a Simple, Decoupling-free, Attention Module for Multimodal Fusion Based on Signal Theory

    Authors: Peiwen Sun, Yifan Zhang, Zishan Liu, Donghao Chen, Honggang Zhang

    Abstract: The vanilla fusion methods still dominate a large percentage of mainstream audio-visual tasks. However, the effectiveness of vanilla fusion from a theoretical perspective is still worth discussing. Thus, this paper reconsiders the signal fused in the multimodal case from a bionics perspective and proposes a simple, plug-and-play, attention module for vanilla fusion based on fundamental signal theo… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

  40. arXiv:2312.03526  [pdf, other

    cs.CV cs.AI cs.LG

    On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

    Authors: Peng Sun, Bei Shi, Daiwei Yu, Tao Lin

    Abstract: Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibi… ▽ More

    Submitted 19 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 17 pages, 20 figures

  41. arXiv:2312.02338  [pdf, other

    cs.CV cs.AI cs.MM

    A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics

    Authors: Xiangru Zhu, Penglei Sun, Chengyu Wang, Jingping Liu, Zhixu Li, Yanghua Xiao, Jun Huang

    Abstract: Text-to-image (T2I) synthesis has recently achieved significant advancements. However, challenges remain in the model's compositionality, which is the ability to create new combinations from known components. We introduce Winoground-T2I, a benchmark designed to evaluate the compositionality of T2I models. This benchmark includes 11K complex, high-quality contrastive sentence pairs spanning 20 cate… ▽ More

    Submitted 11 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

    Comments: 17 pages, 14 figures, 11 tables

  42. arXiv:2311.12076  [pdf, other

    cs.CV

    Towards Few-shot Out-of-Distribution Detection

    Authors: Jiuqing Dong, Yongbin Gao, Heng Zhou, Jun Cen, Yifan Yao, Sook Yoon, Park Dong Sun

    Abstract: Out-of-distribution (OOD) detection is critical for ensuring the reliability of open-world intelligent systems. Despite the notable advancements in existing OOD detection methodologies, our study identifies a significant performance drop under the scarcity of training samples. In this context, we introduce a novel few-shot OOD detection benchmark, carefully constructed to address this gap. Our emp… ▽ More

    Submitted 30 January, 2024; v1 submitted 19 November, 2023; originally announced November 2023.

  43. arXiv:2311.05646  [pdf, other

    cs.CE physics.optics

    Automatic differentiation accelerated shape optimization approaches to photonic inverse design on rectilinear simulation grids

    Authors: Sean Hooten, Peng Sun, Liron Gantz, Marco Fiorentino, Raymond G. Beausoleil, Thomas Van Vaerenbergh

    Abstract: Shape optimization approaches to inverse design offer low-dimensional, physically-guided parameterizations of structures by representing them as combinations of shape primitives. However, on discretized rectilinear simulation grids, computing the gradient of a user objective via the adjoint variables method requires a sum reduction of the forward/adjoint field solutions and the Jacobian of the sim… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: 29 pages, 15 figures

  44. arXiv:2311.00257  [pdf, other

    cs.DC

    AMSP: Reducing Communication Overhead of ZeRO for Efficient LLM Training

    Authors: Qiaoling Chen, Qinghao Hu, Guoteng Wang, Yingtong Xiong, Ting Huang, Xun Chen, Yang Gao, Hang Yan, Yonggang Wen, Tianwei Zhang, Peng Sun

    Abstract: Training large language models (LLMs) encounters challenges in GPU memory consumption due to the high memory requirements of model states. The widely used Zero Redundancy Optimizer (ZeRO) addresses this issue through strategic sharding but introduces communication challenges at scale. To tackle this problem, we propose AMSP, a system designed to optimize ZeRO for scalable LLM training. AMSP incorp… ▽ More

    Submitted 13 March, 2024; v1 submitted 31 October, 2023; originally announced November 2023.

  45. arXiv:2310.18724  [pdf, other

    cs.LG cs.AI

    WCLD: Curated Large Dataset of Criminal Cases from Wisconsin Circuit Courts

    Authors: Elliott Ash, Naman Goel, Nianyun Li, Claudia Marangon, Peiyao Sun

    Abstract: Machine learning based decision-support tools in criminal justice systems are subjects of intense discussions and academic research. There are important open questions about the utility and fairness of such tools. Academic researchers often rely on a few small datasets that are not sufficient to empirically study various real-world aspects of these questions. In this paper, we contribute WCLD, a c… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: (Forthcoming) Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmarks

  46. arXiv:2310.17864  [pdf, other

    eess.AS cs.SD

    TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

    Authors: Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

    Abstract: TorchAudio is an open-source audio and speech processing library built for PyTorch. It aims to accelerate the research and development of audio and speech technologies by providing well-designed, easy-to-use, and performant PyTorch components. Its contributors routinely engage with users to understand their needs and fulfill them by developing impactful features. Here, we survey TorchAudio's devel… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  47. arXiv:2310.14982  [pdf, other

    cs.NE cs.LG eess.AS eess.SP

    Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate

    Authors: Pengfei Sun, Jibin Wu, Malu Zhang, Paul Devos, Dick Botteldooren

    Abstract: Recurrent Neural Networks (RNNs) are renowned for their adeptness in modeling temporal dependencies, a trait that has driven their widespread adoption for sequential data processing. Nevertheless, vanilla RNNs are confronted with the well-known issue of gradient vanishing and exploding, posing a significant challenge for learning and establishing long-range dependencies. Additionally, gated RNNs t… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  48. Analysis and Detection against Network Attacks in the Overlapping Phenomenon of Behavior Attribute

    Authors: Jiang Xie, Shuhao Li, Yongzheng Zhanga, Peishuai Sun, Hongbo Xu

    Abstract: The proliferation of network attacks poses a significant threat. Researchers propose datasets for network attacks to support research in related fields. Then, many attack detection methods based on these datasets are proposed. These detection methods, whether two-classification or multi-classification, belong to single-label learning, i.e., only one label is given to each sample. However, we disco… ▽ More

    Submitted 12 September, 2023; originally announced October 2023.

    Comments: 6 pages, 26 figures

  49. arXiv:2310.01081  [pdf, other

    cs.CR

    Unmasking Role-Play Attack Strategies in Exploiting Decentralized Finance (DeFi) Systems

    Authors: Weilin Li, Zhun Wang, Chenyu Li, Heying Chen, Taiyu Wong, Pengyu Sun, Yufei Yu, Chao Zhang

    Abstract: The rapid growth and adoption of decentralized finance (DeFi) systems have been accompanied by various threats, notably those emerging from vulnerabilities in their intricate design. In our work, we introduce and define an attack strategy termed as Role-Play Attack, in which the attacker acts as multiple roles concurrently to exploit the DeFi system and cause substantial financial losses. We provi… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

  50. arXiv:2309.16870  [pdf, other

    cs.CV cs.LG cs.RO

    LEF: Late-to-Early Temporal Fusion for LiDAR 3D Object Detection

    Authors: Tong He, Pei Sun, Zhaoqi Leng, Chenxi Liu, Dragomir Anguelov, Mingxing Tan

    Abstract: We propose a late-to-early recurrent feature fusion scheme for 3D object detection using temporal LiDAR point clouds. Our main motivation is fusing object-aware latent embeddings into the early stages of a 3D object detector. This feature fusion strategy enables the model to better capture the shapes and poses for challenging objects, compared with learning from raw points directly. Our method con… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.