Skip to main content

Showing 1–50 of 754 results for author: Sun, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05514  [pdf, other

    cs.RO

    HPPS: A Hierarchical Progressive Perception System for Luggage Trolley Detection and Localization at Airports

    Authors: Zhirui Sun, Zhe Zhang, Jieting Zhao, Hanjing Ye, Jiankun Wang

    Abstract: The robotic autonomous luggage trolley collection system employs robots to gather and transport scattered luggage trolleys at airports. However, existing methods for detecting and locating these luggage trolleys often fail when they are not fully visible. To address this, we introduce the Hierarchical Progressive Perception System (HPPS), which enhances the detection and localization of luggage tr… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  3. arXiv:2405.04902  [pdf, other

    eess.IV cs.CV

    HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis

    Authors: Zhihan Ju, Wanting Zhou, Longteng Kong, Yu Chen, Yi Li, Zhenan Sun, Caifeng Shan

    Abstract: Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Ad… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.04867  [pdf, other

    eess.IV cs.CV

    MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and Results

    Authors: Yaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng , et al. (24 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: MIPI@CVPR2024. Website: https://mipi-challenge.org/MIPI2024/

  5. arXiv:2405.03809  [pdf, other

    cs.AI

    SocialFormer: Social Interaction Modeling with Edge-enhanced Heterogeneous Graph Transformers for Trajectory Prediction

    Authors: Zixu Wang, Zhigang Sun, Juergen Luettin, Lavdim Halilaj

    Abstract: Accurate trajectory prediction is crucial for ensuring safe and efficient autonomous driving. However, most existing methods overlook complex interactions between traffic participants that often govern their future trajectories. In this paper, we propose SocialFormer, an agent interaction-aware trajectory prediction method that leverages the semantic relationship between the target vehicle and sur… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.01327  [pdf, other

    cs.LG

    Constrained Reinforcement Learning Under Model Mismatch

    Authors: Zhongchang Sun, Sihong He, Fei Miao, Shaofeng Zou

    Abstract: Existing studies on constrained reinforcement learning (RL) may obtain a well-performing policy in the training environment. However, when deployed in a real environment, it may easily violate constraints that were originally satisfied during training because there might be model mismatch between the training and real environments. To address the above challenge, we formulate the problem as constr… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  7. arXiv:2405.00675  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Self-Play Preference Optimization for Language Model Alignment

    Authors: Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu

    Abstract: Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language mo… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 25 pages, 4 figures, 5 tables

  8. arXiv:2405.00627  [pdf, other

    eess.SY cs.LG

    Koopman-based Deep Learning for Nonlinear System Estimation

    Authors: Zexin Sun, Mingyu Chen, John Baillieul

    Abstract: Nonlinear differential equations are encountered as models of fluid flow, spiking neurons, and many other systems of interest in the real world. Common features of these systems are that their behaviors are difficult to describe exactly and invariably unmodeled dynamics present challenges in making precise predictions. In many cases the models exhibit extremely complicated behavior due to bifurcat… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 11 pages

  9. arXiv:2404.19379  [pdf, other

    cs.CV cs.RO

    SemanticFormer: Holistic and Semantic Traffic Scene Representation for Trajectory Prediction using Knowledge Graphs

    Authors: Zhigang Sun, Zixu Wang, Lavdim Halilaj, Juergen Luettin

    Abstract: Trajectory prediction in autonomous driving relies on accurate representation of all relevant contexts of the driving scene including traffic participants, road topology, traffic signs as well as their semantic relations to each other. Despite increased attention to this issue, most approaches in trajectory prediction do not consider all of these factors sufficiently. This paper describes a method… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures, submitted to RA-L

  10. arXiv:2404.18438  [pdf, ps, other

    cs.IT

    Two classes of constacyclic codes with a square-root-like lower bound

    Authors: Tingfang Chen, Zhonghua Sun, Conghui Xie, Hao Chen, Cunsheng Ding

    Abstract: Constacyclic codes over finite fields are an important class of linear codes as they contain distance-optimal codes and linear codes with best known parameters. They are interesting in theory and practice, as they have the constacyclic structure. In this paper, an infinite class of $q$-ary negacyclic codes of length $(q^m-1)/2$ and an infinite class of $q$-ary constacyclic codes of length… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  11. arXiv:2404.17839  [pdf, other

    cs.CR cs.SE

    Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

    Authors: Yizhou Chen, Zeyu Sun, Zhihao Gong, Dan Hao

    Abstract: Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disr… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE '24)

  12. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  13. arXiv:2404.16473  [pdf, other

    cs.HC

    Impact of spatial auditory navigation on user experience during augmented outdoor navigation tasks

    Authors: Jan-Niklas Voigt-Antons, Zhirou Sun, Maurizio Vergari, Navid Ashrafi, Francesco Vona, Tanja Kojic

    Abstract: The auditory sense of humans is important when it comes to navigation. The importance is especially high in cases when an object of interest is visually partly or fully covered. Interactions with users of technology are mainly focused on the visual domain of navigation tasks. This paper presents the results of a literature review and user study exploring the impact of spatial auditory navigation o… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  14. arXiv:2404.16333  [pdf, other

    cs.SE cs.AI cs.PL

    AI Coders Are Among Us: Rethinking Programming Language Grammar Towards Efficient Code Generation

    Authors: Zhensu Sun, Xiaoning Du, Zhou Yang, Li Li, David Lo

    Abstract: Besides humans and machines, Artificial Intelligence (AI) models have emerged to be another important audience of programming languages, as we come to the era of large language models (LLMs). LLMs can now excel at coding competitions and even program like developers to address various tasks, such as math calculation. Yet, the grammar and layout of existing programs are designed for humans. Particu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: under review

  15. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  16. arXiv:2404.15292  [pdf, other

    eess.SP cs.IT

    Multi-objective Optimization for Multi-UAV-assisted Mobile Edge Computing

    Authors: Geng Sun, Yixian Wang, Zemin Sun, Qingqing Wu, Jiawen Kang, Dusit Niyato, Victor C. M. Leung

    Abstract: Recent developments in unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) have provided users with flexible and resilient computing services. However, meeting the computing-intensive and latency-sensitive demands of users poses a significant challenge due to the limited resources of UAVs. To address this challenge, we present a multi-objective optimization approach for multi-UAV-assis… ▽ More

    Submitted 23 March, 2024; originally announced April 2024.

  17. arXiv:2404.11607  [pdf, other

    cs.DS

    Private federated discovery of out-of-vocabulary words for Gboard

    Authors: Ziteng Sun, Peter Kairouz, Haicheng Sun, Adria Gascon, Ananda Theertha Suresh

    Abstract: The vocabulary of language models in Gboard, Google's keyboard application, plays a crucial role for improving user experience. One way to improve the vocabulary is to discover frequently typed out-of-vocabulary (OOV) words on user devices. This task requires strong privacy protection due to the sensitive nature of user input data. In this report, we present a private OOV discovery algorithm for G… ▽ More

    Submitted 18 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  18. arXiv:2404.11225  [pdf, other

    cs.CL cs.AI

    In-Context Learning State Vector with Inner and Momentum Optimization

    Authors: Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, Min Zhang

    Abstract: Large Language Models (LLMs) have exhibited an impressive ability to perform In-Context Learning (ICL) from only a few examples. Recent works have indicated that the functions learned by ICL can be represented through compressed vectors derived from the transformer. However, the working mechanisms and optimization of these vectors are yet to be thoroughly explored. In this paper, we address this g… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 7 figures, 5 tables

  19. arXiv:2404.11180  [pdf, other

    cs.IR

    Causal Deconfounding via Confounder Disentanglement for Dual-Target Cross-Domain Recommendation

    Authors: Jiajie Zhu, Yan Wang, Feng Zhu, Zhu Sun

    Abstract: In recent years, dual-target Cross-Domain Recommendation (CDR) has been proposed to capture comprehensive user preferences in order to ultimately enhance the recommendation accuracy in both data-richer and data-sparser domains simultaneously. However, in addition to users' true preferences, the user-item interactions might also be affected by confounders (e.g., free shipping, sales promotion). As… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  20. arXiv:2404.09640  [pdf, other

    cs.CV

    CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

    Authors: Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

    Abstract: Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances an… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Ongoing work; 10 pages, 2 Tables, 9 Figures; Repo is available at: https://github.com/JethroJames/CREST

  21. arXiv:2404.09158  [pdf, other

    cs.CV cs.AI

    StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging

    Authors: Xuelong Li, Hongjun An, Guangying Li, Xing Wang, Guanghua Cheng, Zhe Sun

    Abstract: In this paper, we introduce StreakNet-Arch, a novel signal processing architecture designed for Underwater Carrier LiDAR-Radar (UCLR) imaging systems, to address the limitations in scatter suppression and real-time imaging. StreakNet-Arch formulates the signal processing as a real-time, end-to-end binary classification task, enabling real-time image acquisition. To achieve this, we leverage Self-A… ▽ More

    Submitted 23 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

    Comments: Reduce the number of pages to 13

  22. arXiv:2404.08917  [pdf, other

    cs.CV

    MAProtoNet: A Multi-scale Attentive Interpretable Prototypical Part Network for 3D Magnetic Resonance Imaging Brain Tumor Classification

    Authors: Binghua Li, Jie Mao, Zhe Sun, Chao Li, Qibin Zhao, Toshihisa Tanaka

    Abstract: Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  23. arXiv:2404.06244  [pdf, other

    cs.CV

    Anchor-based Robust Finetuning of Vision-Language Models

    Authors: Jinwei Han, Zhiwen Lin, Zhongyisun Sun, Yingguo Gao, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simpl… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  24. arXiv:2404.04394  [pdf, other

    cs.CV

    Analyzing Participants' Engagement during Online Meetings Using Unsupervised Remote Photoplethysmography with Behavioral Features

    Authors: Alexander Vedernikov, Zhaodong Sun, Virpi-Liisa Kykyri, Mikko Pohjola, Miriam Nokia, Xiaobai Li

    Abstract: Engagement measurement finds application in healthcare, education, advertisement, and services. The use of physiological and behavioral features is viable, but the impracticality of traditional physiological measurement arises due to the need for contact sensors. We demonstrate the feasibility of unsupervised remote photoplethysmography (rPPG) as an alternative for contact sensors in deriving hear… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  25. arXiv:2404.04292  [pdf, other

    cs.CL cs.AI

    Conversational Disease Diagnosis via External Planner-Controlled Large Language Models

    Authors: Zhoujian Sun, Cheng Luo, Ziyi Liu, Zhengxing Huang

    Abstract: The development of large language models (LLMs) has brought unprecedented possibilities for artificial intelligence (AI) based medical diagnosis. However, the application perspective of LLMs in real diagnostic scenarios is still unclear because they are not adept at collecting patient data proactively. This study presents a LLM-based diagnostic system that enhances planning capabilities by emulati… ▽ More

    Submitted 9 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Work in Progress

  26. arXiv:2404.03267  [pdf, other

    cs.IR

    To Search or to Recommend: Predicting Open-App Motivation with Neural Hawkes Process

    Authors: Zhongxiang Sun, Zihua Si, Xiao Zhang, Xiaoxue Zang, Yang Song, Hongteng Xu, Jun Xu

    Abstract: Incorporating Search and Recommendation (S&R) services within a singular application is prevalent in online platforms, leading to a new task termed open-app motivation prediction, which aims to predict whether users initiate the application with the specific intent of information searching, or to explore recommended content for entertainment. Studies have shown that predicting users' motivation to… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  27. arXiv:2404.03164  [pdf, ps, other

    cs.IR cs.AI cs.LG

    Does Knowledge Graph Really Matter for Recommender Systems?

    Authors: Haonan Zhang, Dongxia Wang, Zhu Sun, Yanhui Li, Youcheng Sun, Huizhi Liang, Wenhai Wang

    Abstract: Recommender systems (RSs) are designed to provide personalized recommendations to users. Recently, knowledge graphs (KGs) have been widely introduced in RSs to improve recommendation accuracy. In this study, however, we demonstrate that RSs do not necessarily perform worse even if the KG is downgraded to the user-item interaction graph only (or removed). We propose an evaluation framework KG4RecEv… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  28. arXiv:2404.02323  [pdf, other

    cs.CL

    Toward Informal Language Processing: Knowledge of Slang in Large Language Models

    Authors: Zhewei Sun, Qian Hu, Rahul Gupta, Richard Zemel, Yang Xu

    Abstract: Recent advancement in large language models (LLMs) has offered a strong potential for natural language systems to process informal language. A representative form of informal language is slang, used commonly in daily conversations and online social media. To date, slang has not been comprehensively evaluated in LLMs due partly to the absence of a carefully designed and publicly accessible benchmar… ▽ More

    Submitted 12 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 main conference

  29. arXiv:2404.02166  [pdf, other

    cs.IT

    An Online Joint Optimization Approach for QoE Maximization in UAV-Enabled Mobile Edge Computing

    Authors: Long He, Geng Sun, Zemin Sun, Pengfei Wang, Jiahui Li, Shuang Liang, Dusit Niyato

    Abstract: Given flexible mobility, rapid deployment, and low cost, unmanned aerial vehicle (UAV)-enabled mobile edge computing (MEC) shows great potential to compensate for the lack of terrestrial edge computing coverage. However, limited battery capacity, computing and spectrum resources also pose serious challenges for UAV-enabled MEC, which shorten the service time of UAVs and degrade the quality of expe… ▽ More

    Submitted 23 March, 2024; originally announced April 2024.

  30. arXiv:2404.01730  [pdf, other

    cs.LG cs.IT stat.ML

    Asymptotics of Language Model Alignment

    Authors: Joy Qiping Yang, Salman Salamatian, Ziteng Sun, Ananda Theertha Suresh, Ahmad Beirami

    Abstract: Let $p$ denote a generative language model. Let $r$ denote a reward model that returns a scalar that captures the degree at which a draw from $p$ is preferred. The goal of language model alignment is to alter $p$ to a new distribution $φ$ that results in a higher expected reward while keeping $φ$ close to $p.$ A popular alignment method is the KL-constrained reinforcement learning (RL), which choo… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  31. arXiv:2404.01677  [pdf, other

    cs.AI cs.CL

    Towards Generalizable and Faithful Logic Reasoning over Natural Language via Resolution Refutation

    Authors: Zhouhao Sun, Xiao Ding, Li Du, Bibo Cai, Jinglong Gao, Ting Liu, Qin Bing

    Abstract: Large language models (LLMs) have achieved significant performance in various natural language reasoning tasks. However, they still struggle with performing first-order logic reasoning over formal logical theories expressed in natural language. This is because the previous LLMs-based reasoning systems have the theoretical incompleteness issue. As a result, it can only address a limited set of simp… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: LREC-Coling 2024

  32. arXiv:2404.01258  [pdf, other

    cs.CV cs.AI

    Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

    Authors: Ruohong Zhang, Liangke Gui, Zhiqing Sun, Yihao Feng, Keyang Xu, Yuanhan Zhang, Di Fu, Chunyuan Li, Alexander Hauptmann, Yonatan Bisk, Yiming Yang

    Abstract: Preference modeling techniques, such as direct preference optimization (DPO), has shown effective in enhancing the generalization abilities of large language model (LLM). However, in tasks involving video instruction-following, providing informative feedback, especially for detecting hallucinations in generated responses, remains a significant challenge. Previous studies have explored using large… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  33. arXiv:2404.00990  [pdf, ps, other

    cs.CL

    Exploring the Nexus of Large Language Models and Legal Systems: A Short Survey

    Authors: Weicong Qin, Zhongxiang Sun

    Abstract: With the advancement of Artificial Intelligence (AI) and Large Language Models (LLMs), there is a profound transformation occurring in the realm of natural language processing tasks within the legal domain. The capabilities of LLMs are increasingly demonstrating unique roles in the legal sector, bringing both distinctive benefits and various challenges. This survey delves into the synergy between… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  34. arXiv:2404.00261  [pdf, other

    cs.IR cs.AI

    A Simple Yet Effective Approach for Diversified Session-Based Recommendation

    Authors: Qing Yin, Hui Fang, Zhu Sun, Yew-Soon Ong

    Abstract: Session-based recommender systems (SBRSs) have become extremely popular in view of the core capability of capturing short-term and dynamic user preferences. However, most SBRSs primarily maximize recommendation accuracy but ignore user minor preferences, thus leading to filter bubbles in the long run. Only a handful of works, being devoted to improving diversity, depend on unique model designs and… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  35. arXiv:2403.19924  [pdf, other

    cs.CV

    SceneTracker: Long-term Scene Flow Estimation Network

    Authors: Bo Wang, Jian Li, Yang Yu, Li Liu, Zhenping Sun, Dewen Hu

    Abstract: Considering the complementarity of scene flow estimation in the spatial domain's focusing capability and 3D object tracking in the temporal domain's coherence, this study aims to address a comprehensive new task that can simultaneously capture fine-grained and long-term 3D motion in an online manner: long-term scene flow estimation (LSFE). We introduce SceneTracker, a novel learning-based LSFE net… ▽ More

    Submitted 6 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  36. arXiv:2403.18621  [pdf, other

    cs.IT eess.SP

    Performance Analysis of Integrated Sensing and Communication Networks with Blockage Effects

    Authors: Zezhong Sun, Shi Yan, Ning Jiang, Jiaen Zhou, Mugen Peng

    Abstract: Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle,… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE Transactions on Vehicular Technology

  37. arXiv:2403.18381  [pdf, other

    cs.CL cs.AI

    Improving Attributed Text Generation of Large Language Models via Preference Learning

    Authors: Dongfang Li, Zetian Sun, Baotian Hu, Zhenyu Liu, Xinshuo Hu, Xuebo Liu, Min Zhang

    Abstract: Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirrorin… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: 23 pages, 15 tables, 2 figures

  38. arXiv:2403.18331  [pdf, other

    cs.HC

    Neighbor-Environment Observer: An Intelligent Agent for Immersive Working Companionship

    Authors: Zhe Sun, Qixuan Liang, Meng Wang, Zhenliang Zhang

    Abstract: Human-computer symbiosis is a crucial direction for the development of artificial intelligence. As intelligent systems become increasingly prevalent in our work and personal lives, it is important to develop strategies to support users across physical and virtual environments. While technological advances in personal digital devices, such as personal computers and virtual reality devices, can prov… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: UIST 2023

  39. arXiv:2403.17755  [pdf, other

    cs.AI cs.CR cs.CV

    DataCook: Crafting Anti-Adversarial Examples for Healthcare Data Copyright Protection

    Authors: Sihan Shang, Jiancheng Yang, Zhenglong Sun, Pascal Fua

    Abstract: In the realm of healthcare, the challenges of copyright protection and unauthorized third-party misuse are increasingly significant. Traditional methods for data copyright protection are applied prior to data distribution, implying that models trained on these data become uncontrollable. This paper introduces a novel approach, named DataCook, designed to safeguard the copyright of healthcare data… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  40. arXiv:2403.17688  [pdf, other

    cs.IR

    Large Language Models Enhanced Collaborative Filtering

    Authors: Zhongxiang Sun, Zihua Si, Xiaoxue Zang, Kai Zheng, Yang Song, Xiao Zhang, Jun Xu

    Abstract: Recent advancements in Large Language Models (LLMs) have attracted considerable interest among researchers to leverage these models to enhance Recommender Systems (RSs). Existing work predominantly utilizes LLMs to generate knowledge-rich texts or utilizes LLM-derived embeddings as features to improve RSs. Although the extensive world knowledge embedded in LLMs generally benefits RSs, the applicat… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 11 pages

  41. arXiv:2403.17004  [pdf, other

    cs.CV cs.MM

    SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

    Authors: Rui Zhu, Yingwei Pan, Yehao Li, Ting Yao, Zhenglong Sun, Tao Mei, Chang Wen Chen

    Abstract: Diffusion Transformer (DiT) has emerged as the new trend of generative diffusion models on image generation. In view of extremely slow convergence in typical DiT, recent breakthroughs have been driven by mask strategy that significantly improves the training efficiency of DiT with additional intra-image contextual learning. Despite this progress, mask strategy still suffers from two inherent limit… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  42. arXiv:2403.16627  [pdf, other

    cs.CV

    SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

    Authors: Yuda Song, Zehao Sun, Xuanwu Yin

    Abstract: Recent advancements in diffusion models have positioned them at the forefront of image generation. Despite their superior performance, diffusion models are not without drawbacks; they are characterized by complex architectures and substantial computational demands, resulting in significant latency due to their iterative sampling process. To mitigate these limitations, we introduce a dual approach… ▽ More

    Submitted 16 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  43. arXiv:2403.16427  [pdf, other

    cs.AI

    Re2LLM: Reflective Reinforcement Large Language Model for Session-based Recommendation

    Authors: Ziyan Wang, Yingpeng Du, Zhu Sun, Haoyan Chua, Kaidong Feng, Wenya Wang, Jie Zhang

    Abstract: Large Language Models (LLMs) are emerging as promising approaches to enhance session-based recommendation (SBR), where both prompt-based and fine-tuning-based methods have been widely investigated to align LLMs with SBR. However, the former methods struggle with optimal prompts to elicit the correct reasoning of LLMs due to the lack of task-specific feedback, leading to unsatisfactory recommendati… ▽ More

    Submitted 19 April, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures

  44. arXiv:2403.15743  [pdf, other

    eess.SY cs.AI math.DS

    A Comparative Study of Artificial Potential Fields and Safety Filters

    Authors: Ming Li, Zhiyong Sun

    Abstract: In this paper, we have demonstrated that the controllers designed by a classical motion planning tool, namely artificial potential fields (APFs), can be derived from a recently prevalent approach: control barrier function quadratic program (CBF-QP) safety filters. By integrating APF information into the CBF-QP framework, we establish a bridge between these two methodologies. Specifically, this is… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  45. arXiv:2403.15244  [pdf, ps, other

    cs.LG math.OC

    A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness

    Authors: Zhenyu Sun, Ermin Wei

    Abstract: Classical convergence analyses for optimization algorithms rely on the widely-adopted uniform smoothness assumption. However, recent experimental studies have demonstrated that many machine learning problems exhibit non-uniform smoothness, meaning the smoothness factor is a function of the model parameter instead of a universal constant. In particular, it has been observed that the smoothness grow… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  46. arXiv:2403.14950  [pdf, other

    cs.CL cs.LG

    KnowLA: Enhancing Parameter-efficient Finetuning with Knowledgeable Adaptation

    Authors: Xindi Luo, Zequn Sun, Jing Zhao, Zhe Zhao, Wei Hu

    Abstract: Parameter-efficient finetuning (PEFT) is a key technique for adapting large language models (LLMs) to downstream tasks. In this paper, we study leveraging knowledge graph embeddings to improve the effectiveness of PEFT. We propose a knowledgeable adaptation method called KnowLA. It inserts an adaptation layer into an LLM to integrate the embeddings of entities appearing in the input text. The adap… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: Accepted in the 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024)

  47. arXiv:2403.14731  [pdf, other

    cs.CR cs.CL cs.LG

    Reversible Jump Attack to Textual Classifiers with Modification Reduction

    Authors: Mingze Ni, Zhensu Sun, Wei Liu

    Abstract: Recent studies on adversarial examples expose vulnerabilities of natural language processing (NLP) models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack succe… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  48. arXiv:2403.13805  [pdf, other

    cs.CV cs.AI cs.LG

    RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

    Authors: Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang

    Abstract: CLIP (Contrastive Language-Image Pre-training) uses contrastive learning from noise image-text pairs to excel at recognizing a wide array of candidates, yet its focus on broad associations hinders the precision in distinguishing subtle differences among fine-grained items. Conversely, Multimodal Large Language Models (MLLMs) excel at classifying fine-grained categories, thanks to their substantial… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Project: https://github.com/Liuziyu77/RAR

  49. arXiv:2403.13271  [pdf, other

    cs.SE

    Enhancing Code Generation Performance of Smaller Models by Distilling the Reasoning Ability of LLMs

    Authors: Zhihong Sun, Chen Lyu, Bolun Li, Yao Wan, Hongyu Zhang, Ge Li, Zhi Jin

    Abstract: Large Language Models (LLMs) have recently made significant advances in code generation through the 'Chain-of-Thought' prompting technique. This technique empowers the model to autonomously devise "solution plans" to tackle intricate programming challenges, thereby improving its performance in code generation. Nevertheless, smaller models have been struggling to keep up with LLMs in deducing these… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted for LREC-COLING 2024

    ACM Class: D.2.3

  50. arXiv:2403.12749  [pdf, other

    cs.CL

    Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

    Authors: Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

    Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs fro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024