Skip to main content

Showing 1–50 of 579 results for author: Xu, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05008  [pdf, other

    cs.CL

    ADELIE: Aligning Large Language Models on Information Extraction

    Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

    Abstract: Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effective… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.03155  [pdf, other

    cs.RO

    CushSense: Soft, Stretchable, and Comfortable Tactile-Sensing Skin for Physical Human-Robot Interaction

    Authors: Boxin Xu, Luoyan Zhong, Grace Zhang, Xiaoyu Liang, Diego Virtue, Rishabh Madan, Tapomayukh Bhattacharjee

    Abstract: Whole-arm tactile feedback is crucial for robots to ensure safe physical interaction with their surroundings. This paper introduces CushSense, a fabric-based soft and stretchable tactile-sensing skin designed for physical human-robot interaction (pHRI) tasks such as robotic caregiving. Using stretchable fabric and hyper-elastic polymer, CushSense identifies contacts by monitoring capacitive change… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 8 pages, 8 figures, ICRA2024

  3. arXiv:2405.02828  [pdf, other

    cs.SE cs.LG

    Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy

    Authors: Aftab Hussain, Md Rafiqul Islam Rabin, Toufique Ahmed, Bowen Xu, Premkumar Devanbu, Mohammad Amin Alipour

    Abstract: Large language models (LLMs) have provided a lot of exciting new capabilities in software development. However, the opaque nature of these models makes them difficult to reason about and inspect. Their opacity gives rise to potential security risks, as adversaries can train and deploy compromised models to disrupt the software development process in the victims' organization. This work presents… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.03803

  4. arXiv:2405.00435  [pdf, other

    cs.HC

    CultiVerse: Towards Cross-Cultural Understanding for Paintings with Large Language Model

    Authors: Wei Zhang, Wong Kam-Kwai, Biying Xu, Yiwen Ren, Yuhuai Li, Minfeng Zhu, Yingchaojie Feng, Wei Chen

    Abstract: The integration of new technology with cultural studies enhances our understanding of cultural heritage but often struggles to connect with diverse audiences. It is challenging to align personal interpretations with the intended meanings across different cultures. Our study investigates the important factors in appreciating art from a cross-cultural perspective. We explore the application of Large… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  5. arXiv:2405.00145  [pdf, other

    cs.SE cs.CV

    GUing: A Mobile GUI Search Engine using a Vision-Language Model

    Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej

    Abstract: App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual infor… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  6. arXiv:2404.18410  [pdf, other

    cs.CL

    Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions

    Authors: Bowen Xu, Shaoyu Wu, Kai Liu, Lulu Hu

    Abstract: With the proliferation of large language models (LLMs), the comprehensive alignment of such models across multiple tasks has emerged as a critical area of research. Existing alignment methodologies primarily address single task, such as multi-turn dialogue, coding, mathematical problem-solving, and tool usage. However, AI-driven products that leverage language models usually necessitate a fusion o… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  7. arXiv:2404.17683  [pdf, other

    math.OC cs.GT cs.LG eess.SY

    Energy Storage Arbitrage in Two-settlement Markets: A Transformer-Based Approach

    Authors: Saud Alghumayjan, Jiajun Han, Ningkun Zheng, Ming Yi, Bolun Xu

    Abstract: This paper presents an integrated model for bidding energy storage in day-ahead and real-time markets to maximize profits. We show that in integrated two-stage bidding, the real-time bids are independent of day-ahead settlements, while the day-ahead bids should be based on predicted real-time prices. We utilize a transformer-based model for real-time price prediction, which captures complex dynami… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  8. arXiv:2404.15067  [pdf, other

    cs.CL

    Enhancing Textual Personality Detection toward Social Media: Integrating Long-term and Short-term Perspectives

    Authors: Haohao Zhu, Xiaokun Zhang, Junyu Lu, Youlin Wu, Zewen Bai, Changrong Min, Liang Yang, Bo Xu, Dongyu Zhang, Hongfei Lin

    Abstract: Textual personality detection aims to identify personality characteristics by analyzing user-generated content toward social media platforms. Numerous psychological literature highlighted that personality encompasses both long-term stable traits and short-term dynamic states. However, existing studies often concentrate only on either long-term or short-term personality representations, without eff… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 11 pages, 9 figures

  9. FineRec:Exploring Fine-grained Sequential Recommendation

    Authors: Xiaokun Zhang, Bo Xu, Youlin Wu, Yuan Zhong, Hongfei Lin, Fenglong Ma

    Abstract: Sequential recommendation is dedicated to offering items of interest for users based on their history behaviors. The attribute-opinion pairs, expressed by users in their reviews for items, provide the potentials to capture user preferences and item characteristics at a fine-grained level. To this end, we propose a novel framework FineRec that explores the attribute-opinion pairs of reviews to fine… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: This work has been accepted by SIGIR24' as a full paper

  10. Disentangling ID and Modality Effects for Session-based Recommendation

    Authors: Xiaokun Zhang, Bo Xu, Zhaochun Ren, Xiaochen Wang, Hongfei Lin, Fenglong Ma

    Abstract: Session-based recommendation aims to predict intents of anonymous users based on their limited behaviors. Modeling user behaviors involves two distinct rationales: co-occurrence patterns reflected by item IDs, and fine-grained preferences represented by item modalities (e.g., text and images). However, existing methods typically entangle these causes, leading to their failure in achieving accurate… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: This work has been accepted by SIGIR24' as a full paper

  11. arXiv:2404.11070  [pdf

    cs.CV eess.SP

    Sky-GVIO: an enhanced GNSS/INS/Vision navigation with FCN-based sky-segmentation in urban canyon

    Authors: Jingrong Wang, Bo Xu, Ronghe Jin, Shoujian Zhang, Kefu Gao, Jingnan Liu

    Abstract: Accurate, continuous, and reliable positioning is a critical component of achieving autonomous driving. However, in complex urban canyon environments, the vulnerability of a stand-alone sensor and non-line-of-sight (NLOS) caused by high buildings, trees, and elevated structures seriously affect positioning results. To address these challenges, a sky-view images segmentation algorithm based on Full… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  12. arXiv:2404.10731  [pdf, ps, other

    cs.AI

    What is Meant by AGI? On the Definition of Artificial General Intelligence

    Authors: Bowen Xu

    Abstract: This paper aims to establish a consensus on AGI's definition. General intelligence refers to the adaptation to open environments according to certain principles using limited resources. It emphasizes that adaptation or learning is an indispensable property of intelligence, and places the controversial part within the principles of intelligence, which can be described from different perspectives.

    Submitted 16 April, 2024; originally announced April 2024.

  13. arXiv:2404.04140  [pdf, other

    cs.CV cs.LG

    Improving Detection in Aerial Images by Capturing Inter-Object Relationships

    Authors: Botao Ren, Botian Xu, Yifan Pu, Jingyi Wang, Zhidong Deng

    Abstract: In many image domains, the spatial distribution of objects in a scene exhibits meaningful patterns governed by their semantic relationships. In most modern detection pipelines, however, the detection proposals are processed independently, overlooking the underlying relationships between objects. In this work, we introduce a transformer-based approach to capture these inter-object relationships to… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  14. arXiv:2404.03663  [pdf, other

    cs.NE cs.CV

    Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

    Authors: Man Yao, Jiakui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo Xu, Guoqi Li

    Abstract: Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offer… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

    Comments: Accepted by ICLR2024. Code and Model: https://github.com/BICLab/Spike-Driven-Transformer-V2

  15. arXiv:2403.20163  [pdf, other

    cs.NE q-bio.NC

    Biologically-Plausible Topology Improved Spiking Actor Network for Efficient Deep Reinforcement Learning

    Authors: Duzhen Zhang, Qingyu Wang, Tielin Zhang, Bo Xu

    Abstract: The success of Deep Reinforcement Learning (DRL) is largely attributed to utilizing Artificial Neural Networks (ANNs) as function approximators. Recent advances in neuroscience have unveiled that the human brain achieves efficient reward-based learning, at least by integrating spiking neurons with spatial-temporal dynamics and network topologies with biologically-plausible connectivity patterns. T… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Work in Progress

  16. arXiv:2403.18228  [pdf, other

    cs.CV cs.LG cs.NE

    Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classification

    Authors: Qingyu Wang, Duzhen Zhang, Tilelin Zhang, Bo Xu

    Abstract: Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovative… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 18 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.02557

  17. arXiv:2403.16749  [pdf, other

    cs.SE

    Enhancing Software Effort Estimation through Reinforcement Learning-based Project Management-Oriented Feature Selection

    Authors: Haoyang Chen, Botong Xu, Kaiyang Zhong

    Abstract: Purpose: The study aims to investigate the application of the data element market in software project management, focusing on improving effort estimation by addressing challenges faced by traditional methods. Design/methodology/approach: This study proposes a solution based on feature selection, utilizing the data element market and reinforcement learning-based algorithms to enhance the accuracy o… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 18pages, 10 figures, 6 tables

  18. arXiv:2403.15388  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models

    Authors: Yuzhang Shang, Mu Cai, Bingxin Xu, Yong Jae Lee, Yan Yan

    Abstract: Large Multimodal Models (LMMs) have shown significant reasoning capabilities by connecting a visual encoder and a large language model. LMMs typically use a fixed amount of visual tokens, such as the penultimate layer features in the CLIP visual encoder, as the prefix content. Recent LMMs incorporate more complex visual inputs, such as high-resolution images and videos, which increase the number o… ▽ More

    Submitted 12 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: Project page: https://llava-prumerge.github.io/

  19. arXiv:2403.14390  [pdf, other

    cs.CL

    From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision

    Authors: Qingwen Lin, Boyan Xu, Zhengting Huang, Ruichu Cai

    Abstract: Addressing the challenge of high annotation costs in solving Math Word Problems (MWPs) through full supervision with intermediate equations, recent works have proposed weakly supervised task settings that rely solely on the final answer as a supervised signal. Existing leading approaches typically employ various search techniques to infer intermediate equations, but cannot ensure their semantic co… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  20. arXiv:2403.14139  [pdf, other

    cs.NE cs.LG

    Genetic Programming for Explainable Manifold Learning

    Authors: Ben Cravens, Andrew Lensen, Paula Maddigan, Bing Xue

    Abstract: Manifold learning techniques play a pivotal role in machine learning by revealing lower-dimensional embeddings within high-dimensional data, thus enhancing both the efficiency and interpretability of data analysis by transforming the data into a lower-dimensional representation. However, a notable challenge with current manifold learning methods is their lack of explicit functional mappings, cruci… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  21. arXiv:2403.11639  [pdf, other

    cs.RO cs.CV

    An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation

    Authors: Zewen Xu, Yijia He, Hao Wei, Bo Xu, BinJian Xie, Yihong Wu

    Abstract: Line features are valid complements for point features in man-made environments. 3D-2D constraints provided by line features have been widely used in Visual Odometry (VO) and Structure-from-Motion (SfM) systems. However, how to accurately solve three-view relative motion only with 2D observations of points and lines in real time has not been fully explored. In this paper, we propose a novel three-… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  22. arXiv:2403.10784  [pdf, other

    cs.RO

    Identifying Optimal Launch Sites of High-Altitude Latex-Balloons using Bayesian Optimisation for the Task of Station-Keeping

    Authors: Jack Saunders, Sajad Saeedi, Adam Hartshorne, Binbin Xu, Özgur Şimşek, Alan Hunter, Wenbin Li

    Abstract: Station-keeping tasks for high-altitude balloons show promise in areas such as ecological surveys, atmospheric analysis, and communication relays. However, identifying the optimal time and position to launch a latex high-altitude balloon is still a challenging and multifaceted problem. For example, tasks such as forest fire tracking place geometric constraints on the launch location of the balloon… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  23. arXiv:2403.10119  [pdf, other

    cs.CV

    URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields

    Authors: Bo Xu, Ziao Liu, Mengqi Guo, Jiancheng Li, Gim Hee Lee

    Abstract: We propose a novel rolling shutter bundle adjustment method for neural radiance fields (NeRF), which utilizes the unordered rolling shutter (RS) images to obtain the implicit 3D representation. Existing NeRF methods suffer from low-quality images and inaccurate initial camera poses due to the RS effect in the image, whereas, the previous method that incorporates the RS into NeRF requires strict se… ▽ More

    Submitted 24 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.07798   

    cs.CV

    A Fourier Transform Framework for Domain Adaptation

    Authors: Le Luo, Bingrong Xu, Qingyong Zhang, Cheng Lian, Jie Luo

    Abstract: By using unsupervised domain adaptation (UDA), knowledge can be transferred from a label-rich source domain to a target domain that contains relevant information but lacks labels. Many existing UDA algorithms suffer from directly using raw images as input, resulting in models that overly focus on redundant information and exhibit poor generalization capability. To address this issue, we attempt to… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: The paper contains significant errors and the experimental methodology is not rigorous. The experimental section and methodology need to be rewritten

  25. arXiv:2403.07631  [pdf, other

    cs.RO

    Efficient Global Navigational Planning in 3D Structures based on Point Cloud Tomography

    Authors: Bowen Yang, Jie Cheng, Bohuan Xue, Jianhao Jiao, Ming Liu

    Abstract: Navigation in complex 3D scenarios requires appropriate environment representation for efficient scene understanding and trajectory generation. We propose a highly efficient and extensible global navigation framework based on a tomographic understanding of the environment to navigate ground robots in multi-layer structures. Our approach generates tomogram slices using the point cloud map to encode… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 11 pages, 9 figures, submitted to IEEE/ASME Transactions on Mechatronics

  26. arXiv:2403.05874  [pdf, other

    cs.CV cs.RO

    SPAFormer: Sequential 3D Part Assembly with Transformers

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We introduce SPAFormer, an innovative model designed to overcome the combinatorial explosion challenge in the 3D Part Assembly (3D-PA) task. This task requires accurate prediction of each part's pose and shape in sequential steps, and as the number of parts increases, the possible assembly combinations increase exponentially, leading to a combinatorial explosion that severely hinders the efficacy… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Code will be released at https://github.com/xuboshen/SPAFormer

  27. POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World

    Authors: Boshen Xu, Sipeng Zheng, Qin Jin

    Abstract: We humans are good at translating third-person observations of hand-object interactions (HOI) into an egocentric view. However, current methods struggle to replicate this ability of view adaptation from third-person to first-person. Although some approaches attempt to learn view-agnostic representation from large-scale video datasets, they ignore the relationships among multiple third-person views… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ACM MM 2023. Project page: https://xuboshen.github.io/

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023). Association for Computing Machinery, New York, NY, USA, 2807-2816

  28. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy , et al. (683 additional authors not shown)

    Abstract: In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalit… ▽ More

    Submitted 25 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  29. arXiv:2403.03397  [pdf, other

    cs.NE

    Explaining Genetic Programming Trees using Large Language Models

    Authors: Paula Maddigan, Andrew Lensen, Bing Xue

    Abstract: Genetic programming (GP) has the potential to generate explainable results, especially when used for dimensionality reduction. In this research, we investigate the potential of leveraging eXplainable AI (XAI) and large language models (LLMs) like ChatGPT to improve the interpretability of GP-based non-linear dimensionality reduction. Our study introduces a novel XAI dashboard named GP4NLDR, the fi… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  30. arXiv:2403.02756  [pdf, other

    cs.CL

    Role Prompting Guided Domain Adaptation with General Capability Preserve for Large Language Models

    Authors: Rui Wang, Fei Mi, Yi Chen, Boyang Xue, Hongru Wang, Qi Zhu, Kam-Fai Wong, Ruifeng Xu

    Abstract: The growing interest in Large Language Models (LLMs) for specialized applications has revealed a significant challenge: when tailored to specific domains, LLMs tend to experience catastrophic forgetting, compromising their general capabilities and leading to a suboptimal user experience. Additionally, crafting a versatile model for multiple domains simultaneously often results in a decline in over… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  31. arXiv:2403.01369  [pdf, other

    eess.AS cs.AI cs.LG

    A Closer Look at Wav2Vec2 Embeddings for On-Device Single-Channel Speech Enhancement

    Authors: Ravi Shankar, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Self-supervised learned models have been found to be very effective for certain speech tasks such as automatic speech recognition, speaker identification, keyword spotting and others. While the features are undeniably useful in speech recognition and associated tasks, their utility in speech enhancement systems is yet to be firmly established, and perhaps not properly understood. In this paper, we… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 8 pages; Shorter form accepted in ICASSP 2024

  32. arXiv:2403.00865  [pdf, other

    cs.NE cs.AI cs.CV cs.LG

    Fast and Efficient Local Search for Genetic Programming Based Loss Function Learning

    Authors: Christian Raymond, Qi Chen, Bing Xue, Mengjie Zhang

    Abstract: In this paper, we develop upon the topic of loss function learning, an emergent meta-learning paradigm that aims to learn loss functions that significantly improve the performance of the models trained under them. Specifically, we propose a new meta-learning framework for task and model-agnostic loss function learning via a hybrid search approach. The framework first uses genetic programming to fi… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2209.08907

  33. arXiv:2403.00812  [pdf, other

    cs.CL cs.AI

    LoRA Meets Dropout under a Unified Framework

    Authors: Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, Chuan Wu

    Abstract: With the remarkable capabilities, large language models (LLMs) have emerged as essential elements in numerous NLP applications, while parameter-efficient finetuning, especially LoRA, has gained popularity as a lightweight approach for model customization. Meanwhile, various dropout methods, initially designed for full finetuning with all the parameters updated, alleviates overfitting associated wi… ▽ More

    Submitted 25 February, 2024; originally announced March 2024.

  34. arXiv:2402.18169  [pdf, ps, other

    cs.CL

    MIKO: Multimodal Intention Knowledge Distillation from Large Language Models for Social-Media Commonsense Discovery

    Authors: Feihong Lu, Weiqi Wang, Yangyifei Luo, Ziqin Zhu, Qingyun Sun, Baixuan Xu, Haochen Shi, Shiqi Gao, Qian Li, Yangqiu Song, Jianxin Li

    Abstract: Social media has become a ubiquitous tool for connecting with others, staying updated with news, expressing opinions, and finding entertainment. However, understanding the intention behind social media posts remains challenging due to the implicitness of intentions in social media posts, the need for cross-modality understanding of both text and images, and the presence of noisy information such a… ▽ More

    Submitted 29 February, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 11 pages, 5 figures

  35. arXiv:2402.18140  [pdf, other

    cs.CV

    OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction

    Authors: Jian Liu, Sipeng Zhang, Chuixin Kong, Wenyuan Zhang, Yuhang Wu, Yikang Ding, Borun Xu, Ruibo Ming, Donglai Wei, Xianming Liu

    Abstract: This technical report presents our solution, "occTransformer" for the 3D occupancy prediction track in the autonomous driving challenge at CVPR 2023. Our method builds upon the strong baseline BEVFormer and improves its performance through several simple yet effective techniques. Firstly, we employed data augmentation to increase the diversity of the training data and improve the model's generaliz… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Innovation Award in the 3D Occupancy Prediction Challenge (CVPR23)

  36. arXiv:2402.17493  [pdf

    cs.CL

    Predicting postoperative risks using large language models

    Authors: Bing Xue, Charles Alba, Joanna Abraham, Thomas Kannampallil, Chenyang Lu

    Abstract: Predicting postoperative risk can inform effective care management & planning. We explored large language models (LLMs) in predicting postoperative risk through clinical texts using various tuning strategies. Records spanning 84,875 patients from Barnes Jewish Hospital (BJH) between 2018 & 2021, with a mean duration of follow-up based on the length of postoperative ICU stay less than 7 days, were… ▽ More

    Submitted 5 May, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Supplemental file available at: https://sites.wustl.edu/alba/files/2024/04/supplemental_materials-283eb0c14629614c.pdf models publicly available at: https://huggingface.co/cja5553/BJH-perioperative-notes-bioGPT AND https://huggingface.co/cja5553/BJH-perioperative-notes-bioGPT

    ACM Class: J.3; I.2.7

  37. arXiv:2402.17238  [pdf, other

    cs.LG

    Does Negative Sampling Matter? A Review with Insights into its Theory and Applications

    Authors: Zhen Yang, Ming Ding, Tinglin Huang, Yukuo Cen, Junshuai Song, Bin Xu, Yuxiao Dong, Jie Tang

    Abstract: Negative sampling has swiftly risen to prominence as a focal point of research, with wide-ranging applications spanning machine learning, computer vision, natural language processing, data mining, and recommender systems. This growing interest raises several critical questions: Does negative sampling really matter? Is there a general framework that can incorporate all existing negative sampling me… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 20 pages, 11 figures

  38. arXiv:2402.17129  [pdf, other

    cs.IR

    Side Information-Driven Session-based Recommendation: A Survey

    Authors: Xiaokun Zhang, Bo Xu, Chenliang Li, Yao Zhou, Liangyue Li, Hongfei Lin

    Abstract: The session-based recommendation (SBR) garners increasing attention due to its ability to predict anonymous user intents within limited interactions. Emerging efforts incorporate various kinds of side information into their methods for enhancing task performance. In this survey, we thoroughly review the side information-driven session-based recommendation from a data-centric perspective. Our surve… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: This is a survey on side information-driven session-based recommendation

  39. arXiv:2402.16902  [pdf, other

    cs.LG

    PRoLoRA: Partial Rotation Empowers More Parameter-Efficient LoRA

    Authors: Sheng Wang, Boyang Xue, Jiacheng Ye, Jiyue Jiang, Liheng Chen, Lingpeng Kong, Chuan Wu

    Abstract: With the rapid scaling of large language models (LLMs), serving numerous LoRAs concurrently has become increasingly impractical, leading to unaffordable costs and necessitating more parameter-efficient finetuning methods. In this work, we introduce Partially Rotation-enhanced Low-Rank Adaptation (PRoLoRA), an intra-layer sharing mechanism comprising four essential components: broadcast reduction,… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  40. arXiv:2402.16261  [pdf, other

    cs.CL cs.IR

    UniRetriever: Multi-task Candidates Selection for Various Context-Adaptive Conversational Retrieval

    Authors: Hongru Wang, Boyang Xue, Baohang Zhou, Rui Wang, Fei Mi, Weichao Wang, Yasheng Wang, Kam-Fai Wong

    Abstract: Conversational retrieval refers to an information retrieval system that operates in an iterative and interactive manner, requiring the retrieval of various external resources, such as persona, knowledge, and even response, to effectively engage with the user and successfully complete the dialogue. However, most previous work trained independent retrievers for each specific resource, resulting in s… ▽ More

    Submitted 28 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  41. arXiv:2402.13606  [pdf, other

    cs.CL

    A Comprehensive Study of Multilingual Confidence Estimation on Large Language Models

    Authors: Boyang Xue, Hongru Wang, Weichao Wang, Rui Wang, Sheng Wang, Zeming Liu, Kam-Fai Wong

    Abstract: The tendency of Large Language Models to generate hallucinations and exhibit overconfidence in predictions raises concerns regarding their reliability. Confidence or uncertainty estimations indicating the extent of trustworthiness of a model's response are essential to developing reliable AI systems. Current research primarily focuses on LLM confidence estimations in English, remaining a void for… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  42. arXiv:2402.13514  [pdf, other

    cs.CL cs.AI

    Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions

    Authors: Hongru Wang, Boyang Xue, Baohang Zhou, Tianhua Zhang, Cunxiang Wang, Guanhua Chen, Huimin Wang, Kam-fai Wong

    Abstract: Retrieve-then-read and generate-then-read are two typical solutions to handle unknown and known questions in open-domain question-answering, while the former retrieves necessary external knowledge and the later prompt the large language models to generate internal known knowledge encoded in the parameters. However, few of previous works consider the compositional unknown questions, which consist o… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  43. arXiv:2402.08492  [pdf

    cs.AI

    The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale

    Authors: Xiaoqiang Liu, Yubin Wang, Zicheng Huang, Boming Xu, Yilin Zeng, Xinqi Chen, Zilong Wang, Enning Yang, Xiaoxuan Lei, Yisen Huang, Xiaobo Liu

    Abstract: Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospect… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  44. arXiv:2402.05644  [pdf, other

    cs.RO cs.CV

    FuncGrasp: Learning Object-Centric Neural Grasp Functions from Single Annotated Example Object

    Authors: Hanzhi Chen, Binbin Xu, Stefan Leutenegger

    Abstract: We present FuncGrasp, a framework that can infer dense yet reliable grasp configurations for unseen objects using one annotated object and single-view RGB-D observation via categorical priors. Unlike previous works that only transfer a set of grasp poses, FuncGrasp aims to transfer infinite configurations parameterized by an object-centric continuous grasp function across varying instances. To eas… ▽ More

    Submitted 22 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA 2024

  45. arXiv:2402.04236  [pdf, other

    cs.CV cs.CL

    CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

    Authors: Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

    Abstract: Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers. However, this conclusive alignment leads models to ignore critical visual reasoning, and further result in failures on meticulous visual problems and unfaithful responses. In this paper, we propose Chain of Manipulations, a mechanism that enables VLMs… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 17 pages, 7 figures

  46. arXiv:2402.00904  [pdf, ps, other

    cs.LG cs.AI

    Graph Domain Adaptation: Challenges, Progress and Prospects

    Authors: Boshen Shi, Yongqing Wang, Fangda Guo, Bingbing Xu, Huawei Shen, Xueqi Cheng

    Abstract: As graph representation learning often suffers from label scarcity problems in real-world applications, researchers have proposed graph domain adaptation (GDA) as an effective knowledge-transfer paradigm across graphs. In particular, to enhance model performance on target graphs with specific tasks, GDA introduces a bunch of task-related graphs as source graphs and adapts the knowledge learnt from… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  47. arXiv:2402.00324  [pdf, other

    cs.LG

    A Consistent Lebesgue Measure for Multi-label Learning

    Authors: Kaan Demir, Bach Nguyen, Bing Xue, Mengjie Zhang

    Abstract: Multi-label loss functions are usually non-differentiable, requiring surrogate loss functions for gradient-based optimisation. The consistency of surrogate loss functions is not proven and is exacerbated by the conflicting nature of multi-label loss functions. To directly learn from multiple related, yet potentially conflicting multi-label loss functions, we propose a Consistent Lebesgue Measure-b… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  48. arXiv:2401.17603  [pdf, other

    cs.CV

    Topology-Aware Latent Diffusion for 3D Shape Generation

    Authors: Jiangbei Hu, Ben Fei, Baixin Xu, Fei Hou, Weidong Yang, Shengfa Wang, Na Lei, Chen Qian, Ying He

    Abstract: We introduce a new generative model that combines latent diffusion with persistent homology to create 3D shapes with high diversity, with a special emphasis on their topological characteristics. Our method involves representing 3D shapes as implicit fields, then employing persistent homology to extract topological features, including Betti numbers and persistence diagrams. The shape generation pro… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 16 pages, 9 figures

    ACM Class: I.3.5; I.2.10

  49. arXiv:2401.15459  [pdf, other

    cs.SE

    Multi-LLM Collaboration + Data-Centric Innovation = 2x Better Vulnerability Repair

    Authors: Xin Zhou, Kisub Kim, Bowen Xu, DongGyun Han, David Lo

    Abstract: The advances of deep learning (DL) have paved the way for automatic software vulnerability repair approaches, which effectively learn the mapping from the vulnerable code to the fixed code. Nevertheless, existing DL-based vulnerability repair methods face notable limitations: 1) they struggle to handle lengthy vulnerable code, 2) they treat code as natural language texts, neglecting its inherent s… ▽ More

    Submitted 12 March, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

    Comments: Accepted in the ICSE 2024 Research Track with a different title "Out of Sight, Out of Mind: Better Automatic Vulnerability Repair by Broadening Input Ranges and Sources"

  50. MultiTest: Physical-Aware Object Insertion for Testing Multi-sensor Fusion Perception Systems

    Authors: Xinyu Gao, Zhijie Wang, Yang Feng, Lei Ma, Zhenyu Chen, Baowen Xu

    Abstract: Multi-sensor fusion stands as a pivotal technique in addressing numerous safety-critical tasks and applications, e.g., self-driving cars and automated robotic arms. With the continuous advancement in data-driven artificial intelligence (AI), MSF's potential for sensing and understanding intricate external environments has been further amplified, bringing a profound impact on intelligent systems an… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: The first two authors contributed equally. To appear in the proceedings of the 46th International Conference on Software Engineering (ICSE 2024)