Skip to main content

Showing 1–50 of 487 results for author: Shen, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03946  [pdf

    cs.SI

    Association between centrality and flourishing trait: analyzing student co-occurrence networks drawn from dining activities

    Authors: Yi Cao, Shimin Cai, Xiaorong Shen, Tao Zhou

    Abstract: Comprehending the association between social capabilities and individual psychological traits is paramount for educational administrators. Presently, many studies heavily depend on online questionnaires and self-reported data, while analysis of the connection between offline social networks and mental health status remains scarce. By leveraging a public dataset encompassing on-campus dining activi… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, 1 Table

  2. arXiv:2405.03486  [pdf, other

    cs.CR cs.CV cs.SI

    UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images

    Authors: Yiting Qu, Xinyue Shen, Yixin Wu, Michael Backes, Savvas Zannettou, Yang Zhang

    Abstract: Image safety classifiers play an important role in identifying and mitigating the spread of unsafe images online (e.g., images including violence, hateful rhetoric, etc.). At the same time, with the advent of text-to-image models and increasing concerns about the safety of AI models, developers are increasingly relying on image safety classifiers to safeguard their models. Yet, the performance of… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2405.01221  [pdf, other

    cs.NI

    A Survey on Semantic Communication Networks: Architecture, Security, and Privacy

    Authors: Shaolong Guo, Yuntao Wang, Ning Zhang, Zhou Su, Tom H. Luan, Zhiyi Tian, Xuemin Shen

    Abstract: Semantic communication, emerging as a breakthrough beyond the classical Shannon paradigm, aims to convey the essential meaning of source data rather than merely focusing on precise yet content-agnostic bit transmission. By interconnecting diverse intelligent agents (e.g., autonomous vehicles and VR devices) via semantic communications, the semantic communication networks (SemComNet) supports seman… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2404.16812  [pdf, other

    cs.DC

    ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUs

    Authors: Xinning Hui, Yuanchao Xu, Zhishan Guo, Xipeng Shen

    Abstract: Recent years have witnessed increasing interest in machine learning inferences on serverless computing for its auto-scaling and cost effective properties. Existing serverless computing, however, lacks effective job scheduling methods to handle the schedule space dramatically expanded by GPU sharing, task batching, and inter-task relations. Prior solutions have dodged the issue by neglecting some i… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24)

  5. arXiv:2404.15625  [pdf, other

    cs.LG

    Optimizing OOD Detection in Molecular Graphs: A Novel Approach with Diffusion Models

    Authors: Xu Shen, Yili Wang, Kaixiong Zhou, Shirui Pan, Xin Wang

    Abstract: The open-world test dataset is often mixed with out-of-distribution (OOD) samples, where the deployed models will struggle to make accurate predictions. Traditional detection methods need to trade off OOD detection and in-distribution (ID) classification performance since they share the same representation learning model. In this work, we propose to detect OOD molecules by adopting an auxiliary di… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 11 pages,10 figures

  6. arXiv:2404.14720  [pdf, other

    cs.CR

    Incorporating Gradients to Rules: Towards Lightweight, Adaptive Provenance-based Intrusion Detection

    Authors: Lingzhi Wang, Xiangmin Shen, Weijian Li, Zhenyuan Li, R. Sekar, Han Liu, Yan Chen

    Abstract: As cyber-attacks become increasingly sophisticated and stealthy, it becomes more imperative and challenging to detect intrusion from normal behaviors. Through fine-grained causality analysis, provenance-based intrusion detection systems (PIDS) demonstrated a promising capacity to distinguish benign and malicious behaviors, attracting widespread attention from both industry and academia. Among dive… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  7. arXiv:2404.14381  [pdf, other

    cs.CV cs.MM

    TAVGBench: Benchmarking Text to Audible-Video Generation

    Authors: Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

    Abstract: The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1.7 million clips with a total duration of 11.8 th… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Technical Report. Project page:https://github.com/OpenNLPLab/TAVGBench

  8. arXiv:2404.14122  [pdf, other

    cs.CL

    Fine-Tuning Large Language Models to Translate: Will a Touch of Noisy Data in Misaligned Languages Suffice?

    Authors: Dawei Zhu, Pinzhen Chen, Miaoran Zhang, Barry Haddow, Xiaoyu Shen, Dietrich Klakow

    Abstract: Traditionally, success in multilingual machine translation can be attributed to three key factors in training data: large volume, diverse translation directions, and high quality. In the current practice of fine-tuning large language models (LLMs) for translation, we revisit the importance of all these factors. We find that LLMs display strong translation capability after being fine-tuned on as fe… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.13898  [pdf, other

    cs.NI

    Cross-Modal Generative Semantic Communications for Mobile AIGC: Joint Semantic Encoding and Prompt Engineering

    Authors: Yinqiu Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shiwen Mao, Ping Zhang, Xuemin Shen

    Abstract: Employing massive Mobile AI-Generated Content (AIGC) Service Providers (MASPs) with powerful models, high-quality AIGC services can become accessible for resource-constrained end users. However, this advancement, referred to as mobile AIGC, also introduces a significant challenge: users should download large AIGC outputs from the MASPs, leading to substantial bandwidth consumption and potential tr… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.13749  [pdf, other

    cs.NI

    Efficient Digital Twin Data Processing for Low-Latency Multicast Short Video Streaming

    Authors: Xinyu Huang, Shisheng Hu, Mushu Li, Cheng Huang, Xuemin Shen

    Abstract: In this paper, we propose a novel efficient digital twin (DT) data processing scheme to reduce service latency for multicast short video streaming. Particularly, DT is constructed to emulate and analyze user status for multicast group update and swipe feature abstraction. Then, a precise measurement model of DT data processing is developed to characterize the relationship among DT model size, user… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 6 pages, 6 figures, submitted to ICCC 2024

  11. arXiv:2404.13649  [pdf, other

    stat.ML cs.LG stat.ME

    Distributional Principal Autoencoders

    Authors: Xinwei Shen, Nicolai Meinshausen

    Abstract: Dimension reduction techniques usually lose information in the sense that reconstructed data are not identical to the original data. However, we argue that it is possible to have reconstructed data identically distributed as the original data, irrespective of the retained dimension or the specific mapping. This can be achieved by learning a distributional model that matches the conditional distrib… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  12. arXiv:2404.13528  [pdf, other

    cs.LG cs.AI cs.DC

    SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

    Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

    Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.12567  [pdf

    cs.HC

    Impact of Vibrotactile Triggers on Mental Well-Being through ASMR Experience in VR

    Authors: Danyang Peng, Tanner Person, Ximing Shen, Yun Suen Pai, Giulia Barbareschi, Shengyin Li, Kouta Minamizawa

    Abstract: Watching Autonomous Sensory Meridian Response (ASMR) videos is a popular approach to support mental well-being, as the triggered ASMR tingling sensation supports de-stressing and regulating emotions. Therefore, there is increasing research on how to efficiently trigger ASMR tingling sensation. Tactile sensation remains unexplored because current popular ASMR approaches focus on the visual and audi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  14. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota , et al. (72 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  15. arXiv:2404.11288  [pdf, other

    cs.CL

    A Preference-driven Paradigm for Enhanced Translation with Large Language Models

    Authors: Dawei Zhu, Sony Trenous, Xiaoyu Shen, Dietrich Klakow, Bill Byrne, Eva Hasler

    Abstract: Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine-tuning (SFT) using only a small amount of parallel data. However, SFT simply instructs the model to imitate the reference translations at the token level, making it vulnerable to the noise present in the references. Hence, the assistance from SFT often reaches a platea… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 (long, main)

  16. arXiv:2404.11276  [pdf, other

    cs.AI q-fin.GN

    RD2Bench: Toward Data-Centric Automatic R&D

    Authors: Haotian Chen, Xinjie Shen, Zeqi Ye, Xiao Yang, Xu Yang, Weiqing Liu, Jiang Bian

    Abstract: The progress of humanity is driven by those successful discoveries accompanied by countless failed experiments. Researchers often seek the potential research directions by reading and then verifying them through experiments. The process imposes a significant burden on researchers. In the past decade, the data-driven black-box deep learning method demonstrates its effectiveness in a wide range of r… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures,

  17. arXiv:2404.08677  [pdf, other

    cs.IR cs.AI cs.CL

    PMG : Personalized Multimodal Generation with Large Language Models

    Authors: Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, Xi Xiao

    Abstract: The emergence of large language models (LLMs) has revolutionized the capabilities of text comprehension and generation. Multi-modal generation attracts great attention from both the industry and academia, but there is little work on personalized generation, which has important applications such as recommender systems. This paper proposes the first method for personalized multimodal generation usin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  18. arXiv:2404.08639  [pdf, other

    cs.CV

    COCONut: Modernizing COCO Segmentation

    Authors: Xueqing Deng, Qihang Yu, Peng Wang, Xiaohui Shen, Liang-Chieh Chen

    Abstract: In recent decades, the vision community has witnessed remarkable progress in visual recognition, partially owing to advancements in dataset benchmarks. Notably, the established COCO benchmark has propelled the development of modern detection and segmentation systems. However, the COCO segmentation benchmark has seen comparatively slow improvement over the last decade. Originally equipped with coar… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR2024, data available at https://xdeng7.github.io/coconut.github.io/

  19. arXiv:2404.07904  [pdf, other

    cs.CL

    HGRN2: Gated Linear RNNs with State Expansion

    Authors: Zhen Qin, Songlin Yang, Weixuan Sun, Xuyang Shen, Dong Li, Weigao Sun, Yiran Zhong

    Abstract: Hierarchically gated linear RNN (HGRN,Qin et al. 2023) has demonstrated competitive training speed and performance in language modeling, while offering efficient inference. However, the recurrent state size of HGRN remains relatively small, which limits its expressiveness.To address this issue, inspired by linear attention, we introduce a simple outer-product-based state expansion mechanism so tha… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: Techinical Report. Yiran Zhong is the corresponding author. The source code is available at https://github.com/OpenNLPLab/HGRN2

  20. arXiv:2404.06798  [pdf, other

    cs.CV

    MedRG: Medical Report Grounding with Multi-modal Large Language Model

    Authors: Ke Zou, Yang Bai, Zhihao Chen, Yang Zhou, Yidi Chen, Kai Ren, Meng Wang, Xuedong Yuan, Xiaojing Shen, Huazhu Fu

    Abstract: Medical Report Grounding is pivotal in identifying the most relevant regions in medical images based on a given phrase query, a critical aspect in medical image analysis and radiological diagnosis. However, prevailing visual grounding approaches necessitate the manual extraction of key phrases from medical reports, imposing substantial burdens on both system efficiency and physicians. In this pape… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 4 figures

  21. arXiv:2404.06182  [pdf, other

    cs.NI

    Streamlined Transmission: A Semantic-Aware XR Deployment Framework Enhanced by Generative AI

    Authors: Wanting Yang, Zehui Xiong, Tony Q. S. Quek, Xuemin Shen

    Abstract: In the era of 6G, featuring compelling visions of digital twins and metaverses, Extended Reality (XR) has emerged as a vital conduit connecting the digital and physical realms, garnering widespread interest. Ensuring a fully immersive wireless XR experience stands as a paramount technical necessity, demanding the liberation of XR from the confines of wired connections. In this paper, we first intr… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Under review with IEEE Network

  22. arXiv:2404.05689  [pdf, other

    cs.LG cs.AI

    Automated discovery of symbolic laws governing skill acquisition from naturally occurring data

    Authors: Sannyuya Liu, Qing Li, Xiaoxuan Shen, Jianwen Sun, Zongkai Yang

    Abstract: Skill acquisition is a key area of research in cognitive psychology as it encompasses multiple psychological processes. The laws discovered under experimental paradigms are controversial and lack generalizability. This paper aims to unearth the laws of skill learning from large-scale training log data. A two-stage algorithm was developed to tackle the issues of unobservable cognitive states and al… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  23. arXiv:2404.05029  [pdf, other

    cs.CV

    LOGO: A Long-Form Video Dataset for Group Action Quality Assessment

    Authors: Shiyi Zhang, Wenxun Dai, Sujia Wang, Xiangwei Shen, Jiwen Lu, Jie Zhou, Yansong Tang

    Abstract: Action quality assessment (AQA) has become an emerging topic since it can be extensively applied in numerous scenarios. However, most existing methods and datasets focus on single-person short-sequence scenes, hindering the application of AQA in more complex situations. To address this issue, we construct a new multi-person long-form video dataset for action quality assessment named LOGO. Distingu… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2023

  24. arXiv:2404.03413  [pdf, other

    cs.CV

    MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

    Authors: Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny

    Abstract: This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding. The model is capable of processing both temporal visual and textual data, making it adept at understanding the complexities of videos. Building upon the success of MiniGPT-v2, which excelled in translating visual features into the LLM space for single images and achieved imp… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 6 pages,8 figures

  25. arXiv:2404.03025  [pdf, other

    cs.NI

    When Digital Twin Meets Generative AI: Intelligent Closed-Loop Network Management

    Authors: Xinyu Huang, Haojun Yang, Conghao Zhou, Mingcheng He, Xuemin Shen, Weihua Zhuang

    Abstract: Generative artificial intelligence (GAI) and digital twin (DT) are advanced data processing and virtualization technologies to revolutionize communication networks. Thanks to the powerful data processing capabilities of GAI, integrating it into DT is a potential approach to construct an intelligent holistic virtualized network for better network management performance. To this end, we propose a GA… ▽ More

    Submitted 8 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 8 pages, 5 figures

  26. arXiv:2404.02882  [pdf, other

    cs.LG cs.CL

    Linear Attention Sequence Parallelism

    Authors: Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong

    Abstract: Sequence Parallel (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single GPU. However, existing SP methods do not take advantage of linear attention features, resulting in sub-optimal parallelism efficiency and usability for linear attention-based language models. In this paper, we introduce Linear Attention Sequence Parallel (LASP), an efficient SP m… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Technical Report. Weigao Sun and Zhen Qin contribute equally to this paper. Yiran Zhong is the corresponding author. The code is available at https://github.com/OpenNLPLab/LASP

  27. arXiv:2404.02132  [pdf, other

    cs.CV

    ViTamin: Designing Scalable Vision Models in the Vision-Language Era

    Authors: Jieneng Chen, Qihang Yu, Xiaohui Shen, Alan Yuille, Liang-Chieh Chen

    Abstract: Recent breakthroughs in vision-language models (VLMs) start a new page in the vision community. The VLMs provide stronger and more generalizable feature embeddings compared to those from ImageNet-pretrained models, thanks to the training on the large-scale Internet image-text pairs. However, despite the amazing achievement from the VLMs, vanilla Vision Transformers (ViTs) remain the default choice… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: CVPR 2024; https://github.com/Beckschen/ViTamin

  28. arXiv:2403.16408  [pdf, other

    cs.NI eess.SP

    Accuracy-Aware Cooperative Sensing and Computing for Connected Autonomous Vehicles

    Authors: Xuehan Ye, Kaige Qu, Weihua Zhuang, Xuemin Shen

    Abstract: To maintain high perception performance among connected and autonomous vehicles (CAVs), in this paper, we propose an accuracy-aware and resource-efficient raw-level cooperative sensing and computing scheme among CAVs and road-side infrastructure. The scheme enables fined-grained partial raw sensing data selection, transmission, fusion, and processing in per-object granularity, by exploiting the pa… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  29. arXiv:2403.14346  [pdf, other

    cs.CV

    Towards Efficient Information Fusion: Concentric Dual Fusion Attention Based Multiple Instance Learning for Whole Slide Images

    Authors: Yujian Liu, Ruoxuan Wu, Xinjie Shen, Zihuang Lu, Lingyu Liang, Haiyu Zhou, Shipu Xu, Shaoai Cai, Shidang Xu

    Abstract: In the realm of digital pathology, multi-magnification Multiple Instance Learning (multi-mag MIL) has proven effective in leveraging the hierarchical structure of Whole Slide Images (WSIs) to reduce information loss and redundant data. However, current methods fall short in bridging the domain gap between pretrained models and medical imaging, and often fail to account for spatial relationships ac… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 14 pages, 7 figures

  30. arXiv:2403.12541  [pdf, other

    cs.CR

    TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events

    Authors: Zhenyuan Li, Yangyang Wei, Xiangmin Shen, Lingzhi Wang, Yan Chen, Haitao Xu, Shouling Ji, Fan Zhang

    Abstract: The evolution and advancement of cyberattacks pose challenges to existing security products. Recent concentrated research on provenance graph-based detection has proved its effectiveness in attack detection and investigation. However, implementing these approaches in practice encounters challenges such as high overhead, slow responsiveness, and low interpretability and extensibility. Towards pra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  31. arXiv:2403.08826  [pdf, other

    cs.HC cs.LG

    A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

    Authors: Fei Wang, Haoyu Liu, Haoyang Bi, Xiangzhuang Shen, Renyu Zhu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Qi Liu, Zhenya Huang, Enhong Chen

    Abstract: For the purpose of efficient and cost-effective large-scale data labeling, crowdsourcing is increasingly being utilized. To guarantee the quality of data labeling, multiple annotations need to be collected for each data sample, and truth inference algorithms have been developed to accurately infer the true labels. Despite previous studies having released public datasets to evaluate the efficacy of… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  32. arXiv:2403.06563  [pdf, other

    cs.LG cs.CL

    Unraveling the Mystery of Scaling Laws: Part I

    Authors: Hui Su, Zhi Tian, Xiaoyu Shen, Xunliang Cai

    Abstract: Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by O… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  33. arXiv:2403.05014  [pdf, other

    cs.LG cs.AI

    Simple Multigraph Convolution Networks

    Authors: Danyang Wu, Xinjie Shen, Jitao Lu, Jin Xu, Feiping Nie

    Abstract: Existing multigraph convolution methods either ignore the cross-view interaction among multiple graphs, or induce extremely high computational cost due to standard cross-view polynomial operators. To alleviate this problem, this paper proposes a Simple MultiGraph Convolution Networks (SMGCN) which first extracts consistent cross-view topology from multigraphs including edge-level and subgraph-leve… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW 2024 Short

  34. arXiv:2403.04258  [pdf, other

    cs.CV

    Depth-aware Test-Time Training for Zero-shot Video Object Segmentation

    Authors: Weihuang Liu, Xi Shen, Haolun Li, Xiuli Bi, Bo Liu, Chi-Man Pun, Xiaodong Cun

    Abstract: Zero-shot Video Object Segmentation (ZSVOS) aims at segmenting the primary moving object without any human annotations. Mainstream solutions mainly focus on learning a single model on large-scale video datasets, which struggle to generalize to unseen videos. In this work, we introduce a test-time training (TTT) strategy to address the problem. Our key insight is to enforce the model to predict con… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  35. arXiv:2403.02647  [pdf, other

    cs.CL cs.AI

    FinReport: Explainable Stock Earnings Forecasting via News Factor Analyzing Model

    Authors: Xiangyu Li, Xinjie Shen, Yawen Zeng, Xiaofen Xing, Jin Xu

    Abstract: The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is not easy for ordinary investors to mine factors and analyze news. On the other hand, although large language models in the financial field can serve users in the form of dialogue robots, it still requires users to have… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by WWW 2024

  36. arXiv:2403.01758  [pdf, other

    eess.IV cs.CV cs.LG

    AFBT GAN: enhanced explainability and diagnostic performance for cognitive decline by counterfactual generative adversarial network

    Authors: Xiongri Shen, Zhenxi Song, Zhiguo Zhang

    Abstract: Existing explanation results of functional connectivity (FC) are normally generated by using classification result labels and correlation analysis methods such as Pearson's correlation or gradient backward. However, the diagnostic model is still trained on the black box model and might lack the attention of FCs in important regions during the training. To enhance the explainability and improve dia… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 10 pages, 5 figures

  37. arXiv:2403.00543  [pdf, other

    cs.CV

    SURE: SUrvey REcipes for building reliable and robust deep networks

    Authors: Yuting Li, Yingyi Chen, Xuanlong Yu, Dexiong Chen, Xi Shen

    Abstract: In this paper, we revisit techniques for uncertainty estimation within deep neural networks and consolidate a suite of techniques to enhance their reliability. Our investigation reveals that an integrated application of diverse techniques--spanning model regularization, classifier and optimization--substantially improves the accuracy of uncertainty predictions in image classification tasks. The sy… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  38. arXiv:2402.19095  [pdf

    q-bio.BM cs.LG

    A Protein Structure Prediction Approach Leveraging Transformer and CNN Integration

    Authors: Yanlin Zhou, Kai Tan, Xinyu Shen, Zheng He, Haotian Zheng

    Abstract: Proteins are essential for life, and their structure determines their function. The protein secondary structure is formed by the folding of the protein primary structure, and the protein tertiary structure is formed by the bending and folding of the secondary structure. Therefore, the study of protein secondary structure is very helpful to the overall understanding of protein structure. Although t… ▽ More

    Submitted 8 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  39. arXiv:2402.17213  [pdf, other

    cs.CV cs.AI

    VCD: Knowledge Base Guided Visual Commonsense Discovery in Images

    Authors: Xiangqing Shen, Yurun Song, Siwei Wu, Rui Xia

    Abstract: Visual commonsense contains knowledge about object properties, relationships, and behaviors in visual data. Discovering visual commonsense can provide a more comprehensive and richer understanding of images, and enhance the reasoning and decision-making capabilities of computer vision systems. However, the visual commonsense defined in existing visual commonsense discovery studies is coarse-graine… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  40. arXiv:2402.16374  [pdf, other

    cs.LG cs.SI

    Graph Learning under Distribution Shifts: A Comprehensive Survey on Domain Adaptation, Out-of-distribution, and Continual Learning

    Authors: Man Wu, Xin Zheng, Qin Zhang, Xiao Shen, Xiong Luo, Xingquan Zhu, Shirui Pan

    Abstract: Graph learning plays a pivotal role and has gained significant attention in various application scenarios, from social network analysis to recommendation systems, for its effectiveness in modeling complex data relations represented by graph structural data. In reality, the real-world graph data typically show dynamics over time, with changing node attributes and edge structure, leading to the seve… ▽ More

    Submitted 7 March, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  41. arXiv:2402.15900  [pdf, other

    cs.NI

    Dedicated Restricted Target Wake Time for Real-Time Applications in Wi-Fi 7

    Authors: Andrey Belogaev, Xiaoman Shen, Chun Pan, Xingfeng Jiang, Chris Blondia, Jeroen Famaey

    Abstract: Real-time applications (RTA) tend to play a crucial role in people's everyday life. Such applications are among the key use cases for the next generations of wireless technologies. RTA applications are characterized by strict guaranteed delay requirements (in the order of a few milliseconds). One of the pillars of enabling RTA in next-generation Wi-Fi standards is Restricted Target Wake Time (R-TW… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: IEEE WCNC 2024

  42. arXiv:2402.15796  [pdf

    cs.AI cs.HC

    Construction and application of artificial intelligence crowdsourcing map based on multi-track GPS data

    Authors: Yong Wang, Yanlin Zhou, Huan Ji, Zheng He, Xinyu Shen

    Abstract: In recent years, the rapid development of high-precision map technology combined with artificial intelligence has ushered in a new development opportunity in the field of intelligent vehicles. High-precision map technology is an important guarantee for intelligent vehicles to achieve autonomous driving. However, due to the lack of research on high-precision map technology, it is difficult to ratio… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  43. arXiv:2402.14213  [pdf

    q-bio.NC cs.LG eess.SP

    Contrastive Learning of Shared Spatiotemporal EEG Representations Across Individuals for Naturalistic Neuroscience

    Authors: Xinke Shen, Lingyi Tao, Xuyang Chen, Sen Song, Quanying Liu, Dan Zhang

    Abstract: Neural representations induced by naturalistic stimuli offer insights into how humans respond to peripheral stimuli in daily life. The key to understanding the general neural mechanisms underlying naturalistic stimuli processing involves aligning neural activities across individuals and extracting inter-subject shared neural representations. Targeting the Electroencephalogram (EEG) technique, know… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: 52 pages, 14 figures

  44. arXiv:2402.12976  [pdf, other

    cs.CL cs.AI

    The Impact of Demonstrations on Multilingual In-Context Learning: A Multidimensional Analysis

    Authors: Miaoran Zhang, Vagrant Gautam, Mingyang Wang, Jesujoba O. Alabi, Xiaoyu Shen, Dietrich Klakow, Marius Mosbach

    Abstract: In-context learning is a popular inference strategy where large language models solve a task using only a few labelled demonstrations without needing any parameter updates. Compared to work on monolingual (English) in-context learning, multilingual in-context learning is under-explored, and we lack an in-depth understanding of the role of demonstrations in this context. To address this gap, we con… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  45. arXiv:2402.11853  [pdf, other

    cs.HC cs.CY cs.RO cs.SD

    Beyond Voice Assistants: Exploring Advantages and Risks of an In-Car Social Robot in Real Driving Scenarios

    Authors: Yuanchao Li, Lachlan Urquhart, Nihan Karatas, Shun Shao, Hiroshi Ishiguro, Xun Shen

    Abstract: In-car Voice Assistants (VAs) play an increasingly critical role in automotive user interface design. However, existing VAs primarily perform simple 'query-answer' tasks, limiting their ability to sustain drivers' long-term attention. In this study, we investigate the effectiveness of an in-car Robot Assistant (RA) that offers functionalities beyond voice interaction. We aim to answer the question… ▽ More

    Submitted 20 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: Submitted to ACM Transactions on Computer-Human Interaction

  46. arXiv:2402.11095  [pdf, other

    cs.CV

    GIM: Learning Generalizable Image Matcher From Internet Videos

    Authors: Xuelun Shen, Zhipeng Cai, Wei Yin, Matthias Müller, Zijun Li, Kaixuan Wang, Xiaozhi Chen, Cheng Wang

    Abstract: Image matching is a fundamental computer vision problem. While learning-based methods achieve state-of-the-art performance on existing benchmarks, they generalize poorly to in-the-wild images. Such methods typically need to train separate models for different scene types and are impractical when the scene type is unknown in advance. One of the underlying problems is the limited scalability of exis… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted to ICLR 2024 for spotlight presentation

  47. arXiv:2402.10787  [pdf, other

    cs.LG cs.AI cs.CL

    EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge

    Authors: Xuan Shen, Zhenglun Kong, Changdi Yang, Zhaoyang Han, Lei Lu, Peiyan Dong, Cheng Lyu, Chih-hsiang Li, Xuehang Guo, Zhihao Shu, Wei Niu, Miriam Leeser, Pu Zhao, Yanzhi Wang

    Abstract: Despite the remarkable strides of Large Language Models (LLMs) in various fields, the wide applications of LLMs on edge devices are limited due to their massive parameters and computations. To address this, quantization is commonly adopted to generate lightweight LLMs with efficient computations and fast inference. However, Post-Training Quantization (PTQ) methods dramatically degrade in quality w… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Preprint

  48. arXiv:2402.09460  [pdf, other

    eess.SP cs.LG

    Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

    Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

    Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  49. arXiv:2402.05668  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Comprehensive Assessment of Jailbreak Attacks Against LLMs

    Authors: Junjie Chu, Yugeng Liu, Ziqing Yang, Xinyue Shen, Michael Backes, Yang Zhang

    Abstract: Misuse of the Large Language Models (LLMs) has raised widespread concern. To address this issue, safeguards have been taken to ensure that LLMs align with social ethics. However, recent findings have revealed an unsettling vulnerability bypassing the safeguards of LLMs, known as jailbreak attacks. By applying techniques, such as employing role-playing scenarios, adversarial examples, or subtle sub… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 18 pages, 12 figures

  50. arXiv:2402.04779  [pdf, other

    cs.CL cs.AI

    StableMask: Refining Causal Masking in Decoder-only Transformer

    Authors: Qingyu Yin, Xuzheng He, Xiang Zhuang, Yu Zhao, Jianhua Yao, Xiaoyu Shen, Qiang Zhang

    Abstract: The decoder-only Transformer architecture with causal masking and relative position encoding (RPE) has become the de facto choice in language modeling. Despite its exceptional performance across various tasks, we have identified two limitations: First, it requires all attention scores to be non-zero and sum up to 1, even if the current embedding has sufficient self-contained information. This comp… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: Preprint