Skip to main content

Showing 1–50 of 8,906 results for author: Wang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05841  [pdf, other

    cs.CV

    Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

    Authors: Zuan Gao, Yuxin Wang, Yadong Qu, Boqiang Zhang, Zixiao Wang, Jianjun Xu, Hongtao Xie

    Abstract: In text recognition, self-supervised pre-training emerges as a good solution to reduce dependence on expansive annotated real data. Previous studies primarily focus on local visual representation by leveraging mask image modeling or sequence contrastive learning. However, they omit modeling the linguistic information in text images, which is crucial for recognizing text. To simultaneously capture… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to IJCAI2024

  3. arXiv:2405.05786  [pdf, other

    cs.LG

    FusionTransNet for Smart Urban Mobility: Spatiotemporal Traffic Forecasting Through Multimodal Network Integration

    Authors: Binwu Wang, Yan Leng, Guang Wang, Yang Wang

    Abstract: This study develops FusionTransNet, a framework designed for Origin-Destination (OD) flow predictions within smart and multimodal urban transportation systems. Urban transportation complexity arises from the spatiotemporal interactions among various traffic modes. Motivated by analyzing multimodal data from Shenzhen, a framework that can dissect complicated spatiotemporal interactions between thes… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2405.05691  [pdf, other

    cs.CV cs.MM

    StableMoFusion: Towards Robust and Efficient Diffusion-based Motion Generation Framework

    Authors: Yiheng Huang, Hui Yang, Chuanchen Luo, Yuxi Wang, Shibiao Xu, Zhaoxiang Zhang, Man Zhang, Junran Peng

    Abstract: Thanks to the powerful generative capacity of diffusion models, recent years have witnessed rapid progress in human motion generation. Existing diffusion-based methods employ disparate network architectures and training strategies. The effect of the design of each component is still unclear. In addition, the iterative denoising process consumes considerable computational overhead, which is prohibi… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.05672  [pdf, other

    cs.CV

    Multi-Stream Keypoint Attention Network for Sign Language Recognition and Translation

    Authors: Mo Guan, Yan Wang, Guangkun Ma, Jiarui Liu, Mingzu Sun

    Abstract: Sign language serves as a non-vocal means of communication, transmitting information and significance through gestures, facial expressions, and bodily movements. The majority of current approaches for sign language recognition (SLR) and translation rely on RGB video inputs, which are vulnerable to fluctuations in the background. Employing a keypoint-based strategy not only mitigates the effects of… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 15 pages

  6. arXiv:2405.05665  [pdf, other

    cs.LG q-bio.QM

    SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

    Authors: Jiying Zhang, Zijing Liu, Yu Wang, Yu Li

    Abstract: Molecular representation learning has shown great success in advancing AI-based drug discovery. The core of many recent works is based on the fact that the 3D geometric structure of molecules provides essential information about their physical and chemical characteristics. Recently, denoising diffusion probabilistic models have achieved impressive performance in 3D molecular representation learnin… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 31 pages

  7. arXiv:2405.05641  [pdf, other

    eess.SP cs.IT

    Channel Estimation for Holographic MIMO: Wavenumber-Domain Sparsity Inspired Approaches

    Authors: Yuqing Guo, Yuanbin Chen, Ying Wang

    Abstract: This paper investigates the sparse channel estimation for holographic multiple-input multiple-output (HMIMO) systems. Given that the wavenumber-domain representation is based on a series of Fourier harmonics that are in essence a series of orthogonal basis functions, a novel wavenumber-domain sparsifying basis is designed to expose the sparsity inherent in HMIMO channels. Furthermore, by harnessin… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This paper has been submitted to IEEE WCL, Major Revision

  8. arXiv:2405.05615  [pdf, other

    cs.CV cs.CL cs.LG

    Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning

    Authors: Shibo Jie, Yehui Tang, Ning Ding, Zhi-Hong Deng, Kai Han, Yunhe Wang

    Abstract: Current solutions for efficiently constructing large vision-language (VL) models follow a two-step paradigm: projecting the output of pre-trained vision encoders to the input space of pre-trained language models as visual prompts; and then transferring the models to downstream VL tasks via end-to-end parameter-efficient fine-tuning (PEFT). However, this paradigm still exhibits inefficiency since i… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML2024

  9. arXiv:2405.05587  [pdf, other

    cs.CV cs.LG

    Navigate Beyond Shortcuts: Debiased Learning through the Lens of Neural Collapse

    Authors: Yining Wang, Junjie Sun, Chenyue Wang, Mi Zhang, Min Yang

    Abstract: Recent studies have noted an intriguing phenomenon termed Neural Collapse, that is, when the neural networks establish the right correlation between feature spaces and the training targets, their last-layer features, together with the classifier weights, will collapse into a stable and symmetric structure. In this paper, we extend the investigation of Neural Collapse to the biased datasets with im… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: CVPR 2024 Highlight

  10. arXiv:2405.05583  [pdf, other

    cs.CL

    OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

    Authors: Yuxia Wang, Minghan Wang, Hasan Iqbal, Georgi Georgiev, Jiahui Geng, Preslav Nakov

    Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assessing the factuality of free-form responses in open domains. Also, different papers use disparate evaluation benchmarks and measurements, which renders them hard to compare and hampers future progress. To mitigat… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 19 pages, 8 tables, 8 figures

  11. arXiv:2405.05525  [pdf, other

    cs.CR cs.LG

    Ditto: Quantization-aware Secure Inference of Transformers upon MPC

    Authors: Haoqi Wu, Wenjing Fang, Yancheng Zheng, Junming Ma, Jin Tan, Yinggui Wang, Lei Wang

    Abstract: Due to the rising privacy concerns on sensitive client data and trained models like Transformers, secure multi-party computation (MPC) techniques are employed to enable secure inference despite attendant overhead. Existing works attempt to reduce the overhead using more MPC-friendly non-linear function approximations. However, the integration of quantization widely used in plaintext inference into… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: to be published in ICML 2024

  12. arXiv:2405.05523  [pdf, other

    cs.CV cs.AI

    Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

    Authors: Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu

    Abstract: Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, Port enhances the ba… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICMEW 2024. arXiv admin note: text overlap with arXiv:2404.13657

  13. arXiv:2405.05445  [pdf, other

    cs.LG

    Large Language Model Enhanced Machine Learning Estimators for Classification

    Authors: Yuhang Wu, Yingfei Wang, Chu Wang, Zeyu Zheng

    Abstract: Pre-trained large language models (LLM) have emerged as a powerful tool for simulating various scenarios and generating output given specific instructions and multimodal input. In this work, we analyze the specific use of LLM to enhance a classical supervised machine learning method for classification problems. We propose a few approaches to integrate LLM into a classical machine learning estimato… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  14. arXiv:2405.05231  [pdf, other

    cs.LG

    DiskGNN: Bridging I/O Efficiency and Model Accuracy for Out-of-Core GNN Training

    Authors: Renjie Liu, Yichuan Wang, Xiao Yan, Zhenkun Cai, Minjie Wang, Haitian Jiang, Bo Tang, Jinyang Li

    Abstract: Graph neural networks (GNNs) are machine learning models specialized for graph data and widely used in many applications. To train GNNs on large graphs that exceed CPU memory, several systems store data on disk and conduct out-of-core processing. However, these systems suffer from either read amplification when reading node features that are usually smaller than a disk page or degraded model accur… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  15. arXiv:2405.05176  [pdf, other

    cs.CL

    Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming

    Authors: Tommaso Pasini, Alejo López-Ávila, Husam Quteineh, Gerasimos Lampouras, Jinhua Du, Yubing Wang, Ze Li, Yusen Sun

    Abstract: Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, revers… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 18 pages, 1 figure

    MSC Class: I.2.7

  16. arXiv:2405.05173  [pdf, other

    cs.CV cs.AI cs.RO

    A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective

    Authors: Huaiyuan Xu, Junliang Chen, Shiyu Meng, Yi Wang, Lap-Pui Chau

    Abstract: 3D occupancy perception technology aims to observe and understand dense 3D environments for autonomous vehicles. Owing to its comprehensive perception capability, this technology is emerging as a trend in autonomous driving perception systems, and is attracting significant attention from both industry and academia. Similar to traditional bird's-eye view (BEV) perception, 3D occupancy perception ha… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  17. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  18. arXiv:2405.04823  [pdf, other

    cs.DS

    Counting Cohesive Subgraphs with Hereditary Properties

    Authors: Rong-Hua Li, Xiaowei Ye, Fusheng Jin, Yu-Ping Wang, Ye Yuan, Guoren Wang

    Abstract: Counting small cohesive subgraphs in a graph is a fundamental operation with numerous applications in graph analysis. Previous studies on cohesive subgraph counting are mainly based on the clique model, which aim to count the number of $k$-cliques in a graph with a small $k$. However, the clique model often proves too restrictive for practical use. To address this issue, we investigate a new probl… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  19. arXiv:2405.04773  [pdf, other

    cs.LG cs.AI cs.IR cs.SI

    Hypergraph-enhanced Dual Semi-supervised Graph Classification

    Authors: Wei Ju, Zhengyang Mao, Siyu Yi, Yifang Qin, Yiyang Gu, Zhiping Xiao, Yifan Wang, Xiao Luo, Ming Zhang

    Abstract: In this paper, we study semi-supervised graph classification, which aims at accurately predicting the categories of graphs in scenarios with limited labeled graphs and abundant unlabeled graphs. Despite the promising capability of graph neural networks (GNNs), they typically require a large number of costly labeled graphs, while a wealth of unlabeled graphs fail to be effectively utilized. Moreove… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted by Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

  20. arXiv:2405.04755  [pdf, other

    cs.LG cs.SI

    Conditional Local Feature Encoding for Graph Neural Networks

    Authors: Yongze Wang, Haimin Zhang, Qiang Wu, Min Xu

    Abstract: Graph neural networks (GNNs) have shown great success in learning from graph-based data. The key mechanism of current GNNs is message passing, where a node's feature is updated based on the information passing from its local neighbourhood. A limitation of this mechanism is that node features become increasingly dominated by the information aggregated from the neighbourhood as we use more rounds of… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages

  21. arXiv:2405.04657  [pdf, other

    cs.LG cs.AI q-bio.BM

    ACEGEN: Reinforcement learning of generative chemical agents for drug discovery

    Authors: Albert Bou, Morgan Thomas, Sebastian Dittert, Carles Navarro Ramírez, Maciej Majewski, Ye Wang, Shivam Patel, Gary Tresadern, Mazen Ahmad, Vincent Moens, Woody Sherman, Simone Sciabola, Gianni De Fabritiis

    Abstract: In recent years, reinforcement learning (RL) has emerged as a valuable tool in drug design, offering the potential to propose and optimize molecules with desired properties. However, striking a balance between capability, flexibility, and reliability remains challenging due to the complexity of advanced RL algorithms and the significant reliance on specialized code. In this work, we introduce ACEG… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  22. arXiv:2405.04490  [pdf, other

    cs.DC quant-ph

    Resource-Efficient and Self-Adaptive Quantum Search in a Quantum-Classical Hybrid System

    Authors: Zihao Jiang, Zefan Du, Shaolun Ruan, Juntao Chen, Yong Wang, Long Cheng, Rajkumar Buyya, Ying Mao

    Abstract: Over the past decade, the rapid advancement of deep learning and big data applications has been driven by vast datasets and high-performance computing systems. However, as we approach the physical limits of semiconductor fabrication in the post-Moore's Law era, questions arise about the future of these applications. In parallel, quantum computing has made significant progress with the potential to… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  23. arXiv:2405.04377  [pdf, other

    cs.CV

    Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing

    Authors: Boqiang Zhang, Hongtao Xie, Zuan Gao, Yuxin Wang

    Abstract: Scene text images contain not only style information (font, background) but also content information (character, texture). Different scene text tasks need different information, but previous representation learning methods use tightly coupled features for all tasks, resulting in sub-optimal performance. We propose a Disentangled Representation Learning framework (DARLING) aimed at disentangling th… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024

  24. arXiv:2405.04285  [pdf, other

    cs.AI eess.SP

    On the Foundations of Earth and Climate Foundation Models

    Authors: Xiao Xiang Zhu, Zhitong Xiong, Yi Wang, Adam J. Stewart, Konrad Heidler, Yuanyuan Wang, Zhenghang Yuan, Thomas Dujardin, Qingsong Xu, Yilei Shi

    Abstract: Foundation models have enormous potential in advancing Earth and climate sciences, however, current approaches may not be optimal as they focus on a few basic features of a desirable Earth and climate foundation model. Crafting the ideal Earth foundation model, we define eleven features which would allow such a foundation model to be beneficial for any geoscientific downstream application in an en… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  25. arXiv:2405.04233  [pdf, other

    cs.CV cs.LG

    Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models

    Authors: Fan Bao, Chendong Xiang, Gang Yue, Guande He, Hongzhou Zhu, Kaiwen Zheng, Min Zhao, Shilong Liu, Yaole Wang, Jun Zhu

    Abstract: We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as un… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Project page at https://www.shengshu-ai.com/vidu

  26. arXiv:2405.04219  [pdf, other

    cs.CL cs.AI cs.MA cs.SE

    Iterative Experience Refinement of Software-Developing Agents

    Authors: Chen Qian, Jiahao Li, Yufan Dang, Wei Liu, YiFei Wang, Zihao Xie, Weize Chen, Cheng Yang, Yingli Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks it… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Work in progress

  27. arXiv:2405.04160  [pdf, other

    cs.CL

    A Causal Explainable Guardrails for Large Language Models

    Authors: Zhixuan Chu, Yan Wang, Longfei Li, Zhibo Wang, Zhan Qin, Kui Ren

    Abstract: Large Language Models (LLMs) have shown impressive performance in natural language tasks, but their outputs can exhibit undesirable attributes or biases. Existing methods for steering LLMs towards desired attributes often assume unbiased representations and rely solely on steering prompts. However, the representations learned from pre-training can introduce semantic biases that influence the steer… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 23 pages

  28. arXiv:2405.04144  [pdf, other

    cs.IT

    Lossy Compression with Data, Perception, and Classification Constraints

    Authors: Yuhan Wang, Youlong Wu, Shuai Ma, Ying-Jun Angela Zhang

    Abstract: Balancing diverse task objectives under limited rate is crucial for developing robust multi-task deep learning (DL) models and improving performance across various domains. In this paper, we consider the lossy compression problem with human-centric and task-oriented metrics, such as perceptual quality and classification accuracy. We investigate two ternary relationships, namely, the rate-distortio… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 10 pages, in part submitted to ITW 2024

  29. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  30. arXiv:2405.03988  [pdf, other

    cs.IR cs.AI

    Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application

    Authors: Jian Jia, Yipei Wang, Yan Li, Honggang Chen, Xuehan Bai, Zhaocheng Liu, Jian Liang, Quan Chen, Han Li, Peng Jiang, Kun Gai

    Abstract: Contemporary recommender systems predominantly rely on collaborative filtering techniques, employing ID-embedding to capture latent associations among users and items. However, this approach overlooks the wealth of semantic information embedded within textual descriptions of items, leading to suboptimal performance in cold-start scenarios and long-tail user recommendations. Leveraging the capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 6 figures

  31. arXiv:2405.03943  [pdf, other

    cs.LG cs.AI

    Predictive Modeling with Temporal Graphical Representation on Electronic Health Records

    Authors: Jiayuan Chen, Changchang Yin, Yuanlong Wang, Ping Zhang

    Abstract: Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: IJCAI 2024 main track

  32. arXiv:2405.03942  [pdf, other

    cs.AI cs.HC cs.LG

    Collaborative Intelligence in Sequential Experiments: A Human-in-the-Loop Framework for Drug Discovery

    Authors: Jinghai He, Cheng Hua, Yingfei Wang, Zeyu Zheng

    Abstract: Drug discovery is a complex process that involves sequentially screening and examining a vast array of molecules to identify those with the target properties. This process, also referred to as sequential experimentation, faces challenges due to the vast search space, the rarity of target molecules, and constraints imposed by limited data and experimental budgets. To address these challenges, we in… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  33. arXiv:2405.03939  [pdf, other

    cs.CL

    Long Context Alignment with Short Instructions and Synthesized Positions

    Authors: Wenhao Wu, Yizhong Wang, Yao Fu, Xiang Yue, Dawei Zhu, Sujian Li

    Abstract: Effectively handling instructions with extremely long context remains a challenge for Large Language Models (LLMs), typically necessitating high-quality long data and substantial computational resources. This paper introduces Step-Skipping Alignment (SkipAlign), a new technique designed to enhance the long-context capabilities of LLMs in the phase of alignment without the need for additional effor… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: preview

  34. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  35. arXiv:2405.03546  [pdf, other

    cs.CV cs.LG

    CCDM: Continuous Conditional Diffusion Models for Image Generation

    Authors: Xin Ding, Yongwei Wang, Kao Zhang, Z. Jane Wang

    Abstract: Continuous Conditional Generative Modeling (CCGM) aims to estimate the distribution of high-dimensional data, typically images, conditioned on scalar continuous variables known as regression labels. While Continuous conditional Generative Adversarial Networks (CcGANs) were initially designed for this task, their adversarial training mechanism remains vulnerable to extremely sparse or imbalanced da… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  36. arXiv:2405.03520  [pdf, other

    cs.CV

    Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

    Authors: Zheng Zhu, Xiaofeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang

    Abstract: General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the Sora model has attained significant attention due to its remarkable simulation capabilities, which exhibits an incipient comprehension of physical law… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This survey will be regularly updated at: https://github.com/GigaAI-research/General-World-Models-Survey

  37. arXiv:2405.03501  [pdf, other

    cs.LG cs.AI cs.CV

    Boosting Single Positive Multi-label Classification with Generalized Robust Loss

    Authors: Yanxi Chen, Chunxiao Li, Xinyang Dai, Jinhuan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

    Abstract: Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and ro… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 6 tables

  38. arXiv:2405.03446  [pdf, other

    cs.CR

    SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

    Authors: Hangyuan Ji, Jian Yang, Linzheng Chai, Chaoren Wei, Liqun Yang, Yunlong Duan, Yunli Wang, Tianzhen Sun, Hongcheng Guo, Tongliang Li, Changyu Ren, Zhoujun Li

    Abstract: To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  39. arXiv:2405.03299  [pdf, other

    cs.CR cs.DC

    DarkFed: A Data-Free Backdoor Attack in Federated Learning

    Authors: Minghui Li, Wei Wan, Yuxuan Ning, Shengshan Hu, Lulu Xue, Leo Yu Zhang, Yichen Wang

    Abstract: Federated learning (FL) has been demonstrated to be susceptible to backdoor attacks. However, existing academic studies on FL backdoor attacks rely on a high proportion of real clients with main task-related data, which is impractical. In the context of real-world industrial scenarios, even the simplest defense suffices to defend against the state-of-the-art attack, 3DFed. A practical FL backdoor… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by IJCAI 2024

  40. arXiv:2405.03202  [pdf, other

    cs.CV

    Hierarchical Space-Time Attention for Micro-Expression Recognition

    Authors: Haihong Hao, Shuo Wang, Huixia Ben, Yanbin Hao, Yansong Wang, Weiwei Wang

    Abstract: Micro-expression recognition (MER) aims to recognize the short and subtle facial movements from the Micro-expression (ME) video clips, which reveal real emotions. Recent MER methods mostly only utilize special frames from ME video clips or extract optical flow from these special frames. However, they neglect the relationship between movements and space-time, while facial cues are hidden within the… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  41. arXiv:2405.03176  [pdf, other

    cs.NE

    FIMP-HGA: A Novel Approach to Addressing the Partitioning Min-Max Weighted Matching Problem

    Authors: Yuxuan Wang, Jiongzhi Zheng, Jinyao Xie, Kun He

    Abstract: The Partitioning Min-Max Weighted Matching (PMMWM) problem, being a practical NP-hard problem, integrates the task of partitioning the vertices of a bipartite graph into disjoint sets of limited size with the classical Maximum-Weight Perfect Matching (MPWM) problem. Initially introduced in 2015, the state-of-the-art method for addressing PMMWM is the MP$_{\text{LS}}$. In this paper, we present a n… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  42. arXiv:2405.03162  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Advancing Multimodal Medical Capabilities of Gemini

    Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng , et al. (22 additional authors not shown)

    Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  43. arXiv:2405.03140  [pdf, other

    cs.LG

    TimeMIL: Advancing Multivariate Time Series Classification via a Time-aware Multiple Instance Learning

    Authors: Xiwen Chen, Peijie Qiu, Wenhui Zhu, Huayu Li, Hao Wang, Aristeidis Sotiras, Yalin Wang, Abolfazl Razi

    Abstract: Deep neural networks, including transformers and convolutional neural networks, have significantly improved multivariate time series classification (MTSC). However, these methods often rely on supervised learning, which does not fully account for the sparsity and locality of patterns in time series data (e.g., diseases-related anomalous points in ECG). To address this challenge, we formally reform… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML2024

  44. arXiv:2405.03076  [pdf, other

    cs.MA

    Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management

    Authors: Bingzhang Wang, Zhiyu Cai, Muhammad Monjurul Karim, Chenxi Liu, Yinhai Wang

    Abstract: The digitization of traffic sensing infrastructure has significantly accumulated an extensive traffic data warehouse, which presents unprecedented challenges for transportation analytics. The complexities associated with querying large-scale multi-table databases require specialized programming expertise and labor-intensive development. Additionally, traditional analysis methods have focused mainl… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, submitted to 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

  45. arXiv:2405.02965  [pdf, other

    cs.AI cs.RO

    Robust Collaborative Perception without External Localization and Clock Devices

    Authors: Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Chen Feng, Siheng Chen, Yanfeng Wang

    Abstract: A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6pages, accepted to ICRA 2024

  46. arXiv:2405.02880  [pdf, other

    cs.CV cs.RO

    Blending Distributed NeRFs with Tri-stage Robust Pose Optimization

    Authors: Baijun Ye, Caiyun Liu, Xiaoyu Ye, Yuantao Chen, Yuhai Wang, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

    Abstract: Due to the limited model capacity, leveraging distributed Neural Radiance Fields (NeRFs) for modeling extensive urban environments has become a necessity. However, current distributed NeRF registration approaches encounter aliasing artifacts, arising from discrepancies in rendering resolutions and suboptimal pose precision. These factors collectively deteriorate the fidelity of pose estimation wit… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  47. I$^3$Net: Inter-Intra-slice Interpolation Network for Medical Slice Synthesis

    Authors: Haofei Song, Xintian Mao, Jing Yu, Qingli Li, Yan Wang

    Abstract: Medical imaging is limited by acquisition time and scanning equipment. CT and MR volumes, reconstructed with thicker slices, are anisotropic with high in-plane resolution and low through-plane resolution. We reveal an intriguing phenomenon that due to the mentioned nature of data, performing slice-wise interpolation from the axial view can yield greater benefits than performing super-resolution fr… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  48. arXiv:2405.02834  [pdf, other

    cs.CV

    Scene-Adaptive Person Search via Bilateral Modulations

    Authors: Yimin Jiang, Huibing Wang, Jinjia Peng, Xianping Fu, Yang Wang

    Abstract: Person search aims to localize specific a target person from a gallery set of images with various scenes. As the scene of moving pedestrian changes, the captured person image inevitably bring in lots of background noise and foreground noise on the person feature, which are completely unrelated to the person identity, leading to severe performance degeneration. To address this issue, we present a S… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  49. arXiv:2405.02832  [pdf, other

    cs.CV

    Fast One-Stage Unsupervised Domain Adaptive Person Search

    Authors: Tianxiang Cui, Huibing Wang, Jinjia Peng, Ruoxi Deng, Xianping Fu, Yang Wang

    Abstract: Unsupervised person search aims to localize a particular target person from a gallery set of scene images without annotations, which is extremely challenging due to the unexpected variations of the unlabeled domains. However, most existing methods dedicate to developing multi-stage models to adapt domain variations while using clustering for iterative model training, which inevitably increases mod… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  50. arXiv:2405.02797  [pdf, other

    cs.CV cs.LG

    Adapting to Distribution Shift by Visual Domain Prompt Generation

    Authors: Zhixiang Chi, Li Gu, Tao Zhong, Huan Liu, Yuanhao Yu, Konstantinos N Plataniotis, Yang Wang

    Abstract: In this paper, we aim to adapt a model at test-time using a few unlabeled data to address distribution shifts. To tackle the challenges of extracting domain knowledge from a limited amount of data, it is crucial to utilize correlated information from pre-trained backbones and source domains. Previous studies fail to utilize recent foundation models with strong out-of-distribution generalization. A… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: ICLR2024, code: https://github.com/Guliisgreat/VDPG