Skip to main content

Showing 1–50 of 546 results for author: Guo, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04840  [pdf, other

    cs.IR

    Federated Adaptation for Foundation Model-based Recommendations

    Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

    Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as a regular paper of IJCAI'24

  2. arXiv:2405.03446  [pdf, other

    cs.CR

    SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence

    Authors: Hangyuan Ji, Jian Yang, Linzheng Chai, Chaoren Wei, Liqun Yang, Yunlong Duan, Yunli Wang, Tianzhen Sun, Hongcheng Guo, Tongliang Li, Changyu Ren, Zhoujun Li

    Abstract: To address the increasing complexity and frequency of cybersecurity incidents emphasized by the recent cybersecurity threat reports with over 10 billion instances, cyber threat intelligence (CTI) plays a critical role in the modern cybersecurity landscape by offering the insights required to understand and combat the constantly evolving nature of cyber threats. Inspired by the powerful capability… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2404.17069  [pdf, other

    cs.IT cs.LG eess.SP

    Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

    Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

    Abstract: The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operat… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  5. arXiv:2404.13692  [pdf, other

    cs.CV

    A sustainable development perspective on urban-scale roof greening priorities and benefits

    Authors: Jie Shao, Wei Yao, Lei Luo, Linzhou Zeng, Zhiyi He, Puzuo Wang, Huadong Guo

    Abstract: Greenspaces are tightly linked to human well-being. Yet, rapid urbanization has exacerbated greenspace exposure inequality and declining human life quality. Roof greening has been recognized as an effective strategy to mitigate these negative impacts. Understanding priorities and benefits is crucial to promoting green roofs. Here, using geospatial big data, we conduct an urban-scale assessment of… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  6. arXiv:2404.12135  [pdf, other

    cs.MA cs.CR cs.DC

    mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

    Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhoujin Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  7. Change Guiding Network: Incorporating Change Prior to Guide Change Detection in Remote Sensing Imagery

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Jiepan Li, Hongruixuan Chen

    Abstract: The rapid advancement of automated artificial intelligence algorithms and remote sensing instruments has benefited change detection (CD) tasks. However, there is still a lot of space to study for precise detection, especially the edge integrity and internal holes phenomenon of change features. In order to solve these problems, we design the Change Guiding Network (CGNet), to tackle the insufficien… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  8. HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images

    Authors: Chengxi Han, Chen Wu, Haonan Guo, Meiqi Hu, Hongruixuan Chen

    Abstract: Benefiting from the developments in deep learning technology, deep-learning-based algorithms employing automatic feature extraction have achieved remarkable performance on the change detection (CD) task. However, the performance of existing deep-learning-based CD methods is hindered by the imbalance between changed and unchanged pixels. To tackle this problem, a progressive foreground-balanced sam… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  9. arXiv:2404.08242  [pdf, other

    cs.NE cs.AI

    RLEMMO: Evolutionary Multimodal Optimization Assisted By Deep Reinforcement Learning

    Authors: Hongqiao Lian, Zeyuan Ma, Hongshu Guo, Ting Huang, Yue-Jiao Gong

    Abstract: Solving multimodal optimization problems (MMOP) requires finding all optimal solutions, which is challenging in limited function evaluations. Although existing works strike the balance of exploration and exploitation through hand-crafted adaptive strategies, they require certain expert knowledge, hence inflexible to deal with MMOP with different properties. In this paper, we propose RLEMMO, a Meta… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted as full paper at GECCO 2024

  10. arXiv:2404.08239  [pdf, other

    cs.NE cs.AI

    Auto-configuring Exploration-Exploitation Tradeoff in Evolutionary Computation via Deep Reinforcement Learning

    Authors: Zeyuan Ma, Jiacheng Chen, Hongshu Guo, Yining Ma, Yue-Jiao Gong

    Abstract: Evolutionary computation (EC) algorithms, renowned as powerful black-box optimizers, leverage a group of individuals to cooperatively search for the optimum. The exploration-exploitation tradeoff (EET) plays a crucial role in EC, which, however, has traditionally been governed by manually designed rules. In this paper, we propose a deep reinforcement learning-based framework that autonomously conf… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted as a full paper at GECCO 2024

  11. arXiv:2404.07985  [pdf, other

    cs.CV eess.IV

    WaveMo: Learning Wavefront Modulations to See Through Scattering

    Authors: Mingyang Xie, Haiyun Guo, Brandon Y. Feng, Lingbo Jin, Ashok Veeraraghavan, Christopher A. Metzler

    Abstract: Imaging through scattering media is a fundamental and pervasive challenge in fields ranging from medical diagnostics to astronomy. A promising strategy to overcome this challenge is wavefront modulation, which induces measurement diversity during image acquisition. Despite its importance, designing optimal wavefront modulations to image through scattering remains under-explored. This paper introdu… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  12. arXiv:2404.06668  [pdf

    cs.LG cs.AI physics.ao-ph

    Forecasting the Future with Future Technologies: Advancements in Large Meteorological Models

    Authors: Hailong Shu, Yue Wang, Weiwei Song, Huichuang Guo, Zhen Song

    Abstract: The field of meteorological forecasting has undergone a significant transformation with the integration of large models, especially those employing deep learning techniques. This paper reviews the advancements and applications of these models in weather prediction, emphasizing their role in transforming traditional forecasting methods. Models like FourCastNet, Pangu-Weather, GraphCast, ClimaX, and… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 5 pages

  13. arXiv:2404.06188  [pdf, other

    cs.LG cs.AI

    Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

    Authors: Xudong Yu, Chenjia Bai, Hongyi Guo, Changhong Wang, Zhen Wang

    Abstract: Offline Reinforcement Learning (RL) faces distributional shift and unreliable value estimation, especially for out-of-distribution (OOD) actions. To address this, existing uncertainty-based methods penalize the value function with uncertainty quantification and demand numerous ensemble networks, posing computational challenges and suboptimal outcomes. In this paper, we introduce a novel strategy e… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  14. arXiv:2404.05705  [pdf, other

    cs.CV

    Learning 3D-Aware GANs from Unposed Images with Template Feature Field

    Authors: Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

    Abstract: Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice. This work targets learning 3D-aware GANs from unposed images, for which we propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF). Concretely, in addition to a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: https://XDimlab.github.io/TeFF

  15. arXiv:2404.04878  [pdf, other

    eess.IV cs.CV

    CycleINR: Cycle Implicit Neural Representation for Arbitrary-Scale Volumetric Super-Resolution of Medical Data

    Authors: Wei Fang, Yuxing Tang, Heng Guo, Mingze Yuan, Tony C. W. Mok, Ke Yan, Jiawen Yao, Xin Chen, Zaiyi Liu, Le Lu, Ling Zhang, Minfeng Xu

    Abstract: In the realm of medical 3D data, such as CT and MRI images, prevalent anisotropic resolution is characterized by high intra-slice but diminished inter-slice resolution. The lowered resolution between adjacent slices poses challenges, hindering optimal viewing experiences and impeding the development of robust downstream analysis algorithms. Various volumetric super-resolution algorithms aim to sur… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: CVPR accepted paper

  16. arXiv:2404.02840  [pdf, ps, other

    cs.DC

    A Survey on Error-Bounded Lossy Compression for Scientific Datasets

    Authors: Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Robert Underwood, Zhaorui Zhang, Milan Shah, Yafan Huang, Jiajun Huang, Xiaodong Yu, Congrong Ren, Hanqi Guo, Grant Wilkins, Dingwen Tao, Jiannan Tian, Sian Jin, Zizhe Jian, Daoce Wang, MD Hasanur Rahman, Boyuan Zhang, Jon C. Calhoun, Guanpeng Li, Kazutomo Yoshii, Khalid Ayed Alharthi, Franck Cappello

    Abstract: Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. These lossy compressors are designed with distinct compression models and design principles, such that each… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: submitted to ACM Computing journal, requited to be 35 pages including references

  17. arXiv:2404.02826  [pdf, ps, other

    cs.IT astro-ph.IM cs.GR

    An Error-Bounded Lossy Compression Method with Bit-Adaptive Quantization for Particle Data

    Authors: Congrong Ren, Sheng Di, Longtao Zhang, Kai Zhao, Hanqi Guo

    Abstract: This paper presents error-bounded lossy compression tailored for particle datasets from diverse scientific applications in cosmology, fluid dynamics, and fusion energy sciences. As today's high-performance computing capabilities advance, these datasets often reach trillions of points, posing significant visualization, analysis, and storage challenges. While error-bounded lossy compression makes it… ▽ More

    Submitted 4 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  18. arXiv:2404.00702  [pdf, other

    cs.IR

    Tired of Plugins? Large Language Models Can Be End-To-End Recommenders

    Authors: Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, Ruiming Tang

    Abstract: Recommender systems aim to predict user interest based on historical behavioral data. They are mainly designed in sequential pipelines, requiring lots of data to train different sub-systems, and are hard to scale to new domains. Recently, Large Language Models (LLMs) have demonstrated remarkable generalized capabilities, enabling a singular model to tackle diverse recommendation tasks across vario… ▽ More

    Submitted 7 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  19. arXiv:2403.19785  [pdf, other

    cs.IT eess.SP

    Integrated Communication, Localization, and Sensing in 6G D-MIMO Networks

    Authors: Hao Guo, Henk Wymeersch, Behrooz Makki, Hui Chen, Yibo Wu, Giuseppe Durisi, Musa Furkan Keskin, Mohammad H. Moghaddam, Charitha Madapatha, Han Yu, Peter Hammarberg, Hyowon Kim, Tommy Svensson

    Abstract: Future generations of mobile networks call for concurrent sensing and communication functionalities in the same hardware and/or spectrum. Compared to communication, sensing services often suffer from limited coverage, due to the high path loss of the reflected signal and the increased infrastructure requirements. To provide a more uniform quality of service, distributed multiple input multiple out… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  20. arXiv:2403.19238  [pdf, other

    cs.CV cs.AI eess.IV

    Taming Lookup Tables for Efficient Image Retouching

    Authors: Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

    Abstract: The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To th… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.17556  [pdf, other

    cs.CL cs.AI

    m3P: Towards Multimodal Multilingual Translation with Multimodal Prompt

    Authors: Jian Yang, Hongcheng Guo, Yuwei Yin, Jiaqi Bai, Bing Wang, Jiaheng Liu, Xinnian Liang, Linzheng Cahi, Liqun Yang, Zhoujun Li

    Abstract: Multilingual translation supports multiple translation directions by projecting all languages in a shared space, but the translation quality is undermined by the difference between languages in the text-only modality, especially when the number of languages is large. To bridge this gap, we introduce visual context as the universal language-independent representation to facilitate multilingual tran… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: COLING 2024

  22. arXiv:2403.16059  [pdf, other

    stat.ML cs.LG math.OC

    Manifold Regularization Classification Model Based On Improved Diffusion Map

    Authors: Hongfu Guo, Wencheng Zou, Zeyu Zhang, Shuishan Zhang, Ruitong Wang, Jintao Zhang

    Abstract: Manifold regularization model is a semi-supervised learning model that leverages the geometric structure of a dataset, comprising a small number of labeled samples and a large number of unlabeled samples, to generate classifiers. However, the original manifold norm limits the performance of models to local regions. To address this limitation, this paper proposes an approach to improve manifold reg… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 20 pages, 24figures

  23. arXiv:2403.15789  [pdf, other

    cs.CV

    In-Context Matting

    Authors: He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu

    Abstract: We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting an… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Code is available at https://github.com/tiny-smart/in-context-matting

  24. arXiv:2403.15063  [pdf, other

    cs.CV

    Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans

    Authors: Heng Guo, Jianfeng Zhang, Jiaxing Huang, Tony C. W. Mok, Dazhou Guo, Ke Yan, Le Lu, Dakai Jin, Minfeng Xu

    Abstract: Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM mod… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  25. arXiv:2403.14985  [pdf, other

    cs.CR

    FileDES: A Secure Scalable and Succinct Decentralized Encrypted Storage Network

    Authors: Minghui Xu, Jiahao Zhang, Hechuan Guo, Xiuzhen Cheng, Dongxiao Yu, Qin Hu, Yijun Li, Yipu Wu

    Abstract: Decentralized Storage Network (DSN) is an emerging technology that challenges traditional cloud-based storage systems by consolidating storage capacities from independent providers and coordinating to provide decentralized storage and retrieval services. However, current DSNs face several challenges associated with data privacy and efficiency of the proof systems. To address these issues, we propo… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 10 pages, 8 figures, 1 table. Accepted by 2024 IEEE INFOCOM

  26. arXiv:2403.14268  [pdf

    eess.AS cs.SD

    Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints

    Authors: PeiYing Lee, HauYun Guo, Berlin Chen

    Abstract: End-to-End Neural Diarization with Encoder-Decoder based Attractor (EEND-EDA) is an end-to-end neural model for automatic speaker segmentation and labeling. It achieves the capability to handle flexible number of speakers by estimating the number of attractors. EEND-EDA, however, struggles to accurately capture local speaker dynamics. This work proposes an auxiliary loss that aims to guide the Tra… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to The 28th International Conference on Technologies and Applications of Artificial Intelligence (TAAI), in Chinese language

    Report number: TAAI2023-Domestic-131

  27. arXiv:2403.13820  [pdf, other

    cs.LG cs.CR eess.SP

    Identity information based on human magnetocardiography signals

    Authors: Pengju Zhang, Chenxi Sun, Jianwei Zhang, Hong Guo

    Abstract: We have developed an individual identification system based on magnetocardiography (MCG) signals captured using optically pumped magnetometers (OPMs). Our system utilizes pattern recognition to analyze the signals obtained at different positions on the body, by scanning the matrices composed of MCG signals with a 2*2 window. In order to make use of the spatial information of MCG signals, we transf… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures. Author manuscript accepted for AAAI 2024 Spring Symposium on Clinical Foundation Models

  28. arXiv:2403.13433  [pdf, other

    cs.AI cs.CL cs.CY

    AgentGroupChat: An Interactive Group Chat Simulacra For Better Eliciting Emergent Behavior

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Haoran Guo, Lin Zhang, Yin Cai, Hao Shen, Jiangjie Chen, Zheyu Ye, Yifei Dai, Yan Gao, Yao Hu, Hongwei Feng, Yanghua Xiao

    Abstract: Language significantly influences the formation and evolution of Human emergent behavior, which is crucial in understanding collective intelligence within human societies. Considering that the study of how language affects human behavior needs to put it into the dynamic scenarios in which it is used, we introduce AgentGroupChat in this paper, a simulation that delves into the complex role of langu… ▽ More

    Submitted 4 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  29. arXiv:2403.13430  [pdf, other

    cs.CV

    MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

    Authors: Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

    Abstract: Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as i… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: The codes and pretrained models will be released at https://github.com/ViTAE-Transformer/MTP

  30. arXiv:2403.12660  [pdf, other

    cs.IR cs.AI

    ERASE: Benchmarking Feature Selection Methods for Deep Recommender Systems

    Authors: Pengyue Jia, Yejing Wang, Zhaocheng Du, Xiangyu Zhao, Yichao Wang, Bo Chen, Wanyu Wang, Huifeng Guo, Ruiming Tang

    Abstract: Deep Recommender Systems (DRS) are increasingly dependent on a large number of feature fields for more precise recommendations. Effective feature selection methods are consequently becoming critical for further enhancing the accuracy and optimizing storage efficiencies to align with the deployment demands. This research area, particularly in the context of DRS, is nascent and faces three core chal… ▽ More

    Submitted 20 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  31. arXiv:2403.12502  [pdf, other

    cs.RO

    Under-actuated Robotic Gripper with Multiple Grasping Modes Inspired by Human Finger

    Authors: Jihao Li, Tingbo Liao, Hassen Nigatu, Haotian Guo, Guodong Lu, Huixu Dong

    Abstract: Under-actuated robot grippers as a pervasive tool of robots have become a considerable research focus. Despite their simplicity of mechanical design and control strategy, they suffer from poor versatility and weak adaptability, making widespread applications limited. To better relieve relevant research gaps, we present a novel 3-finger linkage-based gripper that realizes retractable and reconfigur… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages

  32. arXiv:2403.12471  [pdf, other

    cs.RO

    Theoretical Modeling and Bio-inspired Trajectory Optimization of A Multiple-locomotion Origami Robot

    Authors: Keqi Zhu, Haotian Guo, Wei Yu, Hassen Nigatu, Tong Li, Huixu Dong

    Abstract: Recent research on mobile robots has focused on increasing their adaptability to unpredictable and unstructured environments using soft materials and structures. However, the determination of key design parameters and control over these compliant robots are predominantly iterated through experiments, lacking a solid theoretical foundation. To improve their efficiency, this paper aims to provide ma… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages

  33. arXiv:2403.10883  [pdf, other

    cs.CV cs.CR cs.MM

    Improving Adversarial Transferability of Visual-Language Pre-training Models through Collaborative Multimodal Interaction

    Authors: Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Jiafeng Wang, Shuyong Gao, Wenqiang Zhang

    Abstract: Despite the substantial advancements in Vision-Language Pre-training (VLP) models, their susceptibility to adversarial attacks poses a significant challenge. Existing work rarely studies the transferability of attacks on VLP models, resulting in a substantial performance gap from white-box attacks. We observe that prior work overlooks the interaction mechanisms between modalities, which plays a cr… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  34. arXiv:2403.10783  [pdf, other

    cs.CV

    StableGarment: Garment-Centric Generation via Stable Diffusion

    Authors: Rui Wang, Hailong Guo, Jiaming Liu, Huaxia Li, Haibo Zhao, Xu Tang, Yao Hu, Hao Tang, Peipei Li

    Abstract: In this paper, we introduce StableGarment, a unified framework to tackle garment-centric(GC) generation tasks, including GC text-to-image, controllable GC text-to-image, stylized GC text-to-image, and robust virtual try-on. The main challenge lies in retaining the intricate textures of the garment while maintaining the flexibility of pre-trained Stable Diffusion. Our solution involves the developm… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  35. arXiv:2403.09792  [pdf, other

    cs.CV cs.CL

    Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models

    Authors: Yifan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: In this paper, we study the harmlessness alignment problem of multimodal large language models (MLLMs). We conduct a systematic empirical analysis of the harmlessness performance of representative MLLMs and reveal that the image input poses the alignment vulnerability of MLLMs. Inspired by this, we propose a novel jailbreak method named HADES, which hides and amplifies the harmfulness of the malic… ▽ More

    Submitted 14 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Work in progress

  36. arXiv:2403.07708  [pdf, other

    cs.CL cs.AI

    Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

    Authors: Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

    Abstract: Reinforcement learning from human feedback (RLHF) is the mainstream paradigm used to align large language models (LLMs) with human preferences. Yet existing RLHF heavily relies on accurate and informative reward models, which are vulnerable and sensitive to noise from various sources, e.g. human labeling errors, making the pipeline fragile. In this work, we improve the effectiveness of the reward… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  37. arXiv:2403.07300  [pdf, other

    cs.LG cs.CL

    Taming Pre-trained LLMs for Generalised Time Series Forecasting via Cross-modal Knowledge Distillation

    Authors: Peiyuan Liu, Hang Guo, Tao Dai, Naiqi Li, Jigang Bao, Xudong Ren, Yong Jiang, Shu-Tao Xia

    Abstract: Multivariate time series forecasting has recently gained great success with the rapid growth of deep learning models. However, existing approaches usually train models from scratch using limited temporal data, preventing their generalization. Recently, with the surge of the Large Language Models (LLMs), several works have attempted to introduce LLMs into time series forecasting. Despite promising… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  38. arXiv:2403.05632  [pdf, other

    cs.AI

    Can Large Language Models Play Games? A Case Study of A Self-Play Approach

    Authors: Hongyi Guo, Zhihan Liu, Yufeng Zhang, Zhaoran Wang

    Abstract: Large Language Models (LLMs) harness extensive data from the Internet, storing a broad spectrum of prior knowledge. While LLMs have proven beneficial as decision-making aids, their reliability is hampered by limitations in reasoning, hallucination phenomenon, and so on. On the other hand, Monte-Carlo Tree Search (MCTS) is a heuristic search algorithm that provides reliable decision-making solution… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  39. arXiv:2403.02131  [pdf, other

    cs.NE cs.AI

    Deep Reinforcement Learning for Dynamic Algorithm Selection: A Proof-of-Principle Study on Differential Evolution

    Authors: Hongshu Guo, Yining Ma, Zeyuan Ma, Jiacheng Chen, Xinglin Zhang, Zhiguang Cao, Jun Zhang, Yue-Jiao Gong

    Abstract: Evolutionary algorithms, such as Differential Evolution, excel in solving real-parameter optimization challenges. However, the effectiveness of a single algorithm varies across different problem instances, necessitating considerable efforts in algorithm selection or configuration. This paper aims to address the limitation by leveraging the complementary strengths of a group of algorithms and dynam… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Transactions on Systems, Man, and Cybernetics: Systems at Thu, Feb 29, 2024

  40. arXiv:2403.01549  [pdf, other

    cs.CV

    Self-Supervised Representation Learning with Meta Comprehensive Regularization

    Authors: Huijie Guo, Ying Ba, Jie Hu, Lingyu Si, Wenwen Qiang, Lei Shi

    Abstract: Self-Supervised Learning (SSL) methods harness the concept of semantic invariance by utilizing data augmentation strategies to produce similar representations for different deformations of the same input. Essentially, the model captures the shared information among multiple augmented views of samples, while disregarding the non-shared information that may be beneficial for downstream tasks. To add… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  41. arXiv:2403.01131  [pdf, other

    math.OC cs.AI cs.CL cs.LG cs.NE cs.SE

    LLaMoCo: Instruction Tuning of Large Language Models for Optimization Code Generation

    Authors: Zeyuan Ma, Hongshu Guo, Jiacheng Chen, Guojun Peng, Zhiguang Cao, Yining Ma, Yue-Jiao Gong

    Abstract: Recent research explores optimization using large language models (LLMs) by either iteratively seeking next-step solutions from LLMs or directly prompting LLMs for an optimizer. However, these approaches exhibit inherent limitations, including low operational efficiency, high sensitivity to prompt design, and a lack of domain-specific knowledge. We introduce LLaMoCo, the first instruction-tuning f… ▽ More

    Submitted 5 March, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  42. Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

    Authors: Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You

    Abstract: Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited, as evidenced by open-source benchmark assessments. Current researchers tend to focus on developing new models for various datasets and settings, often neglecting a cruci… ▽ More

    Submitted 23 February, 2024; originally announced March 2024.

    Comments: Proceedings of the ACM Web Conference 2024 (WWW '24)

  43. arXiv:2402.18205  [pdf, other

    cs.SE cs.AI

    Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging

    Authors: Wei Zhang, Hongcheng Guo, Anjie Le, Jian Yang, Jiaheng Liu, Zhoujun Li, Tieqiao Zheng, Shi Xu, Runqiang Zang, Liangfan Zheng, Bo Zhang

    Abstract: Logs produced by extensive software systems are integral to monitoring system behaviors. Advanced log analysis facilitates the detection, alerting, and diagnosis of system faults. Log parsing, which entails transforming raw log messages into structured templates, constitutes a critical phase in the automation of log analytics. Existing log parsers fail to identify the correct templates due to reli… ▽ More

    Submitted 1 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages

  44. arXiv:2402.18128  [pdf, other

    cs.CV cs.LG

    Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

    Authors: Han Guo, Ramtin Hosseini, Ruiyi Zhang, Sai Ashish Somayajula, Ranak Roy Chowdhury, Rajesh K. Gupta, Pengtao Xie

    Abstract: Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches pr… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  45. arXiv:2402.18092  [pdf, other

    cs.CV

    Context-aware Talking Face Video Generation

    Authors: Meidai Xuanyuan, Yuwang Wang, Honglei Guo, Qionghai Dai

    Abstract: In this paper, we consider a novel and practical case for talking face video generation. Specifically, we focus on the scenarios involving multi-people interactions, where the talking context, such as audience or surroundings, is present. In these situations, the video generation should take the context into consideration in order to generate video content naturally aligned with driving audios and… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  46. arXiv:2402.17270  [pdf, other

    cs.AI cs.GT cs.HC cs.LG cs.MA

    Multi-Agent, Human-Agent and Beyond: A Survey on Cooperation in Social Dilemmas

    Authors: Hao Guo, Chunjiang Mu, Yang Chen, Chen Shen, Shuyue Hu, Zhen Wang

    Abstract: The study of cooperation within social dilemmas has long been a fundamental topic across various disciplines, including computer science and social science. Recent advancements in Artificial Intelligence (AI) have significantly reshaped this field, offering fresh insights into understanding and enhancing cooperation. This survey examines three key areas at the intersection of AI and cooperation in… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  47. arXiv:2402.16463  [pdf, other

    cs.LG cs.DC

    Learning to Schedule Online Tasks with Bandit Feedback

    Authors: Yongxin Xu, Shangshang Wang, Hengquan Guo, Xin Liu, Ziyu Shao

    Abstract: Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing. Optimal scheduling can enhance system performance, typically measured by the reward-to-cost ratio, under some task arrival distribution. On one hand, both reward and cost are dependent on task context (e.g., evaluation metric) and remain black-box in practice. These render reward an… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 16 pages

  48. arXiv:2402.15648  [pdf, other

    cs.CV

    MambaIR: A Simple Baseline for Image Restoration with State-Space Model

    Authors: Hang Guo, Jinmin Li, Tao Dai, Zhihao Ouyang, Xudong Ren, Shu-Tao Xia

    Abstract: Recent years have seen significant advancements in image restoration, largely attributed to the development of modern deep neural networks, such as CNNs and Transformers. However, existing restoration backbones often face the dilemma between global receptive fields and efficient computation, hindering their application in practice. Recently, the Selective Structured State Space Model, especially t… ▽ More

    Submitted 25 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Technical Report

  49. arXiv:2402.13815  [pdf, other

    cs.SE cs.CR

    An Empirical Study on Oculus Virtual Reality Applications: Security and Privacy Perspectives

    Authors: Hanyang Guo, Hong-Ning Dai, Xiapu Luo, Zibin Zheng, Gengyang Xu, Fengliang He

    Abstract: Although Virtual Reality (VR) has accelerated its prevalent adoption in emerging metaverse applications, it is not a fundamentally new technology. On one hand, most VR operating systems (OS) are based on off-the-shelf mobile OS. As a result, VR apps also inherit privacy and security deficiencies from conventional mobile apps. On the other hand, in contrast to conventional mobile apps, VR apps can… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Accepted by ICSE 2024

  50. arXiv:2402.13013  [pdf, other

    cs.CL

    Code Needs Comments: Enhancing Code LLMs with Comment Augmentation

    Authors: Demin Song, Honglin Guo, Yunhua Zhou, Shuhao Xing, Yudong Wang, Zifan Song, Wenwei Zhang, Qipeng Guo, Hang Yan, Xipeng Qiu, Dahua Lin

    Abstract: The programming skill is one crucial ability for Large Language Models (LLMs), necessitating a deep understanding of programming languages (PLs) and their correlation with natural languages (NLs). We examine the impact of pre-training data on code-focused LLMs' performance by assessing the comment density as a measure of PL-NL alignment. Given the scarcity of code-comment aligned data in pre-train… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.