Skip to main content

Showing 1–50 of 512 results for author: Fang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04336  [pdf, other

    cs.AI

    Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction

    Authors: Zhihao Wen, Yuan Fang, Pengcheng Wei, Fayao Liu, Zhenghua Chen, Min Wu

    Abstract: Predicting Remaining Useful Life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time series sensory data from such systems, deep learning models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 12 pages

  2. arXiv:2405.03458  [pdf, other

    cs.CV

    SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Cropping-paste Attacks

    Authors: Chengxin Zhao, Hefei Ling, Sijing Xie, Han Fang, Yaokun Fang, Nan Sun

    Abstract: Modern image processing tools have made it easy for attackers to crop the region or object of interest in images and paste it into other images. The challenge this cropping-paste attack poses to the watermarking technology is that it breaks the synchronization of the image watermark, introducing multiple superimposed desynchronization distortions, such as rotation, scaling, and translation. Howeve… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures (Have been accepted by ICME 2024)

  3. arXiv:2404.18255  [pdf, other

    cs.CL cs.AI

    PatentGPT: A Large Language Model for Intellectual Property

    Authors: Zilong Bai, Ruiji Zhang, Linqing Chen, Qijun Cai, Yuan Zhong, Cong Wang, Yan Fang, Jie Fang, Jing Sun, Weikuan Wang, Lizhi Zhou, Haoran Hua, Tian Qiu, Chaochao Wang, Cheng Sun, Jianping Lu, Yixin Wang, Yubin Xia, Meng Hu, Haowen Liu, Peng Xu, Licong Xu, Fu Bian, Xiaolong Gu, Lisha Zhang , et al. (2 additional authors not shown)

    Abstract: In recent years, large language models(LLMs) have attracted significant attention due to their exceptional performance across a multitude of natural language process tasks, and have been widely applied in various fields. However, the application of large language models in the Intellectual Property (IP) domain is challenging due to the strong need for specialized knowledge, privacy protection, pro… ▽ More

    Submitted 7 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 19 pages, 9 figures

    ACM Class: I.2.7

  4. arXiv:2404.17513  [pdf, other

    cs.CL cs.AI

    A Comprehensive Evaluation on Event Reasoning of Large Language Models

    Authors: Zhengwei Tao, Zhi Jin, Yifan Zhang, Xiancai Chen, Xiaoying Bai, Yue Fang, Haiyan Zhao, Jia Li, Chongyang Tao

    Abstract: Event reasoning is a fundamental ability that underlies many applications. It requires event schema knowledge to perform global reasoning and needs to deal with the diversity of the inter-event relations and the reasoning paradigms. How well LLMs accomplish event reasoning on various relations and reasoning paradigms remains unknown. To mitigate this disparity, we comprehensively evaluate the abil… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  5. arXiv:2404.17456  [pdf, other

    cs.NE

    Converting High-Performance and Low-Latency SNNs through Explicit Modelling of Residual Error in ANNs

    Authors: Zhipeng Huang, Jianhao Ding, Zhiyu Pan, Haoran Li, Ying Fang, Zhaofei Yu, Jian K. Liu

    Abstract: Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN-SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  6. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  7. arXiv:2404.16356  [pdf, other

    cs.NI cs.AI cs.LG

    Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey

    Authors: Minrui Xu, Dusit Niyato, Jiawen Kang, Zehui Xiong, Abbas Jamalipour, Yuguang Fang, Dong In Kim, Xuemin, Shen

    Abstract: Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets, completing sensor data, and making sequential decisions. In addition, the mixture of experts (MoE) can enable the distributed and collaborative execution of AI models without performance degradation between connected vehicl… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  8. arXiv:2404.15587  [pdf, other

    cs.CR

    Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks

    Authors: Hangcheng Cao, Wenbin Huang, Guowen Xu, Xianhao Chen, Ziyang He, Jingyang Hu, Hongbo Jiang, Yuguang Fang

    Abstract: Deep learning technologies are pivotal in enhancing the performance of WiFi-based wireless sensing systems. However, they are inherently vulnerable to adversarial perturbation attacks, and regrettably, there is lacking serious attention to this security issue within the WiFi sensing community. In this paper, we elaborate such an attack, called WiIntruder, distinguishing itself with universality, r… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  9. arXiv:2404.14962  [pdf, ps, other

    cs.IT

    Short Regular Girth-8 QC-LDPC Codes From Exponent Matrices with Vertical Symmetry

    Authors: Guohua Zhang, Aijing Sun, Ling Liu, Yi Fang

    Abstract: To address the challenge of constructing short girth-8 quasi-cyclic (QC) low-density parity-check (LDPC) codes, a novel construction framework based on vertical symmetry (VS) is proposed. Basic properties of the VS structure are presented. With the aid of these properties, existing explicit constructions for column weights from three to five which can be transformed into the VS structure are sorte… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures; This paper has been accepted by IEEE ISIT2024

  10. arXiv:2404.13536  [pdf, other

    cs.IT eess.SP

    Joint Transmit and Reflective Beamforming for Multi-Active-IRS-Assisted Cooperative Sensing

    Authors: Yuan Fang, Xianghao Yu, Jie Xu

    Abstract: This paper studies multi-active intelligent-reflecting-surface (IRS) cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to help the base station (BS) provide multi-view sensing. We focus on the scenario where the sensing target is located in the non-line-of-sight (NLoS) area of the BS. Based on the received echo signal, the BS aims to estimate the target's dire… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  11. arXiv:2404.11565  [pdf, other

    cs.CV cs.AI cs.GR

    MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

    Authors: Kuan-Chieh Wang, Daniil Ostashev, Yuwei Fang, Sergey Tulyakov, Kfir Aberman

    Abstract: We introduce a new architecture for personalization of text-to-image diffusion models, coined Mixture-of-Attention (MoA). Inspired by the Mixture-of-Experts mechanism utilized in large language models (LLMs), MoA distributes the generation workload between two attention pathways: a personalized branch and a non-personalized prior branch. MoA is designed to retain the original model's prior by fixi… ▽ More

    Submitted 6 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Project Website: https://snap-research.github.io/mixture-of-attention, Same as previous version, only updated metadata because bib was missing an author name

  12. arXiv:2404.09681  [pdf, other

    cs.CR

    An Empirical Study of Open Edge Computing Platforms: Ecosystem, Usage, and Security Risks

    Authors: Yu Bi, Mingshuo Yang, Yong Fang, Xianghang Mi, Shanqing Guo, Shujun Tang, Haixin Duan

    Abstract: Emerging in recent years, open edge computing platforms (OECPs) claim large-scale edge nodes, the extensive usage and adoption, as well as the openness to any third parties to join as edge nodes. For instance, OneThingCloud, a major OECP operated in China, advertises 5 million edge nodes, 70TB bandwidth, and 1,500PB storage. However, little information is publicly available for such OECPs with reg… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  13. arXiv:2404.08566  [pdf, other

    eess.SP cs.LG

    Mitigating Receiver Impact on Radio Frequency Fingerprint Identification via Domain Adaptation

    Authors: Liu Yang, Qiang Li, Xiaoyang Ren, Yi Fang, Shafei Wang

    Abstract: Radio Frequency Fingerprint Identification (RFFI), which exploits non-ideal hardware-induced unique distortion resident in the transmit signals to identify an emitter, is emerging as a means to enhance the security of communication systems. Recently, machine learning has achieved great success in developing state-of-the-art RFFI models. However, few works consider cross-receiver RFFI problems, whe… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  14. Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty

    Authors: Peijie Sun, Yifan Wang, Min Zhang, Chuhan Wu, Yan Fang, Hong Zhu, Yuan Fang, Meng Wang

    Abstract: With the surge in mobile gaming, accurately predicting user spending on newly downloaded games has become paramount for maximizing revenue. However, the inherently unpredictable nature of user behavior poses significant challenges in this endeavor. To address this, we propose a robust model training and evaluation framework aimed at standardizing spending data to mitigate label variance and extrem… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages,6 figures, WWW 2024 Industry Track, with three accept, two weak accept scores

  15. arXiv:2404.06891  [pdf, other

    cs.NI

    PACP: Priority-Aware Collaborative Perception for Connected and Autonomous Vehicles

    Authors: Zhengru Fang, Senkang Hu, Haonan An, Yuang Zhang, Jingjing Wang, Hangcheng Cao, Xianhao Chen, Yuguang Fang

    Abstract: Surrounding perceptions are quintessential for safe driving for connected and autonomous vehicles (CAVs), where the Bird's Eye View has been employed to accurately capture spatial relationships among vehicles. However, severe inherent limitations of BEV, like blind spots, have been identified. Collaborative perception has emerged as an effective solution to overcoming these limitations through dat… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  16. arXiv:2404.06448  [pdf, other

    cs.LG cs.AI

    Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang

    Abstract: Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and commu… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 16 figures

  17. arXiv:2404.06395  [pdf, other

    cs.CL cs.LG

    MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies

    Authors: Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zheng Leng Thai, Kaihuo Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The burgeoning interest in developing Large Language Models (LLMs) with up to trillion parameters has been met with concerns regarding resource efficiency and practical expense, particularly given the immense cost of experimentation. This scenario underscores the importance of exploring the potential of Small Language Models (SLMs) as a resource-efficient alternative. In this context, we introduce… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Enlarge the font size in several figures

  18. arXiv:2404.06345  [pdf, other

    cs.AI cs.RO

    AgentsCoDriver: Large Language Model Empowered Collaborative Driving with Lifelong Learning

    Authors: Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang

    Abstract: Connected and autonomous driving is developing rapidly in recent years. However, current autonomous driving systems, which are primarily based on data-driven approaches, exhibit deficiencies in interpretability, generalization, and continuing learning capabilities. In addition, the single-vehicle autonomous driving systems lack of the ability of collaboration and negotiation with other vehicles, w… ▽ More

    Submitted 21 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  19. arXiv:2404.04522  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

    Authors: Zhiyuan Peng, Xuyang Wu, Qifan Wang, Sravanthi Rajanala, Yi Fang

    Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different doc… ▽ More

    Submitted 11 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

  20. arXiv:2404.03192  [pdf, other

    cs.IR cs.CL

    Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers

    Authors: Yuan Wang, Xuyang Wu, Hsin-Tai Wu, Zhiqiang Tao, Yi Fang

    Abstract: The integration of Large Language Models (LLMs) in information retrieval has raised a critical reevaluation of fairness in the text-ranking models. LLMs, such as GPT models and Llama2, have shown effectiveness in natural language understanding tasks, and prior works (e.g., RankGPT) have also demonstrated that the LLMs exhibit better performance than the traditional ranking models in the ranking ta… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL 2024 Main Conference

  21. arXiv:2404.01780  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

    Authors: Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, Yuedong Fang, Qi Guo, Dezi Liu, Guoliang Li, Lin Lin, Ming Li, Ran Li, Xiaobo Li, Yu Luo, Xianmin Meng, Jundan Nie, Zhaoxiang Qi, Yisheng Qiu, Li Shao, Hao Tian , et al. (7 additional authors not shown)

    Abstract: Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

  22. arXiv:2404.01705  [pdf

    cs.CV

    Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model

    Authors: Qinfeng Zhu, Yuanzhi Cai, Yuan Fang, Yihan Yang, Cheng Chen, Lei Fan, Anh Nguyen

    Abstract: High-resolution remotely sensed images pose a challenge for commonly used semantic segmentation methods such as Convolutional Neural Network (CNN) and Vision Transformer (ViT). CNN-based methods struggle with handling such high-resolution images due to their limited receptive field, while ViT faces challenges in handling long sequences. Inspired by Mamba, which adopts a State Space Model (SSM) to… ▽ More

    Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  23. arXiv:2404.00812  [pdf, ps, other

    cs.CC

    No Complete Problem for Constant-Cost Randomized Communication

    Authors: Yuting Fang, Lianna Hambardzumyan, Nathaniel Harms, Pooya Hatami

    Abstract: We prove that the class of communication problems with public-coin randomized constant-cost protocols, called $BPP^0$, does not contain a complete problem. In other words, there is no randomized constant-cost problem $Q \in BPP^0$, such that all other problems $P \in BPP^0$ can be computed by a constant-cost deterministic protocol with access to an oracle for $Q$. We also show that the $k$-Hamming… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 24 pages

  24. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  25. arXiv:2403.18994  [pdf, other

    stat.ML cs.LG

    Causal-StoNet: Causal Inference for High-Dimensional Complex Data

    Authors: Yaxin Fang, Faming Liang

    Abstract: With the advancement of data science, the collection of increasingly complex datasets has become commonplace. In such datasets, the data dimension can be extremely high, and the underlying data generation process can be unknown and highly nonlinear. As a result, the task of making causal inference with high-dimensional complex data has become a fundamental problem in many disciplines, such as medi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  26. arXiv:2403.18684  [pdf, other

    cs.IR cs.CL

    Scaling Laws For Dense Retrieval

    Authors: Yan Fang, Jingtao Zhan, Qingyao Ai, Jiaxin Mao, Weihang Su, Jia Chen, Yiqun Liu

    Abstract: Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-in… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at SIGIR 2024

  27. arXiv:2403.18104  [pdf, other

    cs.CV cs.LG

    Mathematical Foundation and Corrections for Full Range Head Pose Estimation

    Authors: Huei-Chung Hu, Xuyang Wu, Yuan Wang, Yi Fang, Hsin-Tai Wu

    Abstract: Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on… ▽ More

    Submitted 3 May, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  28. arXiv:2403.17259  [pdf, ps, other

    cs.LG cs.SI

    Diffusion-based Negative Sampling on Graphs for Link Prediction

    Authors: Trung-Kien Nguyen, Yuan Fang

    Abstract: Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted in the TheWebConf 2024

  29. arXiv:2403.16353  [pdf, other

    cs.IT eess.SP

    Energy-Efficient Hybrid Beamforming with Dynamic On-off Control for Integrated Sensing, Communications, and Powering

    Authors: Zeyu Hao, Yuan Fang, Xianghao Yu, Jie Xu, Ling Qiu, Lexi Xu, Shuguang Cui

    Abstract: This paper investigates the energy-efficient hybrid beamforming design for a multi-functional integrated sensing, communications, and powering (ISCAP) system. In this system, a base station (BS) with a hybrid analog-digital (HAD) architecture sends unified wireless signals to communicate with multiple information receivers (IRs), sense multiple point targets, and wirelessly charge multiple energy… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 13 pages, 6 figures, submitted to IEEE Transactions on Communications

  30. arXiv:2403.13647  [pdf, other

    cs.CV

    Meta-Point Learning and Refining for Category-Agnostic Pose Estimation

    Authors: Junjie Chen, Jiebin Yan, Yuming Fang, Li Niu

    Abstract: Category-agnostic pose estimation (CAPE) aims to predict keypoints for arbitrary classes given a few support images annotated with keypoints. Existing methods only rely on the features extracted at support keypoints to predict or refine the keypoints on query image, but a few support feature vectors are local and inadequate for CAPE. Considering that human can quickly perceive potential keypoints… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Published in CVPR 2024

  31. arXiv:2403.11138  [pdf, other

    cs.NE

    Spiking Wavelet Transformer

    Authors: Yuetong Fang, Ziqing Wang, Lingfeng Zhang, Jiahang Cao, Honglei Chen, Renjing Xu

    Abstract: Spiking neural networks (SNNs) offer an energy-efficient alternative to conventional deep learning by mimicking the event-driven processing of the brain. Incorporating the Transformers with SNNs has shown promise for accuracy, yet it is incompetent to capture high-frequency patterns like moving edge and pixel-level brightness changes due to their reliance on global self-attention operations. Porti… ▽ More

    Submitted 25 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  32. arXiv:2403.09096  [pdf, other

    eess.IV cs.CV

    Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction

    Authors: Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhu

    Abstract: In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgr… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  33. arXiv:2403.08554  [pdf

    cs.LG cs.AI

    Federated Knowledge Graph Unlearning via Diffusion Model

    Authors: Bingchen Liu, Yuanyuan Fang

    Abstract: Federated learning (FL) promotes the development and application of artificial intelligence technologies by enabling model sharing and collaboration while safeguarding data privacy. Knowledge graph (KG) embedding representation provides a foundation for knowledge reasoning and applications by mapping entities and relations into vector space. Federated KG embedding enables the utilization of knowle… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  34. arXiv:2403.06832  [pdf, other

    cs.CL cs.AI

    The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework

    Authors: Zhuo Chen, Yin Fang, Yichi Zhang, Lingbing Guo, Jiaoyan Chen, Huajun Chen, Wen Zhang

    Abstract: The advancement of Multi-modal Pre-training highlights the necessity for a robust Multi-Modal Knowledge Graph (MMKG) representation learning framework. This framework is crucial for integrating structured knowledge into multi-modal Large Language Models (LLMs) at scale, aiming to alleviate issues like knowledge misconceptions and multi-modal hallucinations. In this work, to evaluate models' abilit… ▽ More

    Submitted 20 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Ongoing work; 10 pages, 6 Tables, 2 Figures; Repo is available at https://github.com/zjukg/SNAG

  35. arXiv:2403.05135  [pdf, other

    cs.CV

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Authors: Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

    Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Ad… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project Page: https://ella-diffusion.github.io/

  36. arXiv:2403.04363  [pdf, other

    cs.CV

    Multi-step Temporal Modeling for UAV Tracking

    Authors: Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang Fang, Jianan Li

    Abstract: In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based approaches have gained traction due to their optimal balance between efficiency and precision. However, UAV scenarios often present challenges such as insufficient sampling resolution, fast motion and small objects with limited feature information. As a result, temporal context in UAV tracking tasks plays a pivotal role in targe… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  37. arXiv:2402.19479  [pdf, other

    cs.CV

    Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

    Authors: Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace, Ekaterina Deyneka, Hsiang-wei Chao, Byung Eun Jeon, Yuwei Fang, Hsin-Ying Lee, Jian Ren, Ming-Hsuan Yang, Sergey Tulyakov

    Abstract: The quality of the data and annotation upper-bounds the quality of a downstream model. While there exist large text corpora and image-text pairs, high-quality video-text data is much harder to collect. First of all, manual labeling is more time-consuming, as it requires an annotator to watch an entire video. Second, videos have a temporal dimension, consisting of several scenes stacked together, a… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024. Project Page: https://snap-research.github.io/Panda-70M

  38. arXiv:2402.17810  [pdf, other

    q-bio.QM cs.AI cs.CE cs.LG q-bio.BM

    BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning

    Authors: Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, Rui Yan

    Abstract: Recent research trends in computational biology have increasingly focused on integrating text and bio-entity modeling, especially in the context of molecules and proteins. However, previous efforts like BioT5 faced challenges in generalizing across diverse tasks and lacked a nuanced understanding of molecular structures, particularly in their textual representations (e.g., IUPAC). This paper intro… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 24 pages

  39. arXiv:2402.17753  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Very Long-Term Conversational Memory of LLM Agents

    Authors: Adyasha Maharana, Dong-Ho Lee, Sergey Tulyakov, Mohit Bansal, Francesco Barbieri, Yuwei Fang

    Abstract: Existing works on long-term open-domain dialogues focus on evaluating model responses within contexts spanning no more than five chat sessions. Despite advancements in long-context large language models (LLMs) and retrieval augmented generation (RAG) techniques, their efficacy in very long-term dialogues remains unexplored. To address this research gap, we introduce a machine-human pipeline to gen… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 19 pages; Project page: https://snap-research.github.io/locomo/

  40. arXiv:2402.16479  [pdf, other

    cs.CV

    Edge Detectors Can Make Deep Convolutional Neural Networks More Robust

    Authors: Jin Ding, Jie-Chao Zhao, Yong-Zhi Sun, Ping Tan, Jia-Wei Wang, Ji-En Ma, You-Tong Fang

    Abstract: Deep convolutional neural networks (DCNN for short) are vulnerable to examples with small perturbations. Improving DCNN's robustness is of great significance to the safety-critical applications, such as autonomous driving and industry automation. Inspired by the principal way that human eyes recognize objects, i.e., largely relying on the shape features, this paper first employs the edge detectors… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 26 pages, 18 figures, 7 tables. submitted to Neural Networks, under review

  41. arXiv:2402.16020  [pdf, ps, other

    cs.LG

    A Step-by-step Introduction to the Implementation of Automatic Differentiation

    Authors: Yu-Hsueh Fang, He-Zhe Lin, Jie-Jyun Liu, Chih-Jen Lin

    Abstract: Automatic differentiation is a key component in deep learning. This topic is well studied and excellent surveys such as Baydin et al. (2018) have been available to clearly describe the basic concepts. Further, sophisticated implementations of automatic differentiation are now an important part of popular deep learning frameworks. However, it is difficult, if not impossible, to directly teach stude… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 17 pages, 15 figures

  42. arXiv:2402.15903  [pdf, other

    cs.LG cs.AI cs.NI

    ESFL: Efficient Split Federated Learning over Resource-Constrained Heterogeneous Wireless Devices

    Authors: Guangyu Zhu, Yiqin Deng, Xianhao Chen, Haixia Zhang, Yuguang Fang, Tan F. Wong

    Abstract: Federated learning (FL) allows multiple parties (distributed devices) to train a machine learning model without sharing raw data. How to effectively and efficiently utilize the resources on devices and the central server is a highly interesting yet challenging problem. In this paper, we propose an efficient split federated learning algorithm (ESFL) to take full advantage of the powerful computing… ▽ More

    Submitted 16 April, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  43. arXiv:2402.14797  [pdf, other

    cs.CV cs.AI

    Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis

    Authors: Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov

    Abstract: Contemporary models for generating images show remarkable quality and versatility. Swayed by these advantages, the research community repurposes them to generate videos. Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability. In this work, we build Snap Video, a… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  44. arXiv:2402.13035  [pdf, other

    cs.CL cs.AI

    Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models

    Authors: Che Zhang, Zhenyang Xiao, Chengcheng Han, Yixin Lian, Yuejian Fang

    Abstract: Large language models (LLMs) have made significant strides in reasoning capabilities, with ongoing efforts to refine their reasoning through self-correction. However, recent studies suggest that self-correction can be limited or even counterproductive without external accurate knowledge, raising questions about the limits and effectiveness of self-correction. In this paper, we aim to enhance LLM's… ▽ More

    Submitted 22 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  45. arXiv:2402.12411  [pdf, other

    cs.SI cs.AI cs.LG

    Deep Structural Knowledge Exploitation and Synergy for Estimating Node Importance Value on Heterogeneous Information Networks

    Authors: Yankai Chen, Yixiang Fang, Qiongyan Wang, Xin Cao, Irwin King

    Abstract: Node importance estimation problem has been studied conventionally with homogeneous network topology analysis. To deal with network heterogeneity, a few recent methods employ graph neural models to automatically learn diverse sources of information. However, the major concern revolves around that their full adaptive learning process may lead to insufficient information exploration, thereby formula… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: Accepted by AAAI 2024

  46. arXiv:2402.11896  [pdf, other

    cs.CL

    SIBO: A Simple Booster for Parameter-Efficient Fine-Tuning

    Authors: Zhihao Wen, Jie Zhang, Yuan Fang

    Abstract: Fine-tuning all parameters of large language models (LLMs) necessitates substantial computational power and extended time. Latest advancements in parameter-efficient fine-tuning (PEFT) techniques, such as Adapter tuning and LoRA, allow for adjustments to only a minor fraction of the parameters of these LLMs. Concurrently, it has been noted that the issue of over-smoothing diminishes the effectiven… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 16 pages

  47. arXiv:2402.10635  [pdf, other

    cs.LG cs.AI

    ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

    Authors: Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, Dongsheng Li

    Abstract: Modeling continuous-time dynamics on irregular time series is critical to account for data evolution and correlations that occur continuously. Traditional methods including recurrent neural networks or Transformer models leverage inductive bias via powerful neural architectures to capture complex patterns. However, due to their discrete characteristic, they have limitations in generalizing to cont… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Neurips 2023 Poster

  48. arXiv:2402.10137  [pdf, other

    cs.CL

    TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles

    Authors: Yinhong Liu, Yimai Fang, David Vandyke, Nigel Collier

    Abstract: In light of recent advances in large language models (LLMs), the expectations for the next generation of virtual assistants include enhanced naturalness and adaptability across diverse usage scenarios. However, the creation of high-quality annotated data for Task-Oriented Dialog (TOD) is recognized to be slow and costly. To address these challenges, we introduce Task-Oriented Automatic Dialogs (TO… ▽ More

    Submitted 16 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  49. arXiv:2402.09546  [pdf, other

    cs.RO cs.AI

    How Secure Are Large Language Models (LLMs) for Navigation in Urban Environments?

    Authors: Congcong Wen, Jiazhao Liang, Shuaihang Yuan, Hao Huang, Yi Fang

    Abstract: In the field of robotics and automation, navigation systems based on Large Language Models (LLMs) have recently shown impressive performance. However, the security aspects of these systems have received relatively less attention. This paper pioneers the exploration of vulnerabilities in LLM-based navigation models in urban outdoor environments, a critical area given the technology's widespread app… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  50. arXiv:2402.09390  [pdf, other

    cs.AI cs.CL

    HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation

    Authors: Yihao Fang, Stephen W. Thomas, Xiaodan Zhu

    Abstract: With the widespread adoption of large language models (LLMs) in numerous applications, the challenge of factuality and the propensity for hallucinations raises significant concerns. To address this issue, particularly in retrieval-augmented in-context learning, we introduce the hierarchical graph of thoughts (HGOT), a structured, multi-layered graph approach designed to enhance the retrieval of pe… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.