Skip to main content

Showing 1–50 of 722 results for author: Zhang, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04860  [pdf, other

    cs.SE quant-ph

    Concolic Testing of Quantum Programs

    Authors: Shangzhou Xia, Jianjun Zhao, Fuyuan Zhang, Xiaoyu Guo

    Abstract: This paper presents the first concolic testing framework specifically designed for quantum programs. The framework defines quantum conditional statements that quantify quantum states and presents a symbolization method for quantum variables. Utilizing this framework, we generate path constraints for each concrete execution path of a quantum program. These constraints guide the exploration of new p… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  2. arXiv:2405.03969  [pdf, other

    cs.RO

    Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform

    Authors: Zhijian Qiao, Haoming Huang, Chuhao Liu, Shaojie Shen, Fumin Zhang, Huan Yin

    Abstract: The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To ac… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures

  3. arXiv:2405.03565  [pdf, other

    cs.CV

    Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing

    Authors: Han Liu, Siyang Zhao, Xiaotong Zhang, Feng Zhang, Wei Wang, Fenglong Ma, Hongyang Chen, Hong Yu, Xianchao Zhang

    Abstract: Few-shot and zero-shot text classification aim to recognize samples from novel classes with limited labeled samples or no labeled samples at all. While prevailing methods have shown promising performance via transferring knowledge from seen classes to unseen classes, they are still limited by (1) Inherent dissimilarities among classes make the transformation of features learned from seen classes t… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to AAAI 2024

  4. arXiv:2405.01329  [pdf, other

    cs.CR

    Decentralization of Ethereum's Builder Market

    Authors: Sen Yang, Kartik Nayak, Fan Zhang

    Abstract: Blockchains protect an ecosystem worth more than $500bn with their strong security properties derived from the principle of decentralization. Is today's blockchain really decentralized? In this paper, we empirically studied one of the least decentralized parts of Ethereum -- the most used blockchain system in practice -- and shed light on the decentralization issue from a new perspective. To avo… ▽ More

    Submitted 2 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2405.00269  [pdf, other

    cs.RO

    Adaptive Integral Sliding Mode Control for Attitude Tracking of Underwater Robots With Large Range Pitch Variations in Confined Space

    Authors: Xiaorui Wang, Zeyu Sha, Feitian Zhang

    Abstract: Underwater robots play a crucial role in exploring aquatic environments. The ability to flexibly adjust their attitudes is essential for underwater robots to effectively accomplish tasks in confined space. However, the highly coupled six degrees of freedom dynamics resulting from attitude changes and the complex turbulence within limited spatial areas present significant challenges. To address the… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  6. Interest Clock: Time Perception in Real-Time Streaming Recommendation System

    Authors: Yongchun Zhu, Jingwu Chen, Ling Chen, Yitan Li, Feng Zhang, Zuotao Liu

    Abstract: User preferences follow a dynamic pattern over a day, e.g., at 8 am, a user might prefer to read news, while at 8 pm, they might prefer to watch movies. Time modeling aims to enable recommendation systems to perceive time changes to capture users' dynamic preferences over time, which is an important and challenging problem in recommendation systems. Especially, streaming recommendation systems in… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024

  7. arXiv:2404.18580  [pdf, other

    cs.RO eess.SY

    Data-Driven Dynamics Modeling of Miniature Robotic Blimps Using Neural ODEs With Parameter Auto-Tuning

    Authors: Yongjian Zhu, Hao Cheng, Feitian Zhang

    Abstract: Miniature robotic blimps, as one type of lighter-than-air aerial vehicles, have attracted increasing attention in the science and engineering community for their enhanced safety, extended endurance, and quieter operation compared to quadrotors. Accurately modeling the dynamics of these robotic blimps poses a significant challenge due to the complex aerodynamics stemming from their large lifting bo… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures

  8. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  9. arXiv:2404.17862  [pdf, other

    cs.CL

    Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum

    Authors: Tao Meng, Fuchen Zhang, Yuntao Shou, Wei Ai, Nan Yin, Keqin Li

    Abstract: Efficiently capturing consistent and complementary semantic features in a multimodal conversation context is crucial for Multimodal Emotion Recognition in Conversation (MERC). Existing methods mainly use graph structures to model dialogue context semantic dependencies and employ Graph Neural Networks (GNN) to capture multimodal semantic features for emotion recognition. However, these methods are… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures

  10. arXiv:2404.17858  [pdf, other

    cs.CL

    Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

    Authors: Yuntao Shou, Tao Meng, Fuchen Zhang, Nan Yin, Keqin Li

    Abstract: Multi-modal Emotion Recognition in Conversation (MERC) has received considerable attention in various fields, e.g., human-computer interaction and recommendation systems. Most existing works perform feature disentanglement and fusion to extract emotional contextual information from multi-modal features and emotion classification. After revisiting the characteristic of MERC, we argue that long-rang… ▽ More

    Submitted 2 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 10 pages, 6 figures

  11. arXiv:2404.16220  [pdf, other

    cs.IT cs.CR cs.DM math.CO

    When does a bent concatenation not belong to the completed Maiorana-McFarland class?

    Authors: Sadmir Kudin, Enes Pasalic, Alexandr Polujan, Fengrong Zhang

    Abstract: Every Boolean bent function $f$ can be written either as a concatenation $f=f_1||f_2$ of two complementary semi-bent functions $f_1,f_2$; or as a concatenation $f=f_1||f_2||f_3||f_4$ of four Boolean functions $f_1,f_2,f_3,f_4$, all of which are simultaneously bent, semi-bent, or 5-valued spectra-functions. In this context, it is essential to ask: When does a bent concatenation $f$ (not) belong to… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: This is the authors' version of the camera-ready version to be presented at the 2024 IEEE International Symposium on Information Theory (ISIT 2024)

  12. arXiv:2404.15588  [pdf, other

    cs.CL

    Minimal Evidence Group Identification for Claim Verification

    Authors: Xiangci Li, Sihao Chen, Rajvi Kapadia, Jessica Ouyang, Fan Zhang

    Abstract: Claim verification in real-world settings (e.g. against a large collection of candidate evidences retrieved from the web) typically requires identifying and aggregating a complete set of evidence pieces that collectively provide full support to the claim. The problem becomes particularly challenging when there exists distinct sets of evidence that could be used to verify the claim from different p… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  13. arXiv:2404.15041  [pdf, other

    cs.CV

    LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

    Authors: Fan Zhang, Zhi-Qi Cheng, Jian Zhao, Xiaojiang Peng, Xuelong Li

    Abstract: Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the… ▽ More

    Submitted 26 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  14. arXiv:2404.11968  [pdf, other

    cs.CL

    P-NAL: an Effective and Interpretable Entity Alignment Method

    Authors: Chuanhao Xu, Jingwei Cheng, Fu Zhang

    Abstract: Entity alignment (EA) aims to find equivalent entities between two Knowledge Graphs. Existing embedding-based EA methods usually encode entities as embeddings, triples as embeddings' constraint and learn to align the embeddings. The structural and side information are usually utilized via embedding propagation, aggregation or interaction. However, the details of the underlying logical inference st… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 2 figures

    ACM Class: I.2.4

  15. arXiv:2404.11095  [pdf, other

    cs.CL cs.AI

    Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues

    Authors: Jiao Ou, Jiayu Wu, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai

    Abstract: Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which can be achieved by raising diverse, in-depth, and insightful instructions that deepen interactions. Existing methods target instructions from real instruction dialogues as a learning goal and fine-tune a user simulator for posing instructions. However, the user simulator struggles to… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 27 pages, 3 figures, 12 tables

  16. arXiv:2404.09571  [pdf, other

    eess.IV cs.CV

    MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution

    Authors: Yuxuan Jiang, Chen Feng, Fan Zhang, David Bull

    Abstract: Knowledge distillation (KD) has emerged as a promising technique in deep learning, typically employed to enhance a compact student network through learning from their high-performance but more complex teacher variant. When applied in the context of image super-resolution, most KD approaches are modified versions of methods developed for other computer vision tasks, which are based on training stra… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.08353  [pdf, other

    cs.CV cs.RO

    TDANet: Target-Directed Attention Network For Object-Goal Visual Navigation With Zero-Shot Ability

    Authors: Shiwei Lian, Feitian Zhang

    Abstract: The generalization of the end-to-end deep reinforcement learning (DRL) for object-goal visual navigation is a long-standing challenge since object classes and placements vary in new test environments. Learning domain-independent visual representation is critical for enabling the trained DRL agent with the ability to generalize to unseen scenes and objects. In this letter, a target-directed attenti… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  18. BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting

    Authors: Yuqing Cheng, Bo Chen, Fanjin Zhang, Jie Tang

    Abstract: From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally groupin… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: TheWebConf 2024 (WWW '24)

    ACM Class: H.3.7; H.3.3

    Journal ref: Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore

  19. arXiv:2404.06495  [pdf, other

    cs.GT

    Mechanism Design for ZK-Rollup Prover Markets

    Authors: Wenhao Wang, Lulu Zhou, Aviv Yaish, Fan Zhang, Ben Fisch, Benjamin Livshits

    Abstract: In ZK-Rollups, provers spend significant computational resources to generate validity proofs. Their costs should be compensated properly, so a sustainable prover market can form over time. Existing transaction fee mechanisms (TFMs) such as EIP-1559, however, do not work in this setting, as EIP-1559 only generates negligible revenue because of burning, while provers often create or purchase special… ▽ More

    Submitted 26 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  20. arXiv:2404.05220  [pdf, other

    cs.CV

    StylizedGS: Controllable Stylization for 3D Gaussian Splatting

    Authors: Dingxi Zhang, Zhuoxun Chen, Yu-Jie Yuan, Fang-Lue Zhang, Zhenliang He, Shiguang Shan, Lin Gao

    Abstract: With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user e… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  21. arXiv:2404.01579  [pdf

    cs.CV

    Diffusion Deepfake

    Authors: Chaitali Bhattacharyya, Hanxiao Wang, Feng Zhang, Sungho Kim, Xiatian Zhu

    Abstract: Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to th… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: 28 pages including Supplementary material

  22. arXiv:2403.19001  [pdf, other

    cs.CV cs.AI eess.IV q-bio.NC

    Cross-domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score Prediction

    Authors: Yui Lo, Yuqian Chen, Dongnan Liu, Wan Liu, Leo Zekelman, Fan Zhang, Yogesh Rathi, Nikos Makris, Alexandra J. Golby, Weidong Cai, Lauren J. O'Donnell

    Abstract: Shape plays an important role in computer graphics, offering informative features to convey an object's morphology and functionality. Shape analysis in brain imaging can help interpret structural and functionality correlations of the human brain. In this work, we investigate the shape of the brain's 3D white matter connections and its potential predictive relationship to human cognitive function.… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 2 figures, 11 pages

  23. arXiv:2403.16405  [pdf, other

    cs.LG cs.CR cs.CV

    Ensemble Adversarial Defense via Integration of Multiple Dispersed Low Curvature Models

    Authors: Kaikang Zhao, Xi Chen, Wei Huang, Liuxin Ding, Xianglong Kong, Fan Zhang

    Abstract: The integration of an ensemble of deep learning models has been extensively explored to enhance defense against adversarial attacks. The diversity among sub-models increases the attack cost required to deceive the majority of the ensemble, thereby improving the adversarial robustness. While existing approaches mainly center on increasing diversity in feature representations or dispersion of first-… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted to The 2024 International Joint Conference on Neural Networks (IJCNN)

  24. arXiv:2403.15369  [pdf, other

    cs.RO

    OceanPlan: Hierarchical Planning and Replanning for Natural Language AUV Piloting in Large-scale Unexplored Ocean Environments

    Authors: Ruochu Yang, Fumin Zhang, Mengxue Hou

    Abstract: We develop a hierarchical LLM-task-motion planning and replanning framework to efficiently ground an abstracted human command into tangible Autonomous Underwater Vehicle (AUV) control through enhanced representations of the world. We also incorporate a holistic replanner to provide real-world feedback with all planners for robust AUV operation. While there has been extensive research in bridging t… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: submitted to IROS 2024

  25. arXiv:2403.15356  [pdf, other

    cs.CV

    Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities

    Authors: Zhitong Xiong, Yi Wang, Fahong Zhang, Adam J. Stewart, Joëlle Hanna, Damian Borth, Ioannis Papoutsis, Bertrand Le Saux, Gustau Camps-Valls, Xiao Xiang Zhu

    Abstract: The development of foundation models has revolutionized our ability to interpret the Earth's surface using satellite observational data. Traditional models have been siloed, tailored to specific sensors or data types like optical, radar, and hyperspectral, each with its own unique characteristics. This specialization hinders the potential for a holistic analysis that could benefit from the combine… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 33 pages, 10 figures

  26. arXiv:2403.15257  [pdf, other

    cs.SI cs.AI

    Hierarchical Information Enhancement Network for Cascade Prediction in Social Networks

    Authors: Fanrui Zhang, Jiawei Liu, Qiang Zhang, Xiaoling Zhu, Zheng-Jun Zha

    Abstract: Understanding information cascades in networks is a fundamental issue in numerous applications. Current researches often sample cascade information into several independent paths or subgraphs to learn a simple cascade representation. However, these approaches fail to exploit the hierarchical semantic associations between different modalities, limiting their predictive performance. In this work, we… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 2 figures

  27. arXiv:2403.15235  [pdf, other

    cs.SI cs.AI

    Multi-perspective Memory Enhanced Network for Identifying Key Nodes in Social Networks

    Authors: Qiang Zhang, Jiawei Liu, Fanrui Zhang, Xiaoling Zhu, Zheng-Jun Zha

    Abstract: Identifying key nodes in social networks plays a crucial role in timely blocking false information. Existing key node identification methods usually consider node influence only from the propagation structure perspective and have insufficient generalization ability to unknown scenarios. In this paper, we propose a novel Multi-perspective Memory Enhanced Network (MMEN) for identifying key nodes in… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 7 pages, 1 figures

  28. arXiv:2403.15044  [pdf, other

    cs.CV cs.AI

    Multimodal Fusion with Pre-Trained Model Features in Affective Behaviour Analysis In-the-wild

    Authors: Zhuofan Wen, Fengyu Zhang, Siyuan Zhang, Haiyang Sun, Mingyu Xu, Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao

    Abstract: Multimodal fusion is a significant method for most multimodal tasks. With the recent surge in the number of large pre-trained models, combining both multimodal fusion methods and pre-trained model features can achieve outstanding performance in many multimodal tasks. In this paper, we present our approach, which leverages both advantages for addressing the task of Expression (Expr) Recognition and… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  29. arXiv:2403.14941  [pdf, other

    cs.LG cs.AI

    Unifying Lane-Level Traffic Prediction from a Graph Structural Perspective: Benchmark and Baseline

    Authors: Shuhao Li, Yue Cui, Jingyi Xu, Libin Li, Lingkai Meng, Weidong Yang, Fan Zhang, Xiaofang Zhou

    Abstract: Traffic prediction has long been a focal and pivotal area in research, witnessing both significant strides from city-level to road-level predictions in recent years. With the advancement of Vehicle-to-Everything (V2X) technologies, autonomous driving, and large-scale models in the traffic domain, lane-level traffic prediction has emerged as an indispensable direction. However, further progress in… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  30. arXiv:2403.13064  [pdf, other

    cs.CV

    SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model

    Authors: Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins, Tsun-Yi Yang, Samir Aroudj, Suvam Patra, Fuyang Zhang, Duncan Frost, Luke Holland, Campbell Orme, Jakob Engel, Edward Miller, Richard Newcombe, Vasileios Balntas

    Abstract: We introduce SceneScript, a method that directly produces full scene models as a sequence of structured language commands using an autoregressive, token-based approach. Our proposed scene representation is inspired by recent successes in transformers & LLMs, and departs from more traditional methods which commonly describe scenes as meshes, voxel grids, point clouds or radiance fields. Our method… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: see project page, https://projectaria.com/scenescript

  31. arXiv:2403.12541  [pdf, other

    cs.CR

    TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events

    Authors: Zhenyuan Li, Yangyang Wei, Xiangmin Shen, Lingzhi Wang, Yan Chen, Haitao Xu, Shouling Ji, Fan Zhang

    Abstract: The evolution and advancement of cyberattacks pose challenges to existing security products. Recent concentrated research on provenance graph-based detection has proved its effectiveness in attack detection and investigation. However, implementing these approaches in practice encounters challenges such as high overhead, slow responsiveness, and low interpretability and extensibility. Towards pra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  32. arXiv:2403.12450  [pdf, other

    cs.CV

    Intention Action Anticipation Model with Guide-Feedback Loop Mechanism

    Authors: Zongnan Ma, Fuchun Zhang, Zhixiong Nan, Yao Ge

    Abstract: Anticipating human intention from videos has broad applications, such as automatic driving, robot assistive technology, and virtual reality. This study addresses the problem of intention action anticipation using egocentric video sequences to estimate actions that indicate human intention. We propose a Hierarchical Complete-Recent (HCR) information fusion model that makes full use of the features… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  33. arXiv:2403.11758  [pdf, other

    cs.SE

    Demystifying the DAO Governance Process

    Authors: Junjie Ma, Muhui Jiang, Jinan Jiang, Xiapu Luo, Yufeng Hu, Yajin Zhou, Qi Wang, Fengwei Zhang

    Abstract: Decentralized Autonomous Organization (DAO) becomes a popular governance solution for decentralized applications (dApps) to achieve decentralized governance. In the DAO, no single entity can arbitrarily control the dApps without approval from the majority of members. However, despite its advantages, DAO has also been targeted by several attacks, leading to the loss of millions of dollars. In this… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  34. arXiv:2403.11074  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Audio-Visual Segmentation via Unlabeled Frame Exploitation

    Authors: Jinxiang Liu, Yikun Liu, Fei Zhang, Chen Ju, Ya Zhang, Yanfeng Wang

    Abstract: Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames. Although great progress has been witnessed, we experimentally reveal that current methods reach marginal performance gain within the use of the unlabeled frames, leading to the underutilization issue. To fully explore the potential of the unlabeled frames for AVS, we explicitly divide them into two categories bas… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  35. arXiv:2403.10805  [pdf, other

    cs.SD cs.AI cs.CV cs.GR cs.HC eess.AS

    Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference

    Authors: Fan Zhang, Zhaohan Wang, Xin Lyu, Siyuan Zhao, Mengjian Li, Weidong Geng, Naye Ji, Hui Du, Fuxing Gao, Hao Wu, Shunman Li

    Abstract: Speech-driven gesture generation is an emerging field within virtual human creation. However, a significant challenge lies in accurately determining and processing the multitude of input features (such as acoustic, semantic, emotional, personality, and even subtle unknown features). Traditional approaches, reliant on various explicit feature inputs and complex multimodal processing, constrain the… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 12 pages,

  36. arXiv:2403.09439  [pdf, other

    cs.CV cs.AI

    3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation

    Authors: Frank Zhang, Yibo Zhang, Quan Zheng, Rui Ma, Wei Hua, Hujun Bao, Weiwei Xu, Changqing Zou

    Abstract: Text-driven 3D scene generation techniques have made rapid progress in recent years. Their success is mainly attributed to using existing generative models to iteratively perform image warping and inpainting to generate 3D scenes. However, these methods heavily rely on the outputs of existing models, leading to error accumulation in geometry and appearance that prevent the models from being used i… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 7 figures

  37. arXiv:2403.09036  [pdf, other

    cs.CV

    Gradient-Aware Logit Adjustment Loss for Long-tailed Classifier

    Authors: Fan Zhang, Wei Qin, Weijieying Ren, Lei Wang, Zetong Chen, Richang Hong

    Abstract: In the real-world setting, data often follows a long-tailed distribution, where head classes contain significantly more training samples than tail classes. Consequently, models trained on such data tend to be biased toward head classes. The medium of this bias is imbalanced gradients, which include not only the ratio of scale between positive and negative gradients but also imbalanced gradients fr… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 5 pages, 2 figures. Accepted by icassp 2024, see https://cmsworkshops.com/ICASSP2024/papers/accepted_papers.php by searching this paper title

  38. arXiv:2403.07261  [pdf, other

    cs.LG cs.AI

    Disentangling Policy from Offline Task Representation Learning via Adversarial Data Augmentation

    Authors: Chengxing Jia, Fuxiang Zhang, Yi-Chen Li, Chen-Xiao Gao, Xu-Hui Liu, Lei Yuan, Zongzhang Zhang, Yang Yu

    Abstract: Offline meta-reinforcement learning (OMRL) proficiently allows an agent to tackle novel tasks while solely relying on a static dataset. For precise and efficient task identification, existing OMRL research suggests learning separate task representations that be incorporated with policy input, thus forming a context-based meta-policy. A major approach to train task representations is to adopt contr… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  39. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  40. arXiv:2403.06606  [pdf, other

    cs.CV cs.LG

    Distributionally Generative Augmentation for Fair Facial Attribute Classification

    Authors: Fengda Zhang, Qianpei He, Kun Kuang, Jiashuo Liu, Long Chen, Chao Wu, Jun Xiao, Hanwang Zhang

    Abstract: Facial Attribute Classification (FAC) holds substantial promise in widespread applications. However, FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations. This unfairness is largely attributed to bias in data, where some spurious attributes (e.g., Male) statistically correlate with the target attribute (e.g., Smiling… ▽ More

    Submitted 25 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  41. arXiv:2403.05854  [pdf, other

    cs.CV

    LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content

    Authors: Qihao Zhao, Yalun Dai, Hao Li, Wei Hu, Fan Zhang, Jun Liu

    Abstract: Long-tail recognition is challenging because it requires the model to learn good representations from tail categories and address imbalances across all categories. In this paper, we propose a novel generative and fine-tuning framework, LTGC, to handle long-tail recognition via leveraging generated content. Firstly, inspired by the rich implicit knowledge in large-scale models (e.g., large language… ▽ More

    Submitted 13 March, 2024; v1 submitted 9 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  42. arXiv:2403.03497  [pdf, other

    cs.GT

    Adaptive coordination promotes collective cooperation in repeated social dilemmas

    Authors: Feipeng Zhang, Te Wu, Long Wang

    Abstract: Direct reciprocity based on the repeated prisoner's dilemma has been intensively studied. Most theoretical investigations have concentrated on memory-$1$ strategies, a class of elementary strategies just reacting to the previous-round outcomes. Though the properties of "All-or-None" strategies ($AoN_K$) have been discovered, simulations just confirmed the good performance of $AoN_K$ of very short… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  43. arXiv:2402.18945  [pdf, other

    cs.CR cs.AI cs.CL

    Syntactic Ghost: An Imperceptible General-purpose Backdoor Attacks on Pre-trained Language Models

    Authors: Pengzhou Cheng, Wei Du, Zongru Wu, Fengwei Zhang, Libo Chen, Gongshen Liu

    Abstract: Pre-trained language models (PLMs) have been found susceptible to backdoor attacks, which can transfer vulnerabilities to various downstream tasks. However, existing PLM backdoors are conducted with explicit triggers under the manually aligned, thus failing to satisfy expectation goals simultaneously in terms of effectiveness, stealthiness, and universality. In this paper, we propose a novel appro… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 16 pages, 16 figures, 13 tables

  44. arXiv:2402.18792  [pdf, other

    cs.LG cs.CL cs.CR

    MPAT: Building Robust Deep Neural Networks against Textual Adversarial Attacks

    Authors: Fangyuan Zhang, Huichi Zhou, Shuangjiao Li, Hongtao Wang

    Abstract: Deep neural networks have been proven to be vulnerable to adversarial examples and various methods have been proposed to defend against adversarial attacks for natural language processing tasks. However, previous defense methods have limitations in maintaining effective defense while ensuring the performance of the original task. In this paper, we propose a malicious perturbation based adversarial… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  45. arXiv:2402.18563  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    Approaching Human-Level Forecasting with Language Models

    Authors: Danny Halawi, Fred Zhang, Chen Yueh-Han, Jacob Steinhardt

    Abstract: Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large data… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  46. arXiv:2402.16009  [pdf, other

    cs.DL cs.CL

    PST-Bench: Tracing and Benchmarking the Source of Publications

    Authors: Fanjin Zhang, Kun Cao, Yukuo Cen, Jifan Yu, Da Yin, Jie Tang

    Abstract: Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers hinder researchers from understanding the evolution of science efficiently. To date, there is still a lack of an accurate and scalable dataset constructed by professional researchers to identify the direct source of their studied papers, based on which au… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: 8 pages, 3 appendix pages

  47. arXiv:2402.15810  [pdf, other

    cs.DL cs.CL cs.LG

    OAG-Bench: A Human-Curated Benchmark for Academic Graph Mining

    Authors: Fanjin Zhang, Shijie Shi, Yifan Zhu, Bo Chen, Yukuo Cen, Jifan Yu, Yelin Chen, Lulu Wang, Qingfei Zhao, Yuqing Cheng, Tianyi Han, Yuwei An, Dan Zhang, Weng Lam Tam, Kun Cao, Yunhe Pang, Xinyu Guan, Huihui Yuan, Jian Song, Xiaoyan Li, Yuxiao Dong, Jie Tang

    Abstract: With the rapid proliferation of scientific literature, versatile academic knowledge services increasingly rely on comprehensive academic graph mining. Despite the availability of public academic graphs, benchmarks, and datasets, these resources often fall short in multi-aspect and fine-grained annotations, are constrained to specific task types and domains, or lack underlying real academic graphs.… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 appendix pages

  48. arXiv:2402.15526  [pdf, other

    cs.AI cs.LG

    Chain-of-Specificity: An Iteratively Refining Method for Eliciting Knowledge from Large Language Models

    Authors: Kaiwen Wei, Jingyuan Zhang, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Li Jin, Yue Yu

    Abstract: Large Language Models (LLMs) exhibit remarkable generative capabilities, enabling the generation of valuable information. Despite these advancements, previous research found that LLMs sometimes struggle with adhering to specific constraints (e.g., in specific place or at specific time), at times even overlooking them, which leads to responses that are either too generic or not fully satisfactory.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  49. arXiv:2402.15078  [pdf, other

    cs.SE

    LLM-CompDroid: Repairing Configuration Compatibility Bugs in Android Apps with Pre-trained Large Language Models

    Authors: Zhijie Liu, Yutian Tang, Meiyun Li, Xin Jin, Yunfei Long, Liang Feng Zhang, Xiapu Luo

    Abstract: XML configurations are integral to the Android development framework, particularly in the realm of UI display. However, these configurations can introduce compatibility issues (bugs), resulting in divergent visual outcomes and system crashes across various Android API versions (levels). In this study, we systematically investigate LLM-based approaches for detecting and repairing configuration comp… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  50. arXiv:2402.12712  [pdf, other

    cs.CV

    MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

    Authors: Shitao Tang, Jiacheng Chen, Dilin Wang, Chengzhou Tang, Fuyang Zhang, Yuchen Fan, Vikas Chandra, Yasutaka Furukawa, Rakesh Ranjan

    Abstract: This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency… ▽ More

    Submitted 30 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: 3D generation, project page: https://mvdiffusion-plusplus.github.io/