Skip to main content

Showing 1–50 of 1,275 results for author: Huang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04883  [pdf, other

    cs.CV cs.AI cs.LG

    Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion

    Authors: Zehan Wang, Ziang Zhang, Xize Cheng, Rongjie Huang, Luping Liu, Zhenhui Ye, Haifeng Huang, Yang Zhao, Tao Jin, Peng Gao, Zhou Zhao

    Abstract: Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces. In this work, we propose Molecule-Space, an idea that treats multimodal representation spaces as "molecules", and augments pre-trained unified space… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024. The code and checkpoints are released at https://github.com/MoleculeSpace/MoleculeSpace

  2. arXiv:2405.04093  [pdf, other

    cs.CV cs.AI

    DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects

    Authors: Da Fu, Mingfei Rong, Eun-Hu Kim, Hao Huang, Witold Pedrycz

    Abstract: Accurate classification of fine-grained images remains a challenge in backbones based on convolutional operations or self-attention mechanisms. This study proposes novel dual-current neural networks (DCNN), which combine the advantages of convolutional operations and self-attention mechanisms to improve the accuracy of fine-grained image classification. The main novel design features for construct… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2405.04065  [pdf, other

    cs.CL

    FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

    Authors: Runheng Liu, Xingchen Xiao, Heyan Huang, Zewen Chi, Zhijing Wu

    Abstract: Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work using utilizing retrieved content by simply prepending retrieved contents to the input poses a high runtime issue, which degrades the inferen… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 14 pages

  4. arXiv:2405.03969  [pdf, other

    cs.RO

    Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform

    Authors: Zhijian Qiao, Haoming Huang, Chuhao Liu, Shaojie Shen, Fumin Zhang, Huan Yin

    Abstract: The construction and robotic sensing data originate from disparate sources and are associated with distinct frames of reference. The primary objective of this study is to align LiDAR point clouds with building information modeling (BIM) using a global point cloud registration approach, aimed at establishing a shared understanding between the two modalities, i.e., ``speak the same language''. To ac… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures

  5. LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model

    Authors: Haowen Sun, Ruikun Zheng, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce LGTM, a novel Local-to-Global pipeline for Text-to-Motion generation. LGTM utilizes a diffusion-based architecture and aims to address the challenge of accurately translating textual descriptions into semantically coherent human motion in computer animation. Specifically, traditional methods often struggle with semantic discrepancies, particularly in aligning specific m… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures, SIGGRAPH 2024

  6. arXiv:2405.03221  [pdf, other

    cs.CV cs.GR cs.LG

    Spatial and Surface Correspondence Field for Interaction Transfer

    Authors: Zeyu Huang, Honghao Xu, Haibin Huang, Chongyang Ma, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a new method for the task of interaction transfer. Given an example interaction between a source object and an agent, our method can automatically infer both surface and spatial relationships for the agent and target objects within the same category, yielding more accurate and valid transfers. Specifically, our method characterizes the example interaction using a combin… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGGRAPH 2024, project page at https://vcc.tech/research/2024/InterTransfer

  7. arXiv:2405.02982  [pdf, other

    cs.CV

    Paintings and Drawings Aesthetics Assessment with Rich Attributes for Various Artistic Categories

    Authors: Xin Jin, Qianqian Qiao, Yi Lu, Shan Gao, Heng Huang, Guangdong Li

    Abstract: Image aesthetic evaluation is a highly prominent research domain in the field of computer vision. In recent years, there has been a proliferation of datasets and corresponding evaluation methodologies for assessing the aesthetic quality of photographic works, leading to the establishment of a relatively mature research environment. However, in contrast to the extensive research in photographic aes… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  8. arXiv:2405.01102  [pdf, other

    cs.LG cs.AI

    Less is More: on the Over-Globalizing Problem in Graph Transformers

    Authors: Yujie Xing, Xiao Wang, Yibo Li, Hai Huang, Chuan Shi

    Abstract: Graph Transformer, due to its global attention mechanism, has emerged as a new tool in dealing with graph-structured data. It is well recognized that the global attention mechanism considers a wider receptive field in a fully connected graph, leading many to believe that useful information can be extracted from all the nodes. In this paper, we challenge this belief: does the globalizing property a… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  9. arXiv:2405.00749  [pdf, other

    cs.CV cs.LG

    More is Better: Deep Domain Adaptation with Multiple Sources

    Authors: Sicheng Zhao, Hui Chen, Hu Huang, Pengfei Xu, Guiguang Ding

    Abstract: In many practical applications, it is often difficult and expensive to obtain large-scale labeled data to train state-of-the-art deep neural networks. Therefore, transferring the learned knowledge from a separate, labeled source domain to an unlabeled or sparsely labeled target domain becomes an appealing alternative. However, direct transfer often results in significant performance decay due to d… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024. arXiv admin note: text overlap with arXiv:2002.12169

  10. arXiv:2405.00393  [pdf, other

    cs.CR

    Inferring State Machine from the Protocol Implementation via Large Langeuage Model

    Authors: Haiyang Wei, Zhengjie Du, Haohui Huang, Yue Liu, Guang Cheng, Linzhang Wang, Bing Mao

    Abstract: State machines play a pivotal role in augmenting the efficacy of protocol analyzing to unveil more vulnerabilities. However, the task of inferring state machines from network protocol implementations presents significant challenges. Traditional methods based on dynamic analysis often overlook crucial state transitions due to limited coverage, while static analysis faces difficulties with complex c… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  11. arXiv:2404.19368  [pdf, other

    cs.SE

    Exploring Multi-Lingual Bias of Large Code Models in Code Generation

    Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

    Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 12 pages

  12. arXiv:2404.18202  [pdf, other

    cs.AI cs.MM

    WorldGPT: Empowering LLM as Multimodal World Model

    Authors: Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

    Abstract: World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). W… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  13. arXiv:2404.17238  [pdf, other

    cs.IR

    TruthSR: Trustworthy Sequential Recommender Systems via User-generated Multimodal Content

    Authors: Meng Yan, Haibin Huang, Ying Liu, Juan Zhao, Xiyue Gao, Cai Xu, Ziyu Guan, Wei Zhao

    Abstract: Sequential recommender systems explore users' preferences and behavioral patterns from their historically generated data. Recently, researchers aim to improve sequential recommendation by utilizing massive user-generated multi-modal content, such as reviews, images, etc. This content often contains inevitable noise. Some studies attempt to reduce noise interference by suppressing cross-modal incon… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  14. arXiv:2404.17169  [pdf, other

    cs.LG cs.CY

    FairGT: A Fairness-aware Graph Transformer

    Authors: Renqiang Luo, Huafei Huang, Shuo Yu, Xiuzhen Zhang, Feng Xia

    Abstract: The design of Graph Transformers (GTs) generally neglects considerations for fairness, resulting in biased outcomes against certain sensitive subgroups. Since GTs encode graph information without relying on message-passing mechanisms, conventional fairness-aware graph learning methods cannot be directly applicable to address these issues. To tackle this challenge, we propose FairGT, a Fairness-awa… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: IJCAI2024

  15. arXiv:2404.16027  [pdf, other

    cs.RO

    ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

    Authors: Qinxi Yu, Masoud Moghani, Karthik Dharmarajan, Vincent Schorp, William Chung-Ho Panitch, Jingzhou Liu, Kush Hari, Huang Huang, Mayank Mittal, Ken Goldberg, Animesh Garg

    Abstract: Physics-based simulations have accelerated progress in robot learning for driving, manipulation, and locomotion. Yet, a fast, accurate, and robust surgical simulation environment remains a challenge. In this paper, we present ORBIT-Surgical, a physics-based surgical robot simulation framework with photorealistic rendering in NVIDIA Omniverse. We provide 14 benchmark surgical tasks for the da Vinci… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  16. arXiv:2404.15789  [pdf, other

    cs.CV

    MotionMaster: Training-free Camera Motion Transfer For Video Generation

    Authors: Teng Hu, Jiangning Zhang, Ran Yi, Yating Wang, Hongrui Huang, Jieyu Weng, Yabiao Wang, Lizhuang Ma

    Abstract: The emergence of diffusion models has greatly propelled the progress in image and video generation. Recently, some efforts have been made in controllable video generation, including text-to-video generation and video motion control, among which camera motion control is an important topic. However, existing camera motion control methods rely on training a temporal camera module, and necessitate sub… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  17. arXiv:2404.14771  [pdf, other

    cs.SD cs.AI

    Music Style Transfer With Diffusion Model

    Authors: Hong Huang, Yuyi Wang, Luyao Li, Jun Lin

    Abstract: Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited. When considering the conversion between multiple styles, previous methods required designing multiple modes to disentangle the complex style of the music, resulting in large computational costs and slow audio generation. The existing music style transfer methods generate spectr… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures, ICMC 2023

    Journal ref: International Computer Music Conference (ICMC 2023) pp. 40-47, October 2023

  18. arXiv:2404.13953  [pdf, other

    cs.CV

    360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos

    Authors: Yinzhe Xu, Huajian Huang, Yingshu Chen, Sai-Kit Yeung

    Abstract: Visual object tracking and segmentation in omnidirectional videos are challenging due to the wide field-of-view and large spherical distortion brought by 360° images. To alleviate these problems, we introduce a novel representation, extended bounding field-of-view (eBFoV), for target localization and use it as the foundation of a general 360 tracking framework which is applicable for both omnidire… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  19. arXiv:2404.13631  [pdf, other

    cs.LG cond-mat.dis-nn cond-mat.stat-mech cs.NE q-bio.NC

    Fermi-Bose Machine

    Authors: Mingshan Xie, Yuchen Wang, Haiping Huang

    Abstract: Distinct from human cognitive processing, deep neural networks trained by backpropagation can be easily fooled by adversarial examples. To design a semantically meaningful representation learning, we discard backpropagation, and instead, propose a local contrastive learning, where the representation for the inputs bearing the same label shrink (akin to boson) in hidden layers, while those of diffe… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 17 pages, 6 figures, a physics inspired machine without backpropagation and enhanced adversarial robustness

  20. arXiv:2404.13033  [pdf, other

    cs.CL

    Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs

    Authors: Biyang Guo, He Wang, Wenyilin Xiao, Hong Chen, Zhuxin Lee, Songqiao Han, Hailiang Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical appro… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 figures, 14 tables

  21. arXiv:2404.12768  [pdf, other

    cs.CV cs.AI cs.GR

    MixLight: Borrowing the Best of both Spherical Harmonics and Gaussian Models

    Authors: Xinlong Ji, Fangneng Zhan, Shijian Lu, Shi-Sheng Huang, Hua Huang

    Abstract: Accurately estimating scene lighting is critical for applications such as mixed reality. Existing works estimate illumination by generating illumination maps or regressing illumination parameters. However, the method of generating illumination maps has poor generalization performance and parametric models such as Spherical Harmonic (SH) and Spherical Gaussian (SG) fall short in capturing high-freq… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  22. arXiv:2404.12242  [pdf, other

    cs.CL

    CMNEE: A Large-Scale Document-Level Event Extraction Dataset based on Open-Source Chinese Military News

    Authors: Mengna Zhu, Zijie Xu, Kaisheng Zeng, Kaiming Xiao, Mao Wang, Wenjun Ke, Hongbin Huang

    Abstract: Extracting structured event knowledge, including event triggers and corresponding arguments, from military texts is fundamental to many applications, such as intelligence analysis and decision assistance. However, event extraction in the military field faces the data scarcity problem, which impedes the research of event extraction models in this domain. To alleviate this problem, we propose CMNEE,… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 13 pages, 7 figures, accepted to LREC-COLING 2024

  23. arXiv:2404.11422  [pdf

    cs.LG cs.AI physics.ao-ph

    Short-term wind speed forecasting model based on an attention-gated recurrent neural network and error correction strategy

    Authors: Haojian Huang

    Abstract: The accurate wind speed series forecast is very pivotal to security of grid dispatching and the application of wind power. Nevertheless, on account of their nonlinear and non-stationary nature, their short-term forecast is extremely challenging. Therefore, this dissertation raises one short-term wind speed forecast pattern on the foundation of attention with an improved gated recurrent neural netw… ▽ More

    Submitted 22 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 23 pages, 11 figures, 6 tables, Technical Report

  24. arXiv:2404.10681  [pdf, other

    cs.CV

    StyleCity: Large-Scale 3D Urban Scenes Stylization with Vision-and-Text Reference via Progressive Optimization

    Authors: Yingshu Chen, Huajian Huang, Tuan-Anh Vu, Ka Chun Shum, Sai-Kit Yeung

    Abstract: Creating large-scale virtual urban scenes with variant styles is inherently challenging. To facilitate prototypes of virtual production and bypass the need for complex materials and lighting setups, we introduce the first vision-and-text-driven texture stylization system for large-scale urban scenes, StyleCity. Taking an image and text as references, StyleCity stylizes a 3D textured mesh of a larg… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: project page: https://chenyingshu.github.io/stylecity3d/

  25. arXiv:2404.10253  [pdf, other

    cs.DC

    Kilometer-Level Coupled Modeling Using 40 Million Cores: An Eight-Year Journey of Model Development

    Authors: Xiaohui Duan, Yuxuan Li, Zhao Liu, Bin Yang, Juepeng Zheng, Haohuan Fu, Shaoqing Zhang, Shiming Xu, Yang Gao, Wei Xue, Di Wei, Xiaojing Lv, Lifeng Yan, Haopeng Huang, Haitian Lu, Lingfeng Wan, Haoran Lin, Qixin Chang, Chenlin Li, Quanjie He, Zeyu Song, Xuantong Wang, Yangyang Yu, Xilong Fan, Zhaopeng Qu , et al. (16 additional authors not shown)

    Abstract: With current and future leading systems adopting heterogeneous architectures, adapting existing models for heterogeneous supercomputers is of urgent need for improving model resolution and reducing modeling uncertainty. This paper presents our three-week effort on porting a complex earth system model, CESM 2.2, to a 40-million-core Sunway supercomputer. Taking a non-intrusive approach that tries t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 18 pages, 13 figures

  26. arXiv:2404.09640  [pdf, other

    cs.CV

    CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

    Authors: Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

    Abstract: Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances an… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Ongoing work; 10 pages, 2 Tables, 9 Figures; Repo is available at: https://github.com/JethroJames/CREST

  27. arXiv:2404.09540  [pdf, other

    cs.CV

    Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement

    Authors: Chi Wang, Junming Huang, Rong Zhang, Qi Wang, Haotian Yang, Haibin Huang, Chongyang Ma, Weiwei Xu

    Abstract: Automatic 3D facial texture generation has gained significant interest recently. Existing approaches may not support the traditional physically based rendering pipeline or rely on 3D data captured by Light Stage. Our key contribution is a progressive latent space refinement approach that can bootstrap from 3D Morphable Models (3DMMs)-based texture maps generated from facial images to generate high… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  28. arXiv:2404.09192  [pdf, other

    cs.SD cs.AI eess.AS

    Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

    Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

    Abstract: Over the past decade, a series of unflagging efforts have been dedicated to developing highly expressive and controllable text-to-speech (TTS) systems. In general, the holistic TTS comprises two interconnected components: the frontend module and the backend module. The frontend excels in capturing linguistic representations from the raw text input, while the backend module converts linguistic cues… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  29. arXiv:2404.09115  [pdf, other

    cs.CV

    GCC: Generative Calibration Clustering

    Authors: Haifeng Xia, Hai Huang, Zhengming Ding

    Abstract: Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-lev… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  30. arXiv:2404.07281  [pdf, other

    quant-ph cs.IT cs.LG

    Certifying almost all quantum states with few single-qubit measurements

    Authors: Hsin-Yuan Huang, John Preskill, Mehdi Soleimanifar

    Abstract: Certifying that an n-qubit state synthesized in the lab is close to the target state is a fundamental task in quantum information science. However, existing rigorous protocols either require deep quantum circuits or exponentially many single-qubit measurements. In this work, we prove that almost all n-qubit target states, including those with exponential circuit complexity, can be certified from o… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 63 pages, 5 figures

  31. arXiv:2404.06681  [pdf, other

    cs.AI cs.LG stat.ME

    Causal Unit Selection using Tractable Arithmetic Circuits

    Authors: Haiying Huang, Adnan Darwiche

    Abstract: The unit selection problem aims to find objects, called units, that optimize a causal objective function which describes the objects' behavior in a causal context (e.g., selecting customers who are about to churn but would most likely change their mind if encouraged). While early studies focused mainly on bounding a specific class of counterfactual objective functions using data, more recent work… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  32. arXiv:2404.06047  [pdf, other

    cs.DC

    A Systematic Literature Survey of Sparse Matrix-Vector Multiplication

    Authors: Jianhua Gao, Bingjie Liu, Weixing Ji, Hua Huang

    Abstract: Sparse matrix-vector multiplication (SpMV) is a crucial computing kernel with widespread applications in iterative algorithms. Over the past decades, research on SpMV optimization has made remarkable strides, giving rise to various optimization contributions. However, the comprehensive and systematic literature survey that introduces, analyzes, discusses, and summarizes the advancements of SpMV in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 34 pages, 18 figures, 16 tables

    MSC Class: 68-02; 68W10; 65F50 ACM Class: A.1; D.1.3; G.1.3

  33. arXiv:2404.04057  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

    Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

    Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: ICML 2024

  34. arXiv:2404.03202  [pdf, other

    cs.CV

    OmniGS: Omnidirectional Gaussian Splatting for Fast Radiance Field Reconstruction using Omnidirectional Images

    Authors: Longwei Li, Huajian Huang, Sai-Kit Yeung, Hui Cheng

    Abstract: Photorealistic reconstruction relying on 3D Gaussian Splatting has shown promising potential in robotics. However, the current 3D Gaussian Splatting system only supports radiance field reconstruction using undistorted perspective images. In this paper, we present OmniGS, a novel omnidirectional Gaussian splatting system, to take advantage of omnidirectional images for fast radiance field reconstru… ▽ More

    Submitted 7 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures

  35. arXiv:2404.03171  [pdf, other

    cs.SE cs.LG cs.PL

    Multi-modal Learning for WebAssembly Reverse Engineering

    Authors: Hanxian Huang, Jishen Zhao

    Abstract: The increasing adoption of WebAssembly (Wasm) for performance-critical and security-sensitive tasks drives the demand for WebAssembly program comprehension and reverse engineering. Recent studies have introduced machine learning (ML)-based WebAssembly reverse engineering tools. Yet, the generalization of task-specific ML solutions remains challenging, because their effectiveness hinges on the avai… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted by ISSTA '24

  36. arXiv:2404.03019  [pdf, other

    cs.DC cs.LG

    GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU

    Authors: Zhongming Yu, Genghan Zhang, Hanxian Huang, Xin Chen, Jishen Zhao

    Abstract: In recent years, Graph Neural Networks (GNNs) have ignited a surge of innovation, significantly enhancing the processing of geometric data structures such as graphs, point clouds, and meshes. As the domain continues to evolve, a series of frameworks and libraries are being developed to push GNN efficiency to new heights. While graph-centric libraries have achieved success in the past, the advent o… ▽ More

    Submitted 7 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  37. arXiv:2404.02772  [pdf, other

    cs.CL

    FPT: Feature Prompt Tuning for Few-shot Readability Assessment

    Authors: Ziyang Wang, Sanwoo Lee, Hsiu-Yuan Huang, Yunfang Wu

    Abstract: Prompt-based methods have achieved promising results in most few-shot text classification tasks. However, for readability assessment tasks, traditional prompt methods lackcrucial linguistic knowledge, which has already been proven to be essential. Moreover, previous studies on utilizing linguistic features have shown non-robust performance in few-shot settings and may even impair model performance… ▽ More

    Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: NAACL-2024 main conference

  38. arXiv:2404.02545  [pdf, other

    cs.LG cs.AI

    Grid-Mapping Pseudo-Count Constraint for Offline Reinforcement Learning

    Authors: Yi Shen, Hanyan Huang, Shan Xie

    Abstract: Offline reinforcement learning learns from a static dataset without interacting with the environment, which ensures security and thus owns a good prospect of application. However, directly applying naive reinforcement learning methods usually fails in an offline environment due to function approximation errors caused by out-of-distribution(OOD) actions. To solve this problem, existing algorithms m… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  39. arXiv:2404.00971  [pdf, other

    cs.SE cs.AI

    Exploring and Evaluating Hallucinations in LLM-Powered Code Generation

    Authors: Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang

    Abstract: The rise of Large Language Models (LLMs) has significantly advanced many applications on software engineering tasks, particularly in code generation. Despite the promising performance, LLMs are prone to generate hallucinations, which means LLMs might produce outputs that deviate from users' intent, exhibit internal inconsistencies, or misalign with the factual knowledge, making the deployment of L… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  40. arXiv:2403.19949  [pdf, other

    cs.CV

    FairCLIP: Harnessing Fairness in Vision-Language Learning

    Authors: Yan Luo, Min Shi, Muhammad Osama Khan, Muhammad Muneeb Afzal, Hao Huang, Shuaihang Yuan, Yu Tian, Luo Song, Ava Kouhana, Tobias Elze, Yi Fang, Mengyu Wang

    Abstract: Fairness is a critical concern in deep learning, especially in healthcare, where these models influence diagnoses and treatment decisions. Although fairness has been investigated in the vision-only domain, the fairness of medical vision-language (VL) models remains unexplored due to the scarcity of medical VL datasets for studying fairness. To bridge this research gap, we introduce the first fair… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  41. arXiv:2403.19783  [pdf, other

    cond-mat.mtrl-sci cs.LG

    AlloyBERT: Alloy Property Prediction with Large Language Models

    Authors: Akshat Chaudhari, Chakradhar Guntuboina, Hongshuo Huang, Amir Barati Farimani

    Abstract: The pursuit of novel alloys tailored to specific requirements poses significant challenges for researchers in the field. This underscores the importance of developing predictive techniques for essential physical properties of alloys based on their chemical composition and processing parameters. This study introduces AlloyBERT, a transformer encoder-based model designed to predict properties such a… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 20 pages, 3 figures

  42. arXiv:2403.19490  [pdf, other

    cs.CV

    Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

    Authors: Alireza Ganjdanesh, Shangqian Gao, Heng Huang

    Abstract: Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024

  43. arXiv:2403.18761  [pdf, other

    cs.GR

    MATTopo: Topology-preserving Medial Axis Transform with Restricted Power Diagram

    Authors: Ningna Wang, Hui Huang, Shibo Song, Bin Wang, Wenping Wang, Xiaohu Guo

    Abstract: We present a novel volumetric RPD (restricted power diagram) based framework for approximating the medial axes of 3D CAD shapes adaptively, while preserving topological equivalence, medial features, and geometric convergence. To solve the topology preservation problem, we propose a volumetric RPD based strategy, which discretizes the input volume into sub-regions given a set of medial spheres. Wit… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  44. arXiv:2403.18667  [pdf, other

    cs.IR cs.CL

    Improving Content Recommendation: Knowledge Graph-Based Semantic Contrastive Learning for Diversity and Cold-Start Users

    Authors: Yejin Kim, Scott Rome, Kevin Foley, Mayur Nankani, Rimon Melamed, Javier Morales, Abhay Yadav, Maria Peifer, Sardar Hamidian, H. Howie Huang

    Abstract: Addressing the challenges related to data sparsity, cold-start problems, and diversity in recommendation systems is both crucial and demanding. Many current solutions leverage knowledge graphs to tackle these issues by combining both item-based and user-item collaborative signals. A common trend in these approaches focuses on improving ranking performance at the cost of escalating model complexity… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at LREC-COLING 2024

  45. arXiv:2403.18493  [pdf, other

    cs.CV

    VersaT2I: Improving Text-to-Image Models with Versatile Reward

    Authors: Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang Wang

    Abstract: Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  46. arXiv:2403.18361  [pdf, other

    cs.CV

    ViTAR: Vision Transformer with Any Resolution

    Authors: Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia Yang

    Abstract: This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, d… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  47. arXiv:2403.18281  [pdf, other

    cs.CV

    AIR-HLoc: Adaptive Image Retrieval for Efficient Visual Localisation

    Authors: Changkun Liu, Huajian Huang, Zhengyang Ma, Tristan Braud

    Abstract: State-of-the-art (SOTA) hierarchical localisation pipelines (HLoc) rely on image retrieval (IR) techniques to establish 2D-3D correspondences by selecting the $k$ most similar images from a reference image database for a given query image. Although higher values of $k$ enhance localisation robustness, the computational cost for feature matching increases linearly with $k$. In this paper, we observ… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  48. arXiv:2403.17706  [pdf, other

    cs.CL cs.AI

    Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

    Authors: Shuyu Chang, Rui Wang, Peng Ren, Haiping Huang

    Abstract: Crafting effective topic models for brief texts, like tweets and news headlines, is essential for capturing the swift shifts in social dynamics. Traditional topic models, however, often fall short in accurately representing the semantic intricacies of short texts due to their brevity and lack of contextual data. In our study, we harness the advanced capabilities of Large Language Models (LLMs) to… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 6 pages, 4 figures

  49. arXiv:2403.17636  [pdf, other

    cs.CL

    Mix-Initiative Response Generation with Dynamic Prefix Tuning

    Authors: Yuxiang Nie, Heyan Huang, Xian-Ling Mao, Lizi Liao

    Abstract: Mixed initiative serves as one of the key factors in controlling conversation directions. For a speaker, responding passively or leading proactively would result in rather different responses. However, most dialogue systems focus on training a holistic response generation model without any distinction among different initiatives. It leads to the cross-contamination problem, where the model confuse… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to the main conference of NAACL 2024

  50. arXiv:2403.15612  [pdf, other

    cs.CV

    InterFusion: Text-Driven Generation of 3D Human-Object Interaction

    Authors: Sisi Dai, Wenhao Li, Haowen Sun, Haibin Huang, Chongyang Ma, Hui Huang, Kai Xu, Ruizhen Hu

    Abstract: In this study, we tackle the complex task of generating 3D human-object interactions (HOI) from textual descriptions in a zero-shot text-to-3D manner. We identify and address two key challenges: the unsatisfactory outcomes of direct text-to-3D methods in HOI, largely due to the lack of paired text-interaction data, and the inherent difficulties in simultaneously generating multiple concepts with c… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.