Skip to main content

Showing 1–50 of 904 results for author: Hu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02962  [pdf, other

    cs.CV

    VectorPainter: A Novel Approach to Stylized Vector Graphics Synthesis with Vectorized Strokes

    Authors: Juncheng Hu, Ximing Xing, Zhengqi Zhang, Jing Zhang, Qian Yu

    Abstract: We propose a novel method, VectorPainter, for the task of stylized vector graphics synthesis. Given a text prompt and a reference style image, VectorPainter generates a vector graphic that aligns in content with the text prompt and remains faithful in style to the reference image. We recognize that the key to this task lies in fully leveraging the intrinsic properties of vector graphics. Innovativ… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  2. arXiv:2405.02830  [pdf, other

    cs.CV

    You Only Need Half: Boosting Data Augmentation by Using Partial Content

    Authors: Juntao Hu, Yuan Wu

    Abstract: We propose a novel data augmentation method termed You Only Need hAlf (YONA), which simplifies the augmentation process. YONA bisects an image, substitutes one half with noise, and applies data augmentation techniques to the remaining half. This method reduces the redundant information in the original image, encourages neural networks to recognize objects from incomplete views, and significantly e… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Technical report,16 pages

  3. arXiv:2405.02730  [pdf, other

    cs.CV

    U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers

    Authors: Yuchuan Tian, Zhijun Tu, Hanting Chen, Jie Hu, Chao Xu, Yunhe Wang

    Abstract: Diffusion Transformers (DiTs) introduce the transformer architecture to diffusion tasks for latent-space image generation. With an isotropic architecture that chains a series of transformer blocks, DiTs demonstrate competitive performance and good scalability; but meanwhile, the abandonment of U-Net by DiTs and their following improvements is worth rethinking. To this end, we conduct a simple toy… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  4. arXiv:2405.02356  [pdf, other

    cs.LG cs.AI

    Stochastic Multivariate Universal-Radix Finite-State Machine: a Theoretically and Practically Elegant Nonlinear Function Approximator

    Authors: Xincheng Feng, Guodong Shen, Jianhao Hu, Meng Li, Ngai Wong

    Abstract: Nonlinearities are crucial for capturing complex input-output relationships especially in deep neural networks. However, nonlinear functions often incur various hardware and compute overheads. Meanwhile, stochastic computing (SC) has emerged as a promising approach to tackle this challenge by trading output precision for hardware simplicity. To this end, this paper proposes a first-of-its-kind sto… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  5. arXiv:2404.19738  [pdf, other

    cs.HC

    DiaryHelper: Exploring the Use of an Automatic Contextual Information Recording Agent for Elicitation Diary Study

    Authors: Junze Li, Changyang He, Jiaxiong Hu, Boyang Jia, Alon Halevy, Xiaojuan Ma

    Abstract: Elicitation diary studies, a type of qualitative, longitudinal research method, involve participants to self-report aspects of events of interest at their occurrences as memory cues for providing details and insights during post-study interviews. However, due to time constraints and lack of motivation, participants' diary entries may be vague or incomplete, impairing their later recall. To address… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CHI 2024

  6. arXiv:2404.19444  [pdf, other

    cs.CV

    AnomalyXFusion: Multi-modal Anomaly Synthesis with Diffusion

    Authors: Jie Hu, Yawen Huang, Yilin Lu, Guoyang Xie, Guannan Jiang, Yefeng Zheng, Zhichao Lu

    Abstract: Anomaly synthesis is one of the effective methods to augment abnormal samples for training. However, current anomaly synthesis methods predominantly rely on texture information as input, which limits the fidelity of synthesized abnormal samples. Because texture information is insufficient to correctly depict the pattern of anomalies, especially for logical anomalies. To surmount this obstacle, we… ▽ More

    Submitted 1 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

  7. arXiv:2404.19154  [pdf, other

    cs.CL

    RTF: Region-based Table Filling Method for Relational Triple Extraction

    Authors: Ning An, Lei Hei, Yong Jiang, Weiping Meng, Jingjing Hu, Boran Huang, Feiliang Ren

    Abstract: Relational triple extraction is crucial work for the automatic construction of knowledge graphs. Existing methods only construct shallow representations from a token or token pair-level. However, previous works ignore local spatial dependencies of relational triples, resulting in a weakness of entity pair boundary detection. To tackle this problem, we propose a novel Region-based Table Filling met… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Rejected by EMNLP 2023

  8. arXiv:2404.18243  [pdf, other

    cs.CL

    LEGENT: Open Platform for Embodied Agents

    Authors: Zhili Cheng, Zhitong Wang, Jinyi Hu, Shengding Hu, An Liu, Yuge Tu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun

    Abstract: Despite advancements in Large Language Models (LLMs) and Large Multimodal Models (LMMs), their integration into language-grounded, human-like embodied agents remains incomplete, hindering complex real-life task performance in physical environments. Existing integrations often feature limited open sourcing, challenging collective progress in this field. We introduce LEGENT, an open, scalable platfo… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Demo Paper

  9. arXiv:2404.17611  [pdf

    physics.ao-ph cs.AI cs.LG

    MetaSD: A Unified Framework for Scalable Downscaling of Meteorological Variables in Diverse Situations

    Authors: Jing Hu, Honghu Zhang, Peng Zheng, Jialin Mu, Xiaomeng Huang, Xi Wu

    Abstract: Addressing complex meteorological processes at a fine spatial resolution requires substantial computational resources. To accelerate meteorological simulations, researchers have utilized neural networks to downscale meteorological variables from low-resolution simulations. Despite notable advancements, contemporary cutting-edge downscaling algorithms tailored to specific variables. Addressing mete… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  10. arXiv:2404.17065  [pdf, ps, other

    cs.LO cs.PL

    DeLaM: A Dependent Layered Modal Type Theory for Meta-programming

    Authors: Jason Z. S. Hu, Brigitte Pientka

    Abstract: We scale layered modal type theory to dependent types, introducing DeLaM, dependent layered modal type theory. This type theory is novel in that we have one uniform type theory in which we can not only compose and execute code, but also intensionally analyze the code of types and terms. The latter in particular allows us to write tactics as meta-programs and use regular libraries when writing tact… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.16147  [pdf, other

    cs.RO

    Chat2Scenario: Scenario Extraction From Dataset Through Utilization of Large Language Model

    Authors: Yongqi Zhao, Wenbo Xiao, Tomislav Mihalj, Jia Hu, Arno Eichberger

    Abstract: The advent of Large Language Models (LLM) provides new insights to validate Automated Driving Systems (ADS). In the herein-introduced work, a novel approach to extracting scenarios from naturalistic driving datasets is presented. A framework called Chat2Scenario is proposed leveraging the advanced Natural Language Processing (NLP) capabilities of LLM to understand and identify different driving sc… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: IEEE Intelligent Vehicles Symposium (IV 2024)

  12. arXiv:2404.15655  [pdf, other

    cs.CV

    Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering

    Authors: Jiawei Yao, Qi Qian, Juhua Hu

    Abstract: Multiple clustering has gained significant attention in recent years due to its potential to reveal multiple hidden structures of data from different perspectives. The advent of deep multiple clustering techniques has notably advanced the performance by uncovering complex patterns and relationships within large datasets. However, a major challenge arises as users often do not need all the clusteri… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. Project page: https://github.com/Alexander-Yao/Multi-MaP

  13. arXiv:2404.15587  [pdf, other

    cs.CR

    Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks

    Authors: Hangcheng Cao, Wenbin Huang, Guowen Xu, Xianhao Chen, Ziyang He, Jingyang Hu, Hongbo Jiang, Yuguang Fang

    Abstract: Deep learning technologies are pivotal in enhancing the performance of WiFi-based wireless sensing systems. However, they are inherently vulnerable to adversarial perturbation attacks, and regrettably, there is lacking serious attention to this security issue within the WiFi sensing community. In this paper, we elaborate such an attack, called WiIntruder, distinguishing itself with universality, r… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  14. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  15. arXiv:2404.11064  [pdf, other

    cs.CV cs.AI

    Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization

    Authors: Yongdong Luo, Haojia Lin, Xiawu Zheng, Yigeng Jiang, Fei Chao, Jie Hu, Guannan Jiang, Songan Zhang, Rongrong Ji

    Abstract: 3D Visual Grounding (3DVG) and 3D Dense Captioning (3DDC) are two crucial tasks in various 3D applications, which require both shared and complementary information in localization and visual-language relationships. Therefore, existing approaches adopt the two-stage "detect-then-describe/discriminate" pipeline, which relies heavily on the performance of the detector, resulting in suboptimal perform… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  16. arXiv:2404.10260  [pdf, other

    q-bio.BM cs.AI

    HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New Heights

    Authors: Xiaomin Fang, Jie Gao, Jing Hu, Lihang Liu, Yang Xue, Xiaonan Zhang, Kunrui Zhu

    Abstract: While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex predictio… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  17. arXiv:2404.09709  [pdf, other

    cs.IR cs.LG

    Scenario-Adaptive Fine-Grained Personalization Network: Tailoring User Behavior Representation to the Scenario Context

    Authors: Moyu Zhang, Yongxiang Tang, Jinxin Hu, Yu Zhang

    Abstract: Existing methods often adjust representations adaptively only after aggregating user behavior sequences. This coarse-grained approach to re-weighting the entire user sequence hampers the model's ability to accurately model the user interest migration across different scenarios. To enhance the model's capacity to capture user interests from historical behavior sequences in each scenario, we develop… ▽ More

    Submitted 29 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024, 10 pages, 5 figures, 5 tables

    Journal ref: SIGIR 2024

  18. arXiv:2404.08611  [pdf, other

    cs.CV cs.AI physics.med-ph

    Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation Network

    Authors: Xin Tie, Muheon Shin, Changhee Lee, Scott B. Perlman, Zachary Huemann, Amy J. Weisman, Sharon M. Castellino, Kara M. Kelly, Kathleen M. McCarten, Adina L. Alazraki, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw

    Abstract: $\textbf{Purpose}$: Automatic quantification of longitudinal changes in PET scans for lymphoma patients has proven challenging, as residual disease in interim-therapy scans is often subtle and difficult to detect. Our goal was to develop a longitudinally-aware segmentation network (LAS-Net) that can quantify serial PET/CT images for pediatric Hodgkin lymphoma patients. $\textbf{Materials and Metho… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 6 figures, 4 tables in the main text

  19. arXiv:2404.07857  [pdf, other

    physics.optics cs.ET nlin.CD

    Optical next generation reservoir computing

    Authors: Hao Wang, Jianqi Hu, YoonSeok Baek, Kohei Tsuchiyama, Malo Joly, Qiang Liu, Sylvain Gigan

    Abstract: Artificial neural networks with internal dynamics exhibit remarkable capability in processing information. Reservoir computing (RC) is a canonical example that features rich computing expressivity and compatibility with physical implementations for enhanced efficiency. Recently, a new RC paradigm known as next generation reservoir computing (NGRC) further improves expressivity but compromises its… ▽ More

    Submitted 28 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  20. arXiv:2404.06075  [pdf, other

    cs.CV

    LIPT: Latency-aware Image Processing Transformer

    Authors: Junbo Qiao, Wei Li, Haizhen Xie, Hanting Chen, Yunshuai Zhou, Zhijun Tu, Jie Hu, Shaohui Lin

    Abstract: Transformer is leading a trend in the field of image processing. Despite the great success that existing lightweight image processing transformers have achieved, they are tailored to FLOPs or parameters reduction, rather than practical inference acceleration. In this paper, we present a latency-aware image processing transformer, termed LIPT. We devise the low-latency proportion LIPT block that su… ▽ More

    Submitted 28 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  21. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  22. arXiv:2404.04927  [pdf, ps, other

    cs.IT

    Holographic Integrated Data and Energy Transfer

    Authors: Qingxiao Huang, Jie Hu, Yizhe Zhao, Kun Yang

    Abstract: Thanks to the application of metamaterials, holographic multiple-input multiple-output (H-MIMO) is expected to achieve a higher spatial diversity gain by enabling the ability to generate any current distribution on the surface. With the aid of electromagnetic (EM) manipulation capability of H-MIMO, integrated data and energy transfer (IDET) system can fully exploits the EM channel to realize energ… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  23. arXiv:2404.04810  [pdf, other

    cond-mat.mtrl-sci cs.LG

    AlphaCrystal-II: Distance matrix based crystal structure prediction using deep learning

    Authors: Yuqi Song, Rongzhi Dong, Lai Wei, Qin Li, Jianjun Hu

    Abstract: Computational prediction of stable crystal structures has a profound impact on the large-scale discovery of novel functional materials. However, predicting the crystal structure solely from a material's composition or formula is a promising yet challenging task, as traditional ab initio crystal structure prediction (CSP) methods rely on time-consuming global searches and first-principles free ener… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 16 pages

  24. arXiv:2404.04403  [pdf, other

    stat.ME cs.AI

    Low-Rank Robust Subspace Tensor Clustering for Metro Passenger Flow Modeling

    Authors: Jiuyun Hu, Ziyue Li, Chen Zhang, Fugee Tsung, Hao Yan

    Abstract: Tensor clustering has become an important topic, specifically in spatio-temporal modeling, due to its ability to cluster spatial modes (e.g., stations or road segments) and temporal modes (e.g., time of the day or day of the week). Our motivating example is from subway passenger flow modeling, where similarities between stations are commonly found. However, the challenges lie in the innate high-di… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Conditionally Accepted in INFORMS Journal of Data Science

  25. arXiv:2404.03900  [pdf, other

    stat.ML cs.AI cs.LG cs.NE

    Nonparametric Modern Hopfield Models

    Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

    Abstract: We present a nonparametric construction for deep learning compatible modern Hopfield models and utilize this framework to debut an efficient variant. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Crucially, our framework not only recovers the known resul… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 59 pages; Code available at https://github.com/MAGICS-LAB/NonparametricHopfield

  26. arXiv:2404.03830  [pdf, other

    cs.LG cs.AI stat.ML

    BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model

    Authors: Chenwei Xu, Yu-Chao Huang, Jerry Yao-Chieh Hu, Weijian Li, Ammar Gilani, Hsi-Sheng Goan, Han Liu

    Abstract: We introduce the \textbf{B}i-Directional \textbf{S}parse \textbf{Hop}field Network (\textbf{BiSHop}), a novel end-to-end framework for deep tabular learning. BiSHop handles the two major challenges of deep tabular learning: non-rotationally invariant data structure and feature sparsity in tabular data. Our key motivation comes from the recent established connection between associative memory and a… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 40 page; Code available at https://github.com/MAGICS-LAB/BiSHop

  27. arXiv:2404.03828  [pdf, other

    cs.LG cs.AI stat.ML

    Outlier-Efficient Hopfield Layers for Large Transformer-Based Models

    Authors: Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Robin Luo, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu

    Abstract: We introduce an Outlier-Efficient Modern Hopfield Model (termed $\mathtt{OutEffHop}$) and use it to address the outlier-induced challenge of quantizing gigantic transformer-based models. Our main contribution is a novel associative memory model facilitating \textit{outlier-efficient} associative memory retrievals. Interestingly, this memory model manifests a model-based interpretation of an outlie… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 48 pages; Code available at https://github.com/MAGICS-LAB/OutEffHop

  28. arXiv:2404.03827  [pdf, other

    cs.LG cs.AI stat.ML

    Uniform Memory Retrieval with Larger Capacity for Modern Hopfield Models

    Authors: Dennis Wu, Jerry Yao-Chieh Hu, Teng-Yun Hsiao, Han Liu

    Abstract: We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed $\mathtt{U\text{-}Hop}$, with enhanced memory capacity. Our key contribution is a learnable feature map $Φ$ which transforms the Hopfield energy function into a kernel space. This transformation ensures convergence between the local minima of energy and the fixed points of retrieval dynamics within the kernel space… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: 64 pages; Code available at https://github.com/MAGICS-LAB/UHop

  29. arXiv:2404.03663  [pdf, other

    cs.NE cs.CV

    Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips

    Authors: Man Yao, Jiakui Hu, Tianxiang Hu, Yifan Xu, Zhaokun Zhou, Yonghong Tian, Bo Xu, Guoqi Li

    Abstract: Neuromorphic computing, which exploits Spiking Neural Networks (SNNs) on neuromorphic chips, is a promising energy-efficient alternative to traditional AI. CNN-based SNNs are the current mainstream of neuromorphic computing. By contrast, no neuromorphic chips are designed especially for Transformer-based SNNs, which have just emerged, and their performance is only on par with CNN-based SNNs, offer… ▽ More

    Submitted 15 February, 2024; originally announced April 2024.

    Comments: Accepted by ICLR2024. Code and Model: https://github.com/BICLab/Spike-Driven-Transformer-V2

  30. arXiv:2404.03558  [pdf, other

    cs.CL cs.LG

    How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes

    Authors: Harmon Bhasin, Timothy Ossowski, Yiqiao Zhong, Junjie Hu

    Abstract: Large language models (LLM) have recently shown the extraordinary ability to perform unseen tasks based on few-shot examples provided as text, also known as in-context learning (ICL). While recent works have attempted to understand the mechanisms driving ICL, few have explored training strategies that incentivize these models to generalize to multiple tasks. Multi-task learning (MTL) for generalis… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024

  31. Unblind Text Inputs: Predicting Hint-text of Text Input in Mobile Apps via LLM

    Authors: Zhe Liu, Chunyang Chen, Junjie Wang, Mengzhuo Chen, Boyu Wu, Yuekai Huang, Jun Hu, Qing Wang

    Abstract: Mobile apps have become indispensable for accessing and participating in various environments, especially for low-vision users. Users with visual impairments can use screen readers to read the content of each screen and understand the content that needs to be operated. Screen readers need to read the hint-text attribute in the text input component to remind visually impaired users what to fill in.… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Accepted by the 2024 CHI Conference on Human Factors in Computing Systems

  32. arXiv:2404.02573  [pdf, other

    cs.CV

    Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

    Authors: Simiao Li, Yun Zhang, Wei Li, Hanting Chen, Wenjia Wang, Bingyi Jing, Shaohui Lin, Jie Hu

    Abstract: Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model. Previous methods for image super-resolution (SR) mostly compare the feature maps directly or after standardizing the dimensions with basic algebraic operations (e.g. average, dot-product).… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  33. arXiv:2404.02418  [pdf, other

    cs.CL cs.AI

    Auxiliary task demands mask the capabilities of smaller language models

    Authors: Jennifer Hu, Michael C. Frank

    Abstract: Developmental psychologists have argued about when cognitive capacities such as language understanding or theory of mind emerge. These debates often hinge on the concept of "task demands" -- the auxiliary challenges associated with performing a particular evaluation -- that may mask the child's underlying ability. The same issues arise when measuring the capacities of language models (LMs): perfor… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  34. arXiv:2404.01943  [pdf, other

    cs.CV cs.RO

    Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation

    Authors: Zihan Wang, Xiangyang Li, Jiahao Yang, Yeqi Liu, Junjie Hu, Ming Jiang, Shuqiang Jiang

    Abstract: Vision-and-language navigation (VLN) enables the agent to navigate to a remote location following the natural language instruction in 3D environments. At each navigation step, the agent selects from possible candidate locations and then makes the move. For better navigation planning, the lookahead exploration strategy aims to effectively evaluate the agent's next action by accurately anticipating… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024. The code is available at https://github.com/MrZihan/HNR-VLN

  35. arXiv:2404.00633  [pdf, other

    cs.CV

    IPT-V2: Efficient Image Processing Transformer using Hierarchical Attentions

    Authors: Zhijun Tu, Kunpeng Du, Hanting Chen, Hailing Wang, Wei Li, Jie Hu, Yunhe Wang

    Abstract: Recent advances have demonstrated the powerful capability of transformer architecture in image restoration. However, our analysis indicates that existing transformerbased methods can not establish both exact global and local dependencies simultaneously, which are much critical to restore the details and missing content of degraded images. To this end, we present an efficient image processing trans… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  36. arXiv:2403.20150  [pdf, other

    cs.LG cs.AI cs.CY

    TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

    Authors: Xiangfei Qiu, Jilin Hu, Lekui Zhou, Xingjian Wu, Junyang Du, Buang Zhang, Chenjuan Guo, Aoying Zhou, Christian S. Jensen, Zhenli Sheng, Bin Yang

    Abstract: Time series are generated in diverse domains such as economic, traffic, health, and energy, where forecasting of future values has numerous important applications. Not surprisingly, many forecasting methods are being proposed. To ensure progress, it is essential to be able to study and compare such methods empirically in a comprehensive and reliable manner. To achieve this, we propose TFB, an auto… ▽ More

    Submitted 8 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by PVLDB 2024

  37. arXiv:2403.19724  [pdf

    cs.ET cs.NE physics.optics

    Towards Reverse-Engineering the Brain: Brain-Derived Neuromorphic Computing Approach with Photonic, Electronic, and Ionic Dynamicity in 3D integrated circuits

    Authors: S. J. Ben Yoo, Luis El-Srouji, Suman Datta, Shimeng Yu, Jean Anne Incorvia, Alberto Salleo, Volker Sorger, Juejun Hu, Lionel C Kimerling, Kristofer Bouchard, Joy Geng, Rishidev Chaudhuri, Charan Ranganath, Randall O'Reilly

    Abstract: The human brain has immense learning capabilities at extreme energy efficiencies and scale that no artificial system has been able to match. For decades, reverse engineering the brain has been one of the top priorities of science and technology research. Despite numerous efforts, conventional electronics-based methods have failed to match the scalability, energy efficiency, and self-supervised lea… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 15 pages, 12 figures

  38. arXiv:2403.18978  [pdf, other

    cs.CV cs.AI cs.LG

    TextCraftor: Your Text Encoder Can be Image Quality Controller

    Authors: Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation, enabling significant advancements in areas like image editing and video synthesis. Despite their formidable capabilities, these models are not without their limitations. It is still challenging to synthesize an image that aligns well with the input text, and multiple runs w… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  39. arXiv:2403.18864  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Interpretable Machine Learning for Weather and Climate Prediction: A Survey

    Authors: Ruyi Yang, Jingyu Hu, Zihao Li, Jianli Mu, Tingzhao Yu, Jiangjiang Xia, Xuhong Li, Aritra Dasgupta, Haoyi Xiong

    Abstract: Advanced machine learning models have recently achieved high predictive accuracy for weather and climate prediction. However, these complex models often lack inherent transparency and interpretability, acting as "black boxes" that impede user trust and hinder further model improvements. As such, interpretable machine learning techniques have become crucial in enhancing the credibility and utility… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: 26 pages, 5 figures

  40. Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging

    Authors: Jintong Hu, Hui Che, Zishuo Li, Wenming Yang

    Abstract: Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma. Traditional interpolation-based zoom-in techniques often sacrifice detail and introduce artifacts. Motivated by the potential of arbitrary-scale super-resolution to naturally address these inherent challenges, we present… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP2024, https://ieeexplore.ieee.org/document/10447712

    Journal ref: ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  41. arXiv:2403.16368  [pdf, other

    cs.CV

    Distilling Semantic Priors from SAM to Efficient Image Restoration Models

    Authors: Quan Zhang, Xiaoyu Liu, Wei Li, Hanting Chen, Junchao Liu, Jie Hu, Zhiwei Xiong, Chun Yuan, Yunhe Wang

    Abstract: In image restoration (IR), leveraging semantic priors from segmentation models has been a common approach to improve performance. The recent segment anything model (SAM) has emerged as a powerful tool for extracting advanced semantic priors to enhance IR tasks. However, the computational cost of SAM is prohibitive for IR, compared to existing smaller IR models. The incorporation of SAM for extract… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  42. arXiv:2403.16095  [pdf, other

    cs.CV cs.RO

    CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field

    Authors: Jiarui Hu, Xianhao Chen, Boyin Feng, Guanglin Li, Liangjing Yang, Hujun Bao, Guofeng Zhang, Zhaopeng Cui

    Abstract: Recently neural radiance fields (NeRF) have been widely exploited as 3D representations for dense simultaneous localization and mapping (SLAM). Despite their notable successes in surface modeling and novel view synthesis, existing NeRF-based methods are hindered by their computationally intensive and time-consuming volume rendering pipeline. This paper presents an efficient dense RGB-D SLAM system… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Project Page: https://zju3dv.github.io/cg-slam

  43. arXiv:2403.14430  [pdf, other

    cs.CV

    Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels

    Authors: Tianming Liang, Chaolei Tan, Beihao Xia, Wei-Shi Zheng, Jian-Fang Hu

    Abstract: This paper focuses on open-ended video question answering, which aims to find the correct answers from a large answer set in response to a video-related question. This is essentially a multi-label classification task, since a question may have multiple answers. However, due to annotation costs, the labels in existing benchmarks are always extremely insufficient, typically one answer per question.… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  44. arXiv:2403.14349  [pdf, other

    cs.CV

    On the Concept Trustworthiness in Concept Bottleneck Models

    Authors: Qihan Huang, Jie Song, Jingwen Hu, Haofei Zhang, Yong Wang, Mingli Song

    Abstract: Concept Bottleneck Models (CBMs), which break down the reasoning process into the input-to-concept mapping and the concept-to-label prediction, have garnered significant attention due to their remarkable interpretability achieved by the interpretable concept bottleneck. However, despite the transparency of the concept-to-label prediction, the mapping from the input to the intermediate concept rema… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  45. arXiv:2403.14174  [pdf, other

    cs.CV

    Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding

    Authors: Jingjing Hu, Dan Guo, Kun Li, Zhan Si, Xun Yang, Xiaojun Chang, Meng Wang

    Abstract: Inspired by the activity-silent and persistent activity mechanisms in human visual perception biology, we design a Unified Static and Dynamic Network (UniSDNet), to learn the semantic association between the video and text/audio queries in a cross-modal environment for efficient video grounding. For static modeling, we devise a novel residual structure (ResMLP) to boost the global comprehensive in… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  46. arXiv:2403.13128  [pdf, ps, other

    cs.LG

    AdaFish: Fast low-rank parameter-efficient fine-tuning by using second-order information

    Authors: Jiang Hu, Quanzheng Li

    Abstract: Recent advancements in large-scale pretrained models have significantly improved performance across a variety of tasks in natural language processing and computer vision. However, the extensive number of parameters in these models necessitates substantial memory and computational resources for full training. To adapt these models for downstream tasks or specific application-oriented datasets, para… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  47. arXiv:2403.12945  [pdf, other

    cs.RO

    DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

    Authors: Alexander Khazatsky, Karl Pertsch, Suraj Nair, Ashwin Balakrishna, Sudeep Dasari, Siddharth Karamcheti, Soroush Nasiriany, Mohan Kumar Srirama, Lawrence Yunliang Chen, Kirsty Ellis, Peter David Fagan, Joey Hejna, Masha Itkina, Marion Lepert, Yecheng Jason Ma, Patrick Tree Miller, Jimmy Wu, Suneel Belkhale, Shivin Dass, Huy Ha, Arhan Jain, Abraham Lee, Youngwoon Lee, Marius Memmel, Sungjae Park , et al. (74 additional authors not shown)

    Abstract: The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a resu… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://droid-dataset.github.io/

  48. arXiv:2403.12382  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Trace Adaptation of Zero-shot Self-supervised Blind Image Denoising

    Authors: Jintong Hu, Bin Xia, Bingchen Li, Wenming Yang

    Abstract: Deep learning-based denoiser has been the focus of recent development on image denoising. In the past few years, there has been increasing interest in developing self-supervised denoising networks that only require noisy images, without the need for clean ground truth for training. However, a performance gap remains between current self-supervised methods and their supervised counterparts. Additio… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 11pages, 6 figures

  49. arXiv:2403.12362  [pdf, other

    cs.CV cs.LG

    DMAD: Dual Memory Bank for Real-World Anomaly Detection

    Authors: Jianlong Hu, Xu Chen, Zhenye Gan, Jinlong Peng, Shengchuan Zhang, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Liujuan Cao, Rongrong Ji

    Abstract: Training a unified model is considered to be more suitable for practical industrial anomaly detection scenarios due to its generalization ability and storage efficiency. However, this multi-class setting, which exclusively uses normal data, overlooks the few but important accessible annotated anomalies in the real world. To address the challenge of real-world anomaly detection, we propose a new fr… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  50. arXiv:2403.12052   

    cs.CV

    A Dataset and Benchmark for Copyright Protection from Text-to-Image Diffusion Models

    Authors: Rui Ma, Qiang Zhou, Bangjun Xiao, Yizhu Jin, Daquan Zhou, Xiuyu Li, Aishani Singh, Yi Qu, Kurt Keutzer, Xiaodong Xie, Jingtong Hu, Zhen Dong, Shanghang Zhang

    Abstract: Copyright is a legal right that grants creators the exclusive authority to reproduce, distribute, and profit from their creative works. However, the recent advancements in text-to-image generation techniques have posed significant challenges to copyright protection, as these methods have facilitated the learning of unauthorized content, artistic creations, and portraits, which are subsequently uti… ▽ More

    Submitted 19 March, 2024; v1 submitted 4 January, 2024; originally announced March 2024.

    Comments: Improve experimental content