Skip to main content

Showing 1–50 of 620 results for author: Wu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.03125  [pdf, other

    cs.IT

    MambaJSCC: Deep Joint Source-Channel Coding with Visual State Space Model

    Authors: Tong Wu, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Wenjun Zhang, Ping Zhang

    Abstract: Lightweight and efficient deep joint source-channel coding (JSCC) is a key technology for semantic communications. In this paper, we design a novel JSCC scheme named MambaJSCC, which utilizes a visual state space model with channel adaptation (VSSM-CA) block as its backbone for transmitting images over wireless channels. The VSSM-CA block utilizes VSSM to integrate two-dimensional images with the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE conference

  2. arXiv:2405.02714  [pdf, other

    cs.IR cs.CL

    Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

    Authors: Xinran Zhao, Tong Chen, Sihao Chen, Hongming Zhang, Tongshuang Wu

    Abstract: The task of Information Retrieval (IR) requires a system to identify relevant documents based on users' information needs. In real-world scenarios, retrievers are expected to not only rely on the semantic relevance between the documents and the queries but also recognize the nuanced intents or perspectives behind a user query. For example, when asked to verify a claim, a retrieval system is expect… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  3. arXiv:2405.01349  [pdf, other

    cs.LG cs.CR

    Position Paper: Beyond Robustness Against Single Attack Types

    Authors: Sihui Dai, Chong Xiang, Tong Wu, Prateek Mittal

    Abstract: Current research on defending against adversarial examples focuses primarily on achieving robustness against a single attack type such as $\ell_2$ or $\ell_{\infty}$-bounded attacks. However, the space of possible perturbations is much larger and currently cannot be modeled by a single attack type. The discrepancy between the focus of current defenses and the space of attacks of interest calls to… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2404.18933  [pdf, other

    cs.CV cs.LG

    Learning Low-Rank Feature for Thorax Disease Classification

    Authors: Rajeev Goel, Utkarsh Nath, Yancheng Wang, Alvin C. Silva, Teresa Wu, Yingzhen Yang

    Abstract: Deep neural networks, including Convolutional Neural Networks (CNNs) and Visual Transformers (ViT), have achieved stunning success in medical image domain. We study thorax disease classification in this paper. Effective extraction of features for the disease areas is crucial for disease classification on radiographic images. While various neural architectures and training techniques, such as self-… ▽ More

    Submitted 14 February, 2024; originally announced April 2024.

  5. arXiv:2404.18262  [pdf, other

    cs.AI

    Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning

    Authors: Atharva Naik, Jessica Ruhan Yin, Anusha Kamath, Qianou Ma, Sherry Tongshuang Wu, Charles Murray, Christopher Bogart, Majd Sakr, Carolyn P. Rose

    Abstract: An advantage of Large Language Models (LLMs) is their contextualization capability - providing different responses based on student inputs like solution strategy or prior discussion, to potentially better engage students than standard feedback. We present a design and evaluation of a proof-of-concept LLM application to offer students dynamic and contextualized feedback. Specifically, we augment an… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  6. arXiv:2404.18053  [pdf, ps, other

    cs.IT

    Binary duadic codes and their related codes with a square-root-like lower bound

    Authors: Tingting Wu, Lanqiang Li, Xiuyu Zhang, Shixin Zhu

    Abstract: Binary cyclic codes have been a hot topic for many years, and significant progress has been made in the study of this types of codes. As is well known, it is hard to construct infinite families of binary cyclic codes [n, n+1/2] with good minimum distance. In this paper, by using the BCH bound on cyclic codes, one of the open problems proposed by Liu et al. about binary cyclic codes (Finite Field A… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 15 pages

  7. arXiv:2404.17489  [pdf, other

    cs.LG cs.AI stat.ML

    Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation

    Authors: Wei Cui, Rasa Hosseinzadeh, Junwei Ma, Tongzi Wu, Yi Sui, Keyvan Golestan

    Abstract: Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular d… ▽ More

    Submitted 30 April, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: 14 pages, 4 algorithms, 3 figures, 5 tables

  8. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  9. arXiv:2404.15683  [pdf, other

    cs.CV

    AnoFPDM: Anomaly Segmentation with Forward Process of Diffusion Models for Brain MRI

    Authors: Yiming Che, Fazle Rafsani, Jay Shah, Md Mahfuzur Rahman Siddiquee, Teresa Wu

    Abstract: Weakly-supervised diffusion models (DM) in anomaly segmentation, leveraging image-level labels, have attracted significant attention for their superior performance compared to unsupervised methods. It eliminates the need for pixel-level labels in training, offering a more cost-effective alternative to supervised methods. However, existing methods are not fully weakly-supervised because they heavil… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  10. Cross-Task Multi-Branch Vision Transformer for Facial Expression and Mask Wearing Classification

    Authors: Armando Zhu, Keqin Li, Tong Wu, Peng Zhao, Bo Hong

    Abstract: With wearing masks becoming a new cultural norm, facial expression recognition (FER) while taking masks into account has become a significant challenge. In this paper, we propose a unified multi-branch vision transformer for facial expression recognition and mask wearing classification tasks. Our approach extracts shared features for both tasks using a dual-branch architecture that obtains multi-s… ▽ More

    Submitted 30 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Journal ref: Journal of Computer Technology and Applied Mathematics, vol. 1, no. 1, Apr. 2024, pp. 46-53,

  11. arXiv:2404.14361  [pdf, other

    cs.CL

    Better Synthetic Data by Retrieving and Transforming Existing Datasets

    Authors: Saumya Gandhi, Ritu Gala, Vijay Viswanathan, Tongshuang Wu, Graham Neubig

    Abstract: Despite recent advances in large language models, building dependable and deployable NLP models typically requires abundant, high-quality training data. However, task-specific data is not available for many use cases, and manually curating task-specific data is labor-intensive. Recent work has studied prompt-driven synthetic data generation using large language models, but these generated datasets… ▽ More

    Submitted 26 April, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: PDF fixed in v3

  12. arXiv:2404.13579  [pdf, other

    cs.CV cs.AI

    LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions

    Authors: Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li

    Abstract: Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation. Visual text rendering and layout-to-image generation tasks have been popular in controllable text-to-image generation. However, each of these tasks typically focuses on single modality generation or rendering, leaving yet-to-be-br… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.13289  [pdf, other

    cs.CL cs.MM cs.SD eess.AS

    Double Mixture: Towards Continual Event Detection from Speech

    Authors: Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

    Abstract: Speech event detection is crucial for multimedia retrieval, involving the tagging of both semantic and acoustic events. Traditional ASR systems often overlook the interplay between these events, focusing solely on content, even though the interpretation of dialogue can vary with environmental context. This paper tackles two primary challenges in speech event detection: the continual integration of… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: The first two authors contributed equally to this work

  14. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  15. arXiv:2404.09412  [pdf, other

    cs.CV

    DeferredGS: Decoupled and Editable Gaussian Splatting with Deferred Shading

    Authors: Tong Wu, Jia-Mu Sun, Yu-Kun Lai, Yuewen Ma, Leif Kobbelt, Lin Gao

    Abstract: Reconstructing and editing 3D objects and scenes both play crucial roles in computer graphics and computer vision. Neural radiance fields (NeRFs) can achieve realistic reconstruction and editing results but suffer from inefficiency in rendering. Gaussian splatting significantly accelerates rendering by rasterizing Gaussian ellipsoids. However, Gaussian splatting utilizes a single Spherical Harmoni… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  16. arXiv:2404.07575  [pdf

    cs.SD cs.AI eess.AS

    An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

    Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

    Abstract: Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distri… ▽ More

    Submitted 11 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Findings

  17. arXiv:2404.06974  [pdf

    cs.RO

    Deep Reinforcement Learning for Mobile Robot Path Planning

    Authors: Hao Liu, Yi Shen, Shuangjiang Yu, Zijun Gao, Tong Wu

    Abstract: Path planning is an important problem with the the applications in many aspects, such as video games, robotics etc. This paper proposes a novel method to address the problem of Deep Reinforcement Learning (DRL) based path planning for a mobile robot. We design DRL-based algorithms, including reward functions, and parameter optimization, to avoid time-consuming work in a 2D environment. We also des… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  18. arXiv:2404.05964  [pdf, other

    cs.CR

    Deep Learning-Based Out-of-distribution Source Code Data Identification: How Far Have We Gone?

    Authors: Van Nguyen, Xingliang Yuan, Tingmin Wu, Surya Nepal, Marthie Grobler, Carsten Rudolph

    Abstract: Software vulnerabilities (SVs) have become a common, serious, and crucial concern to safety-critical security systems. That leads to significant progress in the use of AI-based methods for software vulnerability detection (SVD). In practice, although AI-based methods have been achieving promising performances in SVD and other domain applications (e.g., computer vision), they are well-known to fail… ▽ More

    Submitted 14 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  19. arXiv:2404.05692  [pdf, other

    cs.CL

    Evaluating Mathematical Reasoning Beyond Accuracy

    Authors: Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu

    Abstract: The leaderboard of Large Language Models (LLMs) in mathematical tasks has been continuously updated. However, the majority of evaluations focus solely on the final results, neglecting the quality of the intermediate steps. This oversight can mask underlying problems, such as logical errors or unnecessary steps in the reasoning process. To measure reasoning beyond final-answer accuracy, we introduc… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  20. arXiv:2404.04799  [pdf, other

    cs.CV

    Few-Shot Object Detection: Research Advances and Challenges

    Authors: Zhimeng Xin, Shiming Chen, Tianxu Wu, Yuanjie Shao, Weiping Ding, Xinge You

    Abstract: Object detection as a subfield within computer vision has achieved remarkable progress, which aims to accurately identify and locate a specific object from images or videos. Such methods rely on large-scale labeled training samples for each object category to ensure accurate detection, but obtaining extensive annotated data is a labor-intensive and expensive process in many real-world scenarios. T… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  21. arXiv:2404.04565  [pdf, other

    cs.CV

    SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

    Authors: Tao Wu, Runyu He, Gangshan Wu, Limin Wang

    Abstract: Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  22. arXiv:2404.02657  [pdf, other

    cs.CL cs.AI

    Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

    Authors: Taiqiang Wu, Chaofan Tao, Jiahao Wang, Zhe Zhao, Ngai Wong

    Abstract: Kullback-Leiber divergence has been widely used in Knowledge Distillation (KD) to compress Large Language Models (LLMs). Contrary to prior assertions that reverse Kullback-Leibler (RKL) divergence is mode-seeking and thus preferable over the mean-seeking forward Kullback-Leibler (FKL) divergence, this study empirically and theoretically demonstrates that neither mode-seeking nor mean-seeking prope… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Under review as a conference paper at COLM 2024

  23. arXiv:2404.00431  [pdf, other

    cs.HC cs.LG

    Visualizing Routes with AI-Discovered Street-View Patterns

    Authors: Tsung Heng Wu, Md Amiruzzaman, Ye Zhao, Deepshikha Bhati, Jing Yang

    Abstract: Street-level visual appearances play an important role in studying social systems, such as understanding the built environment, driving routes, and associated social and economic factors. It has not been integrated into a typical geographical visualization interface (e.g., map services) for planning driving routes. In this paper, we study this new visualization task with several new contributions.… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 12 pages, 10 figures, and 3 tables

  24. arXiv:2403.16539  [pdf, other

    cs.CV

    DOrA: 3D Visual Grounding with Order-Aware Referring

    Authors: Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang

    Abstract: 3D visual grounding aims to identify the target object within a 3D point cloud scene referred to by a natural language description. While previous works attempt to exploit the verbo-visual relation with proposed cross-modal transformers, unstructured natural utterances and scattered objects might lead to undesirable performances. In this paper, we introduce DOrA, a novel 3D visual grounding framew… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  25. arXiv:2403.15654  [pdf, ps, other

    cs.LG math.OC

    The Effectiveness of Local Updates for Decentralized Learning under Data Heterogeneity

    Authors: Tongle Wu, Ying Sun

    Abstract: We revisit two fundamental decentralized optimization methods, Decentralized Gradient Tracking (DGT) and Decentralized Gradient Descent (DGD), with multiple local updates. We consider two settings and demonstrate that incorporating $K > 1$ local update steps can reduce communication complexity. Specifically, for $μ$-strongly convex and $L$-smooth loss functions, we proved that local DGT achieves c… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  26. arXiv:2403.15285  [pdf, other

    cs.NI cs.CR cs.HC cs.LG

    Blockchain-based Pseudonym Management for Vehicle Twin Migrations in Vehicular Edge Metaverse

    Authors: Jiawen Kang, Xiaofeng Luo, Jiangtian Nie, Tianhao Wu, Haibo Zhou, Yonghua Wang, Dusit Niyato, Shiwen Mao, Shengli Xie

    Abstract: Driven by the great advances in metaverse and edge computing technologies, vehicular edge metaverses are expected to disrupt the current paradigm of intelligent transportation systems. As highly computerized avatars of Vehicular Metaverse Users (VMUs), the Vehicle Twins (VTs) deployed in edge servers can provide valuable metaverse services to improve driving safety and on-board satisfaction for th… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 14 pages, 9 figures

  27. arXiv:2403.14619  [pdf, other

    cs.CV

    ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition

    Authors: Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham, Qianyi Wu

    Abstract: 3D decomposition/segmentation still remains a challenge as large-scale 3D annotated data is not readily available. Contemporary approaches typically leverage 2D machine-generated segments, integrating them for 3D consistency. While the majority of these methods are based on NeRFs, they face a potential weakness that the instance/semantic embedding features derive from independent MLPs, thus preven… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sm0kywu.github.io/ClusteringSDF/

  28. arXiv:2403.12999  [pdf

    cs.RO cs.AI cs.CL cs.LG

    Prompt Selection and Augmentation for Few Examples Code Generation in Large Language Model and its Application in Robotics Control

    Authors: On Tai Wu, Frodo Kin Sun Chan, Zunhao Zhang, Yan Nei Law, Benny Drescher, Edmond Shiao Bun Lai

    Abstract: Few-shot prompting and step-by-step reasoning have enhanced the capabilities of Large Language Models (LLMs) in tackling complex tasks including code generation. In this paper, we introduce a prompt selection and augmentation algorithm aimed at improving mathematical reasoning and robot arm operations. Our approach incorporates a multi-stage example augmentation scheme combined with an example sel… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 17 pages, 4 figures

  29. arXiv:2403.12766  [pdf, other

    cs.CL

    NovelQA: A Benchmark for Long-Range Novel Question Answering

    Authors: Cunxiang Wang, Ruoxi Ning, Boqi Pan, Tonghui Wu, Qipeng Guo, Cheng Deng, Guangsheng Bao, Qian Wang, Yue Zhang

    Abstract: The rapid advancement of Large Language Models (LLMs) has introduced a new frontier in natural language processing, particularly in understanding and processing long-context information. However, the evaluation of these models' long-context abilities remains a challenge due to the limitations of current benchmarks. To address this gap, we introduce NovelQA, a benchmark specifically designed to tes… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  30. arXiv:2403.12421  [pdf, other

    cs.RO

    Dexterous Functional Pre-Grasp Manipulation with Diffusion Policy

    Authors: Tianhao Wu, Yunchong Gan, Mingdong Wu, Jingbo Cheng, Yaodong Yang, Yixin Zhu, Hao Dong

    Abstract: In real-world scenarios, objects often require repositioning and reorientation before they can be grasped, a process known as pre-grasp manipulation. Learning universal dexterous functional pre-grasp manipulation requires precise control over the relative position, orientation, and contact between the hand and object while generalizing to diverse dynamic scenarios with varying objects and goal pos… ▽ More

    Submitted 5 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  31. arXiv:2403.12409  [pdf, other

    cs.CV

    ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance

    Authors: Yongwei Chen, Tengfei Wang, Tong Wu, Xingang Pan, Kui Jia, Ziwei Liu

    Abstract: Generating high-quality 3D assets from a given image is highly desirable in various applications such as AR/VR. Recent advances in single-image 3D generation explore feed-forward models that learn to infer the 3D model of an object without optimization. Though promising results have been achieved in single object generation, these methods often struggle to model complex 3D assets that inherently c… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: https://cyw-3d.github.io/ComboVerse/

  32. arXiv:2403.11134  [pdf, other

    cs.CV cs.GR

    Recent Advances in 3D Gaussian Splatting

    Authors: Tong Wu, Yu-Jie Yuan, Ling-Xiao Zhang, Jie Yang, Yan-Pei Cao, Ling-Qi Yan, Lin Gao

    Abstract: The emergence of 3D Gaussian Splatting (3DGS) has greatly accelerated the rendering speed of novel view synthesis. Unlike neural implicit representations like Neural Radiance Fields (NeRF) that represent a 3D scene with position and viewpoint-conditioned neural networks, 3D Gaussian Splatting utilizes a set of Gaussian ellipsoids to model the scene so that efficient rendering can be accomplished b… ▽ More

    Submitted 13 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  33. arXiv:2403.10854  [pdf, other

    cs.CV

    A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

    Authors: Tianhe Wu, Kede Ma, Jie Liang, Yujiu Yang, Lei Zhang

    Abstract: While Multimodal Large Language Models (MLLMs) have experienced significant advancement on visual understanding and reasoning, their potentials to serve as powerful, flexible, interpretable, and text-driven models for Image Quality Assessment (IQA) remains largely unexplored. In this paper, we conduct a comprehensive and systematic study of prompting MLLMs for IQA. Specifically, we first investiga… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

  34. Ordinal Classification with Distance Regularization for Robust Brain Age Prediction

    Authors: Jay Shah, Md Mahfuzur Rahman Siddiquee, Yi Su, Teresa Wu, Baoxin Li

    Abstract: Age is one of the major known risk factors for Alzheimer's Disease (AD). Detecting AD early is crucial for effective treatment and preventing irreversible brain damage. Brain age, a measure derived from brain imaging reflecting structural changes due to aging, may have the potential to identify AD onset, assess disease risk, and plan targeted interventions. Deep learning-based regression technique… ▽ More

    Submitted 6 May, 2024; v1 submitted 25 October, 2023; originally announced March 2024.

    Comments: Accepted in WACV 2024

  35. arXiv:2403.10044  [pdf, other

    cs.CV

    SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

    Authors: Tao Wu, Xuewei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

    Abstract: Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI2024

  36. arXiv:2403.06401  [pdf, other

    cs.CV

    Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation

    Authors: Peng Zhang, Ting Wu, Jinsheng Sun, Weiqing Li, Zhiyong Su

    Abstract: Existing interactive point cloud segmentation approaches primarily focus on the object segmentation, which aim to determine which points belong to the object of interest guided by user interactions. This paper concentrates on an unexplored yet meaningful task, i.e., interactive point cloud semantic segmentation, which assigns high-quality semantic labels to all points in a scene with user correcti… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  37. arXiv:2403.05500  [pdf, other

    cs.RO

    Using Fiber Optic Bundles to Miniaturize Vision-Based Tactile Sensors

    Authors: Julia Di, Zdravko Dugonjic, Will Fu, Tingfan Wu, Romeo Mercado, Kevin Sawyer, Victoria Rose Most, Gregg Kammerer, Stefanie Speidel, Richard E. Fan, Geoffrey Sonn, Mark R. Cutkosky, Mike Lambeta, Roberto Calandra

    Abstract: Vision-based tactile sensors have recently become popular due to their combination of low cost, very high spatial resolution, and ease of integration using widely available miniature cameras. The associated field of view and focal length, however, are difficult to package in a human-sized finger. In this paper we employ optical fiber bundles to achieve a form factor that, at 15 mm diameter, is sma… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: We open source the design of DIGIT Pinki at https://github.com/facebookresearch/digit-design

  38. arXiv:2403.04586  [pdf, other

    cs.RO cs.LG

    Learning Agility Adaptation for Flight in Clutter

    Authors: Guangyu Zhao, Tianyue Wu, Yeke Chen, Fei Gao

    Abstract: Animals learn to adapt agility of their movements to their capabilities and the environment they operate in. Mobile robots should also demonstrate this ability to combine agility and safety. The aim of this work is to endow flight vehicles with the ability of agility adaptation in prior unknown and partially observable cluttered environments. We propose a hierarchical learning and planning framewo… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Submission to Robotics and Automation Letter. 8 pages, 11 figures. Project page: https://learning-agility-adaptation.github.io/

  39. arXiv:2403.03497  [pdf, other

    cs.GT

    Adaptive coordination promotes collective cooperation in repeated social dilemmas

    Authors: Feipeng Zhang, Te Wu, Long Wang

    Abstract: Direct reciprocity based on the repeated prisoner's dilemma has been intensively studied. Most theoretical investigations have concentrated on memory-$1$ strategies, a class of elementary strategies just reacting to the previous-round outcomes. Though the properties of "All-or-None" strategies ($AoN_K$) have been discovered, simulations just confirmed the good performance of $AoN_K$ of very short… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  40. arXiv:2403.02528  [pdf, other

    cs.CL cs.AI

    DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation

    Authors: Xueqing Wu, Rui Zheng, Jingzhen Sha, Te-Lin Wu, Hanyu Zhou, Mohan Tang, Kai-Wei Chang, Nanyun Peng, Haoran Huang

    Abstract: Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights to comprehensively answer a given user query for tabular data. In this work, we aim to propose new resources and benchmarks to inspire future research on this crucial yet challenging and under-explored task. However, collecting data analysis annotations curated by experts can be prohibitively expensi… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  41. arXiv:2403.02234  [pdf, other

    cs.CV

    3DTopia: Large Text-to-3D Generation Model with Hybrid Diffusion Priors

    Authors: Fangzhou Hong, Jiaxiang Tang, Ziang Cao, Min Shi, Tong Wu, Zhaoxi Chen, Shuai Yang, Tengfei Wang, Liang Pan, Dahua Lin, Ziwei Liu

    Abstract: We present a two-stage text-to-3D generation system, namely 3DTopia, which generates high-quality general 3D assets within 5 minutes using hybrid diffusion priors. The first stage samples from a 3D diffusion prior directly learned from 3D data. Specifically, it is powered by a text-conditioned tri-plane latent diffusion model, which quickly generates coarse 3D samples for fast prototyping. The sec… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Code available at https://github.com/3DTopia/3DTopia

  42. arXiv:2403.01928  [pdf, other

    cs.RO

    ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization

    Authors: Yao Zhao, Tao Wu, Yijie Zhu, Xiang Lu, Jun Wang, Haitham Bou-Ammar, Xinyu Zhang, Peng Du

    Abstract: We present ZSL-RPPO, an improved zero-shot learning architecture that overcomes the limitations of teacher-student neural networks and enables generating robust, reliable, and versatile locomotion for quadrupedal robots in challenging terrains. We propose a new algorithm RPPO (Recurrent Proximal Policy Optimization) that directly trains recurrent neural network in partially observable environments… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  43. arXiv:2403.01370  [pdf

    cs.CV

    Depth Estimation Algorithm Based on Transformer-Encoder and Feature Fusion

    Authors: Linhan Xia, Junbang Liu, Tong Wu

    Abstract: This research presents a novel depth estimation algorithm based on a Transformer-encoder architecture, tailored for the NYU and KITTI Depth Dataset. This research adopts a transformer model, initially renowned for its success in natural language processing, to capture intricate spatial relationships in visual data for depth estimation tasks. A significant innovation of the research is the integrat… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: ICAACE2024

  44. arXiv:2402.17242  [pdf, other

    cs.SI cs.DB

    Scalable Community Search with Accuracy Guarantee on Attributed Graphs

    Authors: Yuxiang Wang, Shuzhan Ye, Xiaoliang Xu, Yuxia Geng, Zhenghe Zhao, Xiangyu Ke, Tianxing Wu

    Abstract: Given an attributed graph $G$ and a query node $q$, \underline{C}ommunity \underline{S}earch over \underline{A}ttributed \underline{G}raphs (CS-AG) aims to find a structure- and attribute-cohesive subgraph from $G$ that contains $q$. Although CS-AG has been widely studied, they still face three challenges. (1) Exact methods based on graph traversal are time-consuming, especially for large graphs.… ▽ More

    Submitted 29 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  45. arXiv:2402.17124  [pdf, other

    cs.CL

    Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

    Authors: Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

    Abstract: For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance. While it is now common sense that LLM performances are greatly impacted by prompts, the confidence calibration in prompting LLMs has yet to be thoroughly explored. In this paper, we explore how different prompting strategies influence LLM confidence calibration and how it could be improved. We… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 17 pages, 10 figures

  46. arXiv:2402.17110  [pdf, other

    cs.LG cs.CL

    Sinkhorn Distance Minimization for Knowledge Distillation

    Authors: Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li

    Abstract: Knowledge distillation (KD) has been widely adopted to compress large language models (LLMs). Existing KD methods investigate various divergence measures including the Kullback-Leibler (KL), reverse Kullback-Leibler (RKL), and Jensen-Shannon (JS) divergences. However, due to limitations inherent in their assumptions and definitions, these measures fail to deliver effective supervision when few dis… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Accepted by COLING 2024

  47. arXiv:2402.17092  [pdf, other

    cs.CR

    An Innovative Information Theory-based Approach to Tackle and Enhance The Transparency in Phishing Detection

    Authors: Van Nguyen, Tingmin Wu, Xingliang Yuan, Marthie Grobler, Surya Nepal, Carsten Rudolph

    Abstract: Phishing attacks have become a serious and challenging issue for detection, explanation, and defense. Despite more than a decade of research on phishing, encompassing both technical and non-technical remedies, phishing continues to be a serious problem. Nowadays, AI-based phishing detection stands out as one of the most effective solutions for defending against phishing attacks by providing vulner… ▽ More

    Submitted 16 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  48. arXiv:2402.14415  [pdf, other

    cs.CV cs.GR

    TaylorGrid: Towards Fast and High-Quality Implicit Field Learning via Direct Taylor-based Grid Optimization

    Authors: Renyi Mao, Qingshan Xu, Peng Zheng, Ye Wang, Tieru Wu, Rui Ma

    Abstract: Coordinate-based neural implicit representation or implicit fields have been widely studied for 3D geometry representation or novel view synthesis. Recently, a series of efforts have been devoted to accelerating the speed and improving the quality of the coordinate-based implicit field learning. Instead of learning heavy MLPs to predict the neural implicit values for the query coordinates, neural… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  49. Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

    Authors: Tzu-Sheng Kuo, Aaron Halfaker, Zirui Cheng, Jiwoo Kim, Meng-Hsin Wu, Tongshuang Wu, Kenneth Holstein, Haiyi Zhu

    Abstract: AI tools are increasingly deployed in community contexts. However, datasets used to evaluate AI are typically created by developers and annotators outside a given community, which can yield misleading conclusions about AI performance. How might we empower communities to drive the intentional design and curation of evaluation datasets for AI that impacts them? We investigate this question on Wikipe… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24)

  50. arXiv:2402.13264  [pdf, other

    cs.AI

    KGroot: Enhancing Root Cause Analysis through Knowledge Graphs and Graph Convolutional Neural Networks

    Authors: Tingting Wang, Guilin Qi, Tianxing Wu

    Abstract: Fault localization is challenging in online micro-service due to the wide variety of monitoring data volume, types, events and complex interdependencies in service and components. Faults events in services are propagative and can trigger a cascade of alerts in a short period of time. In the industry, fault localization is typically conducted manually by experienced personnel. This reliance on expe… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.