Skip to main content

Showing 1–50 of 1,235 results for author: Liu, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05584  [pdf, other

    cs.CV cs.AI

    A Survey on Backbones for Deep Video Action Recognition

    Authors: Zixuan Tang, Youjun Zhao, Yuhang Wen, Mengyuan Liu

    Abstract: Action recognition is a key technology in building interactive metaverses. With the rapid development of deep learning, methods in action recognition have also achieved great advancement. Researchers design and implement the backbones referring to multiple standpoints, which leads to the diversity of methods and encountering new challenges. This paper reviews several action recognition methods bas… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: This paper has been accepted by ICME workshop

  2. arXiv:2405.05523  [pdf, other

    cs.CV cs.AI

    Prompt When the Animal is: Temporal Animal Behavior Grounding with Positional Recovery Training

    Authors: Sheng Yan, Xin Du, Zongying Li, Yi Wang, Hongcang Jin, Mengyuan Liu

    Abstract: Temporal grounding is crucial in multimodal learning, but it poses challenges when applied to animal behavior data due to the sparsity and uniform distribution of moments. To address these challenges, we propose a novel Positional Recovery Training framework (Port), which prompts the model with the start and end times of specific animal behaviors during training. Specifically, Port enhances the ba… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted by ICMEW 2024. arXiv admin note: text overlap with arXiv:2404.13657

  3. arXiv:2405.03806  [pdf, other

    cs.HC

    In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

    Authors: Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

    Abstract: Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-c… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.02881  [pdf, other

    cs.LG cs.AI stat.ML

    FedConPE: Efficient Federated Conversational Bandits with Heterogeneous Clients

    Authors: Zhuohua Li, Maoli Liu, John C. S. Lui

    Abstract: Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

  5. arXiv:2405.02504  [pdf, other

    eess.IV cs.CV

    Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI

    Authors: Minhui Yu, Mengqi Wu, Ling Yue, Andrea Bozoki, Mingxia Liu

    Abstract: Magnetic resonance imaging (MRI) and positron emission tomography (PET) are increasingly used in multimodal analysis of neurodegenerative disorders. While MRI is broadly utilized in clinical settings, PET is less accessible. Many studies have attempted to use deep generative models to synthesize PET from MRI scans. However, they often suffer from unstable training and inadequately preserve brain f… ▽ More

    Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  6. arXiv:2405.01750  [pdf, other

    eess.IV cs.CV

    PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems

    Authors: Walter Zimmer, Ramandika Pranamulia, Xingcheng Zhou, Mingyu Liu, Alois C. Knoll

    Abstract: In the context of Intelligent Transportation Systems (ITS), efficient data compression is crucial for managing large-scale point cloud data acquired by roadside LiDAR sensors. The demand for efficient storage, streaming, and real-time object detection capabilities for point cloud data is substantial. This work introduces PointCompress3D, a novel point cloud compression framework tailored specifica… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2405.01461  [pdf, other

    cs.CV

    SATO: Stable Text-to-Motion Framework

    Authors: Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan Liu, Chen Chen

    Abstract: Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, re… ▽ More

    Submitted 3 May, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

  8. arXiv:2405.00704  [pdf, ps, other

    cs.CL cs.AI

    A Survey on the Real Power of ChatGPT

    Authors: Ming Liu, Ran Liu, Hua Wang, Wray Buntine

    Abstract: ChatGPT has changed the AI community and an active research line is the performance evaluation of ChatGPT. A key challenge for the evaluation is that ChatGPT is still closed-source and traditional benchmark datasets may have been used by ChatGPT as the training data. In this paper, (i) we survey recent studies which uncover the real performance levels of ChatGPT in seven categories of NLP tasks, (… ▽ More

    Submitted 22 April, 2024; originally announced May 2024.

    Comments: 9 pages, 2 tables

  9. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  10. arXiv:2405.00254  [pdf, other

    cs.AI cs.LG

    Principled RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

    Authors: Chanwoo Park, Mingyang Liu, Kaiqing Zhang, Asuman Ozdaglar

    Abstract: Reinforcement learning from human feedback (RLHF) has been an effective technique for aligning AI systems with human values, with remarkable successes in fine-tuning large-language models recently. Most existing RLHF paradigms make the underlying assumption that human preferences are relatively homogeneous, and can be encoded by a single reward model. In this paper, we focus on addressing the issu… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  11. arXiv:2404.19752  [pdf, other

    cs.CV

    Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

    Authors: Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui

    Abstract: Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning model… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  12. arXiv:2404.19615  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    SemiPL: A Semi-supervised Method for Event Sound Source Localization

    Authors: Yue Li, Baiqiao Yin, Jinfu Liu, Jiajun Wen, Jiaying Lin, Mengyuan Liu

    Abstract: In recent years, Event Sound Source Localization has been widely applied in various fields. Recent works typically relying on the contrastive learning framework show impressive performance. However, all work is based on large relatively simple datasets. It's also crucial to understand and analyze human behaviors (actions and interactions of people), voices, and sounds in chaotic events in many app… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  13. arXiv:2404.19518  [pdf, other

    cs.MA cs.AI cs.RO

    MGCBS: An Optimal and Efficient Algorithm for Solving Multi-Goal Multi-Agent Path Finding Problem

    Authors: Mingkai Tang, Yuanhang Li, Hongji Liu, Yingbing Chen, Ming Liu, Lujia Wang

    Abstract: With the expansion of the scale of robotics applications, the multi-goal multi-agent pathfinding (MG-MAPF) problem began to gain widespread attention. This problem requires each agent to visit pre-assigned multiple goal points at least once without conflict. Some previous methods have been proposed to solve the MG-MAPF problem based on Decoupling the goal Vertex visiting order search and the Singl… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: to be published in IJCAI2024

  14. arXiv:2404.19232  [pdf, other

    cs.CL cs.AI

    GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model

    Authors: Xinzhe Li, Ming Liu, Shang Gao

    Abstract: Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  15. arXiv:2404.16779  [pdf, other

    cs.LG cs.AI cs.RO

    DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

    Authors: Tongzhou Mu, Minghua Liu, Hao Su

    Abstract: The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-qu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: ICLR 2024. Explore videos, data, code, and more at https://sites.google.com/view/iclr24drs

  16. arXiv:2404.16609  [pdf, other

    cs.CV cs.AI

    SFMViT: SlowFast Meet ViT in Chaotic World

    Authors: Jiaying Lin, Jiajun Wen, Mengyuan Liu, Jinfu Liu, Baiqiao Yin, Yue Li

    Abstract: The task of spatiotemporal action localization in chaotic scenes is a challenging task toward advanced video understanding. Paving the way with high-quality video feature extraction and enhancing the precision of detector-predicted anchors can effectively improve model performance. To this end, we propose a high-performance dual-stream spatiotemporal feature extraction network SFMViT with an ancho… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  17. arXiv:2404.16571  [pdf, other

    cs.CV

    MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

    Authors: Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Fan Huang, Qiang Ding, Xinxia Feng, Mei Liu, Qiang Li

    Abstract: Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts onl… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 11 pages, 10 figures

  18. arXiv:2404.16561  [pdf

    cs.CV

    Research on geometric figure classification algorithm based on Deep Learning

    Authors: Ruiyang Wang, Haonan Wang, Junfeng Sun, Mingjia Zhao, Meng Liu

    Abstract: In recent years, with the rapid development of computer information technology, the development of artificial intelligence has been accelerating. The traditional geometry recognition technology is relatively backward and the recognition rate is low. In the face of massive information database, the traditional algorithm model inevitably has the problems of low recognition accuracy and poor performa… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 6 pages,9 figures

    Report number: ISSN: 2664-9640

    Journal ref: Scientific Journal of Intelligent Systems Research,Volume 4 Issue 6, 2022

  19. arXiv:2404.16324  [pdf, other

    math.NA cs.LG eess.SP

    Improved impedance inversion by deep learning and iterated graph Laplacian

    Authors: Davide Bianchi, Florian Bossmann, Wenlong Wang, Mingming Liu

    Abstract: Deep learning techniques have shown significant potential in many applications through recent years. The achieved results often outperform traditional techniques. However, the quality of a neural network highly depends on the used training data. Noisy, insufficient, or biased training data leads to suboptimal results. We present a hybrid method that combines deep learning with iterated graph Lap… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  20. arXiv:2404.16139  [pdf, other

    cs.CV cs.RO

    A Survey on Intermediate Fusion Methods for Collaborative Perception Categorized by Real World Challenges

    Authors: Melih Yazgan, Thomas Graf, Min Liu, Tobias Fleck, J. Marius Zoellner

    Abstract: This survey analyzes intermediate fusion methods in collaborative perception for autonomous driving, categorized by real-world challenges. We examine various methods, detailing their features and the evaluation metrics they employ. The focus is on addressing challenges like transmission efficiency, localization errors, communication disruptions, and heterogeneity. Moreover, we explore strategies t… ▽ More

    Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 tables

  21. arXiv:2404.15719  [pdf, other

    cs.CV cs.AI

    HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition

    Authors: Jinfu Liu, Baiqiao Yin, Jiaying Lin, Jiajun Wen, Yue Li, Mengyuan Liu

    Abstract: Skeleton-based action recognition has gained considerable traction thanks to its utilization of succinct and robust skeletal representations. Nonetheless, current methodologies often lean towards utilizing a solitary backbone to model skeleton modality, which can be limited by inherent flaws in the network backbone. To address this and fully leverage the complementary characteristics of various ne… ▽ More

    Submitted 25 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  22. arXiv:2404.14908  [pdf, other

    cs.CV

    Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation

    Authors: Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez, Miaomiao Liu

    Abstract: This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  23. arXiv:2404.14044  [pdf, other

    cs.CV

    HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

    Authors: Jiahao Ma, Miaomiao Liu, David Ahmedt-Aristizaba, Chuong Nguyen

    Abstract: In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: CVPR2024 Highlight

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  24. arXiv:2404.13657  [pdf, other

    cs.CV cs.AI

    MLP: Motion Label Prior for Temporal Sentence Localization in Untrimmed 3D Human Motions

    Authors: Sheng Yan, Mengyuan Liu, Yong Wang, Yang Liu, Chen Chen, Hong Liu

    Abstract: In this paper, we address the unexplored question of temporal sentence localization in human motions (TSLM), aiming to locate a target moment from a 3D human motion that semantically corresponds to a text query. Considering that 3D human motions are captured using specialized motion capture devices, motions with only a few joints lack complex scene information like objects and lighting. Due to thi… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 13 pages, 9 figures

  25. arXiv:2404.12624  [pdf, other

    cs.RO cs.CV

    Dragtraffic: A Non-Expert Interactive and Point-Based Controllable Traffic Scene Generation Framework

    Authors: Sheng Wang, Ge Sun, Fulong Ma, Tianshuai Hu, Yongkang Song, Lei Zhu, Ming Liu

    Abstract: The evaluation and training of autonomous driving systems require diverse and scalable corner cases. However, most existing scene generation methods lack controllability, accuracy, and versatility, resulting in unsatisfactory generation results. To address this problem, we propose Dragtraffic, a generalized, point-based, and controllable traffic scene generation framework based on conditional diff… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  26. arXiv:2404.12621  [pdf, other

    cs.SE

    Research on WebAssembly Runtimes: A Survey

    Authors: Yixuan Zhang, Mugeng Liu, Haoyu Wang, Yun Ma, Gang Huang, Xuanzhe Liu

    Abstract: WebAssembly (abbreviated as Wasm) was initially introduced for the Web but quickly extended its reach into various domains beyond the Web. To create Wasm applications, developers can compile high-level programming languages into Wasm binaries or manually convert equivalent textual formats into Wasm binaries. Regardless of whether it is utilized within or outside the Web, the execution of Wasm bina… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  27. arXiv:2404.12352  [pdf, other

    cs.CV

    Point-In-Context: Understanding Point Cloud via In-Context Learning

    Authors: Mengyuan Liu, Zhongbin Fang, Xia Li, Joachim M. Buhmann, Xiangtai Li, Chen Change Loy

    Abstract: With the emergence of large-scale models trained on diverse datasets, in-context learning has emerged as a promising paradigm for multitasking, notably in natural language processing and image processing. However, its application in 3D point cloud tasks remains largely unexplored. In this work, we introduce Point-In-Context (PIC), a novel framework for 3D point cloud understanding via in-context l… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Project page: https://fanglaosi.github.io/Point-In-Context_Pages. arXiv admin note: text overlap with arXiv:2306.08659

  28. arXiv:2404.12062  [pdf, other

    cs.SD cs.CV cs.GR eess.AS

    MIDGET: Music Conditioned 3D Dance Generation

    Authors: Jinwu Wang, Wei Mao, Miaomiao Liu

    Abstract: In this paper, we introduce a MusIc conditioned 3D Dance GEneraTion model, named MIDGET based on Dance motion Vector Quantised Variational AutoEncoder (VQ-VAE) model and Motion Generative Pre-Training (GPT) model to generate vibrant and highquality dances that match the music rhythm. To tackle challenges in the field, we introduce three new components: 1) a pre-trained memory codebook based on the… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence

    Journal ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023

  29. arXiv:2404.11605  [pdf, other

    cs.CV cs.AI cs.RO

    VG4D: Vision-Language Model Goes 4D Video Recognition

    Authors: Zhichao Deng, Xiangtai Li, Xia Li, Yunhai Tong, Shen Zhao, Mengyuan Liu

    Abstract: Understanding the real world through point cloud video is a crucial aspect of robotics and autonomous driving systems. However, prevailing methods for 4D point cloud recognition have limitations due to sensor resolution, which leads to a lack of detailed information. Recent advances have shown that Vision-Language Models (VLM) pre-trained on web-scale text-image datasets can learn fine-grained vis… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: ICRA 2024

  30. arXiv:2404.10255  [pdf, other

    cs.LG cs.CR cs.DC

    Privacy-Enhanced Training-as-a-Service for On-Device Intelligence: Concept, Architectural Scheme, and Open Problems

    Authors: Zhiyuan Wu, Sheng Sun, Yuwei Wang, Min Liu, Bo Gao, Tianliu He, Wen Wang

    Abstract: On-device intelligence (ODI) enables artificial intelligence (AI) applications to run on end devices, providing real-time and customized AI inference without relying on remote servers. However, training models for on-device deployment face significant challenges due to the decentralized and privacy-sensitive nature of users' data, along with end-side constraints related to network connectivity, co… ▽ More

    Submitted 27 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: 7 pages, 3 figures

  31. arXiv:2404.09677  [pdf, other

    cs.RO

    A Generic Trajectory Planning Method for Constrained All-Wheel-Steering Robots

    Authors: Ren Xin, Hongji Liu, Yingbing Chen, Sheng Wang, Ming Liu

    Abstract: This paper presents a trajectory planning method for wheeled robots with fixed steering axes while the steering angle of each wheel is constrained. In the past, All-Wheel-Steering(AWS) robots, incorporating modes such as rotation-free translation maneuvers, in-situ rotational maneuvers, and proportional steering, exhibited inefficient performance due to time-consuming mode switches. This inefficie… ▽ More

    Submitted 15 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  32. arXiv:2404.09613  [pdf, other

    cs.ET cs.AI cs.AR

    Efficient and accurate neural field reconstruction using resistive memory

    Authors: Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  33. arXiv:2404.09150  [pdf, other

    cs.RO cs.GR

    Learning Cross-hand Policies for High-DOF Reaching and Grasping

    Authors: Qijin She, Shishun Zhang, Yunfan Ye, Min Liu, Ruizhen Hu, Kai Xu

    Abstract: Reaching-and-grasping is a fundamental skill for robotic manipulation, but existing methods usually train models on a specific gripper and cannot be reused on another gripper without retraining. In this paper, we propose a novel method that can learn a unified policy model that can be easily transferred to different dexterous grippers. Our method consists of two stages: a gripper-agnostic policy m… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  34. arXiv:2404.08850  [pdf

    cs.AI cs.CE cs.LG

    Assessing Economic Viability: A Comparative Analysis of Total Cost of Ownership for Domain-Adapted Large Language Models versus State-of-the-art Counterparts in Chip Design Coding Assistance

    Authors: Amit Sharma, Teodor-Dumitru Ene, Kishor Kunal, Mingjie Liu, Zafar Hasan, Haoxing Ren

    Abstract: This paper presents a comparative analysis of total cost of ownership (TCO) and performance between domain-adapted large language models (LLM) and state-of-the-art (SoTA) LLMs , with a particular emphasis on tasks related to coding assistance for chip design. We examine the TCO and performance metrics of a domain-adaptive LLM, ChipNeMo, against two leading LLMs, Claude 3 Opus and ChatGPT-4 Turbo,… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  35. arXiv:2404.08563  [pdf, other

    cs.RO

    FusionPortableV2: A Unified Multi-Sensor Dataset for Generalized SLAM Across Diverse Platforms and Scalable Environments

    Authors: Hexiang Wei, Jianhao Jiao, Xiangcheng Hu, Jingwen Yu, Xupeng Xie, Jin Wu, Yilong Zhu, Yuxuan Liu, Lujia Wang, Ming Liu

    Abstract: Simultaneous Localization and Mapping (SLAM) technology has been widely applied in various robotic scenarios, from rescue operations to autonomous driving. However, the generalization of SLAM algorithms remains a significant challenge, as current datasets often lack scalability in terms of platforms and environments. To address this limitation, we present FusionPortableV2, a multi-sensor SLAM data… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 20 pages, 17 figures, 7 tables. Submitted for IJRR dataset paper

  36. arXiv:2404.07833  [pdf

    cs.CV cs.LG

    Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution

    Authors: Handi Deng, Yucheng Zhou, Jiaxuan Xiang, Liujie Gu, Yan Luo, Hai Feng, Mingyuan Liu, Cheng Ma

    Abstract: Foundation models have rapidly evolved and have achieved significant accomplishments in computer vision tasks. Specifically, the prompt mechanism conveniently allows users to integrate image prior information into the model, making it possible to apply models without any training. Therefore, we propose a method based on foundation models and zero training to solve the tasks of photoacoustic (PA) i… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  37. "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

    Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

    Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Journal ref: "We Need Structured Output": Towards User-centered Constraints on LLM Output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11-16, 2024, Honolulu, HI, USA

  38. arXiv:2404.06860  [pdf, other

    cs.CV

    Monocular 3D lane detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks

    Authors: Fulong Ma, Weiqing Qi, Guoyang Zhao, Linwei Zheng, Sheng Wang, Yuxuan Liu, Ming Liu

    Abstract: 3D lane detection is essential in autonomous driving as it extracts structural and traffic information from the road in three-dimensional space, aiding self-driving cars in logical, safe, and comfortable path planning and motion control. Given the cost of sensors and the advantages of visual data in color information, 3D lane detection based on monocular vision is an important research direction i… ▽ More

    Submitted 19 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  39. arXiv:2404.06852  [pdf, other

    cs.SE

    Research Artifacts in Software Engineering Publications: Status and Trends

    Authors: Mugeng Liu, Xiaolong Huang, Wei He, Yibing Xie, Jie M. Zhang, Xiang Jing, Zhenpeng Chen, Yun Ma

    Abstract: The Software Engineering (SE) community has been embracing the open science policy and encouraging researchers to disclose artifacts in their publications. However, the status and trends of artifact practice and quality remain unclear, lacking insights on further improvement. In this paper, we present an empirical study to characterize the research artifacts in SE publications. Specifically, we ma… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by Journal of Systems and Software (JSS 2024). Please include JSS in any citations

  40. arXiv:2404.06451  [pdf, other

    cs.CV

    SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions

    Authors: Xiaoyu Liu, Yuxiang Wei, Ming Liu, Xianhui Lin, Peiran Ren, Xuansong Xie, Wangmeng Zuo

    Abstract: Human visual imagination usually begins with analogies or rough sketches. For example, given an image with a girl playing guitar before a building, one may analogously imagine how it seems like if Iron Man playing guitar before Pyramid in Egypt. Nonetheless, visual condition may not be precisely aligned with the imaginary result indicated by text prompt, and existing layout-controllable text-to-im… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  41. arXiv:2404.05648  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model

    Authors: Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  42. arXiv:2404.05064  [pdf, other

    cs.LG math.NA

    A Structure-Guided Gauss-Newton Method for Shallow ReLU Neural Network

    Authors: Zhiqiang Cai, Tong Ding, Min Liu, Xinyu Liu, Jianlin Xia

    Abstract: In this paper, we propose a structure-guided Gauss-Newton (SgGN) method for solving least squares problems using a shallow ReLU neural network. The method effectively takes advantage of both the least squares structure and the neural network structure of the objective function. By categorizing the weights and biases of the hidden and output layers of the network as nonlinear and linear parameters,… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    MSC Class: 65D15; 65K10

  43. arXiv:2404.04693  [pdf, other

    cs.CV cs.RO

    OmniColor: A Global Camera Pose Optimization Approach of LiDAR-360Camera Fusion for Colorizing Point Clouds

    Authors: Bonan Liu, Guoyang Zhao, Jianhao Jiao, Guang Cai, Chengyang Li, Handi Yin, Yuyang Wang, Ming Liu, Pan Hui

    Abstract: A Colored point cloud, as a simple and efficient 3D representation, has many advantages in various fields, including robotic navigation and scene reconstruction. This representation is now commonly used in 3D reconstruction tasks relying on cameras and LiDARs. However, fusing data from these two types of sensors is poorly performed in many existing frameworks, leading to unsatisfactory mapping res… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 2024 IEEE International Conference on Robotics and Automation

  44. arXiv:2404.04579  [pdf, other

    cs.HC

    TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

    Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

    Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 33 pages, 12 figures

    MSC Class: H.5.2

    Journal ref: IMUWT 2024

  45. arXiv:2404.03702  [pdf, other

    cs.LG cs.AI

    Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach

    Authors: Qingxiang Liu, Sheng Sun, Yuxuan Liang, Jingjing Xue, Min Liu

    Abstract: The existing federated learning (FL) methods for spatio-temporal forecasting fail to capture the inherent spatio-temporal heterogeneity, which calls for personalized FL (PFL) methods to model the spatio-temporally variant patterns. While contrastive learning approach is promising in addressing spatio-temporal heterogeneity, the existing methods are noneffective in determining negative pairs and ca… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  46. arXiv:2404.03683  [pdf, other

    cs.LG cs.AI cs.CL

    Stream of Search (SoS): Learning to Search in Language

    Authors: Kanishk Gandhi, Denise Lee, Gabriel Grand, Muxin Liu, Winson Cheng, Archit Sharma, Noah D. Goodman

    Abstract: Language models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of their actions several steps ahead. In this paper, we show how language models can be taught to search by representing the process of search in language, as a flattened string -- a stream of search (S… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  47. arXiv:2404.01587  [pdf, other

    cs.CV

    TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation

    Authors: Yehui Shen, Mingmin Liu, Huimin Lu, Xieyuanli Chen

    Abstract: Visual place recognition (VPR) plays a pivotal role in autonomous exploration and navigation of mobile robots within complex outdoor environments. While cost-effective and easily deployed, camera sensors are sensitive to lighting and weather changes, and even slight image alterations can greatly affect VPR efficiency and precision. Existing methods overcome this by exploiting powerful yet large ne… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2404.01188  [pdf, other

    cs.CV

    MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

    Authors: Qiang Hu, Zhenyu Yi, Ying Zhou, Ting Li, Fan Huang, Mei Liu, Qiang Li, Zhiwei Wang

    Abstract: We propose MonoBox, an innovative box-supervised segmentation method constrained by monotonicity to liberate its training from the user-unfriendly box-tightness assumption. In contrast to conventional box-supervised segmentation, where the box edges must precisely touch the target boundaries, MonoBox leverages imprecisely-annotated boxes to achieve robust pixel-wise segmentation. The 'linchpin' is… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  49. arXiv:2404.01143  [pdf, other

    cs.CV cs.AI

    Condition-Aware Neural Network for Controlled Image Generation

    Authors: Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han

    Abstract: We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. This is achieved by introducing a condition-aware weight generation module that generates conditional weight for convolution/linear layer… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  50. arXiv:2403.18433  [pdf, other

    cs.HC

    iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing

    Authors: Mengxi Liu, Hymalai Bello, Bo Zhou, Paul Lukowicz, Jakob Karolus

    Abstract: Hand-over-face gestures can provide important implicit interactions during conversations, such as frustration or excitement. However, in situations where interlocutors are not visible, such as phone calls or textual communication, the potential meaning contained in the hand-over-face gestures is lost. In this work, we present iFace, an unobtrusive, wearable impedance-sensing solution for recognizi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted by Augmented Humans 2024