Skip to main content

Showing 1–50 of 61 results for author: Mu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16779  [pdf, other

    cs.LG cs.AI cs.RO

    DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks

    Authors: Tongzhou Mu, Minghua Liu, Hao Su

    Abstract: The success of many RL techniques heavily relies on human-engineered dense rewards, which typically demand substantial domain expertise and extensive trial and error. In our work, we propose DrS (Dense reward learning from Stages), a novel approach for learning reusable dense rewards for multi-stage tasks in a data-driven manner. By leveraging the stage structures of the task, DrS learns a high-qu… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: ICLR 2024. Explore videos, data, code, and more at https://sites.google.com/view/iclr24drs

  2. arXiv:2404.08966  [pdf, other

    cs.CV

    LoopGaussian: Creating 3D Cinemagraph with Multi-view Images via Eulerian Motion Field

    Authors: Jiyang Li, Lechao Cheng, Zhangye Wang, Tingting Mu, Jingxuan He

    Abstract: Cinemagraph is a unique form of visual media that combines elements of still photography and subtle motion to create a captivating experience. However, the majority of videos generated by recent works lack depth information and are confined to the constraints of 2D image space. In this paper, inspired by significant progress in the field of novel view synthesis (NVS) achieved by 3D Gaussian Splatt… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: 10 pages

  3. arXiv:2404.07428  [pdf, other

    cs.RO cs.LG

    AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    Authors: Tongzhou Mu, Yijie Guo, Jie Xu, Ankit Goyal, Hao Su, Dieter Fox, Animesh Garg

    Abstract: Encouraged by the remarkable achievements of language and vision foundation models, developing generalist robotic agents through imitation learning, using large demonstration datasets, has become a prominent area of interest in robot learning. The efficacy of imitation learning is heavily reliant on the quantity and quality of the demonstration datasets. In this study, we aim to scale up demonstra… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2403.18613  [pdf, ps, other

    cs.LG

    Scalable Lipschitz Estimation for CNNs

    Authors: Yusuf Sulehman, Tingting Mu

    Abstract: Estimating the Lipschitz constant of deep neural networks is of growing interest as it is useful for informing on generalisability and adversarial robustness. Convolutional neural networks (CNNs) in particular, underpin much of the recent success in computer vision related applications. However, although existing methods for estimating the Lipschitz constant can be tight, they have limited scalabi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2403.09143  [pdf, other

    cs.GR

    A New Split Algorithm for 3D Gaussian Splatting

    Authors: Qiyuan Feng, Gengchen Cao, Haoxiang Chen, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

    Abstract: 3D Gaussian splatting models, as a novel explicit 3D representation, have been applied in many domains recently, such as explicit geometric editing and geometry generation. Progress has been rapid. However, due to their mixed scales and cluttered shapes, 3D Gaussian splatting models can produce a blurred or needle-like effect near the surface. At the same time, 3D Gaussian splatting models tend to… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 10 figures

  6. arXiv:2402.18975  [pdf, other

    cs.CV cs.AI

    Theoretically Achieving Continuous Representation of Oriented Bounding Boxes

    Authors: Zi-Kai Xiao, Guo-Ye Yang, Xue Yang, Tai-Jiang Mu, Junchi Yan, Shi-min Hu

    Abstract: Considerable efforts have been devoted to Oriented Object Detection (OOD). However, one lasting issue regarding the discontinuity in Oriented Bounding Box (OBB) representation remains unresolved, which is an inherent bottleneck for extant OOD methods. This paper endeavors to completely solve this issue in a theoretically guaranteed manner and puts an end to the ad-hoc efforts in this direction. Pr… ▽ More

    Submitted 16 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 17 pages, 12 tables, 8 figures. Accepted by CVPR'24. Code: https://github.com/514flowey/JDet-COBB

  7. arXiv:2312.09743  [pdf, other

    cs.CV cs.GR

    SLS4D: Sparse Latent Space for 4D Novel View Synthesis

    Authors: Qi-Yuan Feng, Hao-Xiang Chen, Qun-Ce Xu, Tai-Jiang Mu

    Abstract: Neural radiance field (NeRF) has achieved great success in novel view synthesis and 3D representation for static scenarios. Existing dynamic NeRFs usually exploit a locally dense grid to fit the deformation field; however, they fail to capture the global dynamics and concomitantly yield models of heavy parameters. We observe that the 4D space is inherently sparse. Firstly, the deformation field is… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 10 pages, 6 figures

  8. arXiv:2312.09609  [pdf, other

    cs.CV

    Semantic-Aware Transformation-Invariant RoI Align

    Authors: Guo-Ye Yang, George Kiyohiro Nakayama, Zi-Kai Xiao, Tai-Jiang Mu, Xiaolei Huang, Shi-Min Hu

    Abstract: Great progress has been made in learning-based object detection methods in the last decade. Two-stage detectors often have higher detection accuracy than one-stage detectors, due to the use of region of interest (RoI) feature extractors which extract transformation-invariant RoI features for different RoI proposals, making refinement of bounding boxes and prediction of object categories more robus… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  9. arXiv:2312.08916  [pdf, other

    cs.CV

    Progressive Feature Self-reinforcement for Weakly Supervised Semantic Segmentation

    Authors: Jingxuan He, Lechao Cheng, Chaowei Fang, Zunlei Feng, Tingting Mu, Mingli Song

    Abstract: Compared to conventional semantic segmentation with pixel-level supervision, Weakly Supervised Semantic Segmentation (WSSS) with image-level labels poses the challenge that it always focuses on the most discriminative regions, resulting in a disparity between fully supervised conditions. A typical manifestation is the diminished precision on the object boundaries, leading to a deteriorated accurac… ▽ More

    Submitted 17 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  10. arXiv:2311.00694  [pdf, other

    cs.AI cs.CL

    Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

    Authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su

    Abstract: Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space.… ▽ More

    Submitted 5 December, 2023; v1 submitted 1 November, 2023; originally announced November 2023.

  11. arXiv:2310.18477  [pdf, other

    cs.LG

    Understanding and Improving Ensemble Adversarial Defense

    Authors: Yian Deng, Tingting Mu

    Abstract: The strategy of ensemble has become popular in adversarial defense, which trains multiple base classifiers to defend against adversarial attacks in a cooperative manner. Despite the empirical success, theoretical explanations on why an ensemble of adversarially trained classifiers is more robust than single ones remain unclear. To fill in this gap, we develop a new error theory dedicated to unders… ▽ More

    Submitted 2 November, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

  12. arXiv:2309.13985  [pdf, other

    cs.LG cs.NE

    Physics-Driven ML-Based Modelling for Correcting Inverse Estimation

    Authors: Ruiyuan Kang, Tingting Mu, Panos Liatsis, Dimitrios C. Kyritsis

    Abstract: When deploying machine learning estimators in science and engineering (SAE) domains, it is critical to avoid failed estimations that can have disastrous consequences, e.g., in aero engine design. This work focuses on detecting and correcting failed state estimations before adopting them in SAE inverse problems, by utilizing simulations and performance metrics guided by physical laws. We suggest to… ▽ More

    Submitted 29 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: 19 pages, the paper is accepted by Neurips 2023 as a spotlight

    MSC Class: 78M50; 68T05

  13. arXiv:2307.01468  [pdf, other

    cs.CV

    Generating Animatable 3D Cartoon Faces from Single Portraits

    Authors: Chuanyu Pan, Guowei Yang, Taijiang Mu, Yu-Kun Lai

    Abstract: With the booming of virtual reality (VR) technology, there is a growing need for customized 3D avatars. However, traditional methods for 3D avatar modeling are either time-consuming or fail to retain similarity to the person being modeled. We present a novel framework to generate animatable 3D cartoon faces from a single portrait image. We first transfer an input real-world portrait to a stylized… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 12 pages, accepted by CGI2023 and journal Virtual Reality Intelligent Hardware (VRIH)

  14. arXiv:2306.08400  [pdf, other

    cs.CL cs.AI cs.LG

    Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning

    Authors: Evan Zheran Liu, Sahaana Suri, Tong Mu, Allan Zhou, Chelsea Finn

    Abstract: Whereas machine learning models typically learn language by directly training on language tasks (e.g., next-word prediction), language emerges in human children as a byproduct of solving non-language tasks (e.g., acquiring food). Motivated by this observation, we ask: can embodied reinforcement learning (RL) agents also indirectly learn language from non-language tasks? Learning to associate langu… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: International Conference on Machine Learning (ICML), 2023

  15. arXiv:2304.11665  [pdf, ps, other

    cs.LG

    Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

    Authors: Zebang Shen, Hui Qian, Tongzhou Mu, Chao Zhang

    Abstract: Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergenc… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted to IJCAI 2017. Corresponding author: Hui Qian

  16. arXiv:2304.03917  [pdf

    cs.CV

    MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

    Authors: Zhimin Zhu, Jianguo Zhao, Tong Mu, Yuliang Yang, Mengyu Zhu

    Abstract: In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered attention from researchers. This paper introduces MC-MLP, a general MLP-like backbone for computer vision that is composed of a series of fully-connected (FC) layers. In MC-MLP, we propose that the same semantic information has varying levels of difficulty in learning, depending on the coordinate frame of features. To addres… ▽ More

    Submitted 8 April, 2023; originally announced April 2023.

  17. arXiv:2303.13489  [pdf, other

    cs.LG cs.AI cs.RO

    Boosting Reinforcement Learning and Planning with Demonstrations: A Survey

    Authors: Tongzhou Mu, Hao Su

    Abstract: Although reinforcement learning has seen tremendous success recently, this kind of trial-and-error learning can be impractical or inefficient in complex environments. The use of demonstrations, on the other hand, enables agents to benefit from expert knowledge rather than having to discover the best action to take through exploration. In this survey, we discuss the advantages of using demonstratio… ▽ More

    Submitted 27 March, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  18. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  19. arXiv:2303.08698  [pdf, other

    cs.CV cs.AI

    Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

    Authors: Zhicai Wang, Yanbin Hao, Tingting Mu, Ouxiang Li, Shuo Wang, Xiangnan He

    Abstract: It is well-known that zero-shot learning (ZSL) can suffer severely from the problem of domain shift, where the true and learned data distributions for the unseen classes do not match. Although transductive ZSL (TZSL) attempts to improve this by allowing the use of unlabelled examples from the unseen classes, there is still a high level of distribution shift. We propose a novel TZSL model (named as… ▽ More

    Submitted 19 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: CVPR2023

  20. arXiv:2302.13020  [pdf, other

    cs.LG

    DCLP: Neural Architecture Predictor with Curriculum Contrastive Learning

    Authors: Shenghe Zheng, Hongzhi Wang, Tianyu Mu

    Abstract: Neural predictors have shown great potential in the evaluation process of neural architecture search (NAS). However, current predictor-based approaches overlook the fact that training a predictor necessitates a considerable number of trained neural networks as the labeled training set, which is costly to obtain. Therefore, the critical issue in utilizing predictors for NAS is to train a high-perfo… ▽ More

    Submitted 14 December, 2023; v1 submitted 25 February, 2023; originally announced February 2023.

    Comments: Accepted by AAAI24

  21. arXiv:2302.11076  [pdf, other

    cs.LG math.OC

    Faster Riemannian Newton-type Optimization by Subsampling and Cubic Regularization

    Authors: Yian Deng, Tingting Mu

    Abstract: This work is on constrained large-scale non-convex optimization where the constraint set implies a manifold structure. Solving such problems is important in a multitude of fundamental machine learning tasks. Recent advances on Riemannian optimization have enabled the convenient recovery of solutions by adapting unconstrained optimization algorithms over manifolds. However, it remains challenging t… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

  22. arXiv:2302.04659  [pdf, other

    cs.RO cs.AI

    ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

    Authors: Jiayuan Gu, Fanbo Xiang, Xuanlin Li, Zhan Ling, Xiqiang Liu, Tongzhou Mu, Yihe Tang, Stone Tao, Xinyue Wei, Yunchao Yao, Xiaodi Yuan, Pengwei Xie, Zhiao Huang, Rui Chen, Hao Su

    Abstract: Generalizable manipulation skills, which can be composed to tackle long-horizon and complex daily chores, are one of the cornerstones of Embodied AI. However, existing benchmarks, mostly composed of a suite of simulatable environments, are insufficient to push cutting-edge research works because they lack object-level topological and geometric variations, are not based on fully dynamic simulation,… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: Published as a conference paper at ICLR 2023. Project website: https://maniskill2.github.io/

  23. arXiv:2301.06962  [pdf, other

    cs.CV

    Long Range Pooling for 3D Large-Scale Scene Understanding

    Authors: Xiang-Li Li, Meng-Hao Guo, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

    Abstract: Inspired by the success of recent vision transformers and large kernel design in convolutional neural networks (CNNs), in this paper, we analyze and explore essential reasons for their success. We claim two factors that are critical for 3D large-scale scene understanding: a larger receptive field and operations with greater non-linearity. The former is responsible for providing long range contexts… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

  24. arXiv:2301.03962  [pdf, other

    cs.LG cs.AI stat.ML

    A Unified Theory of Diversity in Ensemble Learning

    Authors: Danny Wood, Tingting Mu, Andrew Webb, Henry Reeve, Mikel Luján, Gavin Brown

    Abstract: We present a theory of ensemble diversity, explaining the nature of diversity for a wide range of supervised learning scenarios. This challenge has been referred to as the holy grail of ensemble learning, an open research issue for over 30 years. Our framework reveals that diversity is in fact a hidden dimension in the bias-variance decomposition of the ensemble loss. We prove a family of exact bi… ▽ More

    Submitted 7 February, 2024; v1 submitted 10 January, 2023; originally announced January 2023.

    Journal ref: Journal of Machine Learning Research, 24(359), 2023

  25. arXiv:2212.05749  [pdf, other

    cs.LG cs.CV cs.RO

    On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline

    Authors: Nicklas Hansen, Zhecheng Yuan, Yanjie Ze, Tongzhou Mu, Aravind Rajeswaran, Hao Su, Huazhe Xu, Xiaolong Wang

    Abstract: In this paper, we examine the effectiveness of pre-training for visuo-motor control tasks. We revisit a simple Learning-from-Scratch (LfS) baseline that incorporates data augmentation and a shallow ConvNet, and find that this baseline is surprisingly competitive with recent approaches (PVR, MVP, R3M) that leverage frozen visual representations trained on large-scale vision datasets -- across a var… ▽ More

    Submitted 15 June, 2023; v1 submitted 12 December, 2022; originally announced December 2022.

    Comments: Code: https://github.com/gemcollector/learning-from-scratch

  26. arXiv:2210.07658  [pdf, other

    cs.LG cs.RO

    Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization

    Authors: Stone Tao, Xiaochen Li, Tongzhou Mu, Zhiao Huang, Yuzhe Qin, Hao Su

    Abstract: Training long-horizon robotic policies in complex physical environments is essential for many applications, such as robotic manipulation. However, learning a policy that can generalize to unseen tasks is challenging. In this work, we propose to achieve one-shot task generalization by decoupling plan generation and plan execution. Specifically, our method solves complex long-horizon tasks in three… ▽ More

    Submitted 30 May, 2023; v1 submitted 14 October, 2022; originally announced October 2022.

    Comments: ICML 2023. Code and visualizations: https://trajectorytranslation.github.io/

  27. arXiv:2210.02631  [pdf, other

    cs.LG stat.ML

    Data-driven Approaches to Surrogate Machine Learning Model Development

    Authors: H. Rhys Jones, Tingting Mu, Andrei C. Popescu, Yusuf Sulehman

    Abstract: We demonstrate the adaption of three established methods to the field of surrogate machine learning model development. These methods are data augmentation, custom loss functions and transfer learning. Each of these methods have seen widespread use in the field of machine learning, however, here we apply them specifically to surrogate machine learning model development. The machine learning model t… ▽ More

    Submitted 3 November, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: 16 pages, 13 figures

  28. arXiv:2209.15153  [pdf, other

    cs.CV

    MonoNeuralFusion: Online Monocular Neural 3D Reconstruction with Geometric Priors

    Authors: Zi-Xin Zou, Shi-Sheng Huang, Yan-Pei Cao, Tai-Jiang Mu, Ying Shan, Hongbo Fu

    Abstract: High-fidelity 3D scene reconstruction from monocular videos continues to be challenging, especially for complete and fine-grained geometry reconstruction. The previous 3D reconstruction approaches with neural implicit representations have shown a promising ability for complete scene reconstruction, while their results are often over-smooth and lack enough geometric details. This paper introduces a… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 12 pages, 12 figures

  29. Parameterization of Cross-Token Relations with Relative Positional Encoding for Vision MLP

    Authors: Zhicai Wang, Yanbin Hao, Xingyu Gao, Hao Zhang, Shuo Wang, Tingting Mu, Xiangnan He

    Abstract: Vision multi-layer perceptrons (MLPs) have shown promising performance in computer vision tasks, and become the main competitor of CNNs and vision Transformers. They use token-mixing layers to capture cross-token interactions, as opposed to the multi-head self-attention mechanism used by Transformers. However, the heavily parameterized token-mixing layers naturally lack mechanisms to capture local… ▽ More

    Submitted 12 September, 2022; v1 submitted 15 July, 2022; originally announced July 2022.

  30. arXiv:2204.12155  [pdf, other

    stat.ML cs.LG

    Bias-Variance Decompositions for Margin Losses

    Authors: Danny Wood, Tingting Mu, Gavin Brown

    Abstract: We introduce a novel bias-variance decomposition for a range of strictly convex margin losses, including the logistic loss (minimized by the classic LogitBoost algorithm), as well as the squared margin loss and canonical boosting loss. Furthermore, we show that, for all strictly convex margin losses, the expected risk decomposes into the risk of a "central" model and a term quantifying variation i… ▽ More

    Submitted 26 April, 2022; originally announced April 2022.

    Comments: Supplementary material included

    Journal ref: 25th International Conference on Artificial Intelligence and Statistics, 2022

  31. arXiv:2204.11700  [pdf, other

    cs.CV

    ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching

    Authors: Yan Shi, Jun-Xiong Cai, Yoli Shavit, Tai-Jiang Mu, Wensen Feng, Kai Zhang

    Abstract: Graph Neural Networks (GNNs) with attention have been successfully applied for learning visual feature matching. However, current methods learn with complete graphs, resulting in a quadratic complexity in the number of features. Motivated by a prior observation that self- and cross- attention matrices converge to a sparse representation, we propose ClusterGNN, an attentional GNN architecture which… ▽ More

    Submitted 17 March, 2023; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: Has been accepted by IEEE Conference on Computer Vision and Pattern Recognition 2022,(modified some typos)

  32. arXiv:2202.01691  [pdf, other

    cs.MA cs.AI

    Solving Dynamic Principal-Agent Problems with a Rationally Inattentive Principal

    Authors: Tong Mu, Stephan Zheng, Alexander Trott

    Abstract: Principal-Agent (PA) problems describe a broad class of economic relationships characterized by misaligned incentives and asymmetric information. The Principal's problem is to find optimal incentives given the available information, e.g., a manager setting optimal wages for its employees. Whereas the Principal is often assumed rational, comparatively little is known about solutions when the Princi… ▽ More

    Submitted 17 February, 2022; v1 submitted 18 January, 2022; originally announced February 2022.

    Comments: 22 pages, 8 figures, including appendix

  33. arXiv:2201.11924  [pdf, other

    cs.RO

    Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

    Authors: Xiaoshuai Zhang, Rui Chen, Ang Li, Fanbo Xiang, Yuzhe Qin, Jiayuan Gu, Zhan Ling, Minghua Liu, Peiyu Zeng, Songfang Han, Zhiao Huang, Tongzhou Mu, Jing Xu, Hao Su

    Abstract: In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline that includes material acquisition, ray-tracing-based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able… ▽ More

    Submitted 5 January, 2023; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: The paper will appear in the IEEE Transactions on Robotics. 20 pages, 14 figures, 10 tables

  34. arXiv:2201.08520  [pdf, other

    cs.LG cs.AI cs.CL

    Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning

    Authors: Tongzhou Mu, Kaixiang Lin, Feiyang Niu, Govind Thattai

    Abstract: We present a two-step hybrid reinforcement learning (RL) policy that is designed to generate interpretable and robust hierarchical policies on the RL problem with graph-based input. Unlike prior deep reinforcement learning policies parameterized by an end-to-end black-box graph neural network, our approach disentangles the decision-making process into two steps. The first step is a simplified clas… ▽ More

    Submitted 19 October, 2022; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Transactions on Machine Learning Research (TMLR)

  35. arXiv:2112.15221  [pdf, other

    cs.AI

    Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning

    Authors: Tong Mu, Georgios Theocharous, David Arbour, Emma Brunskill

    Abstract: Online reinforcement learning (RL) algorithms are often difficult to deploy in complex human-facing applications as they may learn slowly and have poor early performance. To address this, we introduce a practical algorithm for incorporating human insight to speed learning. Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restricti… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Journal ref: AAAI2022

  36. arXiv:2111.12905  [pdf, other

    cs.CV cs.AI cs.GR

    CIRCLE: Convolutional Implicit Reconstruction and Completion for Large-scale Indoor Scene

    Authors: Haoxiang Chen, Jiahui Huang, Tai-Jiang Mu, Shi-Min Hu

    Abstract: We present CIRCLE, a framework for large-scale scene completion and geometric refinement based on local implicit signed distance functions. It is based on an end-to-end sparse convolutional network, CircNet, that jointly models local geometric details and global scene structural contexts, allowing it to preserve fine-grained object detail while recovering missing regions commonly arising in tradit… ▽ More

    Submitted 24 November, 2021; originally announced November 2021.

  37. Attention Mechanisms in Computer Vision: A Survey

    Authors: Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, Shi-Min Hu

    Abstract: Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great succes… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: 27 pages, 9 figures

    Journal ref: Computational Visual Media, 2022, Vol. 8, No. 3, 331-368

  38. arXiv:2109.09063  [pdf, other

    cs.CV cs.AI cs.LG

    Ontology-based n-ball Concept Embeddings Informing Few-shot Image Classification

    Authors: Mirantha Jayathilaka, Tingting Mu, Uli Sattler

    Abstract: We propose a novel framework named ViOCE that integrates ontology-based background knowledge in the form of $n$-ball concept embeddings into a neural network based vision architecture. The approach consists of two components - converting symbolic knowledge of an ontology into continuous space by learning n-ball embeddings that capture properties of subsumption and disjointness, and guiding the tra… ▽ More

    Submitted 19 September, 2021; originally announced September 2021.

    Journal ref: The Combination of Symbolic and Sub-symbolic Methods and their Applications (CSSA @ ECML PDKK 2021)

  39. arXiv:2107.14483  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

    Authors: Tongzhou Mu, Zhan Ling, Fanbo Xiang, Derek Yang, Xuanlin Li, Stone Tao, Zhiao Huang, Zhiwei Jia, Hao Su

    Abstract: Object manipulation from 3D visual inputs poses many challenges on building generalizable perception and policy models. However, 3D assets in existing benchmarks mostly lack the diversity of 3D shapes that align with real-world intra-class complexity in topology and geometry. Here we propose SAPIEN Manipulation Skill Benchmark (ManiSkill) to benchmark manipulation skills over diverse objects in a… ▽ More

    Submitted 4 November, 2021; v1 submitted 30 July, 2021; originally announced July 2021.

    Comments: NeurIPS 2021 Track on Datasets and Benchmarks; code: https://github.com/haosulab/ManiSkill

  40. arXiv:2106.02285  [pdf, other

    cs.CV cs.GR cs.LG

    Subdivision-Based Mesh Convolution Networks

    Authors: Shi-Min Hu, Zheng-Ning Liu, Meng-Hao Guo, Jun-Xiong Cai, Jiahui Huang, Tai-Jiang Mu, Ralph R. Martin

    Abstract: Convolutional neural networks (CNNs) have made great breakthroughs in 2D computer vision. However, their irregular structure makes it hard to harness the potential of CNNs directly on meshes. A subdivision surface provides a hierarchical multi-resolution structure, in which each face in a closed 2-manifold triangle mesh is exactly adjacent to three faces. Motivated by these two observations, this… ▽ More

    Submitted 29 December, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Codes are available in https://github.com/lzhengning/SubdivNet

    ACM Class: I.3.5

    Journal ref: ACM Transactions on Graphics, Volume 41, Issue 3, 2022, Article No.: 25, pp 1-16

  41. arXiv:2105.15078  [pdf, other

    cs.CV cs.LG

    Can Attention Enable MLPs To Catch Up With CNNs?

    Authors: Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Dun Liang, Ralph R. Martin, Shi-Min Hu

    Abstract: In the first week of May, 2021, researchers from four different institutions: Google, Tsinghua University, Oxford University and Facebook, shared their latest work [16, 7, 12, 17] on arXiv.org almost at the same time, each proposing new learning architectures, consisting mainly of linear layers, claiming them to be comparable, or even superior to convolutional-based models. This sparked immediate… ▽ More

    Submitted 31 May, 2021; originally announced May 2021.

    Comments: Computational Visual Media, 2021, accepted. 4 pages, 1 figure

  42. arXiv:2105.09103  [pdf, other

    cs.CV

    Recursive-NeRF: An Efficient and Dynamically Growing NeRF

    Authors: Guo-Wei Yang, Wen-Yang Zhou, Hao-Yang Peng, Dun Liang, Tai-Jiang Mu, Shi-Min Hu

    Abstract: View synthesis methods using implicit continuous shape representations learned from a set of images, such as the Neural Radiance Field (NeRF) method, have gained increasing attention due to their high quality imagery and scalability to high resolution. However, the heavy computation required by its volumetric approach prevents NeRF from being useful in practice; minutes are taken to render a singl… ▽ More

    Submitted 19 May, 2021; originally announced May 2021.

    Comments: 11 pages, 12 figures

  43. arXiv:2105.02358  [pdf, other

    cs.CV

    Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks

    Authors: Meng-Hao Guo, Zheng-Ning Liu, Tai-Jiang Mu, Shi-Min Hu

    Abstract: Attention mechanisms, especially self-attention, have played an increasingly important role in deep feature representation for visual tasks. Self-attention updates the feature at each position by computing a weighted sum of features using pair-wise affinities across all positions to capture the long-range dependency within a single sample. However, self-attention has quadratic complexity and ignor… ▽ More

    Submitted 31 May, 2021; v1 submitted 5 May, 2021; originally announced May 2021.

    Comments: 11 pages, 6 figures. external attention and EAMLP

  44. PCT: Point cloud transformer

    Authors: Meng-Hao Guo, Jun-Xiong Cai, Zheng-Ning Liu, Tai-Jiang Mu, Ralph R. Martin, Shi-Min Hu

    Abstract: The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for… ▽ More

    Submitted 6 June, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

    Comments: 11 pages, 5 figures

    Journal ref: Computational Visual Media, 2021, Vol. 7, No. 2, Pages: 187 - 199

  45. arXiv:2011.00971  [pdf, other

    cs.CV cs.AI cs.LG

    Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals

    Authors: Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su

    Abstract: We study how to learn a policy with compositional generalizability. We propose a two-stage framework, which refactorizes a high-reward teacher policy into a generalizable student policy with strong inductive bias. Particularly, we implement an object-centric GNN-based student policy, whose input objects are learned from images through self-supervised learning. Empirically, we evaluate our approach… ▽ More

    Submitted 26 October, 2020; originally announced November 2020.

    Comments: 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada

  46. arXiv:2009.10026  [pdf, other

    cs.CV cs.CL cs.LG

    Visual-Semantic Embedding Model Informed by Structured Knowledge

    Authors: Mirantha Jayathilaka, Tingting Mu, Uli Sattler

    Abstract: We propose a novel approach to improve a visual-semantic embedding model by incorporating concept representations captured from an external structured knowledge base. We investigate its performance on image classification under both standard and zero-shot settings. We propose two novel evaluation frameworks to analyse classification errors with respect to the class hierarchy indicated by the knowl… ▽ More

    Submitted 21 September, 2020; originally announced September 2020.

    Comments: European Starting AI Researchers' Symposium 2020 (STAIRS 2020)

  47. arXiv:2009.01039  [pdf, other

    cs.CV

    Zero-Shot Human-Object Interaction Recognition via Affordance Graphs

    Authors: Alessio Sarullo, Tingting Mu

    Abstract: We propose a new approach for Zero-Shot Human-Object Interaction Recognition in the challenging setting that involves interactions with unseen actions (as opposed to just unseen combinations of seen actions and objects). Our approach makes use of knowledge external to the image content in the form of a graph that models affordance relations between actions and objects, i.e., whether an action can… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  48. arXiv:2007.03254  [pdf, ps, other

    cs.LG stat.ML

    Auto-CASH: Autonomous Classification Algorithm Selection with Deep Q-Network

    Authors: Tianyu Mu, Hongzhi Wang, Chunnan Wang, Zheng Liang

    Abstract: The great amount of datasets generated by various data sources have posed the challenge to machine learning algorithm selection and hyperparameter configuration. For a specific machine learning task, it usually takes domain experts plenty of time to select an appropriate algorithm and configure its hyperparameters. If the problem of algorithm selection and hyperparameter optimization can be solved… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

  49. arXiv:2006.07818  [pdf, other

    cs.CV cs.GR

    Alternating ConvLSTM: Learning Force Propagation with Alternate State Updates

    Authors: Congyue Deng, Tai-Jiang Mu, Shi-Min Hu

    Abstract: Data-driven simulation is an important step-forward in computational physics when traditional numerical methods meet their limits. Learning-based simulators have been widely studied in past years; however, most previous works view simulation as a general spatial-temporal prediction problem and take little physical guidance in designing their neural network architectures. In this paper, we introduc… ▽ More

    Submitted 14 June, 2020; originally announced June 2020.

  50. VGPN: Voice-Guided Pointing Robot Navigation for Humans

    Authors: Jun Hu, Zhongyu Jiang, Xionghao Ding, Peter Hall, Taijiang Mu

    Abstract: Pointing gestures are widely used in robot navigationapproaches nowadays. However, most approaches only use point-ing gestures, and these have two major limitations. Firstly, they need to recognize pointing gestures all the time, which leads to long processing time and significant system overheads. Secondly,the user's pointing direction may not be very accurate, so the robot may go to an undesired… ▽ More

    Submitted 3 April, 2020; originally announced April 2020.