Search | arXiv e-print repository

Learning Visuotactile Skills with Two Multifingered Hands

Authors: Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

Abstract: Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hard… ▽ More Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. To tackle the first challenge, we develop HATO, a low-cost hands-arms teleoperation system that leverages off-the-shelf electronics, complemented with a software suite that enables efficient data collection; the comprehensive software suite also supports multimodal data processing, scalable policy learning, and smooth policy deployment. To tackle the latter challenge, we introduce a novel hardware adaptation by repurposing two prosthetic hands equipped with touch sensors for research. Using visuotactile data collected from our system, we learn skills to complete long-horizon, high-precision tasks which are difficult to achieve without multifingered dexterity and touch feedback. Furthermore, we empirically investigate the effects of dataset size, sensing modality, and visual input preprocessing on policy learning. Our results mark a promising step forward in bimanual multifingered manipulation from visuotactile data. Videos, code, and datasets can be found at https://toruowo.github.io/hato/ . △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Code and Project Website: https://toruowo.github.io/hato/

arXiv:2403.11335 [pdf, other]

ConvSDG: Session Data Generation for Conversational Search

Authors: Fengran Mo, Bole Yi, Kelong Mao, Chen Qu, Kaiyu Huang, Jian-Yun Nie

Abstract: Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search perf… ▽ More Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of large language models (LLMs) on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the conversational dense retriever. Extensive experiments on four widely used datasets demonstrate the effectiveness and broad applicability of our ConvSDG framework compared with several strong baselines. △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted by WWW 2024 Workshop

arXiv:2402.03046 [pdf, other]

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Authors: Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut , et al. (8 additional authors not shown)

Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, i… ▽ More In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: Under review

arXiv:2309.09979 [pdf, other]

General In-Hand Object Rotation with Vision and Touch

Authors: Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik

Abstract: We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a… ▽ More We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing. △ Less

Submitted 28 September, 2023; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: CoRL 2023; Website: https://haozhi.io/rotateit/

arXiv:2309.04154 [pdf, other]

A novel model for layer jamming-based continuum robots

Authors: Bowen Yi, Yeman Fan, Dikai Liu

Abstract: Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the fi… ▽ More Continuum robots with variable stiffness have gained wide popularity in the last decade. Layer jamming (LJ) has emerged as a simple and efficient technique to achieve tunable stiffness for continuum robots. Despite its merits, the development of a control-oriented dynamical model tailored for this specific class of robots remains an open problem in the literature. This paper aims to present the first solution, to the best of our knowledge, to close the gap. We propose an energy-based model that is integrated with the LuGre frictional model for LJ-based continuum robots. Then, we take a comprehensive theoretical analysis for this model, focusing on two fundamental characteristics of LJ-based continuum robots: shape locking and adjustable stiffness. To validate the modeling approach and theoretical results, a series of experiments using our \textit{OctRobot-I} continuum robotic platform was conducted. The results show that the proposed model is capable of interpreting and predicting the dynamical behaviors in LJ-based continuum robots. △ Less

Submitted 11 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

arXiv:2308.15461 [pdf, other]

Canonical Factors for Hybrid Neural Fields

Authors: Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma

Abstract: Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) e… ▽ More Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures. △ Less

Submitted 29 August, 2023; originally announced August 2023.

Comments: ICCV 2023. Project webpage: https://brentyi.github.io/tilted/

arXiv:2306.03865 [pdf, other]

Simultaneous Position-and-Stiffness Control of Underactuated Antagonistic Tendon-Driven Continuum Robots

Authors: Bowen Yi, Yeman Fan, Dikai Liu, Jose Guadalupe Romero

Abstract: Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations… ▽ More Continuum robots have gained widespread popularity due to their inherent compliance and flexibility, particularly their adjustable levels of stiffness for various application scenarios. Despite efforts to dynamic modeling and control synthesis over the past decade, few studies have incorporated stiffness regulation into their feedback control design; however, this is one of the initial motivations to develop continuum robots. This paper addresses the crucial challenge of controlling both the position and stiffness of underactuated continuum robots actuated by antagonistic tendons. We begin by presenting a rigid-link dynamical model that can analyze the open-loop stiffening of tendon-driven continuum robots. Based on this model, we propose a novel passivity-based position-and-stiffness controller that adheres to the non-negative tension constraint. Comprehensive experiments on our continuum robot validate the theoretical results and demonstrate the efficacy and precision of this approach. △ Less

Submitted 13 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2302.04264 [pdf, other]

doi 10.1145/3588432.3591516

Nerfstudio: A Modular Framework for Neural Radiance Field Development

Authors: Matthew Tancik, Ethan Weber, Evonne Ng, Ruilong Li, Brent Yi, Justin Kerr, Terrance Wang, Alexander Kristoffersen, Jake Austin, Kamyar Salahi, Abhik Ahuja, David McAllister, Angjoo Kanazawa

Abstract: Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and pr… ▽ More Neural Radiance Fields (NeRF) are a rapidly growing area of research with wide-ranging applications in computer vision, graphics, robotics, and more. In order to streamline the development and deployment of NeRF research, we propose a modular PyTorch framework, Nerfstudio. Our framework includes plug-and-play components for implementing NeRF-based methods, which make it easy for researchers and practitioners to incorporate NeRF into their projects. Additionally, the modular design enables support for extensive real-time visualization tools, streamlined pipelines for importing captured in-the-wild data, and tools for exporting to video, point cloud and mesh representations. The modularity of Nerfstudio enables the development of Nerfacto, our method that combines components from recent papers to achieve a balance between speed and quality, while also remaining flexible to future modifications. To promote community-driven development, all associated code and data are made publicly available with open-source licensing at https://nerf.studio. △ Less

Submitted 16 October, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

Comments: Project page at https://nerf.studio

arXiv:2212.03078 [pdf, other]

An unified material interpolation for topology optimization of multi-materials

Authors: Bing Yi, Gil Ho Yoon, Ran Zheng, Long Liu, Daping Li, Xiang Peng

Abstract: Topology optimization is one of the engineering tools for finding efficient design. For the material interpolation scheme, it is usual to employ the SIMP (Solid Isotropic Material with Penalization) or the homogenization based interpolation function for the parameterization of the material properties with respect to the design variables assigned to each finite element. For topology optimization wi… ▽ More Topology optimization is one of the engineering tools for finding efficient design. For the material interpolation scheme, it is usual to employ the SIMP (Solid Isotropic Material with Penalization) or the homogenization based interpolation function for the parameterization of the material properties with respect to the design variables assigned to each finite element. For topology optimization with single material design, i.e., solid or void, the parameterization with 1 for solid and 0 for void becomes relatively straight forward using a polynomial function. For the case of multiple materials, some issues of the equality modeling of each material and \textcolor{red}{the clear 0, 1 result of each element for the topology optimization} issues become serious because of the curse of the dimension. To relieve these issues, this research proposes a new mapping based interpolation function for multi-material topology optimization. Unlike the polynomial based interpolation, this new interpolation is formulated by the ratio of the $p$-norm of the design variables to the 1-norm of the design variable multiplied by the design variable for a specific material. With this alternative mapping based interpolation function, each material are equally modeled and \textcolor{red}{ the clear 0, 1 result of each material for the multi-material topology optimization model} can be improved. This paper solves several topology optimization problems to prove the validity of the present interpolation function. △ Less

Submitted 21 November, 2022; originally announced December 2022.

Comments: 16 pages, 25 figures

arXiv:2210.16782 [pdf, other]

Unsupervised Learning of Structured Representations via Closed-Loop Transcription

Authors: Shengbang Tong, Xili Dai, Yubei Chen, Mingyang Li, Zengyi Li, Brent Yi, Yann LeCun, Yi Ma

Abstract: This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently p… ▽ More This paper proposes an unsupervised method for learning a unified representation that serves both discriminative and generative purposes. While most existing unsupervised learning approaches focus on a representation for only one of these two goals, we show that a unified representation can enjoy the mutual benefits of having both. Such a representation is attainable by generalizing the recently proposed \textit{closed-loop transcription} framework, known as CTRL, to the unsupervised setting. This entails solving a constrained maximin game over a rate reduction objective that expands features of all samples while compressing features of augmentations of each sample. Through this process, we see discriminative low-dimensional structures emerge in the resulting representations. Under comparable experimental conditions and network complexities, we demonstrate that these structured representations enable classification performance close to state-of-the-art unsupervised discriminative representations, and conditionally generated image quality significantly higher than that of state-of-the-art unsupervised generative models. Source code can be found at https://github.com/Delay-Xili/uCTRL. △ Less

Submitted 30 October, 2022; originally announced October 2022.

Comments: 17 pages

arXiv:2206.07277 [pdf, other]

A Gift from Label Smoothing: Robust Training with Adaptive Label Smoothing via Auxiliary Classifier under Label Noise

Authors: Jongwoo Ko, Bongsoo Yi, Se-Young Yun

Abstract: As deep neural networks can easily overfit noisy labels, robust training in the presence of noisy labels is becoming an important challenge in modern deep learning. While existing methods address this problem in various directions, they still produce unpredictable sub-optimal results since they rely on the posterior information estimated by the feature extractor corrupted by noisy labels. Lipschit… ▽ More As deep neural networks can easily overfit noisy labels, robust training in the presence of noisy labels is becoming an important challenge in modern deep learning. While existing methods address this problem in various directions, they still produce unpredictable sub-optimal results since they rely on the posterior information estimated by the feature extractor corrupted by noisy labels. Lipschitz regularization successfully alleviates this problem by training a robust feature extractor, but it requires longer training time and expensive computations. Motivated by this, we propose a simple yet effective method, called ALASCA, which efficiently provides a robust feature extractor under label noise. ALASCA integrates two key ingredients: (1) adaptive label smoothing based on our theoretical analysis that label smoothing implicitly induces Lipschitz regularization, and (2) auxiliary classifiers that enable practical application of intermediate Lipschitz regularization with negligible computations. We conduct wide-ranging experiments for ALASCA and combine our proposed method with previous noise-robust methods on several synthetic and real-world datasets. Experimental results show that our framework consistently improves the robustness of feature extractors and the performance of existing baselines with efficiency. Our code is available at https://github.com/jongwooko/ALASCA. △ Less

Submitted 28 November, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

Comments: THE 37TH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-23)

arXiv:2205.03721 [pdf, other]

doi 10.1109/IROS47612.2022.9982029

Category-Independent Articulated Object Tracking with Factor Graphs

Authors: Nick Heppert, Toki Migimatsu, Brent Yi, Claire Chen, Jeannette Bohg

Abstract: Robots deployed in human-centric environments may need to manipulate a diverse range of articulated objects, such as doors, dishwashers, and cabinets. Articulated objects often come with unexpected articulation mechanisms that are inconsistent with categorical priors: for example, a drawer might rotate about a hinge joint instead of sliding open. We propose a category-independent framework for pre… ▽ More Robots deployed in human-centric environments may need to manipulate a diverse range of articulated objects, such as doors, dishwashers, and cabinets. Articulated objects often come with unexpected articulation mechanisms that are inconsistent with categorical priors: for example, a drawer might rotate about a hinge joint instead of sliding open. We propose a category-independent framework for predicting the articulation models of unknown objects from sequences of RGB-D images. The prediction is performed by a two-step process: first, a visual perception module tracks object part poses from raw images, and second, a factor graph takes these poses and infers the articulation model including the current configuration between the parts as a 6D twist. We also propose a manipulation-oriented metric to evaluate predicted joint twists in terms of how well a compliant robot controller would be able to manipulate the articulated object given the predicted twist. We demonstrate that our visual perception and factor graph modules outperform baselines on simulated data and show the applicability of our factor graph on real world data. △ Less

Submitted 18 January, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

Comments: V2: Camera-ready IROS 2022 version 11 pages, 10 figures, IROS 2022

arXiv:2203.03795 [pdf, other]

Semantic-Preserving Linguistic Steganography by Pivot Translation and Semantic-Aware Bins Coding

Authors: Tianyu Yang, Hanzhou Wu, Biao Yi, Guorui Feng, Xinpeng Zhang

Abstract: Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication. It can be roughly divided to two main categories, i.e., modification based LS (MLS) and generation based LS (GLS). Unlike MLS that hides secret data by slightly modifying a given text without impairing the meaning of the text, GLS uses a trained language model to directly generate a… ▽ More Linguistic steganography (LS) aims to embed secret information into a highly encoded text for covert communication. It can be roughly divided to two main categories, i.e., modification based LS (MLS) and generation based LS (GLS). Unlike MLS that hides secret data by slightly modifying a given text without impairing the meaning of the text, GLS uses a trained language model to directly generate a text carrying secret data. A common disadvantage for MLS methods is that the embedding payload is very low, whose return is well preserving the semantic quality of the text. In contrast, GLS allows the data hider to embed a high payload, which has to pay the high price of uncontrollable semantics. In this paper, we propose a novel LS method to modify a given text by pivoting it between two different languages and embed secret data by applying a GLS-like information encoding strategy. Our purpose is to alter the expression of the given text, enabling a high payload to be embedded while keeping the semantic information unchanged. Experimental results have shown that the proposed work not only achieves a high embedding payload, but also shows superior performance in maintaining the semantic consistency and resisting linguistic steganalysis. △ Less

Submitted 7 March, 2022; originally announced March 2022.

Journal ref: IEEE Transactions on Dependable and Secure Computing (Final version at IEEE, 2023)

arXiv:2202.05411 [pdf, other]

Incremental Learning of Structured Memory via Closed-Loop Transcription

Authors: Shengbang Tong, Xili Dai, Ziyang Wu, Mingyang Li, Brent Yi, Yi Ma

Abstract: This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between the classes and a corresponding set of subspaces, known as a linear discriminative representation, in a low-dimensional feature space. Our method is simpler than existing approaches for incr… ▽ More This work proposes a minimal computational model for learning structured memories of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between the classes and a corresponding set of subspaces, known as a linear discriminative representation, in a low-dimensional feature space. Our method is simpler than existing approaches for incremental learning, and more efficient in terms of model size, storage, and computation: it requires only a single, fixed-capacity autoencoding network with a feature space that is used for both discriminative and generative purposes. Network parameters are optimized simultaneously without architectural manipulations, by solving a constrained minimax game between the encoding and decoding maps over a single rate reduction-based objective. Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work of generative replay on MNIST, CIFAR-10, and ImageNet-50, despite requiring fewer resources. Source code can be found at https://github.com/tsb0601/i-CTRL △ Less

Submitted 7 June, 2023; v1 submitted 10 February, 2022; originally announced February 2022.

Comments: 20 pages

arXiv:2112.12325 [pdf, other]

Globally convergent visual-feature range estimation with biased inertial measurements

Authors: Bowen Yi, Chi Jin, Ian R. Manchester

Abstract: The design of a globally convergent position observer for feature points from visual information is a challenging problem, especially for the case with only inertial measurements and without assumptions of uniform observability, which remained open for a long time. We give a solution to the problem in this paper assuming that only the bearing of a feature point, and biased linear acceleration and… ▽ More The design of a globally convergent position observer for feature points from visual information is a challenging problem, especially for the case with only inertial measurements and without assumptions of uniform observability, which remained open for a long time. We give a solution to the problem in this paper assuming that only the bearing of a feature point, and biased linear acceleration and rotational velocity of a robot -- all in the body-fixed frame -- are available. Further, in contrast to existing related results, we do not need the value of the gravitational constant either. The proposed approach builds upon the parameter estimation-based observer recently developed in (Ortega et al., Syst. Control Lett., vol.85, 2015) and its extension to matrix Lie groups in our previous work. Conditions on the robot trajectory under which the observer converges are given, and these are strictly weaker than the standard persistency of excitation and uniform complete observability conditions. Finally, as an illustration, we apply the proposed design to the visual inertial navigation problem. △ Less

Submitted 14 April, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

arXiv:2111.09422 [pdf, other]

ORPHEUS: Living Labs for End-to-End Data Infrastructures for Digital Agriculture

Authors: Pengcheng Wang, Edgardo Barsallo Yi, Tomas Ratkus, Somali Chaterji

Abstract: IoT networks are being used to collect, analyze, and utilize sensor data. There are still some key requirements to leverage IoT networks in digital agriculture, e.g., design and deployment of energy saving and ruggedized sensor nodes (SN), reliable and long-range wireless network connectivity, end-to-end data collection pipelines for batch and streaming data. Thus, we introduce our living lab ORPH… ▽ More IoT networks are being used to collect, analyze, and utilize sensor data. There are still some key requirements to leverage IoT networks in digital agriculture, e.g., design and deployment of energy saving and ruggedized sensor nodes (SN), reliable and long-range wireless network connectivity, end-to-end data collection pipelines for batch and streaming data. Thus, we introduce our living lab ORPHEUS and its design and implementation trajectory to showcase our orchestrated testbed of IoT sensors, data connectivity, database orchestration, and visualization dashboard. We deploy light-weight energy saving SNs in the field to collect data, using LoRa (Long Range wireless) to transmit data from the SNs to the Gateway node, upload all the data to the database server, and finally visualize the data. For future exploration, we also built a testbed of embedded devices using four different variants of NVIDIA Jetson development modules (Nano, TX2, Xavier NX, AGX Xavier) to benchmark the potential upgrade choices for SNs in ORPHEUS. Based on our deployment in multiple farms in a 3-county region around Purdue University, and on the Purdue University campus, we present analyses from our living lab deployment and additional components of the next-generation IoT farm. △ Less

Submitted 4 October, 2021; originally announced November 2021.

arXiv:2110.06509 [pdf, other]

Learning Stable Koopman Embeddings

Authors: Fletcher Fan, Bowen Yi, David Rye, Guodong Shi, Ian R. Manchester

Abstract: In this paper, we present a new data-driven method for learning stable models of nonlinear systems. Our model lifts the original state space to a higher-dimensional linear manifold using Koopman embeddings. Interestingly, we prove that every discrete-time nonlinear contracting model can be learnt in our framework. Another significant merit of the proposed approach is that it allows for unconstrain… ▽ More In this paper, we present a new data-driven method for learning stable models of nonlinear systems. Our model lifts the original state space to a higher-dimensional linear manifold using Koopman embeddings. Interestingly, we prove that every discrete-time nonlinear contracting model can be learnt in our framework. Another significant merit of the proposed approach is that it allows for unconstrained optimization over the Koopman embedding and operator jointly while enforcing stability of the model, via a direct parameterization of stable linear systems, greatly simplifying the computations involved. We validate our method on a simulated system and analyze the advantages of our parameterization compared to alternatives. △ Less

Submitted 13 October, 2021; originally announced October 2021.

arXiv:2109.14172 [pdf, other]

doi 10.1109/TMI.2021.3117564

Semi-Supervised Segmentation of Radiation-Induced Pulmonary Fibrosis from Lung CT Scans with Multi-Scale Guided Dense Attention

Authors: Guotai Wang, Shuwei Zhai, Giovanni Lasio, Baoshe Zhang, Byong Yi, Shifeng Chen, Thomas J. Macvittie, Dimitris Metaxas, Jinghao Zhou, Shaoting Zhang

Abstract: Computed Tomography (CT) plays an important role in monitoring radiation-induced Pulmonary Fibrosis (PF), where accurate segmentation of the PF lesions is highly desired for diagnosis and treatment follow-up. However, the task is challenged by ambiguous boundary, irregular shape, various position and size of the lesions, as well as the difficulty in acquiring a large set of annotated volumetric im… ▽ More Computed Tomography (CT) plays an important role in monitoring radiation-induced Pulmonary Fibrosis (PF), where accurate segmentation of the PF lesions is highly desired for diagnosis and treatment follow-up. However, the task is challenged by ambiguous boundary, irregular shape, various position and size of the lesions, as well as the difficulty in acquiring a large set of annotated volumetric images for training. To overcome these problems, we propose a novel convolutional neural network called PF-Net and incorporate it into a semi-supervised learning framework based on Iterative Confidence-based Refinement And Weighting of pseudo Labels (I-CRAWL). Our PF-Net combines 2D and 3D convolutions to deal with CT volumes with large inter-slice spacing, and uses multi-scale guided dense attention to segment complex PF lesions. For semi-supervised learning, our I-CRAWL employs pixel-level uncertainty-based confidence-aware refinement to improve the accuracy of pseudo labels of unannotated images, and uses image-level uncertainty for confidence-based image weighting to suppress low-quality pseudo labels in an iterative training process. Extensive experiments with CT scans of Rhesus Macaques with radiation-induced PF showed that: 1) PF-Net achieved higher segmentation accuracy than existing 2D, 3D and 2.5D neural networks, and 2) I-CRAWL outperformed state-of-the-art semi-supervised learning methods for the PF lesion segmentation task. Our method has a potential to improve the diagnosis of PF and clinical assessment of side effects of radiotherapy for lung cancers. △ Less

Submitted 28 September, 2021; originally announced September 2021.

Comments: 12 pages, 9 figures. Submitted to IEEE TMI

arXiv:2107.12168 [pdf, other]

Exploiting Language Model for Efficient Linguistic Steganalysis

Authors: Biao Yi, Hanzhou Wu, Guorui Feng, Xinpeng Zhang

Abstract: Recent advances in linguistic steganalysis have successively applied CNN, RNN, GNN and other efficient deep models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated stego texts… ▽ More Recent advances in linguistic steganalysis have successively applied CNN, RNN, GNN and other efficient deep models for detecting secret information in generative texts. These methods tend to seek stronger feature extractors to achieve higher steganalysis effects. However, we have found through experiments that there actually exists significant difference between automatically generated stego texts and carrier texts in terms of the conditional probability distribution of individual words. Such kind of difference can be naturally captured by the language model used for generating stego texts. Through further experiments, we conclude that this ability can be transplanted to a text classifier by pre-training and fine-tuning to improve the detection performance. Motivated by this insight, we propose two methods for efficient linguistic steganalysis. One is to pre-train a language model based on RNN, and the other is to pre-train a sequence autoencoder. The results indicate that the two methods have different degrees of performance gain compared to the randomly initialized RNN, and the convergence speed is significantly accelerated. Moreover, our methods achieved the best performance compared to related works, while providing a solution for real-world scenario where there are more cover texts than stego texts. △ Less

Submitted 2 February, 2022; v1 submitted 26 July, 2021; originally announced July 2021.

Comments: Accepted to IEEE International Conference on Acoustics, Speech, and Signal Processing 2022

arXiv:2106.07106 [pdf, other]

Alignment and Comparison of Directed Networks via Transition Couplings of Random Walks

Authors: Bongsoo Yi, Kevin O'Connor, Kevin McGoff, Andrew B. Nobel

Abstract: We describe and study a transport based procedure called NetOTC (network optimal transition coupling) for the comparison and alignment of two networks. The networks of interest may be directed or undirected, weighted or unweighted, and may have distinct vertex sets of different sizes. Given two networks and a cost function relating their vertices, NetOTC finds a transition coupling of their associ… ▽ More We describe and study a transport based procedure called NetOTC (network optimal transition coupling) for the comparison and alignment of two networks. The networks of interest may be directed or undirected, weighted or unweighted, and may have distinct vertex sets of different sizes. Given two networks and a cost function relating their vertices, NetOTC finds a transition coupling of their associated random walks having minimum expected cost. The minimizing cost quantifies the difference between the networks, while the optimal transport plan itself provides alignments of both the vertices and the edges of the two networks. Coupling of the full random walks, rather than their marginal distributions, ensures that NetOTC captures local and global information about the networks, and preserves edges. NetOTC has no free parameters, and does not rely on randomization. We investigate a number of theoretical properties of NetOTC and present experiments establishing its empirical performance. △ Less

Submitted 5 February, 2024; v1 submitted 13 June, 2021; originally announced June 2021.

arXiv:2105.08257 [pdf, other]

Differentiable Factor Graph Optimization for Learning Smoothers

Authors: Brent Yi, Michelle A. Lee, Alina Kloss, Roberto Martín-Martín, Jeannette Bohg

Abstract: A recent line of work has shown that end-to-end optimization of Bayesian filters can be used to learn state estimators for systems whose underlying models are difficult to hand-design or tune, while retaining the core advantages of probabilistic state estimation. As an alternative approach for state estimation in these settings, we present an end-to-end approach for learning state estimators model… ▽ More A recent line of work has shown that end-to-end optimization of Bayesian filters can be used to learn state estimators for systems whose underlying models are difficult to hand-design or tune, while retaining the core advantages of probabilistic state estimation. As an alternative approach for state estimation in these settings, we present an end-to-end approach for learning state estimators modeled as factor graph-based smoothers. By unrolling the optimizer we use for maximum a posteriori inference in these probabilistic graphical models, we can learn probabilistic system models in the full context of an overall state estimator, while also taking advantage of the distinct accuracy and runtime advantages that smoothers offer over recursive filters. We study this approach using two fundamental state estimation problems, object tracking and visual odometry, where we demonstrate a significant improvement over existing baselines. Our work comes with an extensive code release, which includes training and evaluation scripts, as well as Python libraries for Lie theory and factor graph optimization: https://sites.google.com/view/diffsmoothing/ △ Less

Submitted 23 August, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: IROS 2021. 9 pages with references and appendix

arXiv:2104.02966 [pdf, other]

An almost globally convergent observer for visual SLAM without persistent excitation

Authors: Bowen Yi, Chi Jin, Lei Wang, Guodong Shi, Ian R. Manchester

Abstract: In this paper we propose a novel observer to solve the problem of visual simultaneous localization and mapping (SLAM), only using the information from a single monocular camera and an inertial measurement unit (IMU). The system state evolves on the manifold $SE(3)\times \mathbb{R}^{3n}$, on which we design dynamic extensions carefully in order to generate an invariant foliation, such that the prob… ▽ More In this paper we propose a novel observer to solve the problem of visual simultaneous localization and mapping (SLAM), only using the information from a single monocular camera and an inertial measurement unit (IMU). The system state evolves on the manifold $SE(3)\times \mathbb{R}^{3n}$, on which we design dynamic extensions carefully in order to generate an invariant foliation, such that the problem is reformulated into online \emph{constant parameter} identification. Then, following the recently introduced parameter estimation-based observer (PEBO) and the dynamic regressor extension and mixing (DREM) procedure, we provide a new simple solution. A notable merit is that the proposed observer guarantees almost global asymptotic stability requiring neither persistency of excitation nor uniform complete observability, which, however, are widely adopted in most existing works with guaranteed stability. △ Less

Submitted 21 December, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

arXiv:2010.13021 [pdf, other]

Multimodal Sensor Fusion with Differentiable Filters

Authors: Michelle A. Lee, Brent Yi, Roberto Martín-Martín, Silvio Savarese, Jeannette Bohg

Abstract: Leveraging multimodal information with recursive Bayesian filters improves performance and robustness of state estimation, as recursive filters can combine different modalities according to their uncertainties. Prior work has studied how to optimally fuse different sensor modalities with analytical state estimation algorithms. However, deriving the dynamics and measurement models along with their… ▽ More Leveraging multimodal information with recursive Bayesian filters improves performance and robustness of state estimation, as recursive filters can combine different modalities according to their uncertainties. Prior work has studied how to optimally fuse different sensor modalities with analytical state estimation algorithms. However, deriving the dynamics and measurement models along with their noise profile can be difficult or lead to intractable models. Differentiable filters provide a way to learn these models end-to-end while retaining the algorithmic structure of recursive filters. This can be especially helpful when working with sensor modalities that are high dimensional and have very different characteristics. In contact-rich manipulation, we want to combine visual sensing (which gives us global information) with tactile sensing (which gives us local information). In this paper, we study new differentiable filtering architectures to fuse heterogeneous sensor information. As case studies, we evaluate three tasks: two in planar pushing (simulated and real) and one in manipulating a kinematically constrained door (simulated). In extensive evaluations, we find that differentiable filters that leverage crossmodal sensor information reach comparable accuracies to unstructured LSTM models, while presenting interpretability benefits that may be important for safety-critical systems. We also release an open-source library for creating and training differentiable Bayesian filters in PyTorch, which can be found on our project website: https://sites.google.com/view/multimodalfilter △ Less

Submitted 23 December, 2020; v1 submitted 24 October, 2020; originally announced October 2020.

Comments: Published in IROS 2020. Updated sponsors, fixed Kalman gain typo

arXiv:2006.12570 [pdf, other]

doi 10.1109/JIOT.2020.3009228

Hybrid Low-Power Wide-Area Mesh Network for IoT Applications

Authors: Xiaofan Jiang, Heng zhang, Edgardo Alberto Barsallo Yi, Nithin Raghunathan, Charilaos Mousoulis, Somali Chaterji, Dimitrios Peroulis, Ali Shakouri, Saurabh Bagchi

Abstract: The recent advancement of the Internet of Things (IoT) enables the possibility of data collection from diverse environments using IoT devices. However, despite the rapid advancement of low-power communication technologies, the deployment of IoT networks still faces many challenges. In this paper, we propose a hybrid, low-power, wide-area network (LPWAN) structure that can achieve wide-area communi… ▽ More The recent advancement of the Internet of Things (IoT) enables the possibility of data collection from diverse environments using IoT devices. However, despite the rapid advancement of low-power communication technologies, the deployment of IoT networks still faces many challenges. In this paper, we propose a hybrid, low-power, wide-area network (LPWAN) structure that can achieve wide-area communication coverage and low power consumption on IoT devices by utilizing both sub-GHz long-range radio and 2.4 GHz short-range radio. Specifically, we constructed a low-power mesh network with LoRa, a physical-layer standard that can provide long-range (kilometers) point-to-point communication using custom time-division multiple access (TDMA). Furthermore, we extended the capabilities of the mesh network by enabling ANT, an ultra-low-power, short-range communication protocol to satisfy data collection in dense device deployments. Third, we demonstrate the performance of the hybrid network with two real-world deployments at the Purdue University campus and at the university-owned farm. The results suggest that both networks have superior advantages in terms of cost, coverage, and power consumption vis-à-vis other IoT solutions, like LoRaWAN. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:2002.07767 [pdf, other]

Learning by Semantic Similarity Makes Abstractive Summarization Better

Authors: Wonjin Yoon, Yoon Sun Yeo, Minbyul Jeong, Bong-Jun Yi, Jaewoo Kang

Abstract: By harnessing pre-trained language models, summarization models had rapid progress recently. However, the models are mainly assessed by automatic evaluation metrics such as ROUGE. Although ROUGE is known for having a positive correlation with human evaluation scores, it has been criticized for its vulnerability and the gap between actual qualities. In this paper, we compare the generated summaries… ▽ More By harnessing pre-trained language models, summarization models had rapid progress recently. However, the models are mainly assessed by automatic evaluation metrics such as ROUGE. Although ROUGE is known for having a positive correlation with human evaluation scores, it has been criticized for its vulnerability and the gap between actual qualities. In this paper, we compare the generated summaries from recent LM, BART, and the reference summaries from a benchmark dataset, CNN/DM, using a crowd-sourced human evaluation metric. Interestingly, model-generated summaries receive higher scores relative to reference summaries. Stemming from our experimental results, we first argue the intrinsic characteristics of the CNN/DM dataset, the progress of pre-trained language models, and their ability to generalize on the training data. Finally, we share our insights into the model-generated summaries and presents our thought on learning methods for abstractive summarization. △ Less

Submitted 2 June, 2021; v1 submitted 18 February, 2020; originally announced February 2020.

Comments: The initial version of the manuscript includes a model design (semsim), experimental results, and discussions on the results. We found that our model has flaws in its implementation and design. This final version of the manuscript is from the rest of the initial paper; we included our findings on the benchmark dataset, BART generated results and human evaluations, and we excluded our model semsim

arXiv:1910.02640 [pdf, ps, other]

Four-Dimension Cross Constellations with Gray Mapping

Authors: Liangping Ma, Hae Chung, Byung K Yi

Abstract: Recently a four-dimension (4D) cross constellation has been proposed, where a 4D-vector is drawn from two $(3\times 4^m)$-ary QAM constellations, in an effort to reduce the peak-to-average-power ratio (PAPR). We construct a bits-to-signal mapping and prove that it is a Gray mapping. Simulation results show that the proposed modulation scheme is effective in reducing the PAPR while providing better… ▽ More Recently a four-dimension (4D) cross constellation has been proposed, where a 4D-vector is drawn from two $(3\times 4^m)$-ary QAM constellations, in an effort to reduce the peak-to-average-power ratio (PAPR). We construct a bits-to-signal mapping and prove that it is a Gray mapping. Simulation results show that the proposed modulation scheme is effective in reducing the PAPR while providing better error performance than existing 4D modulation schemes. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1904.03815 [pdf, other]

Quasi-Direct Drive for Low-Cost Compliant Robotic Manipulation

Authors: David V. Gealy, Stephen McKinley, Brent Yi, Philipp Wu, Phillip R. Downey, Greg Balke, Allan Zhao, Menglong Guo, Rachel Thomasson, Anthony Sinclair, Peter Cuellar, Zoe McCarthy, Pieter Abbeel

Abstract: Robots must cost less and be force-controlled to enable widespread, safe deployment in unconstrained human environments. We propose Quasi-Direct Drive actuation as a capable paradigm for robotic force-controlled manipulation in human environments at low-cost. Our prototype - Blue - is a human scale 7 Degree of Freedom arm with 2kg payload. Blue can cost less than $5000. We show that Blue has dynam… ▽ More Robots must cost less and be force-controlled to enable widespread, safe deployment in unconstrained human environments. We propose Quasi-Direct Drive actuation as a capable paradigm for robotic force-controlled manipulation in human environments at low-cost. Our prototype - Blue - is a human scale 7 Degree of Freedom arm with 2kg payload. Blue can cost less than $5000. We show that Blue has dynamic properties that meet or exceed the needs of human operators: the robot has a nominal position-control bandwidth of 7.5Hz and repeatability within 4mm. We demonstrate a Virtual Reality based interface that can be used as a method for telepresence and collecting robot training demonstrations. Manufacturability, scaling, and potential use-cases for the Blue system are also addressed. Videos and additional information can be found online at berkeleyopenarms.github.io △ Less

Submitted 11 April, 2019; v1 submitted 7 April, 2019; originally announced April 2019.

Comments: This is our long version - 8 pages. Our 6 page version without a discussion of thermal limits was accepted to ICRA 2019. 11 Figures

arXiv:1706.09247 [pdf]

On Reliability of Android Wearable Health Devices

Authors: Naixing Wang, Edgardo Barsallo Yi, Saurabh Bagchi

Abstract: Wearable devices are increasingly being used for monitoring health signals and for fitness purposes with typical uses being calorie tracker, workout assistant, and step counter. Even though these wearables can measure many health signals (e.g. heart rate), they are still not perceived as highly accurate, relative to clinical monitoring devices. In this paper, we investigate the accuracy of heart m… ▽ More Wearable devices are increasingly being used for monitoring health signals and for fitness purposes with typical uses being calorie tracker, workout assistant, and step counter. Even though these wearables can measure many health signals (e.g. heart rate), they are still not perceived as highly accurate, relative to clinical monitoring devices. In this paper, we investigate the accuracy of heart monitor as included in two popular wearables Motorola Moto 360 and the Apple Watch. We analyze the accuracy from a hardware and a software perspective and show the effects of body motion on the heart rate monitors based on the use of photoplethysmography (PPG) signals used in Android wearables. We then do a software reliability study of the Android Wear OS, on which many wearables are based, using fuzz testing. △ Less

Submitted 20 June, 2017; originally announced June 2017.

ACM Class: B.8.1; C.3; C.4; D.4.7; J.3

arXiv:1102.2969 [pdf, ps, other]

Efficient and scalable geometric hashing method for searching protein 3D structures

Authors: Gook-Pil Roh, Seung-won Hwang, Byoung-Kee Yi

Abstract: As the structural databases continue to expand, efficient methods are required to search similar structures of the query structure from the database. There are many previous works about comparing protein 3D structures and scanning the database with a query structure. However, they generally have limitations on practical use because of large computational and storage requirements. We propose two… ▽ More As the structural databases continue to expand, efficient methods are required to search similar structures of the query structure from the database. There are many previous works about comparing protein 3D structures and scanning the database with a query structure. However, they generally have limitations on practical use because of large computational and storage requirements. We propose two new types of queries for searching similar sub-structures on the structural database: LSPM (Local Spatial Pattern Matching) and RLSPM (Reverse LSPM). Between two types of queries, we focus on RLSPM problem, because it is more practical and general than LSPM. As a naive algorithm, we adopt geometric hashing techniques to RLSPM problem and then propose our proposed algorithm which improves the baseline algorithm to deal with large-scale data and provide an efficient matching algorithm. We employ the sub-sampling and Z-ordering to reduce the storage requirement and execution time, respectively. We conduct our experiments to show the correctness and reliability of the proposed method. Our experiment shows that the true positive rate is at least 0.8 using the reliability measure. △ Less

Submitted 15 February, 2011; originally announced February 2011.

Comments: 9 pages, 1 figures

Showing 1–29 of 29 results for author: Yi, B