Search | arXiv e-print repository

Let's Put Ourselves in Sally's Shoes: Shoes-of-Others Prefixing Improves Theory of Mind in Large Language Models

Authors: Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kuniko Saito

Abstract: Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in… ▽ More Recent studies have shown that Theory of Mind (ToM) in large language models (LLMs) has not reached human-level performance yet. Since fine-tuning LLMs on ToM datasets often degrades their generalization, several inference-time methods have been proposed to enhance ToM in LLMs. However, existing inference-time methods for ToM are specialized for inferring beliefs from contexts involving changes in the world state. In this study, we present a new inference-time method for ToM, Shoes-of-Others (SoO) prefixing, which makes fewer assumptions about contexts and is applicable to broader scenarios. SoO prefixing simply specifies the beginning of LLM outputs with ``Let's put ourselves in A's shoes.'', where A denotes the target character's name. We evaluate SoO prefixing on two benchmarks that assess ToM in conversational and narrative contexts without changes in the world state and find that it consistently improves ToM across five categories of mental states. Our analysis suggests that SoO prefixing elicits faithful thoughts, thereby improving the ToM performance. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: 14pages, 12 figures

arXiv:2505.15313 [pdf, ps, other]

FaceCrafter: Identity-Conditional Diffusion with Disentangled Control over Facial Pose, Expression, and Emotion

Authors: Kazuaki Mishima, Antoni Bigata Casademunt, Stavros Petridis, Maja Pantic, Kenji Suzuki

Abstract: Human facial images encode a rich spectrum of information, encompassing both stable identity-related traits and mutable attributes such as pose, expression, and emotion. While recent advances in image generation have enabled high-quality identity-conditional face synthesis, precise control over non-identity attributes remains challenging, and disentangling identity from these mutable factors is pa… ▽ More Human facial images encode a rich spectrum of information, encompassing both stable identity-related traits and mutable attributes such as pose, expression, and emotion. While recent advances in image generation have enabled high-quality identity-conditional face synthesis, precise control over non-identity attributes remains challenging, and disentangling identity from these mutable factors is particularly difficult. To address these limitations, we propose a novel identity-conditional diffusion model that introduces two lightweight control modules designed to independently manipulate facial pose, expression, and emotion without compromising identity preservation. These modules are embedded within the cross-attention layers of the base diffusion model, enabling precise attribute control with minimal parameter overhead. Furthermore, our tailored training strategy, which leverages cross-attention between the identity feature and each non-identity control feature, encourages identity features to remain orthogonal to control signals, enhancing controllability and diversity. Quantitative and qualitative evaluations, along with perceptual user studies, demonstrate that our method surpasses existing approaches in terms of control accuracy over pose, expression, and emotion, while also improving generative diversity under identity-only conditioning. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: 9 pages(excluding references), 3 figures, 5 tables

arXiv:2504.18080 [pdf, ps, other]

Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization

Authors: Wataru Kawakami, Keita Suzuki, Junichiro Iwasawa

Abstract: Large Language Models (LLMs) show potential in medicine, yet clinical adoption is hindered by concerns over factual accuracy, language-specific limitations (e.g., Japanese), and critically, their reliability when required to generate reasoning explanations -- a prerequisite for trust. This paper introduces Preferred-MedLLM-Qwen-72B, a 72B-parameter model optimized for the Japanese medical domain t… ▽ More Large Language Models (LLMs) show potential in medicine, yet clinical adoption is hindered by concerns over factual accuracy, language-specific limitations (e.g., Japanese), and critically, their reliability when required to generate reasoning explanations -- a prerequisite for trust. This paper introduces Preferred-MedLLM-Qwen-72B, a 72B-parameter model optimized for the Japanese medical domain to achieve both high accuracy and stable reasoning. We employ a two-stage fine-tuning process on the Qwen2.5-72B base model: first, Continued Pretraining (CPT) on a comprehensive Japanese medical corpus instills deep domain knowledge. Second, Reasoning Preference Optimization (RPO), a preference-based method, enhances the generation of reliable reasoning pathways while preserving high answer accuracy. Evaluations on the Japanese Medical Licensing Exam benchmark (IgakuQA) show Preferred-MedLLM-Qwen-72B achieves state-of-the-art performance (0.868 accuracy), surpassing strong proprietary models like GPT-4o (0.866). Crucially, unlike baseline or CPT-only models which exhibit significant accuracy degradation (up to 11.5\% and 3.8\% respectively on IgakuQA) when prompted for explanations, our model maintains its high accuracy (0.868) under such conditions. This highlights RPO's effectiveness in stabilizing reasoning generation. This work underscores the importance of optimizing for reliable explanations alongside accuracy. We release the Preferred-MedLLM-Qwen-72B model weights to foster research into trustworthy LLMs for specialized, high-stakes applications. △ Less

Submitted 25 April, 2025; originally announced April 2025.

arXiv:2504.13641 [pdf, other]

Propagational Proxy Voting

Authors: Yasushi Sakai, Parfait Atchade-Adelomou, Ryan Jiang, Luis Alonso, Kent Larson, Ken Suzuki

Abstract: This paper proposes a voting process in which voters allocate fractional votes to their expected utility in different domains: over proposals, other participants, and sets containing proposals and participants. This approach allows for a more nuanced expression of preferences by calculating the result and relevance within each node. We modeled this by creating a voting matrix that reflects their p… ▽ More This paper proposes a voting process in which voters allocate fractional votes to their expected utility in different domains: over proposals, other participants, and sets containing proposals and participants. This approach allows for a more nuanced expression of preferences by calculating the result and relevance within each node. We modeled this by creating a voting matrix that reflects their preference. We use absorbing Markov chains to gain the consensus, and also calculate the influence within the participating nodes. We illustrate this method in action through an experiment with 69 students using a budget allocation topic. △ Less

Submitted 18 April, 2025; originally announced April 2025.

arXiv:2503.11979 [pdf, other]

DynaGSLAM: Real-Time Gaussian-Splatting SLAM for Online Rendering, Tracking, Motion Predictions of Moving Objects in Dynamic Scenes

Authors: Runfa Blark Li, Mahdi Shaghaghi, Keito Suzuki, Xinshuang Liu, Varun Moparthi, Bang Du, Walker Curtis, Martin Renschler, Ki Myung Brian Lee, Nikolay Atanasov, Truong Nguyen

Abstract: Simultaneous Localization and Mapping (SLAM) is one of the most important environment-perception and navigation algorithms for computer vision, robotics, and autonomous cars/drones. Hence, high quality and fast mapping becomes a fundamental problem. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, state-of-the-art (SOTA) work… ▽ More Simultaneous Localization and Mapping (SLAM) is one of the most important environment-perception and navigation algorithms for computer vision, robotics, and autonomous cars/drones. Hence, high quality and fast mapping becomes a fundamental problem. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, state-of-the-art (SOTA) works introduce GS to SLAM. Compared to classical pointcloud-SLAM, GS-SLAM generates photometric information by learning from input camera views and synthesize unseen views with high-quality textures. However, these GS-SLAM fail when moving objects occupy the scene that violate the static assumption of bundle adjustment. The failed updates of moving GS affects the static GS and contaminates the full map over long frames. Although some efforts have been made by concurrent works to consider moving objects for GS-SLAM, they simply detect and remove the moving regions from GS rendering ("anti'' dynamic GS-SLAM), where only the static background could benefit from GS. To this end, we propose the first real-time GS-SLAM, "DynaGSLAM'', that achieves high-quality online GS rendering, tracking, motion predictions of moving objects in dynamic scenes while jointly estimating accurate ego motion. Our DynaGSLAM outperforms SOTA static & "Anti'' dynamic GS-SLAM on three dynamic real datasets, while keeping speed and memory efficiency in practice. △ Less

Submitted 14 March, 2025; originally announced March 2025.

arXiv:2502.19782 [pdf, other]

Open-Vocabulary Semantic Part Segmentation of 3D Human

Authors: Keito Suzuki, Bang Du, Girish Krishnan, Kunyao Chen, Runfa Blark Li, Truong Nguyen

Abstract: 3D part segmentation is still an open problem in the field of 3D vision and AR/VR. Due to limited 3D labeled data, traditional supervised segmentation methods fall short in generalizing to unseen shapes and categories. Recently, the advancement in vision-language models' zero-shot abilities has brought a surge in open-world 3D segmentation methods. While these methods show promising results for 3D… ▽ More 3D part segmentation is still an open problem in the field of 3D vision and AR/VR. Due to limited 3D labeled data, traditional supervised segmentation methods fall short in generalizing to unseen shapes and categories. Recently, the advancement in vision-language models' zero-shot abilities has brought a surge in open-world 3D segmentation methods. While these methods show promising results for 3D scenes or objects, they do not generalize well to 3D humans. In this paper, we present the first open-vocabulary segmentation method capable of handling 3D human. Our framework can segment the human category into desired fine-grained parts based on the textual prompt. We design a simple segmentation pipeline, leveraging SAM to generate multi-view proposals in 2D and proposing a novel HumanCLIP model to create unified embeddings for visual and textual inputs. Compared with existing pre-trained CLIP models, the HumanCLIP model yields more accurate embeddings for human-centric contents. We also design a simple-yet-effective MaskFusion module, which classifies and fuses multi-view features into 3D semantic masks without complex voting and grouping mechanisms. The design of decoupling mask proposals and text input also significantly boosts the efficiency of per-prompt inference. Experimental results on various 3D human datasets show that our method outperforms current state-of-the-art open-vocabulary 3D segmentation methods by a large margin. In addition, we show that our method can be directly applied to various 3D representations including meshes, point clouds, and 3D Gaussian Splatting. △ Less

Submitted 27 February, 2025; originally announced February 2025.

Comments: 3DV 2025

arXiv:2502.01972 [pdf, other]

Layer Separation: Adjustable Joint Space Width Images Synthesis in Conventional Radiography

Authors: Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima

Abstract: Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis… ▽ More Rheumatoid arthritis (RA) is a chronic autoimmune disease characterized by joint inflammation and progressive structural damage. Joint space width (JSW) is a critical indicator in conventional radiography for evaluating disease progression, which has become a prominent research topic in computer-aided diagnostic (CAD) systems. However, deep learning-based radiological CAD systems for JSW analysis face significant challenges in data quality, including data imbalance, limited variety, and annotation difficulties. This work introduced a challenging image synthesis scenario and proposed Layer Separation Networks (LSN) to accurately separate the soft tissue layer, the upper bone layer, and the lower bone layer in conventional radiographs of finger joints. Using these layers, the adjustable JSW images can be synthesized to address data quality challenges and achieve ground truth (GT) generation. Experimental results demonstrated that LSN-based synthetic images closely resemble real radiographs, and significantly enhanced the performance in downstream tasks. The code and dataset will be available. △ Less

Submitted 3 February, 2025; originally announced February 2025.

ACM Class: I.3.3; J.3; I.4.0

arXiv:2501.08838 [pdf, other]

ToMATO: Verbalizing the Mental States of Role-Playing LLMs for Benchmarking Theory of Mind

Authors: Kazutoshi Shinoda, Nobukatsu Hojo, Kyosuke Nishida, Saki Mizuno, Keita Suzuki, Ryo Masumura, Hiroaki Sugiyama, Kuniko Saito

Abstract: Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in three aspects: 1) they assess a limited range of mental states such as beliefs, 2) false beliefs are not comprehensively explored, and 3) the diverse personality traits of characters are overlooked. To address these challenges, we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over conversations.… ▽ More Existing Theory of Mind (ToM) benchmarks diverge from real-world scenarios in three aspects: 1) they assess a limited range of mental states such as beliefs, 2) false beliefs are not comprehensively explored, and 3) the diverse personality traits of characters are overlooked. To address these challenges, we introduce ToMATO, a new ToM benchmark formulated as multiple-choice QA over conversations. ToMATO is generated via LLM-LLM conversations featuring information asymmetry. By employing a prompting method that requires role-playing LLMs to verbalize their thoughts before each utterance, we capture both first- and second-order mental states across five categories: belief, intention, desire, emotion, and knowledge. These verbalized thoughts serve as answers to questions designed to assess the mental states of characters within conversations. Furthermore, the information asymmetry introduced by hiding thoughts from others induces the generation of false beliefs about various mental states. Assigning distinct personality traits to LLMs further diversifies both utterances and thoughts. ToMATO consists of 5.4k questions, 753 conversations, and 15 personality trait patterns. Our analysis shows that this dataset construction approach frequently generates false beliefs due to the information asymmetry between role-playing LLMs, and effectively reflects diverse personalities. We evaluate nine LLMs on ToMATO and find that even GPT-4o mini lags behind human performance, especially in understanding false beliefs, and lacks robustness to various personality traits. △ Less

Submitted 15 January, 2025; originally announced January 2025.

Comments: Accepted by AAAI 2025

arXiv:2412.06235 [pdf, other]

VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition

Authors: Michael Yeung, Toya Teramoto, Songtao Wu, Tatsuo Fujiwara, Kenji Suzuki, Tamaki Kojima

Abstract: The use of large-scale, web-scraped datasets to train face recognition models has raised significant privacy and bias concerns. Synthetic methods mitigate these concerns and provide scalable and controllable face generation to enable fair and accurate face recognition. However, existing synthetic datasets display limited intraclass and interclass diversity and do not match the face recognition per… ▽ More The use of large-scale, web-scraped datasets to train face recognition models has raised significant privacy and bias concerns. Synthetic methods mitigate these concerns and provide scalable and controllable face generation to enable fair and accurate face recognition. However, existing synthetic datasets display limited intraclass and interclass diversity and do not match the face recognition performance obtained using real datasets. Here, we propose VariFace, a two-stage diffusion-based pipeline to create fair and diverse synthetic face datasets to train face recognition models. Specifically, we introduce three methods: Face Recognition Consistency to refine demographic labels, Face Vendi Score Guidance to improve interclass diversity, and Divergence Score Conditioning to balance the identity preservation-intraclass diversity trade-off. When constrained to the same dataset size, VariFace considerably outperforms previous synthetic datasets (0.9200 $\rightarrow$ 0.9405) and achieves comparable performance to face recognition models trained with real data (Real Gap = -0.0065). In an unconstrained setting, VariFace not only consistently achieves better performance compared to previous synthetic methods across dataset sizes but also, for the first time, outperforms the real dataset (CASIA-WebFace) across six evaluation datasets. This sets a new state-of-the-art performance with an average face verification accuracy of 0.9567 (Real Gap = +0.0097) across LFW, CFP-FP, CPLFW, AgeDB, and CALFW datasets and 0.9366 (Real Gap = +0.0380) on the RFW dataset. △ Less

Submitted 17 April, 2025; v1 submitted 9 December, 2024; originally announced December 2024.

arXiv:2411.15468 [pdf, other]

SplatSDF: Boosting Neural Implicit SDF via Gaussian Splatting Fusion

Authors: Runfa Blark Li, Keito Suzuki, Bang Du, Ki Myung Brian Lee, Nikolay Atanasov, Truong Nguyen

Abstract: A signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attent… ▽ More A signed distance function (SDF) is a useful representation for continuous-space geometry and many related operations, including rendering, collision checking, and mesh generation. Hence, reconstructing SDF from image observations accurately and efficiently is a fundamental problem. Recently, neural implicit SDF (SDF-NeRF) techniques, trained using volumetric rendering, have gained a lot of attention. Compared to earlier truncated SDF (TSDF) fusion algorithms that rely on depth maps and voxelize continuous space, SDF-NeRF enables continuous-space SDF reconstruction with better geometric and photometric accuracy. However, the accuracy and convergence speed of scene-level SDF reconstruction require further improvements for many applications. With the advent of 3D Gaussian Splatting (3DGS) as an explicit representation with excellent rendering quality and speed, several works have focused on improving SDF-NeRF by introducing consistency losses on depth and surface normals between 3DGS and SDF-NeRF. However, loss-level connections alone lead to incremental improvements. We propose a novel neural implicit SDF called "SplatSDF" to fuse 3DGSandSDF-NeRF at an architecture level with significant boosts to geometric and photometric accuracy and convergence speed. Our SplatSDF relies on 3DGS as input only during training, and keeps the same complexity and efficiency as the original SDF-NeRF during inference. Our method outperforms state-of-the-art SDF-NeRF models on geometric and photometric evaluation by the time of submission. △ Less

Submitted 23 November, 2024; originally announced November 2024.

arXiv:2410.09060 [pdf, ps, other]

Tractability results for integration in subspaces of the Wiener algebra

Authors: Josef Dick, Takashi Goda, Kosuke Suzuki

Abstract: In this paper, we present some new (in-)tractability results related to the integration problem in subspaces of the Wiener algebra over the $d$-dimensional unit cube. We show that intractability holds for multivariate integration in the standard Wiener algebra in the deterministic setting, in contrast to polynomial tractability in an unweighted subspace of the Wiener algebra recently shown by Goda… ▽ More In this paper, we present some new (in-)tractability results related to the integration problem in subspaces of the Wiener algebra over the $d$-dimensional unit cube. We show that intractability holds for multivariate integration in the standard Wiener algebra in the deterministic setting, in contrast to polynomial tractability in an unweighted subspace of the Wiener algebra recently shown by Goda (2023). Moreover, we prove that multivariate integration in the subspace of the Wiener algebra introduced by Goda is strongly polynomially tractable if we switch to the randomized setting, where we obtain a better $\varepsilon$-exponent than the one implied by the standard Monte Carlo method. We also identify subspaces in which multivariate integration in the deterministic setting are (strongly) polynomially tractable and we compare these results with the bound which can be obtained via Hoeffding's inequality. △ Less

Submitted 5 March, 2025; v1 submitted 27 September, 2024; originally announced October 2024.

Comments: 16 pages. arXiv admin note: text overlap with arXiv:2306.01541

arXiv:2410.04427 [pdf, ps, other]

Consistent and Repeatable Testing of mMIMO O-RU across labs: A Japan-Singapore Experience

Authors: Thanh-Tam Nguyen, Mao V. Ngo, Binbin Chen, Mitsuhiro Kuchitsu, Serena Wai, Seitaro Kawai, Kenya Suzuki, Eng Wei Koo, Tony Quek

Abstract: Open Radio Access Networks (RAN) aim to bring a paradigm shift to telecommunications industry, by enabling an open, intelligent, virtualized, and multi-vendor interoperable RAN ecosystem. At the center of this movement, O-RAN ALLIANCE defines the O-RAN architecture and standards, so that companies around the globe can use these specifications to create innovative and interoperable solutions. To ac… ▽ More Open Radio Access Networks (RAN) aim to bring a paradigm shift to telecommunications industry, by enabling an open, intelligent, virtualized, and multi-vendor interoperable RAN ecosystem. At the center of this movement, O-RAN ALLIANCE defines the O-RAN architecture and standards, so that companies around the globe can use these specifications to create innovative and interoperable solutions. To accelerate the adoption of O-RAN products, rigorous testing of O-RAN Radio Unit (O-RU) and other O-RAN products plays a key role. O-RAN ALLIANCE has approved around 20 Open Testing and Integration Centres (OTICs) globally. OTICs serve as vendor-neutral platforms for providing the testing and integration services, with the vision that an O-RAN product certified in any OTIC is accepted in other parts of the world. To demonstrate the viability of such a certified-once-and-use-everywhere approach, one theme in the O-RAN Global PlugFest Spring 2024 is to demonstrate consistent and repeatable testing for the open fronthaul interface across multiple labs. Towards this, Japan OTIC and Asia Pacific OTIC in Singapore have teamed up together with an O-RU vendor and Keysight Technology. Our international team successfully completed all test cases defined by O-RAN ALLIANCE for O-RU conformance testing. In this paper, we share our journey in achieving this outcome, focusing on the challenges we have overcome and the lessons we have learned through this process. △ Less

Submitted 6 October, 2024; originally announced October 2024.

Comments: Published version at RitiRAN Workshop - co-located with IEEE VTC Fall 2024

arXiv:2409.19778 [pdf, other]

Lessons Learned from Developing a Human-Centered Guide Dog Robot for Mobility Assistance

Authors: Hochul Hwang, Ken Suzuki, Nicholas A Giudice, Joydeep Biswas, Sunghoon Ivan Lee, Donghyun Kim

Abstract: While guide dogs offer essential mobility assistance, their high cost, limited availability, and care requirements make them inaccessible to most blind or low vision (BLV) individuals. Recent advances in quadruped robots provide a scalable solution for mobility assistance, but many current designs fail to meet real-world needs due to a lack of understanding of handler and guide dog interactions. I… ▽ More While guide dogs offer essential mobility assistance, their high cost, limited availability, and care requirements make them inaccessible to most blind or low vision (BLV) individuals. Recent advances in quadruped robots provide a scalable solution for mobility assistance, but many current designs fail to meet real-world needs due to a lack of understanding of handler and guide dog interactions. In this paper, we share lessons learned from developing a human-centered guide dog robot, addressing challenges such as optimal hardware design, robust navigation, and informative scene description for user adoption. By conducting semi-structured interviews and human experiments with BLV individuals, guide-dog handlers, and trainers, we identified key design principles to improve safety, trust, and usability in robotic mobility aids. Our findings lay the building blocks for future development of guide dog robots, ultimately enhancing independence and quality of life for BLV individuals. △ Less

Submitted 29 September, 2024; originally announced September 2024.

arXiv:2409.16422 [pdf, other]

Is All Learning (Natural) Gradient Descent?

Authors: Lucas Shoji, Kenta Suzuki, Leo Kozachkov

Abstract: This paper shows that a wide class of effective learning rules -- those that improve a scalar performance measure over a given time window -- can be rewritten as natural gradient descent with respect to a suitably defined loss function and metric. Specifically, we show that parameter updates within this class of learning rules can be expressed as the product of a symmetric positive definite matrix… ▽ More This paper shows that a wide class of effective learning rules -- those that improve a scalar performance measure over a given time window -- can be rewritten as natural gradient descent with respect to a suitably defined loss function and metric. Specifically, we show that parameter updates within this class of learning rules can be expressed as the product of a symmetric positive definite matrix (i.e., a metric) and the negative gradient of a loss function. We also demonstrate that these metrics have a canonical form and identify several optimal ones, including the metric that achieves the minimum possible condition number. The proofs of the main results are straightforward, relying only on elementary linear algebra and calculus, and are applicable to continuous-time, discrete-time, stochastic, and higher-order learning rules, as well as loss functions that explicitly depend on time. △ Less

Submitted 24 September, 2024; originally announced September 2024.

Comments: 14 pages, 3 figures

arXiv:2409.07304 [pdf, other]

BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs

Authors: Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima

Abstract: Conventional radiography is the widely used imaging technology in diagnosing, monitoring, and prognosticating musculoskeletal (MSK) diseases because of its easy availability, versatility, and cost-effectiveness. In conventional radiographs, bone overlaps are prevalent, and can impede the accurate assessment of bone characteristics by radiologists or algorithms, posing significant challenges to con… ▽ More Conventional radiography is the widely used imaging technology in diagnosing, monitoring, and prognosticating musculoskeletal (MSK) diseases because of its easy availability, versatility, and cost-effectiveness. In conventional radiographs, bone overlaps are prevalent, and can impede the accurate assessment of bone characteristics by radiologists or algorithms, posing significant challenges to conventional and computer-aided diagnoses. This work initiated the study of a challenging scenario - bone layer separation in conventional radiographs, in which separate overlapped bone regions enable the independent assessment of the bone characteristics of each bone layer and lay the groundwork for MSK disease diagnosis and its automation. This work proposed a Bone Layer Separation GAN (BLS-GAN) framework that can produce high-quality bone layer images with reasonable bone characteristics and texture. This framework introduced a reconstructor based on conventional radiography imaging principles, which achieved efficient reconstruction and mitigates the recurrent calculations and training instability issues caused by soft tissue in the overlapped regions. Additionally, pre-training with synthetic images was implemented to enhance the stability of both the training process and the results. The generated images passed the visual Turing test, and improved performance in downstream tasks. This work affirms the feasibility of extracting bone layer images from conventional radiographs, which holds promise for leveraging bone layer separation technology to facilitate more comprehensive analytical research in MSK diagnosis, monitoring, and prognosis. Code and dataset: https://github.com/pokeblow/BLS-GAN.git. △ Less

Submitted 25 December, 2024; v1 submitted 11 September, 2024; originally announced September 2024.

Comments: Accepted by AAAI 2025

ACM Class: I.3.3; J.3; I.4.0

arXiv:2408.14843 [pdf, other]

Correntropy-Based Improper Likelihood Model for Robust Electrophysiological Source Imaging

Authors: Yuanhao Li, Badong Chen, Zhongxu Hu, Keita Suzuki, Wenjun Bai, Yasuharu Koike, Okito Yamashita

Abstract: Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a pote… ▽ More Bayesian learning provides a unified skeleton to solve the electrophysiological source imaging task. From this perspective, existing source imaging algorithms utilize the Gaussian assumption for the observation noise to build the likelihood function for Bayesian inference. However, the electromagnetic measurements of brain activity are usually affected by miscellaneous artifacts, leading to a potentially non-Gaussian distribution for the observation noise. Hence the conventional Gaussian likelihood model is a suboptimal choice for the real-world source imaging task. In this study, we aim to solve this problem by proposing a new likelihood model which is robust with respect to non-Gaussian noises. Motivated by the robust maximum correntropy criterion, we propose a new improper distribution model concerning the noise assumption. This new noise distribution is leveraged to structure a robust likelihood function and integrated with hierarchical prior distributions to estimate source activities by variational inference. In particular, the score matching is adopted to determine the hyperparameters for the improper likelihood model. A comprehensive performance evaluation is performed to compare the proposed noise assumption to the conventional Gaussian model. Simulation results show that, the proposed method can realize more precise source reconstruction by designing known ground-truth. The real-world dataset also demonstrates the superiority of our new method with the visual perception task. This study provides a new backbone for Bayesian source imaging, which would facilitate its application using real-world noisy brain signal. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2407.09044 [pdf, ps, other]

Sensorimotor Attention and Language-based Regressions in Shared Latent Variables for Integrating Robot Motion Learning and LLM

Authors: Kanata Suzuki, Tetsuya Ogata

Abstract: In recent years, studies have been actively conducted on combining large language models (LLM) and robotics; however, most have not considered end-to-end feedback in the robot-motion generation phase. The prediction of deep neural networks must contain errors, it is required to update the trained model to correspond to the real environment to generate robot motion adaptively. This study proposes a… ▽ More In recent years, studies have been actively conducted on combining large language models (LLM) and robotics; however, most have not considered end-to-end feedback in the robot-motion generation phase. The prediction of deep neural networks must contain errors, it is required to update the trained model to correspond to the real environment to generate robot motion adaptively. This study proposes an integration method that connects the robot-motion learning model and LLM using shared latent variables. When generating robot motion, the proposed method updates shared parameters based on prediction errors from both sensorimotor attention points and task language instructions given to the robot. This allows the model to search for latent parameters appropriate for the robot task efficiently. Through simulator experiments on multiple robot tasks, we demonstrated the effectiveness of our proposed method from two perspectives: position generalization and language instruction generalization abilities. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 7 pages, 8 figures, accepted at IROS 2024

arXiv:2406.11310 [pdf]

Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification

Authors: Zhipeng Deng, Yuqiao Yang, Kenji Suzuki

Abstract: Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data. Current FL research focuses on communication efficiency, privacy protection, and personalization and assumes that the data of FL have already been ideally collected. In medical scenarios, however, data annotation demands both expertise and intensive labor, which is a critical problem i… ▽ More Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data. Current FL research focuses on communication efficiency, privacy protection, and personalization and assumes that the data of FL have already been ideally collected. In medical scenarios, however, data annotation demands both expertise and intensive labor, which is a critical problem in FL. Active learning (AL), has shown promising performance in reducing the number of data annotations in medical image analysis. We propose a federated AL (FedAL) framework in which AL is executed periodically and interactively under FL. We exploit a local model in each hospital and a global model acquired from FL to construct an ensemble. We use ensemble-entropy-based AL as an efficient data-annotation strategy in FL. Therefore, our FedAL framework can decrease the amount of annotated data and preserve patient privacy while maintaining the performance of FL. To our knowledge, this is the first FedAL framework applied to medical images. We validated our framework on real-world dermoscopic datasets. Using only 50% of samples, our framework was able to achieve state-of-the-art performance on a skin-lesion classification task. Our framework performed better than several state-of-the-art AL methods under FL and achieved comparable performance to full-data FL. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 14 pages, 3 figures

arXiv:2406.10569 [pdf, other]

MDA: An Interpretable and Scalable Multi-Modal Fusion under Missing Modalities and Intrinsic Noise Conditions

Authors: Lin Fan, Yafei Ou, Cenyang Zheng, Pengyu Dai, Tamotsu Kamishima, Masayuki Ikebe, Kenji Suzuki, Xun Gong

Abstract: Multi-modal learning has shown exceptional performance in various tasks, especially in medical applications, where it integrates diverse medical information for comprehensive diagnostic evidence. However, there still are several challenges in multi-modal learning, 1. Heterogeneity between modalities, 2. uncertainty in missing modalities, 3. influence of intrinsic noise, and 4. interpretability for… ▽ More Multi-modal learning has shown exceptional performance in various tasks, especially in medical applications, where it integrates diverse medical information for comprehensive diagnostic evidence. However, there still are several challenges in multi-modal learning, 1. Heterogeneity between modalities, 2. uncertainty in missing modalities, 3. influence of intrinsic noise, and 4. interpretability for fusion result. This paper introduces the Modal-Domain Attention (MDA) model to address the above challenges. MDA constructs linear relationships between modalities through continuous attention, due to its ability to adaptively allocate dynamic attention to different modalities, MDA can reduce attention to low-correlation data, missing modalities, or modalities with inherent noise, thereby maintaining SOTA performance across various tasks on multiple public datasets. Furthermore, our observations on the contribution of different modalities indicate that MDA aligns with established clinical diagnostic imaging gold standards and holds promise as a reference for pathologies where these standards are not yet clearly defined. The code and dataset will be available. △ Less

Submitted 17 November, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

ACM Class: I.5.2; I.2.7; I.2.10; J.3

arXiv:2406.07074 [pdf, other]

doi 10.1109/LRA.2024.3402180

A Neck Orthosis with Multi-Directional Variable Stiffness for Persons with Dropped Head Syndrome

Authors: Santiago Price Torrendell, Hideki Kadone, Modar Hassan, Yang Chen, Kousei Miura, Kenji Suzuki

Abstract: Dropped Head Syndrome (DHS) causes a passively correctable neck deformation. Currently, there is no wearable orthopedic neck brace to fulfill the needs of persons suffering from DHS. Related works have made progress in this area by creating mobile neck braces that provide head support to mitigate deformation while permitting neck mobility, which enhances user-perceived comfort and quality of life.… ▽ More Dropped Head Syndrome (DHS) causes a passively correctable neck deformation. Currently, there is no wearable orthopedic neck brace to fulfill the needs of persons suffering from DHS. Related works have made progress in this area by creating mobile neck braces that provide head support to mitigate deformation while permitting neck mobility, which enhances user-perceived comfort and quality of life. Specifically, passive designs show great potential for fully functional devices in the short term due to their inherent simplicity and compactness, although achieving suitable support presents some challenges. This work introduces a novel compliant mechanism that provides non-restrictive adjustable support for the neck's anterior and posterior flexion movements while enabling its unconstrained free rotation. The results from the experiments on non-affected persons suggest that the device provides the proposed adjustable support that unloads the muscle groups involved in supporting the head without overloading the antagonist muscle groups. Simultaneously, it was verified that the free rotation is achieved regardless of the stiffness configuration of the device. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted Manuscript

Journal ref: IEEE Robotics and Automation Letters, vol. 9, no. 7, pp. 6224-6231, July 2024

arXiv:2405.05797 [pdf]

doi 10.4018/978-1-4666-1574-8.ch013

Adaptability and Homeostasis in the Game of Life interacting with the evolved Cellular Automata

Authors: Keisuke Suzuki, Takashi Ikegami

Abstract: In this paper we study the emergence of homeostasis in a two-layer system of the Game of Life, in which the Game of Life in the first layer couples with another system of cellular automata in the second layer. Homeostasis is defined here as a space-time dynamic that regulates the number of cells in state-1 in the Game of Life layer. A genetic algorithm is used to evolve the rules of the second lay… ▽ More In this paper we study the emergence of homeostasis in a two-layer system of the Game of Life, in which the Game of Life in the first layer couples with another system of cellular automata in the second layer. Homeostasis is defined here as a space-time dynamic that regulates the number of cells in state-1 in the Game of Life layer. A genetic algorithm is used to evolve the rules of the second layer to control the pattern of the Game of Life. We discovered that there are two antagonistic attractors that control the numbers of cells in state-1 in the first layer. The homeostasis sustained by these attractors are compared with the homeostatic dynamics observed in Daisy World. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Journal ref: Nature-Inspired Computing Design, Development, and Applications, edited by Leandro Nunes de Castro, 232-254. Hershey, PA: IGI Global, 2012

arXiv:2404.05039 [pdf, other]

StaccaToe: A Single-Leg Robot that Mimics the Human Leg and Toe

Authors: Nisal Perera, Shangqun Yu, Daniel Marew, Mack Tang, Ken Suzuki, Aidan McCormack, Shifan Zhu, Yong-Jae Kim, Donghyun Kim

Abstract: We introduce StaccaToe, a human-scale, electric motor-powered single-leg robot designed to rival the agility of human locomotion through two distinctive attributes: an actuated toe and a co-actuation configuration inspired by the human leg. Leveraging the foundational design of HyperLeg's lower leg mechanism, we develop a stand-alone robot by incorporating new link designs, custom-designed power e… ▽ More We introduce StaccaToe, a human-scale, electric motor-powered single-leg robot designed to rival the agility of human locomotion through two distinctive attributes: an actuated toe and a co-actuation configuration inspired by the human leg. Leveraging the foundational design of HyperLeg's lower leg mechanism, we develop a stand-alone robot by incorporating new link designs, custom-designed power electronics, and a refined control system. Unlike previous jumping robots that rely on either special mechanisms (e.g., springs and clutches) or hydraulic/pneumatic actuators, StaccaToe employs electric motors without energy storage mechanisms. This choice underscores our ultimate goal of developing a practical, high-performance humanoid robot capable of human-like, stable walking as well as explosive dynamic movements. In this paper, we aim to empirically evaluate the balance capability and the exertion of explosive ground reaction forces of our toe and co-actuation mechanisms. Throughout extensive hardware and controller development, StaccaToe showcases its control fidelity by demonstrating a balanced tip-toe stance and dynamic jump. This study is significant for three key reasons: 1) StaccaToe represents the first human-scale, electric motor-driven single-leg robot to execute dynamic maneuvers without relying on specialized mechanisms; 2) our research provides empirical evidence of the benefits of replicating critical human leg attributes in robotic design; and 3) we explain the design process for creating agile legged robots, the details that have been scantily covered in academic literature. △ Less

Submitted 7 April, 2024; originally announced April 2024.

Comments: Submitted to 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

arXiv:2312.02182 [pdf, ps, other]

Adam-like Algorithm with Smooth Clipping Attains Global Minima: Analysis Based on Ergodicity of Functional SDEs

Authors: Keisuke Suzuki

Abstract: In this paper, we prove that an Adam-type algorithm with smooth clipping approaches the global minimizer of the regularized non-convex loss function. Adding smooth clipping and taking the state space as the set of all trajectories, we can apply the ergodic theory of Markov semigroups for this algorithm and investigate its asymptotic behavior. The ergodic theory we establish in this paper reduces t… ▽ More In this paper, we prove that an Adam-type algorithm with smooth clipping approaches the global minimizer of the regularized non-convex loss function. Adding smooth clipping and taking the state space as the set of all trajectories, we can apply the ergodic theory of Markov semigroups for this algorithm and investigate its asymptotic behavior. The ergodic theory we establish in this paper reduces the problem of evaluating the convergence, generalization error and discretization error of this algorithm to the problem of evaluating the difference between two functional stochastic differential equations (SDEs) with different drift coefficients. As a result of our analysis, we have shown that this algorithm minimizes the the regularized non-convex loss function with errors of the form $n^{-1/2}$, $η^{1/4}$, $β^{-1} \log (β+ 1)$ and $e^{- c t}$. Here, $c$ is a constant and $n$, $η$, $β$ and $t$ denote the size of the training dataset, learning rate, inverse temperature and time, respectively. △ Less

Submitted 29 November, 2023; originally announced December 2023.

arXiv:2312.01543 [pdf, other]

Torso-Based Control Interface for Standing Mobility-Assistive Devices

Authors: Yang Chen, Diego Paez-Granados, Modar Hassan, Kenji Suzuki

Abstract: Wheelchairs and mobility devices have transformed our bodies into cybernic systems, enhancing our well-being by enabling individuals with reduced mobility to regain freedom. Notwithstanding, current interfaces of control primarily rely on hand operation, therefore constraining the user from performing functional activities of daily living. In this work, we propose a design of a torso-based control… ▽ More Wheelchairs and mobility devices have transformed our bodies into cybernic systems, enhancing our well-being by enabling individuals with reduced mobility to regain freedom. Notwithstanding, current interfaces of control primarily rely on hand operation, therefore constraining the user from performing functional activities of daily living. In this work, we propose a design of a torso-based control interface with compliant coupling support for standing mobility assistive devices. We consider the coupling between the human and robot in the interface design. The design includes a compliant support mechanism and mapping between the body movement space and the velocity space. We present experiments including multiple conditions, with a joystick for comparison with the proposed torso control interface. The results of a path-following experiment demonstrated that users could control the device naturally using the hands-free interface, and the performance was comparable with the joystick, with 10% more consumed time, an average cross error of 0.116 m and 4.9% less average acceleration. In an object-transferring experiment, the proposed interface demonstrated a clear advantage when users needed to manipulate objects during locomotion. Lastly, the torso control scored 15% less than the joystick on the system usability scale for the path-following task but 3.3% more for the object-transferring task. △ Less

Submitted 27 October, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted by IEEE/ASME Transactions on Mechatronics

arXiv:2311.05470 [pdf, other]

Designing ship hull forms using generative adversarial networks

Authors: Kazuo Yonekura, Kotaro Omori, Xinran Qi, Katsuyuki Suzuki

Abstract: We proposed a GAN-based method to generate a ship hull form. Unlike mathematical hull forms that require geometrical parameters to generate ship hull forms, the proposed method requires desirable ship performance parameters, i.e., the drag coefficient and tonnage. The requirements of ship owners are generally focused on the ship performance and not the geometry itself. Hence, the proposed model is… ▽ More We proposed a GAN-based method to generate a ship hull form. Unlike mathematical hull forms that require geometrical parameters to generate ship hull forms, the proposed method requires desirable ship performance parameters, i.e., the drag coefficient and tonnage. The requirements of ship owners are generally focused on the ship performance and not the geometry itself. Hence, the proposed model is useful for obtaining the ship hull form based on an owner's requirements. The GAN model was trained using a ship hull form dataset generated using the generalized Wigley hull form. The proposed method was evaluated through numerical experiments and successfully generated ship data with small errors. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.05445 [pdf, other]

Airfoil generation and feature extraction using the conditional VAE-WGAN-gp

Authors: Kazuo Yonekura, Yuki Tomori, Katsuyuki Suzuki

Abstract: A machine learning method was applied to solve an inverse airfoil design problem. A conditional VAE-WGAN-gp model, which couples the conditional variational autoencoder (VAE) and Wasserstein generative adversarial network with gradient penalty (WGAN-gp), is proposed for an airfoil generation method, and then it is compared with the WGAN-gp and VAE models. The VAEGAN model couples the VAE and GAN m… ▽ More A machine learning method was applied to solve an inverse airfoil design problem. A conditional VAE-WGAN-gp model, which couples the conditional variational autoencoder (VAE) and Wasserstein generative adversarial network with gradient penalty (WGAN-gp), is proposed for an airfoil generation method, and then it is compared with the WGAN-gp and VAE models. The VAEGAN model couples the VAE and GAN models, which enables feature extraction in the GAN models. In airfoil generation tasks, to generate airfoil shapes that satisfy lift coefficient requirements, it is known that VAE outperforms WGAN-gp with respect to the accuracy of the reproduction of the lift coefficient, whereas GAN outperforms VAE with respect to the smoothness and variations of generated shapes. In this study, VAE-WGAN-gp demonstrated a good performance in all three aspects. Latent distribution was also studied to compare the feature extraction ability of the proposed method. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2309.14837 [pdf, other]

Realtime Motion Generation with Active Perception Using Attention Mechanism for Cooking Robot

Authors: Namiko Saito, Mayu Hiramoto, Ayuna Kubo, Kanata Suzuki, Hiroshi Ito, Shigeki Sugano, Tetsuya Ogata

Abstract: To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled on the task of cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement in real time, while the egg is heated and the state changes continuously. In previou… ▽ More To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled on the task of cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement in real time, while the egg is heated and the state changes continuously. In previous works, handling changing objects was found to be challenging because sensory information includes dynamical, both important or noisy information, and the modality which should be focused on changes every time, making it difficult to realize both perception and motion generation in real time. We propose a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is, that realize quick and efficient perception and motion generation. The model is trained with learning from the demonstration, and allows the robot to acquire human-like skills. We validated the proposed technique using the robot, Dry-AIREC, and with our learning model, it could perform cooking eggs with unknown ingredients. The robot could change the method of stirring and direction depending on the status of the egg, as in the beginning it stirs in the whole pot, then subsequently, after the egg started being heated, it starts flipping and splitting motion targeting specific areas, although we did not explicitly indicate them. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.14231 [pdf]

Mixed variable structural optimization using mixed variable system Monte Carlo tree search formulation

Authors: Fu-Yao Ko, Katsuyuki Suzuki, Kazuo Yonekura

Abstract: A novel method called mixed variable system Monte Carlo tree search (MVSMCTS) formulation is presented for optimization problems considering various types of variables with single and mixed continuous-discrete system. This method utilizes a reinforcement learning algorithm with improved Monte Carlo tree search (IMCTS) formulation. For sizing and shape optimization of truss structures, the design v… ▽ More A novel method called mixed variable system Monte Carlo tree search (MVSMCTS) formulation is presented for optimization problems considering various types of variables with single and mixed continuous-discrete system. This method utilizes a reinforcement learning algorithm with improved Monte Carlo tree search (IMCTS) formulation. For sizing and shape optimization of truss structures, the design variables are the cross-sectional areas of the members and the nodal coordinates of the joints. MVSMCTS incorporates update process and accelerating technique for continuous variable and combined scheme for single and mixed system. Update process indicates that once a solution is determined by MCTS with automatic mesh generation in continuous space, it is used as the initial solution for next search tree. The search region should be expanded from the mid-point, which is the design variable for initial state. Accelerating technique is developed by decreasing the range of search region and the width of search tree based on the number of meshes during update process. Combined scheme means that various types of variables are coupled in only one search tree. Through several examples, it is demonstrated that this framework is suitable for mixed variable structural optimization. Moreover, the agent can find optimal solution in a reasonable time, stably generates an optimal design, and is applicable for practical engineering problems. △ Less

Submitted 29 October, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

Comments: 37 pages, 19 figures, 7 tables. arXiv admin note: text overlap with arXiv:2309.06045

arXiv:2309.11040 [pdf, other]

Stein Variational Guided Model Predictive Path Integral Control: Proposal and Experiments with Fast Maneuvering Vehicles

Authors: Kohei Honda, Naoki Akai, Kosuke Suzuki, Mizuho Aoki, Hirotaka Hosogaya, Hiroyuki Okuda, Tatsuya Suzuki

Abstract: This paper presents a novel Stochastic Optimal Control (SOC) method based on Model Predictive Path Integral control (MPPI), named Stein Variational Guided MPPI (SVG-MPPI), designed to handle rapidly shifting multimodal optimal action distributions. While MPPI can find a Gaussian-approximated optimal action distribution in closed form, i.e., without iterative solution updates, it struggles with the… ▽ More This paper presents a novel Stochastic Optimal Control (SOC) method based on Model Predictive Path Integral control (MPPI), named Stein Variational Guided MPPI (SVG-MPPI), designed to handle rapidly shifting multimodal optimal action distributions. While MPPI can find a Gaussian-approximated optimal action distribution in closed form, i.e., without iterative solution updates, it struggles with the multimodality of the optimal distributions. This is due to the less representative nature of the Gaussian. To overcome this limitation, our method aims to identify a target mode of the optimal distribution and guide the solution to converge to fit it. In the proposed method, the target mode is roughly estimated using a modified Stein Variational Gradient Descent (SVGD) method and embedded into the MPPI algorithm to find a closed-form "mode-seeking" solution that covers only the target mode, thus preserving the fast convergence property of MPPI. Our simulation and real-world experimental results demonstrate that SVG-MPPI outperforms both the original MPPI and other state-of-the-art sampling-based SOC algorithms in terms of path-tracking and obstacle-avoidance capabilities. Source code: https://github.com/kohonda/proj-svg_mppi △ Less

Submitted 29 February, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

Comments: 7 pages, 5 figures

arXiv:2309.06045 [pdf]

Improved Monte Carlo tree search formulation with multiple root nodes for discrete sizing optimization of truss structures

Authors: Fu-Yao Ko, Katsuyuki Suzuki, Kazuo Yonekura

Abstract: This paper proposes a novel reinforcement learning (RL) algorithm using improved Monte Carlo tree search (IMCTS) formulation for discrete optimum design of truss structures. IMCTS with multiple root nodes includes update process, the best reward, accelerating technique, and terminal condition. Update process means that once a final solution is found, it is used as the initial solution for next sea… ▽ More This paper proposes a novel reinforcement learning (RL) algorithm using improved Monte Carlo tree search (IMCTS) formulation for discrete optimum design of truss structures. IMCTS with multiple root nodes includes update process, the best reward, accelerating technique, and terminal condition. Update process means that once a final solution is found, it is used as the initial solution for next search tree. The best reward is used in the backpropagation step. Accelerating technique is introduced by decreasing the width of search tree and reducing maximum number of iterations. The agent is trained to minimize the total structural weight under various constraints until the terminal condition is satisfied. Then, optimal solution is the minimum value of all solutions found by search trees. These numerical examples show that the agent can find optimal solution with low computational cost, stably produces an optimal design, and is suitable for multi-objective structural optimization and large-scale structures. △ Less

Submitted 7 August, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 34 pages, 24 figures, 16 tables

arXiv:2308.15684 [pdf, ps, other]

Interactively Robot Action Planning with Uncertainty Analysis and Active Questioning by Large Language Model

Authors: Kazuki Hori, Kanata Suzuki, Tetsuya Ogata

Abstract: The application of the Large Language Model (LLM) to robot action planning has been actively studied. The instructions given to the LLM by natural language may include ambiguity and lack of information depending on the task context. It is possible to adjust the output of LLM by making the instruction input more detailed; however, the design cost is high. In this paper, we propose the interactive r… ▽ More The application of the Large Language Model (LLM) to robot action planning has been actively studied. The instructions given to the LLM by natural language may include ambiguity and lack of information depending on the task context. It is possible to adjust the output of LLM by making the instruction input more detailed; however, the design cost is high. In this paper, we propose the interactive robot action planning method that allows the LLM to analyze and gather missing information by asking questions to humans. The method can minimize the design cost of generating precise robot instructions. We demonstrated the effectiveness of our method through concrete examples in cooking tasks. However, our experiments also revealed challenges in robot action planning with LLM, such as asking unimportant questions and assuming crucial information without asking. Shedding light on these issues provides valuable insights for future research on utilizing LLM for robotics. △ Less

Submitted 18 October, 2023; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 7 pages, 6 figures, accepted at SII 2024

arXiv:2308.10038 [pdf, other]

Physics-guided training of GAN to improve accuracy in airfoil design synthesis

Authors: Kazunari Wada, Katsuyuki Suzuki, Kazuo Yonekura

Abstract: Generative adversarial networks (GAN) have recently been used for a design synthesis of mechanical shapes. A GAN sometimes outputs physically unreasonable shapes. For example, when a GAN model is trained to output airfoil shapes that indicate required aerodynamic performance, significant errors occur in the performance values. This is because the GAN model only considers data but does not consider… ▽ More Generative adversarial networks (GAN) have recently been used for a design synthesis of mechanical shapes. A GAN sometimes outputs physically unreasonable shapes. For example, when a GAN model is trained to output airfoil shapes that indicate required aerodynamic performance, significant errors occur in the performance values. This is because the GAN model only considers data but does not consider the aerodynamic equations that lie under the data. This paper proposes the physics-guided training of the GAN model to guide the model to learn physical validity. Physical validity is computed using general-purpose software located outside the neural network model. Such general-purpose software cannot be used in physics-informed neural network frameworks, because physical equations must be implemented inside the neural network models. Additionally, a limitation of generative models is that the output data are similar to the training data and cannot generate completely new shapes. However, because the proposed model is guided by a physical model and does not use a training dataset, it can generate completely new shapes. Numerical experiments show that the proposed model drastically improves the accuracy. Moreover, the output shapes differ from those of the training dataset but still satisfy the physical validity, overcoming the limitations of existing GAN models. △ Less

Submitted 19 August, 2023; originally announced August 2023.

arXiv:2308.01648 [pdf, other]

Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters using Residual Reinforcement Learning

Authors: Yu Ishihara, Yuichi Hazama, Kousuke Suzuki, Jerry Jun Yokono, Kohtaro Sabe, Kenta Kawamoto

Abstract: Wind resistance control is an essential feature for quadcopters to maintain their position to avoid deviation from target position and prevent collisions with obstacles. Conventionally, cascaded PID controller is used for the control of quadcopters for its simplicity and ease of tuning its parameters. However, it is weak against wind disturbances and the quadcopter can easily deviate from target p… ▽ More Wind resistance control is an essential feature for quadcopters to maintain their position to avoid deviation from target position and prevent collisions with obstacles. Conventionally, cascaded PID controller is used for the control of quadcopters for its simplicity and ease of tuning its parameters. However, it is weak against wind disturbances and the quadcopter can easily deviate from target position. In this work, we propose a residual reinforcement learning based approach to build a wind resistance controller of a quadcopter. By learning only the residual that compensates the disturbance, we can continue using the cascaded PID controller as the base controller of the quadcopter but improve its performance against wind disturbances. To avoid unexpected crashes and destructions of quadcopters, our method does not require real hardware for data collection and training. The controller is trained only on a simulator and directly applied to the target hardware without extra finetuning process. We demonstrate the effectiveness of our approach through various experiments including an experiment in an outdoor scene with wind speed greater than 13 m/s. Despite its simplicity, our controller reduces the position deviation by approximately 50% compared to the quadcopter controlled with the conventional cascaded PID controller. Furthermore, trained controller is robust and preserves its performance even though the quadcopter's mass and propeller's lift coefficient is changed between 50% to 150% from original training time. △ Less

Submitted 3 August, 2023; originally announced August 2023.

arXiv:2307.03338 [pdf]

doi 10.1016/j.techfore.2024.123692

From Conservatism to Innovation: The Sequential and Iterative Process of Smart Livestock Technology Adoption in Japanese Small-Farm Systems

Authors: Takumi Ohashi, Miki Saijo, Kento Suzuki, Shinsuke Arafuka

Abstract: As global demand for animal products is projected to increase significantly by 2050, driven by population growth and increased incomes, smart livestock technologies are essential for improving efficiency, animal welfare, and environmental sustainability. Conducted within the unique agricultural context of Japan, characterized by small-scale, family-run farms and strong government protection polici… ▽ More As global demand for animal products is projected to increase significantly by 2050, driven by population growth and increased incomes, smart livestock technologies are essential for improving efficiency, animal welfare, and environmental sustainability. Conducted within the unique agricultural context of Japan, characterized by small-scale, family-run farms and strong government protection policies, our study builds upon traditional theoretical frameworks that often oversimplify farmers' decision-making processes. By employing a scoping review, expert interviews, and a Modified Grounded Theory Approach, our research uncovers the intricate interplay between individual farmer values, farm management policies, social relations, agricultural policies, and livestock industry trends. We particularly highlight the unique dynamics within family-owned businesses, noting the tension between an "advanced management mindset" and "conservatism." Our study reveals that technology adoption is a sequential and iterative process, influenced by technology availability, farmers' digital literacy, technology implementation support, and observable technology impacts on animal health and productivity. These insights highlight the need for tailored support mechanisms and policies to enhance technology uptake, thereby promoting sustainable and efficient livestock production system. △ Less

Submitted 17 June, 2024; v1 submitted 6 July, 2023; originally announced July 2023.

Comments: 58 pages, 3 figures

MSC Class: 91C99 ACM Class: J.4

arXiv:2306.14714 [pdf, ps, other]

Deep Predictive Learning: Motion Learning Concept inspired by Cognitive Robotics

Authors: Kanata Suzuki, Hiroshi Ito, Tatsuro Yamada, Kei Kase, Tetsuya Ogata

Abstract: Bridging the gap between motion models and reality is crucial by using limited data to deploy robots in the real world. Deep learning is expected to be generalized to diverse situations while reducing feature design costs through end-to-end learning for environmental recognition and motion generation. However, data collection for model training is costly, and time and human resources are essential… ▽ More Bridging the gap between motion models and reality is crucial by using limited data to deploy robots in the real world. Deep learning is expected to be generalized to diverse situations while reducing feature design costs through end-to-end learning for environmental recognition and motion generation. However, data collection for model training is costly, and time and human resources are essential for robot trial-and-error with physical contact. We propose "Deep Predictive Learning," a motion learning concept that predicts the robot's sensorimotor dynamics, assuming imperfections in the prediction model. The predictive coding theory inspires this concept to solve the above problems. It is based on the fundamental strategy of predicting the near-future sensorimotor states of robots and online minimization of the prediction error between the real world and the model. Based on the acquired sensor information, the robot can adjust its behavior in real time, thereby tolerating the difference between the learning experience and reality. Additionally, the robot was expected to perform a wide range of tasks by combining the motion dynamics embedded in the model. This paper describes the proposed concept, its implementation, and examples of its applications in real robots. The code and documents are available at: https://ogata-lab.github.io/eipl-docs △ Less

Submitted 14 March, 2024; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.02273 [pdf, ps, other]

End-to-End Joint Target and Non-Target Speakers ASR

Authors: Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando

Abstract: This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applicatio… ▽ More This paper proposes a novel automatic speech recognition (ASR) system that can transcribe individual speaker's speech while identifying whether they are target or non-target speakers from multi-talker overlapped speech. Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information. However, in conversational ASR applications, transcribing both the target speaker's speech and non-target speakers' ones is often required to understand interactive information. To naturally consider both target and non-target speakers in a single ASR model, our idea is to extend autoregressive modeling-based multi-talker ASR systems to utilize the enrollment speech of the target speaker. Our proposed ASR is performed by recursively generating both textual tokens and tokens that represent target or non-target speakers. Our experiments demonstrate the effectiveness of our proposed method. △ Less

Submitted 4 June, 2023; originally announced June 2023.

Comments: Accepted at Interspeech 2023

arXiv:2303.06413 [pdf, other]

Design of a Multi-Degree-of-Freedom Elastic Neck Exoskeleton for Persons with Dropped Head Syndrome

Authors: Santiago Price Torrendell, Yang Chen, Hideki Kadone, Modar Hassan, Kenji Suzuki

Abstract: Nonsurgical treatment of Dropped Head Syndrome (DHS) incurs the use of collar-type orthoses that immobilize the neck and cause discomfort and sores under the chin. Articulated orthoses have the potential to support the head posture while allowing partial mobility of the neck and reduced discomfort and sores. This work presents the design, modeling, development, and characterization of a novel mult… ▽ More Nonsurgical treatment of Dropped Head Syndrome (DHS) incurs the use of collar-type orthoses that immobilize the neck and cause discomfort and sores under the chin. Articulated orthoses have the potential to support the head posture while allowing partial mobility of the neck and reduced discomfort and sores. This work presents the design, modeling, development, and characterization of a novel multi-degree-of-freedom elastic mechanism designed for neck support. This new type of elastic mechanism allows the bending of the head in the sagittal and coronal planes, and head rotations in the transverse plane. From these articulate movements, the mechanism generates moments that restore the head and neck to the upright posture, thus compensating for the muscle weakness caused by DHS. The experimental results show adherence to the empirical characterization of the elastic mechanism under flexion to the model-based calculations. A neck support orthosis prototype based on the proposed mechanism is presented, which enables the three before-mentioned head motions of a healthy participant, according to the results of preliminary tests. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: 6th IEEE-RAS International Conference on Soft Robotics (RoboSoft), 2023

arXiv:2303.06058 [pdf, other]

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

Authors: Dorian Baudry, Kazuya Suzuki, Junya Honda

Abstract: In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling… ▽ More In this paper we propose a general methodology to derive regret bounds for randomized multi-armed bandit algorithms. It consists in checking a set of sufficient conditions on the sampling probability of each arm and on the family of distributions to prove a logarithmic regret. As a direct application we revisit two famous bandit algorithms, Minimum Empirical Divergence (MED) and Thompson Sampling (TS), under various models for the distributions including single parameter exponential families, Gaussian distributions, bounded distributions, or distributions satisfying some conditions on their moments. In particular, we prove that MED is asymptotically optimal for all these models, but also provide a simple regret analysis of some TS algorithms for which the optimality is already known. We then further illustrate the interest of our approach, by analyzing a new Non-Parametric TS algorithm (h-NPTS), adapted to some families of unbounded reward distributions with a bounded h-moment. This model can for instance capture some non-parametric families of distributions whose variance is upper bounded by a known constant. △ Less

Submitted 13 November, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2212.02024 [pdf, other]

Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models

Authors: Naoki Matsunaga, Masato Ishii, Akio Hayakawa, Kenji Suzuki, Takuya Narihira

Abstract: Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a t… ▽ More Our goal is to develop fine-grained real-image editing methods suitable for real-world applications. In this paper, we first summarize four requirements for these methods and propose a novel diffusion-based image editing framework with pixel-wise guidance that satisfies these requirements. Specifically, we train pixel-classifiers with a few annotated data and then infer the segmentation map of a target image. Users then manipulate the map to instruct how the image will be edited. We utilize a pre-trained diffusion model to generate edited images aligned with the user's intention with pixel-wise guidance. The effective combination of proposed guidance and other techniques enables highly controllable editing with preserving the outside of the edited area, which results in meeting our requirements. The experimental results demonstrate that our proposal outperforms the GAN-based method for editing quality and speed. △ Less

Submitted 31 May, 2023; v1 submitted 4 December, 2022; originally announced December 2022.

Comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2023

arXiv:2212.00285 [pdf]

doi 10.1002/wcs.1662

Hybrid Life: Integrating Biological, Artificial, and Cognitive Systems

Authors: Manuel Baltieri, Hiroyuki Iizuka, Olaf Witkowski, Lana Sinapayen, Keisuke Suzuki

Abstract: Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living sy… ▽ More Artificial life is a research field studying what processes and properties define life, based on a multidisciplinary approach spanning the physical, natural and computational sciences. Artificial life aims to foster a comprehensive study of life beyond "life as we know it" and towards "life as it could be", with theoretical, synthetic and empirical models of the fundamental properties of living systems. While still a relatively young field, artificial life has flourished as an environment for researchers with different backgrounds, welcoming ideas and contributions from a wide range of subjects. Hybrid Life is an attempt to bring attention to some of the most recent developments within the artificial life community, rooted in more traditional artificial life studies but looking at new challenges emerging from interactions with other fields. In particular, Hybrid Life focuses on three complementary themes: 1) theories of systems and agents, 2) hybrid augmentation, with augmented architectures combining living and artificial systems, and 3) hybrid interactions among artificial and biological systems. After discussing some of the major sources of inspiration for these themes, we will focus on an overview of the works that appeared in Hybrid Life special sessions, hosted by the annual Artificial Life Conference between 2018 and 2022. △ Less

Submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.01749 [pdf, other]

Enhanced Visual Feedback with Decoupled Viewpoint Control in Immersive Humanoid Robot Teleoperation using SLAM

Authors: Yang Chen, Leyuan Sun, Mehdi Benallegue, Rafael Cisneros, Rohan P. Singh, Kenji Kaneko, Arnaud Tanguy, Guillaume Caron, Kenji Suzuki, Abderrahmane Kheddar, Fumio Kanehiro

Abstract: In immersive humanoid robot teleoperation, there are three main shortcomings that can alter the transparency of the visual feedback: the lag between the motion of the operator's and robot's head due to network communication delays or slow robot joint motion. This latency could cause a noticeable delay in the visual feedback, which jeopardizes the embodiment quality, can cause dizziness, and affect… ▽ More In immersive humanoid robot teleoperation, there are three main shortcomings that can alter the transparency of the visual feedback: the lag between the motion of the operator's and robot's head due to network communication delays or slow robot joint motion. This latency could cause a noticeable delay in the visual feedback, which jeopardizes the embodiment quality, can cause dizziness, and affects the interactivity resulting in operator frequent motion pauses for the visual feedback to settle; (ii) the mismatch between the camera's and the headset's field-of-views (FOV), the former having generally a lower FOV; and (iii) a mismatch between human's and robot's range of motions of the neck, the latter being also generally lower. In order to leverage these drawbacks, we developed a decoupled viewpoint control solution for a humanoid platform which allows visual feedback with low-latency and artificially increases the camera's FOV range to match that of the operator's headset. Our novel solution uses SLAM technology to enhance the visual feedback from a reconstructed mesh, complementing the areas that are not covered by the visual feedback from the robot. The visual feedback is presented as a point cloud in real-time to the operator. As a result, the operator is fed with real-time vision from the robot's head orientation by observing the pose of the point cloud. Balancing this kind of awareness and immersion is important in virtual reality based teleoperation, considering the safety and robustness of the control system. An experiment shows the effectiveness of our solution. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: IEEE-RAS International Conference on Humanoid Robots (Humanoids 2022)

arXiv:2210.15937 [pdf, other]

On the Use of Modality-Specific Large-Scale Pre-Trained Encoders for Multimodal Sentiment Analysis

Authors: Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato

Abstract: This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded… ▽ More This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA). Although the effectiveness of pre-trained encoders in various fields has been reported, conventional MSA methods employ them for only linguistic modality, and their application has not been investigated. This paper compares the features yielded by large-scale pre-trained encoders with conventional heuristic features. One each of the largest pre-trained encoders publicly available for each modality are used; CLIP-ViT, WavLM, and BERT for visual, acoustic, and linguistic modalities, respectively. Experiments on two datasets reveal that methods with domain-specific pre-trained encoders attain better performance than those with conventional features in both unimodal and multimodal scenarios. We also find it better to use the outputs of the intermediate layers of the encoders than those of the output layer. The codes are available at https://github.com/ando-hub/MSA_Pretrain. △ Less

Submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2210.13368 [pdf, other]

System Configuration and Navigation of a Guide Dog Robot: Toward Animal Guide Dog-Level Guiding Work

Authors: Hochul Hwang, Tim Xia, Ibrahima Keita, Ken Suzuki, Joydeep Biswas, Sunghoon I. Lee, Donghyun Kim

Abstract: A robot guide dog has compelling advantages over animal guide dogs for its cost-effectiveness, potential for mass production, and low maintenance burden. However, despite the long history of guide dog robot research, previous studies were conducted with little or no consideration of how the guide dog handler and the guide dog work as a team for navigation. To develop a robotic guiding system that… ▽ More A robot guide dog has compelling advantages over animal guide dogs for its cost-effectiveness, potential for mass production, and low maintenance burden. However, despite the long history of guide dog robot research, previous studies were conducted with little or no consideration of how the guide dog handler and the guide dog work as a team for navigation. To develop a robotic guiding system that is genuinely beneficial to blind or visually impaired individuals, we performed qualitative research, including interviews with guide dog handlers and trainers and first-hand blindfold walking experiences with various guide dogs. Grounded on the facts learned from vivid experience and interviews, we build a collaborative indoor navigation scheme for a guide dog robot that includes preferred features such as speed and directional control. For collaborative navigation, we propose a semantic-aware local path planner that enables safe and efficient guiding work by utilizing semantic information about the environment and considering the handler's position and directional cues to determine the collision-free path. We evaluate our integrated robotic system by testing guide blindfold walking in indoor settings and demonstrate guide dog-like navigation behavior by avoiding obstacles at typical gait speed ($0.7 \mathrm{m/s}$). △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: First two authors contributed equally

arXiv:2209.10712 [pdf, other]

doi 10.3169/mta.12.110

Compressing Sign Information in DCT-based Image Coding via Deep Sign Retrieval

Authors: Kei Suzuki, Chihiro Tsutake, Keita Takahashi, Toshiaki Fujii

Abstract: Compressing the sign information of discrete cosine transform (DCT) coefficients is an intractable problem in image coding schemes due to the equiprobable characteristics of the signs. To overcome this difficulty, we propose an efficient compression method for the sign information called "sign retrieval." This method is inspired by phase retrieval, which is a classical signal restoration problem o… ▽ More Compressing the sign information of discrete cosine transform (DCT) coefficients is an intractable problem in image coding schemes due to the equiprobable characteristics of the signs. To overcome this difficulty, we propose an efficient compression method for the sign information called "sign retrieval." This method is inspired by phase retrieval, which is a classical signal restoration problem of finding the phase information of discrete Fourier transform coefficients from their magnitudes. The sign information of all DCT coefficients is excluded from a bitstream at the encoder and is complemented at the decoder through our sign retrieval method. We show through experiments that our method outperforms previous ones in terms of the bit amount for the signs and computation cost. Our method, implemented in Python language, is available from https://github.com/ctsutake/dsr. △ Less

Submitted 10 May, 2024; v1 submitted 21 September, 2022; originally announced September 2022.

Journal ref: ITE Transactions on Media Technology and Applications 12 (2024) 110-122

arXiv:2208.02121 [pdf, other]

Pedestrian-Robot Interactions on Autonomous Crowd Navigation: Reactive Control Methods and Evaluation Metrics

Authors: Diego Paez-Granados, Yujie He, David Gonon, Dan Jia, Bastian Leibe, Kenji Suzuki, Aude Billard

Abstract: Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation me… ▽ More Autonomous navigation in highly populated areas remains a challenging task for robots because of the difficulty in guaranteeing safe interactions with pedestrians in unstructured situations. In this work, we present a crowd navigation control framework that delivers continuous obstacle avoidance and post-contact control evaluated on an autonomous personal mobility vehicle. We propose evaluation metrics for accounting efficiency, controller response and crowd interactions in natural crowds. We report the results of over 110 trials in different crowd types: sparse, flows, and mixed traffic, with low- (< 0.15 ppsm), mid- (< 0.65 ppsm), and high- (< 1 ppsm) pedestrian densities. We present comparative results between two low-level obstacle avoidance methods and a baseline of shared control. Results show a 10% drop in relative time to goal on the highest density tests, and no other efficiency metric decrease. Moreover, autonomous navigation showed to be comparable to shared-control navigation with a lower relative jerk and significantly higher fluency in commands indicating high compatibility with the crowd. We conclude that the reactive controller fulfils a necessary task of fast and continuous adaptation to crowd navigation, and it should be coupled with high-level planners for environmental and situational awareness. △ Less

Submitted 3 August, 2022; originally announced August 2022.

Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS-2022)

arXiv:2206.04780 [pdf, other]

doi 10.23919/APSIPAASC55919.2022.9980306

Speak Like a Dog: Human to Non-human creature Voice Conversion

Authors: Kohei Suzuki, Shoki Sakamoto, Tadahiro Taniguchi, Hirokazu Kameoka

Abstract: This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot… ▽ More This paper proposes a new voice conversion (VC) task from human speech to dog-like speech while preserving linguistic information as an example of human to non-human creature voice conversion (H2NH-VC) tasks. Although most VC studies deal with human to human VC, H2NH-VC aims to convert human speech into non-human creature-like speech. Non-parallel VC allows us to develop H2NH-VC, because we cannot collect a parallel dataset that non-human creatures speak human language. In this study, we propose to use dogs as an example of a non-human creature target domain and define the "speak like a dog" task. To clarify the possibilities and characteristics of the "speak like a dog" task, we conducted a comparative experiment using existing representative non-parallel VC methods in acoustic features (Mel-cepstral coefficients and Mel-spectrograms), network architectures (five different kernel-size settings), and training criteria (variational autoencoder (VAE)- based and generative adversarial network-based). Finally, the converted voices were evaluated using mean opinion scores: dog-likeness, sound quality and intelligibility, and character error rate (CER). The experiment showed that the employment of the Mel-spectrogram improved the dog-likeness of the converted speech, while it is challenging to preserve linguistic information. Challenges and limitations of the current VC methods for H2NH-VC are highlighted. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 5 pages, 4 figures

Journal ref: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 1388-1393)

arXiv:2206.01122 [pdf, other]

Super-resolving 2D stress tensor field conserving equilibrium constraints using physics informed U-Net

Authors: Kazuo Yonekura, Kento Maruoka, Kyoku Tyou, Katsuyuki Suzuki

Abstract: In a finite element analysis, using a large number of grids is important to obtain accurate results, but is a resource-consuming task. Aiming to real-time simulation and optimization, it is desired to obtain fine grid analysis results within a limited resource. This paper proposes a super-resolution method that predicts a stress tensor field in a high-resolution from low-resolution contour plots b… ▽ More In a finite element analysis, using a large number of grids is important to obtain accurate results, but is a resource-consuming task. Aiming to real-time simulation and optimization, it is desired to obtain fine grid analysis results within a limited resource. This paper proposes a super-resolution method that predicts a stress tensor field in a high-resolution from low-resolution contour plots by utilizing a U-Net-based neural network which is called PI-UNet. In addition, the proposed model minimizes the residual of the equilibrium constraints so that it outputs a physically reasonable solution. The proposed network is trained with FEM results of simple shapes, and is validated with a complicated realistic shape to evaluate generalization capability. Although ESRGAN is a standard model for image super-resolution, the proposed U-Net based model outperforms ESRGAN model in the stress tensor prediction task. △ Less

Submitted 2 June, 2022; originally announced June 2022.

arXiv:2205.12959 [pdf, ps, other]

Uniform Generalization Bound on Time and Inverse Temperature for Gradient Descent Algorithm and its Application to Analysis of Simulated Annealing

Authors: Keisuke Suzuki

Abstract: In this paper, we propose a novel uniform generalization bound on the time and inverse temperature for stochastic gradient Langevin dynamics (SGLD) in a non-convex setting. While previous works derive their generalization bounds by uniform stability, we use Rademacher complexity to make our generalization bound independent of the time and inverse temperature. Using Rademacher complexity, we can re… ▽ More In this paper, we propose a novel uniform generalization bound on the time and inverse temperature for stochastic gradient Langevin dynamics (SGLD) in a non-convex setting. While previous works derive their generalization bounds by uniform stability, we use Rademacher complexity to make our generalization bound independent of the time and inverse temperature. Using Rademacher complexity, we can reduce the problem to derive a generalization bound on the whole space to that on a bounded region and therefore can remove the effect of the time and inverse temperature from our generalization bound. As an application of our generalization bound, an evaluation on the effectiveness of the simulated annealing in a non-convex setting is also described. For the sample size $n$ and time $s$, we derive evaluations with orders $\sqrt{n^{-1} \log (n+1)}$ and $|(\log)^4(s)|^{-1}$, respectively. Here, $(\log)^4$ denotes the $4$ times composition of the logarithmic function. △ Less

Submitted 4 June, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: 16 pages, typos in (1.1), (1.2) and (4.1) have been fixed

MSC Class: Primary 60J20; Secondary 60H10

arXiv:2205.08664 [pdf, other]

doi 10.1145/3531348.3532177

Journey of Migrating Millions of Queries on The Cloud

Authors: Taro L. Saito, Naoki Takezoe, Yukihiro Okada, Takako Shimamoto, Dongmin Yu, Suprith Chandrashekharachar, Kai Sasaki, Shohei Okumiya, Yan Wang, Takashi Kurihara, Ryu Kobayashi, Keisuke Suzuki, Zhenghong Yang, Makoto Onizuka

Abstract: Treasure Data is processing millions of distributed SQL queries every day on the cloud. Upgrading the query engine service at this scale is challenging because we need to migrate all of the production queries of the customers to a new version while preserving the correctness and performance of the data processing pipelines. To ensure the quality of the query engines, we utilize our query logs to b… ▽ More Treasure Data is processing millions of distributed SQL queries every day on the cloud. Upgrading the query engine service at this scale is challenging because we need to migrate all of the production queries of the customers to a new version while preserving the correctness and performance of the data processing pipelines. To ensure the quality of the query engines, we utilize our query logs to build customer-specific benchmarks and replay these queries with real customer data in a secure pre-production environment. To simulate millions of queries, we need effective minimization of test query sets and better reporting of the simulation results to proactively find incompatible changes and performance regression of the new version. This paper describes the overall design of our system and shares various challenges in maintaining the quality of the query engine service on the cloud. △ Less

Submitted 17 May, 2022; originally announced May 2022.

Comments: This version is published in DBTest '22: Proceedings of the 2022 workshop on 9th International Workshop of Testing Database Systems

MSC Class: 68P20 ACM Class: H.2.4; D.2.9

arXiv:2203.04218 [pdf, other]

doi 10.1109/LRA.2022.3196159

Learning Bidirectional Translation between Descriptions and Actions with Small Paired Data

Authors: Minori Toyoda, Kanata Suzuki, Yoshihiko Hayashi, Tetsuya Ogata

Abstract: This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive t… ▽ More This study achieved bidirectional translation between descriptions and actions using small paired data from different modalities. The ability to mutually generate descriptions and actions is essential for robots to collaborate with humans in their daily lives, which generally requires a large dataset that maintains comprehensive pairs of both modality data. However, a paired dataset is expensive to construct and difficult to collect. To address this issue, this study proposes a two-stage training method for bidirectional translation. In the proposed method, we train recurrent autoencoders (RAEs) for descriptions and actions with a large amount of non-paired data. Then, we finetune the entire model to bind their intermediate representations using small paired data. Because the data used for pre-training do not require pairing, behavior-only data or a large language corpus can be used. We experimentally evaluated our method using a paired dataset consisting of motion-captured actions and descriptions. The results showed that our method performed well, even when the amount of paired data to train was small. The visualization of the intermediate representations of each RAE showed that similar actions were encoded in a clustered position and the corresponding feature vectors were well aligned. △ Less

Submitted 24 September, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 8 pages, 7 figures. To appear in IEEE Robotics and Automation Letters (RA-L) and the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022). An accompanying video is available at https://youtu.be/YlxM_kw6YLE

Journal ref: IEEE Robotics and Automation Letters ( Volume: 7, Issue: 4, October 2022)

Showing 1–50 of 83 results for author: Suzuki, K