Search | arXiv e-print repository

CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD Performance

Authors: Dongsuk Oh, Miryeong Kwon, Jiseon Kim, Eunjee Na, Junseok Moon, Hyunkyu Choi, Seonghyeon Jang, Hanjin Choi, Hongjoo Jung, Sangwon Lee, Myoungsoo Jung

Abstract: Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch t… ▽ More Integrating compute express link (CXL) with SSDs allows scalable access to large memory but has slower speeds than DRAMs. We present ExPAND, an expander-driven CXL prefetcher that offloads last-level cache (LLC) prefetching from host CPU to CXL-SSDs. ExPAND uses a heterogeneous prediction algorithm for prefetching and ensures data consistency with CXL.mem's back-invalidation. We examine prefetch timeliness for accurate latency estimation. ExPAND, being aware of CXL multi-tiered switching, provides end-to-end latency for each CXL-SSD and precise prefetch timeliness estimations. Our method reduces CXL-SSD reliance and enables direct host cache access for most data. ExPAND enhances graph application performance and SPEC CPU's performance by 9.0$\times$ and 14.7$\times$, respectively, surpassing CXL-SSD pools with diverse prefetching strategies. △ Less

Submitted 24 May, 2025; originally announced May 2025.

arXiv:2505.16705 [pdf, ps, other]

An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations

Authors: Seonghwan Park, Jueun Mun, Donghyun Oh, Namhoon Lee

Abstract: Concept bottleneck models (CBMs) ensure interpretability by decomposing predictions into human interpretable concepts. Yet the annotations used for training CBMs that enable this transparency are often noisy, and the impact of such corruption is not well understood. In this study, we present the first systematic study of noise in CBMs and show that even moderate corruption simultaneously impairs p… ▽ More Concept bottleneck models (CBMs) ensure interpretability by decomposing predictions into human interpretable concepts. Yet the annotations used for training CBMs that enable this transparency are often noisy, and the impact of such corruption is not well understood. In this study, we present the first systematic study of noise in CBMs and show that even moderate corruption simultaneously impairs prediction performance, interpretability, and the intervention effectiveness. Our analysis identifies a susceptible subset of concepts whose accuracy declines far more than the average gap between noisy and clean supervision and whose corruption accounts for most performance loss. To mitigate this vulnerability we propose a two-stage framework. During training, sharpness-aware minimization stabilizes the learning of noise-sensitive concepts. During inference, where clean labels are unavailable, we rank concepts by predictive entropy and correct only the most uncertain ones, using uncertainty as a proxy for susceptibility. Theoretical analysis and extensive ablations elucidate why sharpness-aware training confers robustness and why uncertainty reliably identifies susceptible concepts, providing a principled basis that preserves both interpretability and resilience in the presence of noise. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2505.00335 [pdf, other]

Efficient Neural Video Representation with Temporally Coherent Modulation

Authors: Seungjun Shin, Suji Kim, Dokwan Oh

Abstract: Implicit neural representations (INR) has found successful applications across diverse domains. To employ INR in real-life, it is important to speed up training. In the field of INR for video applications, the state-of-the-art approach employs grid-type parametric encoding and successfully achieves a faster encoding speed in comparison to its predecessors. However, the grid usage, which does not c… ▽ More Implicit neural representations (INR) has found successful applications across diverse domains. To employ INR in real-life, it is important to speed up training. In the field of INR for video applications, the state-of-the-art approach employs grid-type parametric encoding and successfully achieves a faster encoding speed in comparison to its predecessors. However, the grid usage, which does not consider the video's dynamic nature, leads to redundant use of trainable parameters. As a result, it has significantly lower parameter efficiency and higher bitrate compared to NeRV-style methods that do not use a parametric encoding. To address the problem, we propose Neural Video representation with Temporally coherent Modulation (NVTM), a novel framework that can capture dynamic characteristics of video. By decomposing the spatio-temporal 3D video data into a set of 2D grids with flow information, NVTM enables learning video representation rapidly and uses parameter efficiently. Our framework enables to process temporally corresponding pixels at once, resulting in the fastest encoding speed for a reasonable video quality, especially when compared to the NeRV-style method, with a speed increase of over 3 times. Also, it remarks an average of 1.54dB/0.019 improvements in PSNR/LPIPS on UVG (Dynamic) (even with 10% fewer parameters) and an average of 1.84dB/0.013 improvements in PSNR/LPIPS on MCL-JCV (Dynamic), compared to previous grid-type works. By expanding this to compression tasks, we demonstrate comparable performance to video compression standards (H.264, HEVC) and recent INR approaches for video compression. Additionally, we perform extensive experiments demonstrating the superior performance of our algorithm across diverse tasks, encompassing super resolution, frame interpolation and video inpainting. Project page is https://sujiikim.github.io/NVTM/. △ Less

Submitted 1 May, 2025; originally announced May 2025.

Comments: ECCV 2024

arXiv:2504.11717 [pdf, other]

Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports

Authors: Donggeon David Oh, Justin Lidard, Haimin Hu, Himani Sinhmar, Elle Lazarski, Deepak Gopinath, Emily S. Sumner, Jonathan A. DeCastro, Guy Rosman, Naomi Ehrich Leonard, Jaime Fernández Fisac

Abstract: We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filte… ▽ More We propose a human-centered safety filter (HCSF) for shared autonomy that significantly enhances system safety without compromising human agency. Our HCSF is built on a neural safety value function, which we first learn scalably through black-box interactions and then use at deployment to enforce a novel state-action control barrier function (Q-CBF) safety constraint. Since this Q-CBF safety filter does not require any knowledge of the system dynamics for both synthesis and runtime safety monitoring and intervention, our method applies readily to complex, black-box shared autonomy systems. Notably, our HCSF's CBF-based interventions modify the human's actions minimally and smoothly, avoiding the abrupt, last-moment corrections delivered by many conventional safety filters. We validate our approach in a comprehensive in-person user study using Assetto Corsa-a high-fidelity car racing simulator with black-box dynamics-to assess robustness in "driving on the edge" scenarios. We compare both trajectory data and drivers' perceptions of our HCSF assistance against unassisted driving and a conventional safety filter. Experimental results show that 1) compared to having no assistance, our HCSF improves both safety and user satisfaction without compromising human agency or comfort, and 2) relative to a conventional safety filter, our proposed HCSF boosts human agency, comfort, and satisfaction while maintaining robustness. △ Less

Submitted 29 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

Comments: Accepted to Robotics: Science and Systems (R:SS) 2025, 22 pages, 16 figures, 7 tables

arXiv:2501.09819 [pdf, other]

Torque Responsive Metamaterials Enable High Payload Soft Robot Arms

Authors: Ian Good, Srivatsan Balaji, David Oh, Sawyer Thomas, Jeffrey I. Lipton

Abstract: Soft robots have struggled to support large forces and moments while also supporting their own weight against gravity. This limits their ability to reach certain configurations necessary for tasks such as inspection and pushing objects up. We have overcome this limitation by creating an electrically driven metamaterial soft arm using handed shearing auxetics (HSA) and bendable extendable torque re… ▽ More Soft robots have struggled to support large forces and moments while also supporting their own weight against gravity. This limits their ability to reach certain configurations necessary for tasks such as inspection and pushing objects up. We have overcome this limitation by creating an electrically driven metamaterial soft arm using handed shearing auxetics (HSA) and bendable extendable torque resistant (BETR) shafts. These use the large force and torque capacity of HSAs and the nestable torque transmission of BETRs to create a strong soft arm. We found that the HSA arm was able to push 2.3 kg vertically and lift more than 600 g when positioned horizontally, supporting 0.33 Nm of torque at the base. The arm is able to move between waypoints while carrying the large payload and demonstrates consistent movement with path variance below 5 mm. The HSA arm's ability to perform active grasping with HSA grippers was also demonstrated, requiring 20 N of pull force to dislodge the object. Finally, we test the arm in a pipe inspection task. The arm is able to locate all the defects while sliding against the inner surface of the pipe, demonstrating its compliance. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: 9 pages, 8 figures, currently under review

arXiv:2411.05832 [pdf, other]

Diversify, Contextualize, and Adapt: Efficient Entropy Modeling for Neural Image Codec

Authors: Jun-Hyuk Kim, Seungeon Kim, Won-Hee Lee, Dokwan Oh

Abstract: Designing a fast and effective entropy model is challenging but essential for practical application of neural codecs. Beyond spatial autoregressive entropy models, more efficient backward adaptation-based entropy models have been recently developed. They not only reduce decoding time by using smaller number of modeling steps but also maintain or even improve rate--distortion performance by leverag… ▽ More Designing a fast and effective entropy model is challenging but essential for practical application of neural codecs. Beyond spatial autoregressive entropy models, more efficient backward adaptation-based entropy models have been recently developed. They not only reduce decoding time by using smaller number of modeling steps but also maintain or even improve rate--distortion performance by leveraging more diverse contexts for backward adaptation. Despite their significant progress, we argue that their performance has been limited by the simple adoption of the design convention for forward adaptation: using only a single type of hyper latent representation, which does not provide sufficient contextual information, especially in the first modeling step. In this paper, we propose a simple yet effective entropy modeling framework that leverages sufficient contexts for forward adaptation without compromising on bit-rate. Specifically, we introduce a strategy of diversifying hyper latent representations for forward adaptation, i.e., using two additional types of contexts along with the existing single type of context. In addition, we present a method to effectively use the diverse contexts for contextualizing the current elements to be encoded/decoded. By addressing the limitation of the previous approach, our proposed framework leads to significant performance improvements. Experimental results on popular datasets show that our proposed framework consistently improves rate--distortion performance across various bit-rate regions, e.g., 3.73% BD-rate gain over the state-of-the-art baseline on the Kodak dataset. △ Less

Submitted 5 November, 2024; originally announced November 2024.

Comments: NeurIPS 2024

arXiv:2410.01866 [pdf, other]

House of Cards: Massive Weights in LLMs

Authors: Jaehoon Oh, Seungjun Shin, Dokwan Oh

Abstract: Massive activations, which manifest in specific feature dimensions of hidden states, introduce a significant bias in large language models (LLMs), leading to an overemphasis on the corresponding token. In this paper, we identify that massive activations originate not from the hidden state but from the intermediate state of a feed-forward network module in an early layer. Expanding on the previous… ▽ More Massive activations, which manifest in specific feature dimensions of hidden states, introduce a significant bias in large language models (LLMs), leading to an overemphasis on the corresponding token. In this paper, we identify that massive activations originate not from the hidden state but from the intermediate state of a feed-forward network module in an early layer. Expanding on the previous observation that massive activations occur only in specific feature dimensions, we dive deep into the weights that cause massive activations. Specifically, we define top-$k$ massive weights as the weights that contribute to the dimensions with the top-$k$ magnitudes in the intermediate state. When these massive weights are set to zero, the functionality of LLMs is entirely disrupted. However, when all weights except for massive weights are set to zero, it results in a relatively minor performance drop, even though a much larger number of weights are set to zero. This implies that during the pre-training process, learning is dominantly focused on massive weights. Building on this observation, we propose a simple plug-and-play method called MacDrop (massive weights curriculum dropout), to rely less on massive weights during parameter-efficient fine-tuning. This method applies dropout to the pre-trained massive weights, starting with a high dropout probability and gradually decreasing it as fine-tuning progresses. Through various experiments, including zero-shot downstream tasks, long-context tasks, and ablation studies, we demonstrate that \texttt{MacDrop} generally improves performance and strengthens robustness. △ Less

Submitted 6 February, 2025; v1 submitted 2 October, 2024; originally announced October 2024.

Comments: Under review

arXiv:2403.02944 [pdf, other]

Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

Authors: Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee

Abstract: Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we pr… ▽ More Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we propose a compression framework that leverages text information mainly by text-adaptive encoding and training with joint image-text loss. By doing so, we avoid decoding based on text-guided generative models -- known for high generative diversity -- and effectively utilize the semantic information of text at a global level. Experimental results on various datasets show that our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions. In particular, our method outperforms all baselines in terms of LPIPS, with some room for even more improvements when we use more carefully generated captions. △ Less

Submitted 21 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: The first two authors contributed equally

arXiv:2402.02350 [pdf, other]

Interference-Aware Emergent Random Access Protocol for Downlink LEO Satellite Networks

Authors: Chang-Yong Lim, Jihong Park, Jinho Choi, Ju-Hyung Lee, Daesub Oh, Heewook Kim

Abstract: In this article, we propose a multi-agent deep reinforcement learning (MADRL) framework to train a multiple access protocol for downlink low earth orbit (LEO) satellite networks. By improving the existing learned protocol, emergent random access channel (eRACH), our proposed method, coined centralized and compressed emergent signaling for eRACH (Ce2RACH), can mitigate inter-satellite interference… ▽ More In this article, we propose a multi-agent deep reinforcement learning (MADRL) framework to train a multiple access protocol for downlink low earth orbit (LEO) satellite networks. By improving the existing learned protocol, emergent random access channel (eRACH), our proposed method, coined centralized and compressed emergent signaling for eRACH (Ce2RACH), can mitigate inter-satellite interference by exchanging additional signaling messages jointly learned through the MADRL training process. Simulations demonstrate that Ce2RACH achieves up to 36.65% higher network throughput compared to eRACH, while the cost of signaling messages increase linearly with the number of users. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 2 pages, 4 figures, 1 table; submitted to IEEE for possible publication

arXiv:2312.12392 [pdf, other]

Recursive Camera Painting: A Method for Real-Time Painterly Renderings of 3D Scenes

Authors: Ergun Akleman, Cassie Mullins, Christopher Morrison, David Oh

Abstract: In this work, we present the recursive camera-painting approach to obtain painterly smudging in real-time rendering applications. We have implemented recursive camera painting as both a GPU-based ray-tracing and in a Virtual Reality game environment. Using this approach, we can obtain dynamic 3D Paintings in real-time. In a camera painting, each pixel has a separate associated camera whose paramet… ▽ More In this work, we present the recursive camera-painting approach to obtain painterly smudging in real-time rendering applications. We have implemented recursive camera painting as both a GPU-based ray-tracing and in a Virtual Reality game environment. Using this approach, we can obtain dynamic 3D Paintings in real-time. In a camera painting, each pixel has a separate associated camera whose parameters are computed from a corresponding image of the same size. In recursive camera painting, we use the rendered images to compute new camera parameters. When we apply this process a few times, it creates painterly images that can be viewed as real-time 3D dynamic paintings. These visual results are not surprising since multi-view techniques help to obtain painterly effects. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 10 pages

arXiv:2308.15791 [pdf, other]

Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding

Authors: Yeongwoong Kim, Suyong Bahk, Seungeon Kim, Won Hee Lee, Dokwan Oh, Hui Yong Kim

Abstract: Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N… ▽ More Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding. △ Less

Submitted 5 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2302.14273 [pdf, other]

QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking

Authors: Yunwoo Lee, Jungwon Park, Seungwoo Jung, Boseong Jeon, Dahyun Oh, H. Jin Kim

Abstract: Maintaining the visibility of the target is one of the major objectives of aerial tracking missions. This paper proposes a target-visible trajectory planning pipeline using quadratic programming (QP). Our approach can handle various tracking settings, including 1) single- and dual-target following and 2) both static and dynamic environments, unlike other works that focus on a single specific setup… ▽ More Maintaining the visibility of the target is one of the major objectives of aerial tracking missions. This paper proposes a target-visible trajectory planning pipeline using quadratic programming (QP). Our approach can handle various tracking settings, including 1) single- and dual-target following and 2) both static and dynamic environments, unlike other works that focus on a single specific setup. In contrast to other studies that fully trust the predicted trajectory of the target and consider only the visibility of the target's center, our pipeline considers error in target path prediction and the entire body of the target to maintain the target visibility robustly. First, a prediction module uses a sample-check strategy to quickly calculate the reachable sets of moving objects, which represent the areas their bodies can reach, considering obstacles. Subsequently, the planning module formulates a single QP problem, considering path topology, to generate a tracking trajectory that maximizes the visibility of the target's reachable set among obstacles. The performance of the planner is validated in multiple scenarios, through high-fidelity simulations and real-world experiments. △ Less

Submitted 26 November, 2024; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 18 pages, 16 figures

arXiv:2301.08078 [pdf, other]

Stable Contact Guaranteeing Motion/Force Control for an Aerial Manipulator on an Arbitrarily Tilted Surface

Authors: Jeonghyun Byun, Byeongjun Kim, Changhyeon Kim, Donggeon David Oh, H. Jin Kim

Abstract: This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the… ▽ More This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the gains of the disturbance-observer (DOB)-based motion/force controller are calculated based on the stability conditions considering both the model uncertainties in the dynamic equation and switching between the free and contact motions. To validate the proposed controller, we conducted the time-varying motion/force tracking experiments with different approach speeds and orientations of the surface. The results show that our controller enables the aerial manipulator to track the time-varying motion/force trajectories. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: to be presented in 2023 IEEE International Conference on Robotics and Automations (ICRA), London, United Kingdom, 2023

arXiv:2212.07026 [pdf, other]

Improving group robustness under noisy labels using predictive uncertainty

Authors: Dongpin Oh, Dae Lee, Jeunghyun Byun, Bonggun Shin

Abstract: The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) sample… ▽ More The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues. △ Less

Submitted 13 December, 2022; originally announced December 2022.

arXiv:2209.05972 [pdf, other]

Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling

Authors: Dongsuk Oh, Yejin Kim, Hodong Lee, H. Howie Huang, Heuiseok Lim

Abstract: Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. Thi… ▽ More Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: Accepted to COLING 2022

arXiv:2112.09368 [pdf, other]

Improving evidential deep learning via multi-task learning

Authors: Dongpin Oh, Bonggun Shin

Abstract: The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction acc… ▽ More The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: Accepted by AAAI-2022

arXiv:2112.08619 [pdf, other]

Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge

Authors: Yoonna Jang, Jungwoo Lim, Yuna Hur, Dongsuk Oh, Suhyune Son, Yeonsoo Lee, Donghoon Shin, Seungryong Kim, Heuiseok Lim

Abstract: Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a… ▽ More Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment. △ Less

Submitted 16 May, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

Comments: Accepted paper at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22)

arXiv:2109.09041 [pdf, other]

Online Distributed Trajectory Planning for Quadrotor Swarm with Feasibility Guarantee using Linear Safe Corridor

Authors: Jungwon Park, Dabin Kim, Gyeong Chan Kim, Dahyun Oh, H. Jin Kim

Abstract: This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optim… ▽ More This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optimization failure. We adopt a priority-based goal planning method to prevent the deadlock without an additional procedure to decide which robot to yield. The proposed algorithm can compute the trajectories for 60 agents on average 15.5 ms per agent with an Intel i7 laptop and shows a similar flight distance and distance compared to the baselines based on soft constraints. We verified that the proposed method can reach the goal without deadlock in both the random forest and the indoor space, and we validated the safety and operability of the proposed algorithm through a real flight test with ten quadrotors in a maze-like environment. △ Less

Submitted 3 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

Comments: 8 pages, RA-L 2022 under review

arXiv:2107.00844 [pdf, ps, other]

doi 10.1063/5.0054920

Deep learning-based statistical noise reduction for multidimensional spectral data

Authors: Younsik Kim, Dongjin Oh, Soonsang Huh, Dongjoon Song, Sunbeom Jeong, Junyoung Kwon, Minsoo Kim, Donghan Kim, Hanyoung Ryu, Jongkeun Jung, Wonshik Kyung, Byungmin Sohn, Suyoung Lee, Jounghoon Hyun, Yeonghoon Lee, Yeongkwan Kimand Changyoung Kim

Abstract: In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an exa… ▽ More In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise. △ Less

Submitted 2 July, 2021; originally announced July 2021.

Comments: 8 pages, 8 figures

Journal ref: Review of Scientific Instruments 92, 073901 (2021)

arXiv:2105.01869 [pdf, other]

Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression

Authors: Baeseong Park, Se Jung Kwon, Daehwan Oh, Byeongwook Kim, Dongsoo Lee

Abstract: Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to suppor… ▽ More Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to support fine-grained pruning methods such that sparse neural networks can be stored in a highly regular structure. We first estimate the maximum compression ratio of encoding-based compression using entropy. Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme. We demonstrate that our proposed compression scheme achieves almost the maximum compression ratio for the Transformer and ResNet-50 pruned by various fine-grained pruning methods. △ Less

Submitted 30 January, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: ICLR 2022 Accepted

arXiv:2105.01868 [pdf, ps, other]

Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization

Authors: Byeongwook Kim, Dongsoo Lee, Yeonju Ro, Yongkweon Jeon, Se Jung Kwon, Baeseong Park, Daehwan Oh

Abstract: Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non… ▽ More Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new post-training uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization. △ Less

Submitted 5 May, 2021; originally announced May 2021.

arXiv:2103.01152 [pdf, other]

doi 10.1116/6.0001203

PHIDL: Python CAD layout and geometry creation for nanolithography

Authors: A. N. McCaughan, A. M. Tait, S. M. Buckley, D. M. Oh, J. T. Chiles, J. M. Shainline, S. W. Nam

Abstract: Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-sou… ▽ More Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-source CAD tools -- especially those in the GDSII design space -- that prevent wider adoption by scientists and students who might otherwise benefit from scripted design. For example, constructing relations between adjacent geometries is often much more difficult than necessary -- spacing a resonator structure a few micrometers from a readout structure often requires manually-coding the placement arithmetic. While inconveniences like this can be overcome by writing custom functions, they are often significant barriers to entry for new users or those less familiar with programming. To help streamline the design process and reduce barrier to entry for scripting designs, we have developed PHIDL, an open-source GDSII-based CAD tool for Python 2 and 3. △ Less

Submitted 1 March, 2021; originally announced March 2021.

Journal ref: J. Vac. Sci. Technol. B 39, 062601 (2021)

arXiv:2011.06769 [pdf]

Toward the Fully Physics-Informed Echo State Network -- an ODE Approximator Based on Recurrent Artificial Neurons

Authors: Dong Keun Oh

Abstract: Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of diffe… ▽ More Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of differential equations, the constraint in recurrence just takes shape to secure a proper regression method for the ESN-based ODE approximator. After then, the actual training process is established on the idea of two-pass strategy for regression. Aiming at the fully physics-informed reservoir model, a couple of nonlinear dynamical problems are demonstrated as the computations obtained from the proposed method in this study. △ Less

Submitted 13 November, 2020; originally announced November 2020.

Comments: 30 pages, 12 figures, research paper

MSC Class: 68T27; 65L99 ACM Class: I.2.8; J.2

arXiv:2011.00766 [pdf, other]

I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning

Authors: Jungwoo Lim, Dongsuk Oh, Yoonna Jang, Kisu Yang, Heuiseok Lim

Abstract: CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-… ▽ More CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines. △ Less

Submitted 5 November, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: Accepted to COLING 2020

arXiv:2009.04703 [pdf, other]

Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection

Authors: Taesun Whang, Dongyub Lee, Dongsuk Oh, Chanhee Lee, Kijong Han, Dong-hun Lee, Saebyeok Lee

Abstract: In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by for… ▽ More In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating the tasks as dialog--response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient for learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which leads to models pushing the state-of-the-art with significant margins on multiple public benchmark datasets. △ Less

Submitted 16 December, 2020; v1 submitted 10 September, 2020; originally announced September 2020.

Comments: Accepted to AAAI 2021

arXiv:2006.16166 [pdf, other]

Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery

Authors: Aidean Sharghi, Helene Haugerud, Daniel Oh, Omid Mohareri

Abstract: Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-as… ▽ More Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of 88% mean Average Precision. △ Less

Submitted 29 June, 2020; originally announced June 2020.

Comments: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI'20)

arXiv:1911.04015

Word Sense Disambiguation using Knowledge-based Word Similarity

Authors: Sunjae Kwon, Dongsuk Oh, Youngjoong Ko

Abstract: In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semanti… ▽ More In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semantic relationships from the lexical knowledge-base. Second, we propose a method for extracting the contextual words from the text for analyzing an ambiguous word based on the similarity of word vector representations. To validate the effectiveness of our WSD system, we conducted experiments on the five benchmark English WSD corpora (Senseval-02, Senseval-03, SemEval-07, SemEval-13, and SemEval-15). The obtained results demonstrated that the suggested methods significantly enhanced the WSD performance. Furthermore, our system outperformed the existing knowledge-based WSD systems and showed a performance comparable to that of the state-of-the-art supervised WSD systems. △ Less

Submitted 21 June, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

Comments: Since we changed some hyper-parameters, experimental results must be changed. We will resubmit with the retest results

arXiv:1908.04812 [pdf, other]

An Effective Domain Adaptive Post-Training Method for BERT in Response Selection

Authors: Taesun Whang, Dongyub Lee, Chanhee Lee, Kisu Yang, Dongsuk Oh, HeuiSeok Lim

Abstract: We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous basel… ▽ More We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous baselines of each task, it still has limitations if a task corpus is too focused on a certain domain. Post-training on domain-specific corpus (e.g., Ubuntu Corpus) helps the model to train contextualized representations and words that do not appear in general corpus (e.g., English Wikipedia). Experimental results show that our approach achieves new state-of-the-art on two response selection benchmarks (i.e., Ubuntu Corpus V1, Advising Corpus) performance improvement by 5.9% and 6% on R@1. △ Less

Submitted 26 July, 2020; v1 submitted 13 August, 2019; originally announced August 2019.

Comments: INTERSPEECH 2020

arXiv:1811.02628 [pdf, other]

Learning Bone Suppression from Dual Energy Chest X-rays using Adversarial Networks

Authors: Dong Yul Oh, Il Dong Yun

Abstract: Suppressing bones on chest X-rays such as ribs and clavicle is often expected to improve pathologies classification. These bones can interfere with a broad range of diagnostic tasks on pulmonary disease except for musculoskeletal system. Current conventional method for acquisition of bone suppressed X-rays is dual energy imaging, which captures two radiographs at a very short interval with differe… ▽ More Suppressing bones on chest X-rays such as ribs and clavicle is often expected to improve pathologies classification. These bones can interfere with a broad range of diagnostic tasks on pulmonary disease except for musculoskeletal system. Current conventional method for acquisition of bone suppressed X-rays is dual energy imaging, which captures two radiographs at a very short interval with different energy levels; however, the patient is exposed to radiation twice and the artifacts arise due to heartbeats between two shots. In this paper, we introduce a deep generative model trained to predict bone suppressed images on single energy chest X-rays, analyzing a finite set of previously acquired dual energy chest X-rays. Since the relatively small amount of data is available, such approach relies on the methodology maximizing the data utilization. Here we integrate the following two approaches. First, we use a conditional generative adversarial network that complements the traditional regression method minimizing the pairwise image difference. Second, we use Haar 2D wavelet decomposition to offer a perceptual guideline in frequency details to allow the model to converge quickly and efficiently. As a result, we achieve state-of-the-art performance on bone suppression as compared to the existing approaches with dual energy chest X-rays. △ Less

Submitted 4 November, 2018; originally announced November 2018.

arXiv:1003.4057 [pdf, ps, other]

Construction of optimal codes in deletion and insertion metric

Authors: Hyun Kwang Kim, Joon Yop Lee, Dong Yeol Oh

Abstract: We improve Levenshtein's upper bound for the cardinality of a code of length four that is capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we construct an optimal perfect code that is capable of correcting single deletions for the same parameters. We improve Levenshtein's upper bound for the cardinality of a code of length four that is capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we construct an optimal perfect code that is capable of correcting single deletions for the same parameters. △ Less

Submitted 22 March, 2010; originally announced March 2010.

Comments: 20 pages, The material of this paper was presented in part at the 10th International Workshop on Algebraic and Combinatorial Coding Theory, Zvenigorod, Russia, September 2006

arXiv:0810.3729

Optimal codes in deletion and insertion metric

Authors: Hyun Kwang Kim, Joon Yop Lee, Dong Yeol Oh

Abstract: We improve the upper bound of Levenshtein for the cardinality of a code of length 4 capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we will construct an optimal perfect code capable of correcting single deletions for the same parameters. We improve the upper bound of Levenshtein for the cardinality of a code of length 4 capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we will construct an optimal perfect code capable of correcting single deletions for the same parameters. △ Less

Submitted 22 March, 2010; v1 submitted 20 October, 2008; originally announced October 2008.

Comments: 19 pages,The material of this paper was presented in part at the 10th International Workshop on Algebraic and Combinatorial Coding Theory, Zvenigorod, Russia, September 2006

MSC Class: 94B60

Showing 1–31 of 31 results for author: Oh, D