-
Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity
Authors:
Hagyeong Lee,
Minkyu Kim,
Jun-Hyuk Kim,
Seungeon Kim,
Dokwan Oh,
Jaeho Lee
Abstract:
Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we pr…
▽ More
Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we propose a compression framework that leverages text information mainly by text-adaptive encoding and training with joint image-text loss. By doing so, we avoid decoding based on text-guided generative models -- known for high generative diversity -- and effectively utilize the semantic information of text at a global level. Experimental results on various datasets show that our method can achieve high pixel-level and perceptual quality, with either human- or machine-generated captions. In particular, our method outperforms all baselines in terms of LPIPS, with some room for even more improvements when we use more carefully generated captions.
△ Less
Submitted 21 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Interference-Aware Emergent Random Access Protocol for Downlink LEO Satellite Networks
Authors:
Chang-Yong Lim,
Jihong Park,
Jinho Choi,
Ju-Hyung Lee,
Daesub Oh,
Heewook Kim
Abstract:
In this article, we propose a multi-agent deep reinforcement learning (MADRL) framework to train a multiple access protocol for downlink low earth orbit (LEO) satellite networks. By improving the existing learned protocol, emergent random access channel (eRACH), our proposed method, coined centralized and compressed emergent signaling for eRACH (Ce2RACH), can mitigate inter-satellite interference…
▽ More
In this article, we propose a multi-agent deep reinforcement learning (MADRL) framework to train a multiple access protocol for downlink low earth orbit (LEO) satellite networks. By improving the existing learned protocol, emergent random access channel (eRACH), our proposed method, coined centralized and compressed emergent signaling for eRACH (Ce2RACH), can mitigate inter-satellite interference by exchanging additional signaling messages jointly learned through the MADRL training process. Simulations demonstrate that Ce2RACH achieves up to 36.65% higher network throughput compared to eRACH, while the cost of signaling messages increase linearly with the number of users.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Recursive Camera Painting: A Method for Real-Time Painterly Renderings of 3D Scenes
Authors:
Ergun Akleman,
Cassie Mullins,
Christopher Morrison,
David Oh
Abstract:
In this work, we present the recursive camera-painting approach to obtain painterly smudging in real-time rendering applications. We have implemented recursive camera painting as both a GPU-based ray-tracing and in a Virtual Reality game environment. Using this approach, we can obtain dynamic 3D Paintings in real-time. In a camera painting, each pixel has a separate associated camera whose paramet…
▽ More
In this work, we present the recursive camera-painting approach to obtain painterly smudging in real-time rendering applications. We have implemented recursive camera painting as both a GPU-based ray-tracing and in a Virtual Reality game environment. Using this approach, we can obtain dynamic 3D Paintings in real-time. In a camera painting, each pixel has a separate associated camera whose parameters are computed from a corresponding image of the same size. In recursive camera painting, we use the rendered images to compute new camera parameters. When we apply this process a few times, it creates painterly images that can be viewed as real-time 3D dynamic paintings. These visual results are not surprising since multi-view techniques help to obtain painterly effects.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Neural Video Compression with Temporal Layer-Adaptive Hierarchical B-frame Coding
Authors:
Yeongwoong Kim,
Suyong Bahk,
Seungeon Kim,
Won Hee Lee,
Dokwan Oh,
Hui Yong Kim
Abstract:
Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In N…
▽ More
Neural video compression (NVC) is a rapidly evolving video coding research area, with some models achieving superior coding efficiency compared to the latest video coding standard Versatile Video Coding (VVC). In conventional video coding standards, the hierarchical B-frame coding, which utilizes a bidirectional prediction structure for higher compression, had been well-studied and exploited. In NVC, however, limited research has investigated the hierarchical B scheme. In this paper, we propose an NVC model exploiting hierarchical B-frame coding with temporal layer-adaptive optimization. We first extend an existing unidirectional NVC model to a bidirectional model, which achieves -21.13% BD-rate gain over the unidirectional baseline model. However, this model faces challenges when applied to sequences with complex or large motions, leading to performance degradation. To address this, we introduce temporal layer-adaptive optimization, incorporating methods such as temporal layer-adaptive quality scaling (TAQS) and temporal layer-adaptive latent scaling (TALS). The final model with the proposed methods achieves an impressive BD-rate gain of -39.86% against the baseline. It also resolves the challenges in sequences with large or complex motions with up to -49.13% more BD-rate gains than the simple bidirectional extension. This improvement is attributed to the allocation of more bits to lower temporal layers, thereby enhancing overall reconstruction quality with smaller bits. Since our method has little dependency on a specific NVC model architecture, it can serve as a general tool for extending unidirectional NVC models to the ones with hierarchical B-frame coding.
△ Less
Submitted 5 September, 2023; v1 submitted 30 August, 2023;
originally announced August 2023.
-
QP Chaser: Polynomial Trajectory Generation for Autonomous Aerial Tracking
Authors:
Yunwoo Lee,
Jungwon Park,
Seungwoo Jung,
Boseong Jeon,
Dahyun Oh,
H. Jin Kim
Abstract:
Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the pred…
▽ More
Maintaining the visibility of the targets is one of the major objectives of aerial tracking applications. This paper proposes QP Chaser, a trajectory planning pipeline that can enhance the visibility of single- and dual-target in both static and dynamic environments. As the name suggests, the proposed planner generates a target-visible trajectory via quadratic programming problems. First, the predictor forecasts the reachable sets of moving objects with a sample-and-check strategy considering obstacles. Subsequently, the trajectory planner reinforces the visibility of targets with consideration of 1) path topology and 2) reachable sets of targets and obstacles. We define a target-visible region (TVR) with topology analysis of not only static obstacles but also dynamic obstacles, and it reflects reachable sets of moving targets and obstacles to maintain the whole body of the target within the camera image robustly and ceaselessly. The online performance of the proposed planner is validated in multiple scenarios, including high-fidelity simulations and real-world experiments.
△ Less
Submitted 27 February, 2023;
originally announced February 2023.
-
Stable Contact Guaranteeing Motion/Force Control for an Aerial Manipulator on an Arbitrarily Tilted Surface
Authors:
Jeonghyun Byun,
Byeongjun Kim,
Changhyeon Kim,
Donggeon David Oh,
H. Jin Kim
Abstract:
This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the…
▽ More
This study aims to design a motion/force controller for an aerial manipulator which guarantees the tracking of time-varying motion/force trajectories as well as the stability during the transition between free and contact motions. To this end, we model the force exerted on the end-effector as the Kelvin-Voigt linear model and estimate its parameters by recursive least-squares estimator. Then, the gains of the disturbance-observer (DOB)-based motion/force controller are calculated based on the stability conditions considering both the model uncertainties in the dynamic equation and switching between the free and contact motions. To validate the proposed controller, we conducted the time-varying motion/force tracking experiments with different approach speeds and orientations of the surface. The results show that our controller enables the aerial manipulator to track the time-varying motion/force trajectories.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Improving group robustness under noisy labels using predictive uncertainty
Authors:
Dongpin Oh,
Dae Lee,
Jeunghyun Byun,
Bonggun Shin
Abstract:
The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) sample…
▽ More
The standard empirical risk minimization (ERM) can underperform on certain minority groups (i.e., waterbirds in lands or landbirds in water) due to the spurious correlation between the input and its label. Several studies have improved the worst-group accuracy by focusing on the high-loss samples. The hypothesis behind this is that such high-loss samples are \textit{spurious-cue-free} (SCF) samples. However, these approaches can be problematic since the high-loss samples may also be samples with noisy labels in the real-world scenarios. To resolve this issue, we utilize the predictive uncertainty of a model to improve the worst-group accuracy under noisy labels. To motivate this, we theoretically show that the high-uncertainty samples are the SCF samples in the binary classification problem. This theoretical result implies that the predictive uncertainty is an adequate indicator to identify SCF samples in a noisy label setting. Motivated from this, we propose a novel ENtropy based Debiasing (END) framework that prevents models from learning the spurious cues while being robust to the noisy labels. In the END framework, we first train the \textit{identification model} to obtain the SCF samples from a training set using its predictive uncertainty. Then, another model is trained on the dataset augmented with an oversampled SCF set. The experimental results show that our END framework outperforms other strong baselines on several real-world benchmarks that consider both the noisy labels and the spurious-cues.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Don't Judge a Language Model by Its Last Layer: Contrastive Learning with Layer-Wise Attention Pooling
Authors:
Dongsuk Oh,
Yejin Kim,
Hodong Lee,
H. Howie Huang,
Heuiseok Lim
Abstract:
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. Thi…
▽ More
Recent pre-trained language models (PLMs) achieved great success on many natural language processing tasks through learning linguistic features and contextualized sentence representation. Since attributes captured in stacked layers of PLMs are not clearly identified, straightforward approaches such as embedding the last layer are commonly preferred to derive sentence representations from PLMs. This paper introduces the attention-based pooling strategy, which enables the model to preserve layer-wise signals captured in each layer and learn digested linguistic features for downstream tasks. The contrastive learning objective can adapt the layer-wise attention pooling to both unsupervised and supervised manners. It results in regularizing the anisotropic space of pre-trained embeddings and being more uniform. We evaluate our model on standard semantic textual similarity (STS) and semantic search tasks. As a result, our method improved the performance of the base contrastive learned BERT_base and variants.
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Improving evidential deep learning via multi-task learning
Authors:
Dongpin Oh,
Bonggun Shin
Abstract:
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction acc…
▽ More
The Evidential regression network (ENet) estimates a continuous target and its predictive uncertainty without costly Bayesian model averaging. However, it is possible that the target is inaccurately predicted due to the gradient shrinkage problem of the original loss function of the ENet, the negative log marginal likelihood (NLL) loss. In this paper, the objective is to improve the prediction accuracy of the ENet while maintaining its efficient uncertainty estimation by resolving the gradient shrinkage problem. A multi-task learning (MTL) framework, referred to as MT-ENet, is proposed to accomplish this aim. In the MTL, we define the Lipschitz modified mean squared error (MSE) loss function as another loss and add it to the existing NLL loss. The Lipschitz modified MSE loss is designed to mitigate the gradient conflict with the NLL loss by dynamically adjusting its Lipschitz constant. By doing so, the Lipschitz MSE loss does not disturb the uncertainty estimation of the NLL loss. The MT-ENet enhances the predictive accuracy of the ENet without losing uncertainty estimation capability on the synthetic dataset and real-world benchmarks, including drug-target affinity (DTA) regression. Furthermore, the MT-ENet shows remarkable calibration and out-of-distribution detection capability on the DTA benchmarks.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Call for Customized Conversation: Customized Conversation Grounding Persona and Knowledge
Authors:
Yoonna Jang,
Jungwoo Lim,
Yuna Hur,
Dongsuk Oh,
Suhyune Son,
Yeonsoo Lee,
Donghoon Shin,
Seungryong Kim,
Heuiseok Lim
Abstract:
Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a…
▽ More
Humans usually have conversations by making use of prior knowledge about a topic and background information of the people whom they are talking to. However, existing conversational agents and datasets do not consider such comprehensive information, and thus they have a limitation in generating the utterances where the knowledge and persona are fused properly. To address this issue, we introduce a call For Customized conversation (FoCus) dataset where the customized answers are built with the user's persona and Wikipedia knowledge. To evaluate the abilities to make informative and customized utterances of pre-trained language models, we utilize BART and GPT-2 as well as transformer-based models. We assess their generation abilities with automatic scores and conduct human evaluations for qualitative results. We examine whether the model reflects adequate persona and knowledge with our proposed two sub-tasks, persona grounding (PG) and knowledge grounding (KG). Moreover, we show that the utterances of our data are constructed with the proper knowledge and persona through grounding quality assessment.
△ Less
Submitted 16 May, 2022; v1 submitted 15 December, 2021;
originally announced December 2021.
-
Online Distributed Trajectory Planning for Quadrotor Swarm with Feasibility Guarantee using Linear Safe Corridor
Authors:
Jungwon Park,
Dabin Kim,
Gyeong Chan Kim,
Dahyun Oh,
H. Jin Kim
Abstract:
This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optim…
▽ More
This paper presents a new online multi-agent trajectory planning algorithm that guarantees to generate safe, dynamically feasible trajectories in a cluttered environment. The proposed algorithm utilizes a linear safe corridor (LSC) to formulate the distributed trajectory optimization problem with only feasible constraints, so it does not resort to slack variables or soft constraints to avoid optimization failure. We adopt a priority-based goal planning method to prevent the deadlock without an additional procedure to decide which robot to yield. The proposed algorithm can compute the trajectories for 60 agents on average 15.5 ms per agent with an Intel i7 laptop and shows a similar flight distance and distance compared to the baselines based on soft constraints. We verified that the proposed method can reach the goal without deadlock in both the random forest and the indoor space, and we validated the safety and operability of the proposed algorithm through a real flight test with ten quadrotors in a maze-like environment.
△ Less
Submitted 3 January, 2022; v1 submitted 18 September, 2021;
originally announced September 2021.
-
Deep learning-based statistical noise reduction for multidimensional spectral data
Authors:
Younsik Kim,
Dongjin Oh,
Soonsang Huh,
Dongjoon Song,
Sunbeom Jeong,
Junyoung Kwon,
Minsoo Kim,
Donghan Kim,
Hanyoung Ryu,
Jongkeun Jung,
Wonshik Kyung,
Byungmin Sohn,
Suyoung Lee,
Jounghoon Hyun,
Yeonghoon Lee,
Yeongkwan Kimand Changyoung Kim
Abstract:
In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an exa…
▽ More
In spectroscopic experiments, data acquisition in multi-dimensional phase space may require long acquisition time, owing to the large phase space volume to be covered. In such case, the limited time available for data acquisition can be a serious constraint for experiments in which multidimensional spectral data are acquired. Here, taking angle-resolved photoemission spectroscopy (ARPES) as an example, we demonstrate a denoising method that utilizes deep learning as an intelligent way to overcome the constraint. With readily available ARPES data and random generation of training data set, we successfully trained the denoising neural network without overfitting. The denoising neural network can remove the noise in the data while preserving its intrinsic information. We show that the denoising neural network allows us to perform similar level of second-derivative and line shape analysis on data taken with two orders of magnitude less acquisition time. The importance of our method lies in its applicability to any multidimensional spectral data that are susceptible to statistical noise.
△ Less
Submitted 2 July, 2021;
originally announced July 2021.
-
Encoding Weights of Irregular Sparsity for Fixed-to-Fixed Model Compression
Authors:
Baeseong Park,
Se Jung Kwon,
Daehwan Oh,
Byeongwook Kim,
Dongsoo Lee
Abstract:
Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to suppor…
▽ More
Even though fine-grained pruning techniques achieve a high compression ratio, conventional sparsity representations (such as CSR) associated with irregular sparsity degrade parallelism significantly. Practical pruning methods, thus, usually lower pruning rates (by structured pruning) to improve parallelism. In this paper, we study fixed-to-fixed (lossless) encoding architecture/algorithm to support fine-grained pruning methods such that sparse neural networks can be stored in a highly regular structure. We first estimate the maximum compression ratio of encoding-based compression using entropy. Then, as an effort to push the compression ratio to the theoretical maximum (by entropy), we propose a sequential fixed-to-fixed encoding scheme. We demonstrate that our proposed compression scheme achieves almost the maximum compression ratio for the Transformer and ResNet-50 pruned by various fine-grained pruning methods.
△ Less
Submitted 30 January, 2022; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Q-Rater: Non-Convex Optimization for Post-Training Uniform Quantization
Authors:
Byeongwook Kim,
Dongsoo Lee,
Yeonju Ro,
Yongkweon Jeon,
Se Jung Kwon,
Baeseong Park,
Daehwan Oh
Abstract:
Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non…
▽ More
Various post-training uniform quantization methods have usually been studied based on convex optimization. As a result, most previous ones rely on the quantization error minimization and/or quadratic approximations. Such approaches are computationally efficient and reasonable when a large number of quantization bits are employed. When the number of quantization bits is relatively low, however, non-convex optimization is unavoidable to improve model accuracy. In this paper, we propose a new post-training uniform quantization technique considering non-convexity. We empirically show that hyper-parameters for clipping and rounding of weights and activations can be explored by monitoring task loss. Then, an optimally searched set of hyper-parameters is frozen to proceed to the next layer such that an incremental non-convex optimization is enabled for post-training quantization. Throughout extensive experimental results using various models, our proposed technique presents higher model accuracy, especially for a low-bit quantization.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
PHIDL: Python CAD layout and geometry creation for nanolithography
Authors:
A. N. McCaughan,
A. M. Tait,
S. M. Buckley,
D. M. Oh,
J. T. Chiles,
J. M. Shainline,
S. W. Nam
Abstract:
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-sou…
▽ More
Computer-aided design (CAD) has become a critical element in the creation of nanopatterned structures and devices. In particular, with the increased adoption of easy-to-learn programming languages like Python there has been a significant rise in the amount of lithographic geometries generated through scripting and programming. However, there are currently unaddressed gaps in usability for open-source CAD tools -- especially those in the GDSII design space -- that prevent wider adoption by scientists and students who might otherwise benefit from scripted design. For example, constructing relations between adjacent geometries is often much more difficult than necessary -- spacing a resonator structure a few micrometers from a readout structure often requires manually-coding the placement arithmetic. While inconveniences like this can be overcome by writing custom functions, they are often significant barriers to entry for new users or those less familiar with programming. To help streamline the design process and reduce barrier to entry for scripting designs, we have developed PHIDL, an open-source GDSII-based CAD tool for Python 2 and 3.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
Toward the Fully Physics-Informed Echo State Network -- an ODE Approximator Based on Recurrent Artificial Neurons
Authors:
Dong Keun Oh
Abstract:
Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of diffe…
▽ More
Inspired by recent theoretical arguments, physics-informed echo state network (ESN) is discussed on the attempt to train a reservoir model absolutely in physics-informed manner. As the plainest work on such a purpose, an ODE (ordinary differential equation) approximator is designed to replicate the solution in sequence with respect to the recurrent evaluations. On the principal invariance of differential equations, the constraint in recurrence just takes shape to secure a proper regression method for the ESN-based ODE approximator. After then, the actual training process is established on the idea of two-pass strategy for regression. Aiming at the fully physics-informed reservoir model, a couple of nonlinear dynamical problems are demonstrated as the computations obtained from the proposed method in this study.
△ Less
Submitted 13 November, 2020;
originally announced November 2020.
-
I Know What You Asked: Graph Path Learning using AMR for Commonsense Reasoning
Authors:
Jungwoo Lim,
Dongsuk Oh,
Yoonna Jang,
Kisu Yang,
Heuiseok Lim
Abstract:
CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-…
▽ More
CommonsenseQA is a task in which a correct answer is predicted through commonsense reasoning with pre-defined knowledge. Most previous works have aimed to improve the performance with distributed representation without considering the process of predicting the answer from the semantic representation of the question. To shed light upon the semantic interpretation of the question, we propose an AMR-ConceptNet-Pruned (ACP) graph. The ACP graph is pruned from a full integrated graph encompassing Abstract Meaning Representation (AMR) graph generated from input questions and an external commonsense knowledge graph, ConceptNet (CN). Then the ACP graph is exploited to interpret the reasoning path as well as to predict the correct answer on the CommonsenseQA task. This paper presents the manner in which the commonsense reasoning process can be interpreted with the relations and concepts provided by the ACP graph. Moreover, ACP-based models are shown to outperform the baselines.
△ Less
Submitted 5 November, 2020; v1 submitted 2 November, 2020;
originally announced November 2020.
-
Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection
Authors:
Taesun Whang,
Dongyub Lee,
Dongsuk Oh,
Chanhee Lee,
Kijong Han,
Dong-hun Lee,
Saebyeok Lee
Abstract:
In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by for…
▽ More
In this paper, we study the task of selecting the optimal response given a user and system utterance history in retrieval-based multi-turn dialog systems. Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed significant improvements in various natural language processing tasks. This and similar response selection tasks can also be solved using such language models by formulating the tasks as dialog--response binary classification tasks. Although existing works using this approach successfully obtained state-of-the-art results, we observe that language models trained in this manner tend to make predictions based on the relatedness of history and candidates, ignoring the sequential nature of multi-turn dialog systems. This suggests that the response selection task alone is insufficient for learning temporal dependencies between utterances. To this end, we propose utterance manipulation strategies (UMS) to address this problem. Specifically, UMS consist of several strategies (i.e., insertion, deletion, and search), which aid the response selection model towards maintaining dialog coherence. Further, UMS are self-supervised methods that do not require additional annotation and thus can be easily incorporated into existing approaches. Extensive evaluation across multiple languages and models shows that UMS are highly effective in teaching dialog consistency, which leads to models pushing the state-of-the-art with significant margins on multiple public benchmark datasets.
△ Less
Submitted 16 December, 2020; v1 submitted 10 September, 2020;
originally announced September 2020.
-
Automatic Operating Room Surgical Activity Recognition for Robot-Assisted Surgery
Authors:
Aidean Sharghi,
Helene Haugerud,
Daniel Oh,
Omid Mohareri
Abstract:
Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-as…
▽ More
Automatic recognition of surgical activities in the operating room (OR) is a key technology for creating next generation intelligent surgical devices and workflow monitoring/support systems. Such systems can potentially enhance efficiency in the OR, resulting in lower costs and improved care delivery to the patients. In this paper, we investigate automatic surgical activity recognition in robot-assisted operations. We collect the first large-scale dataset including 400 full-length multi-perspective videos from a variety of robotic surgery cases captured using Time-of-Flight cameras. We densely annotate the videos with 10 most recognized and clinically relevant classes of activities. Furthermore, we investigate state-of-the-art computer vision action recognition techniques and adapt them for the OR environment and the dataset. First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity recognition on our dataset and use it to extract features from the videos. These features are then fed to a stack of 3 Temporal Gaussian Mixture layers which extracts context from neighboring clips, and eventually go through a Long Short Term Memory network to learn the order of activities in full-length videos. We extensively assess the model and reach a peak performance of 88% mean Average Precision.
△ Less
Submitted 29 June, 2020;
originally announced June 2020.
-
Word Sense Disambiguation using Knowledge-based Word Similarity
Authors:
Sunjae Kwon,
Dongsuk Oh,
Youngjoong Ko
Abstract:
In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semanti…
▽ More
In natural language processing, word-sense disambiguation (WSD) is an open problem concerned with identifying the correct sense of words in a particular context. To address this problem, we introduce a novel knowledge-based WSD system. We suggest the adoption of two methods in our system. First, we suggest a novel method to encode the word vector representation by considering the graphical semantic relationships from the lexical knowledge-base. Second, we propose a method for extracting the contextual words from the text for analyzing an ambiguous word based on the similarity of word vector representations. To validate the effectiveness of our WSD system, we conducted experiments on the five benchmark English WSD corpora (Senseval-02, Senseval-03, SemEval-07, SemEval-13, and SemEval-15). The obtained results demonstrated that the suggested methods significantly enhanced the WSD performance. Furthermore, our system outperformed the existing knowledge-based WSD systems and showed a performance comparable to that of the state-of-the-art supervised WSD systems.
△ Less
Submitted 21 June, 2020; v1 submitted 10 November, 2019;
originally announced November 2019.
-
An Effective Domain Adaptive Post-Training Method for BERT in Response Selection
Authors:
Taesun Whang,
Dongyub Lee,
Chanhee Lee,
Kisu Yang,
Dongsuk Oh,
HeuiSeok Lim
Abstract:
We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous basel…
▽ More
We focus on multi-turn response selection in a retrieval-based dialog system. In this paper, we utilize the powerful pre-trained language model Bi-directional Encoder Representations from Transformer (BERT) for a multi-turn dialog system and propose a highly effective post-training method on domain-specific corpus. Although BERT is easily adopted to various NLP tasks and outperforms previous baselines of each task, it still has limitations if a task corpus is too focused on a certain domain. Post-training on domain-specific corpus (e.g., Ubuntu Corpus) helps the model to train contextualized representations and words that do not appear in general corpus (e.g., English Wikipedia). Experimental results show that our approach achieves new state-of-the-art on two response selection benchmarks (i.e., Ubuntu Corpus V1, Advising Corpus) performance improvement by 5.9% and 6% on R@1.
△ Less
Submitted 26 July, 2020; v1 submitted 13 August, 2019;
originally announced August 2019.
-
Learning Bone Suppression from Dual Energy Chest X-rays using Adversarial Networks
Authors:
Dong Yul Oh,
Il Dong Yun
Abstract:
Suppressing bones on chest X-rays such as ribs and clavicle is often expected to improve pathologies classification. These bones can interfere with a broad range of diagnostic tasks on pulmonary disease except for musculoskeletal system. Current conventional method for acquisition of bone suppressed X-rays is dual energy imaging, which captures two radiographs at a very short interval with differe…
▽ More
Suppressing bones on chest X-rays such as ribs and clavicle is often expected to improve pathologies classification. These bones can interfere with a broad range of diagnostic tasks on pulmonary disease except for musculoskeletal system. Current conventional method for acquisition of bone suppressed X-rays is dual energy imaging, which captures two radiographs at a very short interval with different energy levels; however, the patient is exposed to radiation twice and the artifacts arise due to heartbeats between two shots. In this paper, we introduce a deep generative model trained to predict bone suppressed images on single energy chest X-rays, analyzing a finite set of previously acquired dual energy chest X-rays. Since the relatively small amount of data is available, such approach relies on the methodology maximizing the data utilization. Here we integrate the following two approaches. First, we use a conditional generative adversarial network that complements the traditional regression method minimizing the pairwise image difference. Second, we use Haar 2D wavelet decomposition to offer a perceptual guideline in frequency details to allow the model to converge quickly and efficiently. As a result, we achieve state-of-the-art performance on bone suppression as compared to the existing approaches with dual energy chest X-rays.
△ Less
Submitted 4 November, 2018;
originally announced November 2018.
-
Construction of optimal codes in deletion and insertion metric
Authors:
Hyun Kwang Kim,
Joon Yop Lee,
Dong Yeol Oh
Abstract:
We improve Levenshtein's upper bound for the cardinality of a code of length four that is capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we construct an optimal perfect code that is capable of correcting single deletions for the same parameters.
We improve Levenshtein's upper bound for the cardinality of a code of length four that is capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we construct an optimal perfect code that is capable of correcting single deletions for the same parameters.
△ Less
Submitted 22 March, 2010;
originally announced March 2010.
-
Optimal codes in deletion and insertion metric
Authors:
Hyun Kwang Kim,
Joon Yop Lee,
Dong Yeol Oh
Abstract:
We improve the upper bound of Levenshtein for the cardinality of a code of length 4 capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we will construct an optimal perfect code capable of correcting single deletions for the same parameters.
We improve the upper bound of Levenshtein for the cardinality of a code of length 4 capable of correcting single deletions over an alphabet of even size. We also illustrate that the new upper bound is sharp. Furthermore we will construct an optimal perfect code capable of correcting single deletions for the same parameters.
△ Less
Submitted 22 March, 2010; v1 submitted 20 October, 2008;
originally announced October 2008.