Skip to main content

Showing 1–50 of 820 results for author: Kim, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04791  [pdf, other

    cs.HC

    The Impact of Perceived Tone, Age, and Gender on Voice Assistant Persuasiveness in the Context of Product Recommendations

    Authors: Sabid Bin Habib Pias, Ran Huang, Donald Williamson, Minjeong Kim, Apu Kapadia

    Abstract: Voice Assistants (VAs) can assist users in various everyday tasks, but many users are reluctant to rely on VAs for intricate tasks like online shopping. This study aims to examine whether the vocal characteristics of VAs can serve as an effective tool to persuade users and increase user engagement with VAs in online shopping. Prior studies have demonstrated that the perceived tone, age, and gender… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: ACM Conversational User Interface 2024

  2. arXiv:2405.03652  [pdf

    cs.CV

    Field-of-View Extension for Diffusion MRI via Deep Generative Models

    Authors: Chenyu Gao, Shunxing Bao, Michael Kim, Nancy Newlin, Praitayini Kanakaraj, Tianyuan Yao, Gaurav Rudravaram, Yuankai Huo, Daniel Moyer, Kurt Schilling, Walter Kukull, Arthur Toga, Derek Archer, Timothy Hohman, Bennett Landman, Zhiyuan Li

    Abstract: Purpose: In diffusion MRI (dMRI), the volumetric and bundle analyses of whole-brain tissue microstructure and connectivity can be severely impeded by an incomplete field-of-view (FOV). This work aims to develop a method for imputing the missing slices directly from existing dMRI scans with an incomplete FOV. We hypothesize that the imputed image with complete FOV can improve the whole-brain tracto… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 20 pages, 11 figures

  3. arXiv:2405.02996  [pdf, other

    cs.SD cs.AI eess.AS

    RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

    Authors: June-Woo Kim, Miika Toikkanen, Sangmin Bae, Minseok Kim, Ho-Young Jung

    Abstract: Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrain… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted EMBC 2024

  4. arXiv:2405.02499  [pdf, other

    cs.CR cs.AR

    DRAMScope: Uncovering DRAM Microarchitecture and Characteristics by Issuing Memory Commands

    Authors: Hwayong Nam, Seungmin Baek, Minbok Wi, Michael Jaemin Kim, Jaehyun Park, Chihun Song, Nam Sung Kim, Jung Ho Ahn

    Abstract: The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM manufacturers have disclosed only a limited amount of information, making it difficult to find specific information on their DRAM microarchitectures. This paper addresses t… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: To appear at the 51st IEEE/ACM International Symposium on Computer Architecture (ISCA)

  5. arXiv:2405.01531  [pdf, other

    cs.LG cs.AI cs.CV

    Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models

    Authors: Nishad Singhi, Jae Myung Kim, Karsten Roth, Zeynep Akata

    Abstract: Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Crucially, the CBM design inherently allows for human interventions, in which expert users are given the ability to modify potentially misaligned concept choices to influence the decision behavior of the model in an interpretable fashion. However, existing appro… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  6. arXiv:2405.01016  [pdf, other

    cs.CV cs.AI

    Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction

    Authors: Minsu Kim, Giseop Kim, Sunwook Choi

    Abstract: Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause signifi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  7. arXiv:2404.18881  [pdf, other

    cs.HC cs.LG cs.SE

    Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking

    Authors: Hong Jin Kang, Fabrice Harel-Canada, Muhammad Ali Gulzar, Violet Peng, Miryung Kim

    Abstract: Data augmentation techniques apply transformations to existing texts to generate additional data. The transformations may produce low-quality texts, where the meaning of the text is changed and the text may even be mangled beyond human comprehension. Analyzing the synthetically generated texts and their corresponding labels is slow and demanding. To winnow out texts with incorrect labels, we devel… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 Findings

  8. arXiv:2404.16947  [pdf, other

    cs.SE

    Fuzzing MLIR by Synthesizing Custom Mutations

    Authors: Ben Limpanukorn, Jiyuan Wang, Hong Jin Kang, Eric Zitong Zhou, Miryung Kim

    Abstract: Multi-Level Intermediate Representation (MLIR) is an effort to enable faster compiler development by providing an extensible framework for downstream developers to define custom IRs with MLIR dialects. MLIR dialects define new IRs that are tailored for specific domains. The diversity and rapid evolution of these IRs make it impractical to pre-define custom generator logic for every available diale… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  9. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  10. arXiv:2404.16169  [pdf, other

    cs.CE q-fin.ST

    Interpretable Machine Learning Models for Predicting the Next Targets of Activist Funds

    Authors: Minwu Kim

    Abstract: This work develops a predictive model to identify potential targets of activist investment funds, which strategically acquire significant corporate stakes to drive operational and strategic improvements and enhance shareholder value. Predicting these targets is crucial for companies to mitigate intervention risks, for activists to select optimal targets, and for investors to capitalize on associat… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 15 pages

  11. arXiv:2404.16065  [pdf, other

    cs.HC eess.SP

    mmWave Wearable Antenna for Interaction with VR Devices

    Authors: Haksun Son, Song Min Kim

    Abstract: The VR industry is one of the most promising industries for the near future, as it can provide a more immersive connection between people and the virtual world. Currently, VR devices interact with people using inconvenient controllers or cameras that perform poorly in dark environments. Interaction through millimeter-wave wearable devices has the potential to conveniently track human behavior rega… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  12. arXiv:2404.11826  [pdf, other

    cs.CL

    AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence

    Authors: Minbeom Kim, Hwanhee Lee, Joonsuk Park, Hwaran Lee, Kyomin Jung

    Abstract: As the integration of large language models into daily life is on the rise, there is a clear gap in benchmarks for advising on subjective and personal dilemmas. To address this, we introduce AdvisorQA, the first benchmark developed to assess LLMs' capability in offering advice for deeply personalized concerns, utilizing the LifeProTips subreddit forum. This forum features a dynamic interaction whe… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 19 pages, 11 figures

  13. arXiv:2404.10378  [pdf, other

    cs.CV cs.AI cs.CY cs.LG

    Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

    Authors: Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia, Zhizhou Zhong, Yuge Huang, Yuxi Mi, Shouhong Ding, Shuigeng Zhou, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Zhihong Xiao, Evgeny Smirnov, Anton Pimenov, Aleksei Grigorev, Denis Timoshenko, Kaleb Mesfin Asfaw , et al. (33 additional authors not shown)

    Abstract: Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.10476

    Journal ref: IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)

  14. arXiv:2404.10355  [pdf, other

    cs.AR

    AERO: Adaptive Erase Operation for Improving Lifetime and Performance of Modern NAND Flash-Based SSDs

    Authors: Sungjun Cho, Beomjun Kim, Hyunuk Cho, Gyeongseob Seo, Onur Mutlu, Myungsuk Kim, Jisung Park

    Abstract: This work investigates a new erase scheme in NAND flash memory to improve the lifetime and performance of modern solid-state drives (SSDs). In NAND flash memory, an erase operation applies a high voltage (e.g., > 20 V) to flash cells for a long time (e.g., > 3.5 ms), which degrades cell endurance and potentially delays user I/O requests. While a large body of prior work has proposed various techni… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted for publication at Proceedings of the 29th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024

  15. arXiv:2404.09490  [pdf, other

    cs.CV

    Leveraging Temporal Contextualization for Video Action Recognition

    Authors: Minji Kim, Dongyoon Han, Taekyung Kim, Bohyung Han

    Abstract: Pretrained vision-language models have shown effectiveness in video understanding. However, recent studies have not sufficiently leveraged essential temporal information from videos, simply averaging frame-wise representations or referencing consecutive frames. We introduce Temporally Contextualized CLIP (TC-CLIP), a pioneering framework for video understanding that effectively and efficiently lev… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 24 pages, 10 figures, 12 tables

  16. arXiv:2404.08110  [pdf, other

    cs.CY cs.SI

    Toxic Synergy Between Hate Speech and Fake News Exposure

    Authors: Munjung Kim, Tuğrulcan Elmas, Filippo Menczer

    Abstract: Hate speech on social media is a pressing concern. Understanding the factors associated with hate speech may help mitigate it. Here we explore the association between hate speech and exposure to fake news by studying the correlation between exposure to news from low-credibility sources through following connections and the use of hate speech on Twitter. Using news source credibility labels and a d… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  17. arXiv:2404.07610  [pdf, other

    cs.CV

    Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval

    Authors: Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon, Jinwoo Choi, Seong Tae Kim

    Abstract: There has been significant attention to the research on dense video captioning, which aims to automatically localize and caption all events within untrimmed video. Several studies introduce methods by designing dense video captioning as a multitasking problem of event localization and event captioning to consider inter-task relations. However, addressing both tasks using only visual input is chall… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  18. arXiv:2404.05144  [pdf, other

    cs.CL cs.CV cs.LG

    Enhancing Clinical Efficiency through LLM: Discharge Note Generation for Cardiac Patients

    Authors: HyoJe Jung, Yunha Kim, Heejung Choi, Hyeram Seo, Minkyoung Kim, JiYe Han, Gaeun Kee, Seohyun Park, Soyoung Ko, Byeolhee Kim, Suyeon Kim, Tae Joon Jun, Young-Hak Kim

    Abstract: Medical documentation, including discharge notes, is crucial for ensuring patient care quality, continuity, and effective medical communication. However, the manual creation of these documents is not only time-consuming but also prone to inconsistencies and potential errors. The automation of this documentation process using artificial intelligence (AI) represents a promising area of innovation in… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables, conference

  19. arXiv:2404.03932  [pdf, ps, other

    quant-ph cs.CR

    On quantum learning algorithms for noisy linear problems

    Authors: Minkyu Kim, Panjin Kim

    Abstract: Quantum algorithms have shown successful results in solving noisy linear problems with quantum samples in which cryptographic hard problems are relevant. In this paper the previous results are investigated in detail, leading to new quantum and classical algorithms under the same assumptions as in the earlier works. To be specific, we present a polynomial-time quantum algorithm for solving the ring… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 14 pages, 1 figure

  20. arXiv:2404.03924  [pdf, other

    cs.CV

    Learning Correlation Structures for Vision Transformers

    Authors: Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho

    Abstract: We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention. StructSA generates attention maps by recognizing space-time structures of key-query correlations via convolution and uses them to dynamically aggregate local contexts of value features. This effectively leverages ri… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

  21. arXiv:2404.02949  [pdf, other

    cs.LG cs.AI

    The SaTML '24 CNN Interpretability Competition: New Innovations for Concept-Level Interpretability

    Authors: Stephen Casper, Jieun Yun, Joonhyuk Baek, Yeseong Jung, Minhwan Kim, Kiwan Kwon, Saerom Park, Hayden Moore, David Shriver, Marissa Connor, Keltin Grimes, Angus Nicolson, Arush Tagade, Jessica Rumbelow, Hieu Minh Nguyen, Dylan Hadfield-Menell

    Abstract: Interpretability techniques are valuable for helping humans understand and oversee AI systems. The SaTML 2024 CNN Interpretability Competition solicited novel methods for studying convolutional neural networks (CNNs) at the ImageNet scale. The objective of the competition was to help human crowd-workers identify trojans in CNNs. This report showcases the methods and results of four featured compet… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: Competition for SaTML 2024

  22. arXiv:2404.02575  [pdf, other

    cs.CL

    Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

    Authors: Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Seonghwan Kim, Taeyoon Kwon, Jiwan Chung, Youngjae Yu, Jinyoung Yeo

    Abstract: Algorithmic reasoning refers to the ability to understand the complex patterns behind the problem and decompose them into a sequence of reasoning steps towards the solution. Such nature of algorithmic reasoning makes it a challenge for large language models (LLMs), even though they have demonstrated promising performance in other reasoning tasks. Within this context, some recent studies use progra… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 38 pages, 4 figures

  23. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  24. arXiv:2404.01816  [pdf, other

    eess.IV cs.CV cs.HC

    Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods

    Authors: Zdravko Marinov, Moon Kim, Jens Kleesiek, Rainer Stiefelhagen

    Abstract: Interactive segmentation plays a crucial role in accelerating the annotation, particularly in domains requiring specialized expertise such as nuclear medicine. For example, annotating lesions in whole-body Positron Emission Tomography (PET) images can require over an hour per volume. While previous works evaluate interactive segmentation models through either real user studies or simulated annotat… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 10 pages, 5 figures, 1 table

  25. arXiv:2404.01645  [pdf, other

    cs.CV cs.LG

    ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models

    Authors: Minseop Jung, Minseong Kim, Jibum Kim

    Abstract: The success of Transformer-based models has encouraged many researchers to learn CAD models using sequence-based approaches. However, learning CAD models is still a challenge, because they can be represented as complex shapes with long construction sequences. Furthermore, the same CAD model can be expressed using different CAD construction sequences. We propose a novel contrastive learning-based a… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  26. Personalized Neural Speech Codec

    Authors: Inseon Jang, Haici Yang, Wootaek Lim, Seungkwon Beack, Minje Kim

    Abstract: In this paper, we propose a personalized neural speech codec, envisioning that personalization can reduce the model complexity or improve perceptual speech quality. Despite the common usage of speech codecs where only a single talker is involved on each side of the communication, personalizing a codec for the specific user has rarely been explored in the literature. First, we assume speakers can b… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 991-995

  27. A Comparative Analysis of Poetry Reading Audio: Singing, Narrating, or Somewhere In Between?

    Authors: Kahyun Choi, Minje Kim

    Abstract: This paper provides a computational analysis of poetry reading audio signals at a large scale to unveil the musicality within professionally-read poems. Although the acoustic characteristics of other types of spoken language have been extensively studied, most of the literature is limited to narrative speech or singing voice, discussing how different they are from each other. In this work, we deve… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024, pp. 1296-1300

  28. arXiv:2404.00678  [pdf, other

    cs.CV cs.GR

    OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees

    Authors: Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang, James Tompkin, Min H. Kim

    Abstract: We present a method to reconstruct indoor and outdoor static scene geometry and appearance from an omnidirectional video moving in a small circular sweep. This setting is challenging because of the small baseline and large depth ranges, making it difficult to find ray crossings. To better constrain the optimization, we estimate geometry as a signed distance field within a spherical binoctree data… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  29. arXiv:2404.00676  [pdf, other

    cs.CV cs.GR

    OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

    Authors: Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

    Abstract: Omnidirectional cameras are extensively used in various applications to provide a wide field of vision. However, they face a challenge in synthesizing novel views due to the inevitable presence of dynamic objects, including the photographer, in their wide field of view. In this paper, we introduce a new approach called Omnidirectional Local Radiance Fields (OmniLocalRF) that can render static-only… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  30. arXiv:2403.20225  [pdf, other

    cs.CV

    MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark

    Authors: Sanghyun Woo, Kwanyong Park, Inkyu Shin, Myungchul Kim, In So Kweon

    Abstract: Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. This task has practical applications in various fields, such as visual surveillance, crowd behavior analysis, and anomaly detection. However, due to the difficulty and cost of collecting and labeling data, existing datasets for this task are e… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted on CVPR 2024

  31. arXiv:2403.19904  [pdf, other

    cs.CV

    Fully Geometric Panoramic Localization

    Authors: Junho Kim, Jiwon Jeong, Young Min Kim

    Abstract: We introduce a lightweight and accurate localization method that only utilizes the geometry of 2D-3D lines. Given a pre-captured 3D map, our approach localizes a panorama image, taking advantage of the holistic 360 view. The system mitigates potential privacy breaches or domain discrepancies by avoiding trained or hand-crafted visual descriptors. However, as lines alone can be ambiguous, we expres… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  32. arXiv:2403.16167  [pdf, other

    cs.CV cs.CL

    Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models

    Authors: Minchan Kim, Minyeong Kim, Junik Bae, Suhwan Choi, Sungkyung Kim, Buru Chang

    Abstract: Hallucinations in vision-language models pose a significant challenge to their reliability, particularly in the generation of long captions. Current methods fall short of accurately identifying and mitigating these hallucinations. To address this issue, we introduce ESREAL, a novel unsupervised learning framework designed to suppress the generation of hallucinations through accurate localization a… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  33. arXiv:2403.16158  [pdf, other

    cs.CL

    Korean Bio-Medical Corpus (KBMC) for Medical Named Entity Recognition

    Authors: Sungjoo Byun, Jiseung Hong, Sumin Park, Dongjun Jang, Jean Seo, Minseok Kim, Chaeyoung Oh, Hyopil Shin

    Abstract: Named Entity Recognition (NER) plays a pivotal role in medical Natural Language Processing (NLP). Yet, there has not been an open-source medical NER dataset specifically for the Korean language. To address this, we utilized ChatGPT to assist in constructing the KBMC (Korean Bio-Medical Corpus), which we are now presenting to the public. With the KBMC dataset, we noticed an impressive 20% increase… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Journal ref: LREC-COLING 2024

  34. arXiv:2403.14852  [pdf, other

    cs.CV

    KeyPoint Relative Position Encoding for Face Recognition

    Authors: Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu

    Abstract: In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: To appear in CVPR2024

  35. PECI-Net: Bolus segmentation from video fluoroscopic swallowing study images using preprocessing ensemble and cascaded inference

    Authors: Dougho Park, Younghun Kim, Harim Kang, Junmyeoung Lee, Jinyoung Choi, Taeyeon Kim, Sangeok Lee, Seokil Son, Minsol Kim, Injung Kim

    Abstract: Bolus segmentation is crucial for the automated detection of swallowing disorders in videofluoroscopic swallowing studies (VFSS). However, it is difficult for the model to accurately segment a bolus region in a VFSS image because VFSS images are translucent, have low contrast and unclear region boundaries, and lack color information. To overcome these challenges, we propose PECI-Net, a network arc… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 20 pages, 8 figures,

    Journal ref: Computers in Biology and Medicine (2024)

  36. arXiv:2403.12862  [pdf, other

    cs.CL

    Epistemology of Language Models: Do Language Models Have Holistic Knowledge?

    Authors: Minsu Kim, James Thorne

    Abstract: This paper investigates the inherent knowledge in language models from the perspective of epistemological holism. The purpose of this paper is to explore whether LLMs exhibit characteristics consistent with epistemological holism. These characteristics suggest that core knowledge, such as general scientific knowledge, each plays a specific role, serving as the foundation of our knowledge system an… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  37. arXiv:2403.11472  [pdf, other

    cs.LG cs.AR cs.DB

    Accelerating String-Key Learned Index Structures via Memoization-based Incremental Training

    Authors: Minsu Kim, Jinwoo Hwang, Guseul Heo, Seiyeon Cho, Divya Mahajan, Jongse Park

    Abstract: Learned indexes use machine learning models to learn the mappings between keys and their corresponding positions in key-value indexes. These indexes use the mapping information as training data. Learned indexes require frequent retrainings of their models to incorporate the changes introduced by update queries. To efficiently retrain the models, existing learned index systems often harness a linea… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at VLDB '24; 12 pages + 2 pages (ref), 18 figures, 2 tables

  38. arXiv:2403.11399  [pdf, other

    cs.CL

    X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment

    Authors: Dongjae Shin, Hyeonseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae Lim

    Abstract: The impressive development of large language models (LLMs) is expanding into the realm of large multimodal models (LMMs), which incorporate multiple types of data beyond text. However, the nature of multimodal models leads to significant expenses in the creation of training data. Furthermore, constructing multilingual data for LMMs presents its own set of challenges due to language diversity and c… ▽ More

    Submitted 1 April, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

  39. arXiv:2403.10494  [pdf, other

    cs.RO

    Lifelong LERF: Local 3D Semantic Inventory Monitoring Using FogROS2

    Authors: Adam Rashid, Chung Min Kim, Justin Kerr, Letian Fu, Kush Hari, Ayah Ahmad, Kaiyuan Chen, Huang Huang, Marcus Gualtieri, Michael Wang, Christian Juette, Nan Tian, Liu Ren, Ken Goldberg

    Abstract: Inventory monitoring in homes, factories, and retail stores relies on maintaining data despite objects being swapped, added, removed, or moved. We introduce Lifelong LERF, a method that allows a mobile robot with minimal compute to jointly optimize a dense language and geometric representation of its surroundings. Lifelong LERF maintains this representation over time by detecting semantic changes… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: See project webpage at: https://sites.google.com/berkeley.edu/lifelonglerf/home

  40. arXiv:2403.09508  [pdf, other

    cs.CV

    SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

    Authors: Jeonghyeok Do, Munchurl Kim

    Abstract: Skeleton-based action recognition, which classifies human actions based on the coordinates of joints and their connectivity within skeleton data, is widely utilized in various scenarios. While Graph Convolutional Networks (GCNs) have been proposed for skeleton data represented as graphs, they suffer from limited receptive fields constrained by joint connectivity. To address this limitation, recent… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Please visit our project page at https://jeonghyeokdo.github.io/SkateFormer_site/

  41. arXiv:2403.08302  [pdf, other

    cs.RO

    Online Multi-Contact Feedback Model Predictive Control for Interactive Robotic Tasks

    Authors: Seo Wook Han, Maged Iskandar, Jinoh Lee, Min Jun Kim

    Abstract: In this paper, we propose a model predictive control (MPC) that accomplishes interactive robotic tasks, in which multiple contacts may occur at unknown locations. To address such scenarios, we made an explicit contact feedback loop in the MPC framework. An algorithm called Multi-Contact Particle Filter with Exploration Particle (MCP-EP) is employed to establish real-time feedback of multi-contact… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, 2024

  42. arXiv:2403.08277  [pdf, other

    cs.CV

    VIGFace: Virtual Identity Generation Model for Face Image Synthesis

    Authors: Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam, Junghyun Cho, Ig-Jae Kim

    Abstract: Deep learning-based face recognition continues to face challenges due to its reliance on huge datasets obtained from web crawling, which can be costly to gather and raise significant real-world privacy concerns. To address this issue, we propose VIGFace, a novel framework capable of generating synthetic facial images. Initially, we train the face recognition model using a real face dataset and cre… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  43. arXiv:2403.08272  [pdf, other

    cs.CL

    RECIPE4U: Student-ChatGPT Interaction Dataset in EFL Writing Education

    Authors: Jieun Han, Haneul Yoo, Junho Myung, Minsun Kim, Tak Yeon Lee, So-Yeon Ahn, Alice Oh

    Abstract: The integration of generative AI in education is expanding, yet empirical analyses of large-scale and real-world interactions between students and AI systems still remain limited. Addressing this gap, we present RECIPE4U (RECIPE for University), a dataset sourced from a semester-long experiment with 212 college students in English as Foreign Language (EFL) writing courses. During the study, studen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2309.13243

  44. arXiv:2403.08262  [pdf, other

    cs.CV

    BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image

    Authors: Minje Kim, Tae-Kyun Kim

    Abstract: Creating personalized hand avatars is important to offer a realistic experience to users on AR / VR platforms. While most prior studies focused on reconstructing 3D hand shapes, some recent work has tackled the reconstruction of hand textures on top of shapes. However, these methods are often limited to capturing pixels on the visible side of a hand, requiring diverse views of the hand in a video… ▽ More

    Submitted 25 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024, Project Page: https://yunminjin2.github.io/projects/bitt/

  45. arXiv:2403.08256  [pdf, other

    cs.CV

    IG-FIQA: Improving Face Image Quality Assessment through Intra-class Variance Guidance robust to Inaccurate Pseudo-Labels

    Authors: Minsoo Kim, Gi Pyo Nam, Haksub Kim, Haesol Park, Ig-Jae Kim

    Abstract: In the realm of face image quality assesment (FIQA), method based on sample relative classification have shown impressive performance. However, the quality scores used as pseudo-labels assigned from images of classes with low intra-class variance could be unrelated to the actual quality in this method. To address this issue, we present IG-FIQA, a novel approach to guide FIQA training, introducing… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  46. arXiv:2403.08187  [pdf, other

    cs.CL cs.SD eess.AS

    Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children

    Authors: Taekyung Ahn, Yeonjung Hong, Younggon Im, Do Hyung Kim, Dayoung Kang, Joo Won Jeong, Jae Won Kim, Min Jung Kim, Ah-ra Cho, Dae-Hyun Jang, Hosung Nam

    Abstract: This study presents a model of automatic speech recognition (ASR) designed to diagnose pronunciation issues in children with speech sound disorders (SSDs) to replace manual transcriptions in clinical procedures. Since ASR models trained for general purposes primarily predict input speech into real words, employing a well-known high-performance ASR model for evaluating pronunciation in children wit… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: 12 pages, 2 figures

    ACM Class: I.2.7

  47. arXiv:2403.07041  [pdf, other

    cs.LG cs.NE

    Ant Colony Sampling with GFlowNets for Combinatorial Optimization

    Authors: Minsu Kim, Sanghyeok Choi, Jiwoo Son, Hyeonah Kim, Jinkyoo Park, Yoshua Bengio

    Abstract: This paper introduces the Generative Flow Ant Colony Sampler (GFACS), a novel neural-guided meta-heuristic algorithm for combinatorial optimization. GFACS integrates generative flow networks (GFlowNets) with the ant colony optimization (ACO) methodology. GFlowNets, a generative model that learns a constructive policy in combinatorial spaces, enhance ACO by providing an informed prior distribution… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 19 pages, 6 figures

  48. arXiv:2403.04460  [pdf, other

    cs.CL

    Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset

    Authors: Minjin Kim, Minju Kim, Hana Kim, Beong-woo Kwak, Soyeon Chun, Hyunseo Kim, SeongKu Kang, Youngjae Yu, Jinyoung Yeo, Dongha Lee

    Abstract: Conversational recommender system is an emerging area that has garnered an increasing interest in the community, especially with the advancements in large language models (LLMs) that enable diverse reasoning over conversational input. Despite the progress, the field has many aspects left to explore. The currently available public datasets for conversational recommendation lack specific user prefer… ▽ More

    Submitted 5 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  49. arXiv:2403.03368  [pdf, other

    cs.LG cs.CY

    Leveraging Federated Learning for Automatic Detection of Clopidogrel Treatment Failures

    Authors: Samuel Kim, Min Sang Kim

    Abstract: The effectiveness of clopidogrel, a widely used antiplatelet medication, varies significantly among individuals, necessitating the development of precise predictive models to optimize patient care. In this study, we leverage federated learning strategies to address clopidogrel treatment failure detection. Our research harnesses the collaborative power of multiple healthcare institutions, allowing… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  50. arXiv:2403.02944  [pdf, other

    cs.CV cs.LG

    Neural Image Compression with Text-guided Encoding for both Pixel-level and Perceptual Fidelity

    Authors: Hagyeong Lee, Minkyu Kim, Jun-Hyuk Kim, Seungeon Kim, Dokwan Oh, Jaeho Lee

    Abstract: Recent advances in text-guided image compression have shown great potential to enhance the perceptual quality of reconstructed images. These methods, however, tend to have significantly degraded pixel-wise fidelity, limiting their practicality. To fill this gap, we develop a new text-guided image compression algorithm that achieves both high perceptual and pixel-wise fidelity. In particular, we pr… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: The first two authors contributed equally