Skip to main content

Showing 1–50 of 365 results for author: Kim, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05749  [pdf, other

    cs.CV

    NeRFFaceSpeech: One-shot Audio-diven 3D Talking Head Synthesis via Generative Prior

    Authors: Gihoon Kim, Kwanggyoon Seo, Sihun Cha, Junyong Noh

    Abstract: Audio-driven talking head generation is advancing from 2D to 3D content. Notably, Neural Radiance Field (NeRF) is in the spotlight as a means to synthesize high-quality 3D talking head outputs. Unfortunately, this NeRF-based approach typically requires a large number of paired audio-visual data for each identity, thereby limiting the scalability of the method. Although there have been attempts to… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  2. arXiv:2405.01016  [pdf, other

    cs.CV cs.AI

    Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction

    Authors: Minsu Kim, Giseop Kim, Sunwook Choi

    Abstract: Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause signifi… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  3. arXiv:2405.00260  [pdf, other

    cs.CV

    CREPE: Coordinate-Aware End-to-End Document Parser

    Authors: Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee

    Abstract: In this study, we formulate an OCR-free sequence generation model for visual document understanding (VDU). Our model not only parses text from document images but also extracts the spatial coordinates of the text based on the multi-head architecture. Named as Coordinate-aware End-to-end Document Parser (CREPE), our method uniquely integrates these capabilities by introducing a special token for OC… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Accepted at the International Conference on Document Analysis and Recognition (ICDAR 2024) main conference

  4. arXiv:2404.19381  [pdf, other

    cs.AR

    Low-overhead General-purpose Near-Data Processing in CXL Memory Expanders

    Authors: Hyungkyu Ham, Jeongmin Hong, Geonwoo Park, Yunseon Shin, Okkyun Woo, Wonhyuk Yang, Jinhoon Bae, Eunhyeok Park, Hyojin Sung, Euicheol Lim, Gwangsun Kim

    Abstract: To overcome the memory capacity wall of large-scale AI and big data applications, Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL.mem protocol stack minimizes interconnect latency, CXL memory accesses can still result in significant slowdowns for memory-bound applications. While near-data processing (NDP) in CXL memory can overc… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  5. arXiv:2404.17218  [pdf, other

    cs.CL

    Prompting Techniques for Reducing Social Bias in LLMs through System 1 and System 2 Cognitive Processes

    Authors: Mahammed Kamruzzaman, Gene Louis Kim

    Abstract: Dual process theory posits that human cognition arises via two systems. System 1, which is a quick, emotional, and intuitive process, which is subject to cognitive biases, and System 2, a slow, onerous, and deliberate process. NLP researchers often compare zero-shot prompting in LLMs to System 1 reasoning and chain-of-thought (CoT) prompting to System 2. In line with this interpretation, prior res… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  6. arXiv:2404.16804  [pdf, other

    cs.CV cs.AI cs.LG

    AAPL: Adding Attributes to Prompt Learning for Vision-Language Models

    Authors: Gahyeon Kim, Sohee Kim, Seokju Lee

    Abstract: Recent advances in large pre-trained vision-language models have demonstrated remarkable performance on zero-shot downstream tasks. Building upon this, recent studies, such as CoOp and CoCoOp, have proposed the use of prompt learning, where context within a prompt is replaced with learnable vectors, leading to significant improvements over manually crafted prompts. However, the performance improve… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 Workshop on Prompting in Vision, Project Page: https://github.com/Gahyeonkim09/AAPL

  7. arXiv:2404.16292  [pdf, other

    cs.GR cs.CV cs.LG

    One Noise to Rule Them All: Learning a Unified Model of Spatially-Varying Noise Patterns

    Authors: Arman Maesumi, Dylan Hu, Krishi Saripalli, Vladimir G. Kim, Matthew Fisher, Sören Pirk, Daniel Ritchie

    Abstract: Procedural noise is a fundamental component of computer graphics pipelines, offering a flexible way to generate textures that exhibit "natural" random variation. Many different types of noise exist, each produced by a separate algorithm. In this paper, we present a single generative model which can learn to generate multiple types of noise as well as blend between them. In addition, it is capable… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: In ACM Transactions on Graphics (Proceedings of SIGGRAPH) 2024, 21 pages

  8. arXiv:2404.15707  [pdf, other

    cs.CV

    ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-view Images

    Authors: Jinseo Jeong, Junseo Koo, Qimeng Zhang, Gunhee Kim

    Abstract: Existing NeRF-based inverse rendering methods suppose that scenes are exclusively illuminated by distant light sources, neglecting the potential influence of emissive sources within a scene. In this work, we confront this limitation using LDR multi-view images captured with emissive sources turned on and off. Two key issues must be addressed: 1) ambiguity arising from the limited dynamic range alo… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  9. arXiv:2404.11925  [pdf, other

    cs.LG cs.AI cs.CV

    EdgeFusion: On-Device Text-to-Image Generation

    Authors: Thibault Castells, Hyoung-Kyu Song, Tairen Piao, Shinkook Choi, Bo-Kyeong Kim, Hanyoung Yim, Changgwun Lee, Jae Gon Kim, Tae-Ho Kim

    Abstract: The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application. To tackle this challenge, recent research focuses on methods to reduce sampling steps, such as Latent Consistency Model (LCM), and on employing architectural optimizations, including pruning and knowledge distillation. Diverging from existing approaches… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 4 pages, accepted to CVPR24 First Workshop on Efficient and On-Device Generation (EDGE)

  10. arXiv:2404.04682  [pdf, other

    cs.LG cs.AI cs.RO

    Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning

    Authors: Yeda Song, Dongwook Lee, Gunhee Kim

    Abstract: Offline reinforcement learning (RL) is a compelling framework for learning optimal policies from past experiences without additional interaction with the environment. Nevertheless, offline RL inevitably faces the problem of distributional shifts, where the states and actions encountered during policy execution may not be in the training dataset distribution. A common solution involves incorporatin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  11. arXiv:2404.04544  [pdf, other

    cs.CV cs.AI

    BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion

    Authors: Gwanghyun Kim, Hayeon Kim, Hoigi Seo, Dong Un Kang, Se Young Chun

    Abstract: Generating higher-resolution human-centric scenes with details and controls remains a challenge for existing text-to-image diffusion models. This challenge stems from limited training image size, text encoder capacity (limited tokens), and the inherent difficulty of generating complex scenes involving multiple humans. While current methods attempted to address training size limit only, they often… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Project page: https://janeyeon.github.io/beyond-scene

  12. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  13. arXiv:2403.15209  [pdf, other

    cs.CV

    MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

    Authors: Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

    Abstract: Multispectral pedestrian detection is attractive for around-the-clock applications due to the complementary information between RGB and thermal modalities. However, current models often fail to detect pedestrians in obvious cases, especially due to the modality bias learned from statistically biased datasets. From these problems, we anticipate that maybe understanding the complementary information… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  14. Bandwidth-Effective DRAM Cache for GPUs with Storage-Class Memory

    Authors: Jeongmin Hong, Sungjun Cho, Geonwoo Park, Wonhyuk Yang, Young-Ho Gong, Gwangsun Kim

    Abstract: We propose overcoming the memory capacity limitation of GPUs with high-capacity Storage-Class Memory (SCM) and DRAM cache. By significantly increasing the memory capacity with SCM, the GPU can capture a larger fraction of the memory footprint than HBM for workloads that oversubscribe memory, achieving high speedups. However, the DRAM cache needs to be carefully designed to address the latency and… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Published in 2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA'24)

  15. arXiv:2403.04207  [pdf, other

    cs.LG cs.DC

    HeteroSwitch: Characterizing and Taming System-Induced Data Heterogeneity in Federated Learning

    Authors: Gyudong Kim, Mehdi Ghasemi, Soroush Heidari, Seungryong Kim, Young Geun Kim, Sarma Vrudhula, Carole-Jean Wu

    Abstract: Federated Learning (FL) is a practical approach to train deep learning models collaboratively across user-end devices, protecting user privacy by retaining raw data on-device. In FL, participating user-end devices are highly fragmented in terms of hardware and software configurations. Such fragmentation introduces a new type of data heterogeneity in FL, namely \textit{system-induced data heterogen… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  16. arXiv:2403.02460  [pdf, other

    cs.GR

    MagicClay: Sculpting Meshes With Generative Neural Fields

    Authors: Amir Barda, Vladimir G. Kim, Noam Aigerman, Amit H. Bermano, Thibault Groueix

    Abstract: The recent developments in neural fields have brought phenomenal capabilities to the field of shape generation, but they lack crucial properties, such as incremental control - a fundamental requirement for artistic work. Triangular meshes, on the other hand, are the representation of choice for most geometry related tasks, offering efficiency and intuitive control, but do not lend themselves to ne… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: project page: https://amir90.github.io/MagicClay.github.io/

  17. arXiv:2403.01300  [pdf, other

    cs.CV

    Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

    Authors: Taeheon Kim, Sebin Shin, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

    Abstract: RGBT multispectral pedestrian detection has emerged as a promising solution for safety-critical applications that require day/night operations. However, the modality bias problem remains unsolved as multispectral pedestrian detectors learn the statistical bias in datasets. Specifically, datasets in multispectral pedestrian detection mainly distribute between ROTO (day) and RXTO (night) data; the m… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

    Comments: CVPR2024

  18. arXiv:2403.00579  [pdf, other

    cs.AR

    NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing

    Authors: Guseul Heo, Sangyeop Lee, Jaehong Cho, Hyunmin Choi, Sanghyeon Lee, Hyungkyu Ham, Gwangsun Kim, Divya Mahajan, Jongse Park

    Abstract: Modern transformer-based Large Language Models (LLMs) are constructed with a series of decoder blocks. Each block comprises three key components: (1) QKV generation, (2) multi-head attention, and (3) feed-forward networks. In batched processing, QKV generation and feed-forward networks involve compute-intensive matrix-matrix multiplications (GEMM), while multi-head attention requires bandwidth-hea… ▽ More

    Submitted 29 March, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: 16 pages, 15 figures

  19. arXiv:2402.16994  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    GEM3D: GEnerative Medial Abstractions for 3D Shape Synthesis

    Authors: Dmitry Petrov, Pradyumn Goyal, Vikas Thamizharasan, Vladimir G. Kim, Matheus Gadelha, Melinos Averkiou, Siddhartha Chaudhuri, Evangelos Kalogerakis

    Abstract: We introduce GEM3D -- a new deep, topology-aware generative model of 3D shapes. The key ingredient of our method is a neural skeleton-based representation encoding information on both shape topology and geometry. Through a denoising diffusion probabilistic model, our method first generates skeleton-based representations following the Medial Axis Transform (MAT), then generates surfaces through a s… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Webpage: https://lodurality.github.io/GEM3D/ -- Cond. accept. to SIGGRAPH 2024 (conf. track) -- Changes (based on reviews): changed style to sigconf; rearranged figures for readability; added missing citations; fixed misaligned centers in Fig. 3; added failure cases (Fig. 10); rewrote discussion; added categories averages to Tab. 8; added Tab. 10 with model capacities

  20. arXiv:2402.12842  [pdf, other

    cs.CL cs.AI cs.LG

    PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning

    Authors: Gyeongman Kim, Doohyuk Jang, Eunho Yang

    Abstract: Recent advancements in large language models (LLMs) have raised concerns about inference costs, increasing the need for research into model compression. While knowledge distillation (KD) is a prominent method for this, research on KD for generative language models like LLMs is relatively sparse, and the approach of distilling student-friendly knowledge, which has shown promising performance in KD… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  21. arXiv:2402.11827  [pdf, other

    cs.IR cs.CL

    Ask Optimal Questions: Aligning Large Language Models with Retriever's Preference in Conversational Search

    Authors: Chanwoong Yoon, Gangwoo Kim, Byeongguk Jeon, Sungdong Kim, Yohan Jo, Jaewoo Kang

    Abstract: Conversational search, unlike single-turn retrieval tasks, requires understanding the current question within a dialogue context. The common approach of rewrite-then-retrieve aims to decontextualize questions to be self-sufficient for off-the-shelf retrievers, but most existing methods produce sub-optimal query rewrites due to the limited ability to incorporate signals from the retrieval results.… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: 8 pages

  22. arXiv:2402.11201  [pdf, other

    cs.CV

    A Decoding Scheme with Successive Aggregation of Multi-Level Features for Light-Weight Semantic Segmentation

    Authors: Jiwon Yoo, Jangwon Lee, Gyeonghwan Kim

    Abstract: Multi-scale architecture, including hierarchical vision transformer, has been commonly applied to high-resolution semantic segmentation to deal with computational complexity with minimum performance loss. In this paper, we propose a novel decoding scheme for semantic segmentation in this regard, which takes multi-level features from the encoder with multi-scale architecture. The decoding scheme ba… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 7 pages, 4 figures

  23. arXiv:2402.06440  [pdf, other

    cs.CR

    A Method for Decrypting Data Infected with Rhysida Ransomware

    Authors: Giyoon Kim, Soojin Kang, Seungjun Baek, Kimoon Kim, Jongsung Kim

    Abstract: Ransomware is malicious software that is a prominent global cybersecurity threat. Typically, ransomware encrypts data on a system, rendering the victim unable to decrypt it without the attacker's private key. Subsequently, victims often pay a substantial ransom to recover their data, yet some may still incur damage or loss. This study examines Rhysida ransomware, which caused significant damage in… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  24. arXiv:2402.02834  [pdf, other

    cs.LG cs.CL

    Shortened LLaMA: A Simple Depth Pruning for Large Language Models

    Authors: Bo-Kyeong Kim, Geonmin Kim, Tae-Ho Kim, Thibault Castells, Shinkook Choi, Junho Shin, Hyoung-Kyu Song

    Abstract: Structured pruning of modern large language models (LLMs) has emerged as a way of decreasing their high computational needs. Width pruning reduces the size of projection weight matrices (e.g., by removing attention heads) while maintaining the number of layers. Depth pruning, in contrast, removes entire layers or blocks, while keeping the size of the remaining weights unchanged. Most current resea… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  25. arXiv:2401.17547  [pdf, other

    cs.CV

    Task-Oriented Diffusion Model Compression

    Authors: Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho

    Abstract: As recent advancements in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper,… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  26. Quantum-Secure Hybrid Blockchain System for DID-based Verifiable Random Function with NTRU Linkable Ring Signature

    Authors: Bong Gon Kim, Dennis Wong, Yoon Seok Yang

    Abstract: In this study, we present a secure smart contract-based Verifiable Random Function (VRF) model, addressing the shortcomings of existing systems. As quantum computing emerges, conventional public key cryptography faces potential vulnerabilities. To enhance our VRF's robustness, we employ post-quantum Ring-LWE encryption for generating pseudo-random sequences. Given the computational intensity of th… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 25 pages, 5 figures, 2023 International Journal on Cryptography and Information Security (IJCIS). arXiv admin note: text overlap with arXiv:2311.11734

    Journal ref: Volume 13, Number 4, December 2023

  27. arXiv:2401.13191  [pdf, other

    cs.CV

    Towards Multi-domain Face Landmark Detection with Synthetic Data from Diffusion model

    Authors: Yuanming Li, Gwantae Kim, Jeong-gi Kwak, Bon-hwa Ku, Hanseok Ko

    Abstract: Recently, deep learning-based facial landmark detection for in-the-wild faces has achieved significant improvement. However, there are still challenges in face landmark detection in other domains (e.g. cartoon, caricature, etc). This is due to the scarcity of extensively annotated training data. To tackle this concern, we design a two-stage training approach that effectively leverages limited data… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 6 pages, ICASSP 2024 accepted

  28. arXiv:2401.09787  [pdf, other

    cs.LG cs.AI stat.ML

    Querying Easily Flip-flopped Samples for Deep Active Learning

    Authors: Seong Jin Cho, Gwangsu Kim, Junghyun Lee, Jinwoo Shin, Chang D. Yoo

    Abstract: Active learning is a machine learning paradigm that aims to improve the performance of a model by strategically selecting and querying unlabeled data. One effective selection strategy is to base it on the model's predictive uncertainty, which can be interpreted as a measure of how informative a sample is. The sample's distance to the decision boundary is a natural measure of predictive uncertainty… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

    Comments: 34 pages, 17 figures, 5 tables. Accepted to the 12th International Conference on Learning Representations (ICLR 2024)

  29. arXiv:2401.08962  [pdf, other

    cs.HC cs.LG cs.SD eess.AS

    DOO-RE: A dataset of ambient sensors in a meeting room for activity recognition

    Authors: Hyunju Kim, Geon Kim, Taehoon Lee, Kisoo Kim, Dongman Lee

    Abstract: With the advancement of IoT technology, recognizing user activities with machine learning methods is a promising way to provide various smart services to users. High-quality data with privacy protection is essential for deploying such services in the real world. Data streams from surrounding ambient sensors are well suited to the requirement. Existing ambient sensor datasets only support constrain… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

  30. arXiv:2401.06591  [pdf, other

    cs.CL

    Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

    Authors: Seongyun Lee, Seungone Kim, Sue Hyun Park, Geewook Kim, Minjoon Seo

    Abstract: Assessing long-form responses generated by Vision-Language Models (VLMs) is challenging. It not only requires checking whether the VLM follows the given instruction but also verifying whether the text output is properly grounded on the given image. Inspired by the recent approach of evaluating LMs with LMs, in this work, we propose to evaluate VLMs with VLMs. For this purpose, we present a new fee… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

    Comments: Work in progress

  31. arXiv:2401.05516  [pdf, other

    cs.CV cs.AI cs.GR

    FPRF: Feed-Forward Photorealistic Style Transfer of Large-Scale 3D Neural Radiance Fields

    Authors: GeonU Kim, Kim Youwang, Tae-Hyun Oh

    Abstract: We present FPRF, a feed-forward photorealistic style transfer method for large-scale 3D neural radiance fields. FPRF stylizes large-scale 3D scenes with arbitrary, multiple style reference images without additional optimization while preserving multi-view appearance consistency. Prior arts required tedious per-style/-scene optimization and were limited to small-scale 3D scenes. FPRF efficiently st… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Project page: https://kim-geonu.github.io/FPRF/

  32. arXiv:2401.04928  [pdf, other

    cs.LG

    Relaxed Contrastive Learning for Federated Learning

    Authors: Seonguk Seo, Jinkyu Kim, Geeho Kim, Bohyung Han

    Abstract: We propose a novel contrastive learning framework to effectively address the challenges of data heterogeneity in federated learning. We first analyze the inconsistency of gradient updates across clients during local training and establish its dependence on the distribution of feature representations, leading to the derivation of the supervised contrastive learning (SCL) objective to mitigate local… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

  33. arXiv:2312.16914  [pdf

    cs.CV

    ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image Identification

    Authors: Ga-Eun Kim, Chang-Hwan Son

    Abstract: The pests captured with imaging devices may be relatively small in size compared to the entire images, and complex backgrounds have colors and textures similar to those of the pests, which hinders accurate feature extraction and makes pest identification challenging. The key to pest identification is to create a model capable of detecting regions of interest (ROIs) and transforming them into bette… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

  34. arXiv:2312.14492  [pdf, other

    cs.CV

    Context Enhanced Transformer for Single Image Object Detection

    Authors: Seungjun An, Seonghoon Park, Gyeongnyeon Kim, Jeongyeol Baek, Byeongwon Lee, Seungryong Kim

    Abstract: With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-bas… ▽ More

    Submitted 26 December, 2023; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: Project page: https://ku-cvlab.github.io/CETR

  35. arXiv:2312.13313  [pdf, other

    eess.IV cs.CV

    ParamISP: Learned Forward and Inverse ISPs using Camera Parameters

    Authors: Woohyeok Kim, Geonu Kim, Junyong Lee, Seungyong Lee, Seung-Hwan Baek, Sunghyun Cho

    Abstract: RAW images are rarely shared mainly due to its excessive data size compared to their sRGB counterparts obtained by camera ISPs. Learning the forward and inverse processes of camera ISPs has been recently demonstrated, enabling physically-meaningful RAW-level image processing on input sRGB images. However, existing learning-based ISP methods fail to handle the large variations in the ISP processes… ▽ More

    Submitted 14 April, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

  36. arXiv:2312.08961  [pdf, other

    cs.RO

    Contact-Implicit MPC: Controlling Diverse Quadruped Motions Without Pre-Planned Contact Modes or Trajectories

    Authors: Gijeong Kim, Dongyun Kang, Joon-Ha Kim, Seungwoo Hong, Hae-Won Park

    Abstract: This paper presents a contact-implicit model predictive control (MPC) framework for the real-time discovery of multi-contact motions, without predefined contact mode sequences or foothold positions. This approach utilizes the contact-implicit differential dynamic programming (DDP) framework, merging the hard contact model with a linear complementarity constraint. We propose the analytical gradient… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 22 pages, 19 figures, submitted to International Journal of Robotics Research (IJRR)

  37. arXiv:2312.05465  [pdf, other

    cs.LG eess.SY

    On Task-Relevant Loss Functions in Meta-Reinforcement Learning and Online LQR

    Authors: Jaeuk Shin, Giho Kim, Howon Lee, Joonho Han, Insoon Yang

    Abstract: Designing a competent meta-reinforcement learning (meta-RL) algorithm in terms of data usage remains a central challenge to be tackled for its successful real-world applications. In this paper, we propose a sample-efficient meta-RL algorithm that learns a model of the system or environment at hand in a task-directed manner. As opposed to the standard model-based approaches to meta-RL, our method e… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  38. arXiv:2311.18654  [pdf, other

    cs.CV cs.AI

    Detailed Human-Centric Text Description-Driven Large Scene Synthesis

    Authors: Gwanghyun Kim, Dong Un Kang, Hoigi Seo, Hayeon Kim, Se Young Chun

    Abstract: Text-driven large scene image synthesis has made significant progress with diffusion models, but controlling it is challenging. While using additional spatial controls with corresponding texts has improved the controllability of large scene synthesis, it is still challenging to faithfully reflect detailed text descriptions without user-provided controls. Here, we propose DetText2Scene, a novel tex… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  39. arXiv:2311.16739  [pdf, other

    cs.CV cs.GR

    As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors

    Authors: Seungwoo Yoo, Kunho Kim, Vladimir G. Kim, Minhyuk Sung

    Abstract: We present As-Plausible-as-Possible (APAP) mesh deformation technique that leverages 2D diffusion priors to preserve the plausibility of a mesh under user-controlled deformation. Our framework uses per-face Jacobians to represent mesh deformations, where mesh vertex coordinates are computed via a differentiable Poisson Solve. The deformed mesh is rendered, and the resulting 2D image is used in the… ▽ More

    Submitted 30 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: Project page: https://as-plausible-as-possible.github.io/

  40. arXiv:2311.14496  [pdf, other

    cs.CR

    RTPS Attack Dataset Description

    Authors: Dong Young Kim, Dongsung Kim, Yuchan Song, Gang Min Kim, Min Geun Song, Jeong Do Yoo, Huy Kang Kim

    Abstract: This paper explains all about our RTPS datasets. We collect malicious/benign packet data by injecting attack data in an Unmanned Ground Vehicle (UGV) in the normal state. We assembled the testbed, consisting of UGV, Controller, PC, and Router. We collect this dataset in the UGV part of our testbed. We conducted two types of attack "Command Injection" and "Command Injection with ARP Spoofing" on… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 November, 2023; originally announced November 2023.

    Comments: This manuscript is written in Korean. You can download our dataset through our lab: https://ocslab.hksecurity.net/Datasets/rtps-attack-dataset We welcome your comments or feedback. Contact INFO: Dong Young Kim (klgh1256@korea.ac.kr), Huy Kang Kim (cenda@korea.ac.kr)

  41. Private and Secure Post-Quantum Verifiable Random Function with NIZK Proof and Ring-LWE Encryption in Blockchain

    Authors: Bong Gon Kim, Dennis Wong, Yoon Seok Yang

    Abstract: We present a secure and private blockchain-based Verifiable Random Function (VRF) scheme addressing some limitations of classical VRF constructions. Given the imminent quantum computing adversarial scenario, conventional cryptographic methods face vulnerabilities. To enhance our VRF's secure randomness, we adopt post-quantum Ring-LWE encryption for synthesizing pseudo-random sequences. Considering… ▽ More

    Submitted 7 February, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: 21 pages, 5 figures, In the 2023 Proceedings of International Conference on Cryptography and Blockchain

    Journal ref: Proceedings of International Conference on Cryptography and Blockchain, 13(21), 47-67 (2023)

  42. arXiv:2311.05241  [pdf, other

    cs.LG stat.ML

    When Meta-Learning Meets Online and Continual Learning: A Survey

    Authors: Jaehyeon Son, Soochan Lee, Gunhee Kim

    Abstract: Over the past decade, deep neural networks have demonstrated significant success using the training scheme that involves mini-batch stochastic gradient descent on extensive datasets. Expanding upon this accomplishment, there has been a surge in research exploring the application of neural networks in other learning scenarios. One notable framework that has garnered significant attention is meta-le… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  43. arXiv:2311.02570  [pdf, other

    cs.CL

    BanMANI: A Dataset to Identify Manipulated Social Media News in Bangla

    Authors: Mahammed Kamruzzaman, Md. Minul Islam Shovon, Gene Louis Kim

    Abstract: Initial work has been done to address fake news detection and misrepresentation of news in the Bengali language. However, no work in Bengali yet addresses the identification of specific claims in social media news that falsely manipulates a related news article. At this point, this problem has been tackled in English and a few other languages, but not in the Bengali language. In this paper, we cur… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

  44. arXiv:2310.18932  [pdf, other

    cs.AI

    Self Attention with Temporal Prior: Can We Learn More from Arrow of Time?

    Authors: Kyung Geun Kim, Byeong Tak Lee

    Abstract: Many diverse phenomena in nature often inherently encode both short- and long-term temporal dependencies, which especially result from the direction of the flow of time. In this respect, we discovered experimental evidence suggesting that interrelations of these events are higher for closer time stamps. However, to be able for attention-based models to learn these regularities in short-term depend… ▽ More

    Submitted 26 April, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  45. arXiv:2310.15421  [pdf, other

    cs.CL cs.AI

    FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions

    Authors: Hyunwoo Kim, Melanie Sclar, Xuhui Zhou, Ronan Le Bras, Gunhee Kim, Yejin Choi, Maarten Sap

    Abstract: Theory of mind (ToM) evaluations currently focus on testing models using passive narratives that inherently lack interactivity. We introduce FANToM, a new benchmark designed to stress-test ToM within information-asymmetric conversational contexts via question answering. Our benchmark draws upon important theoretical requisites from psychology and necessary empirical considerations when evaluating… ▽ More

    Submitted 31 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023. Code and dataset can be found here: https://hyunw.kim/fantom

  46. arXiv:2310.14696  [pdf, other

    cs.CL

    Tree of Clarifications: Answering Ambiguous Questions with Retrieval-Augmented Large Language Models

    Authors: Gangwoo Kim, Sungdong Kim, Byeongguk Jeon, Joonsuk Park, Jaewoo Kang

    Abstract: Questions in open-domain question answering are often ambiguous, allowing multiple interpretations. One approach to handling them is to identify all possible interpretations of the ambiguous question (AQ) and to generate a long-form answer addressing them all, as suggested by Stelmakh et al., (2022). While it provides a comprehensive response without bothering the user for clarification, consideri… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023

  47. arXiv:2310.14159  [pdf, other

    cs.CL cs.CV

    Can Language Models Laugh at YouTube Short-form Videos?

    Authors: Dayoon Ko, Sangho Lee, Gunhee Kim

    Abstract: As short-form funny videos on social networks are gaining popularity, it becomes demanding for AI models to understand them for better communication with humans. Unfortunately, previous video humor datasets target specific domains, such as speeches or sitcoms, and mostly focus on verbal cues. We curate a user-generated dataset of 10K multimodal funny videos from YouTube, called ExFunTube. Using a… ▽ More

    Submitted 31 March, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023; references added

  48. arXiv:2310.11952  [pdf, other

    cs.LG

    Recasting Continual Learning as Sequence Modeling

    Authors: Soochan Lee, Jaehyeon Son, Gunhee Kim

    Abstract: In this work, we aim to establish a strong connection between two significant bodies of machine learning research: continual learning and sequence modeling. That is, we propose to formulate continual learning as a sequence modeling problem, allowing advanced sequence models to be utilized for continual learning. Under this formulation, the continual learning process becomes the forward pass of a s… ▽ More

    Submitted 14 January, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  49. arXiv:2310.07814  [pdf, other

    cs.GR cs.CV cs.LG

    Explorable Mesh Deformation Subspaces from Unstructured Generative Models

    Authors: Arman Maesumi, Paul Guerrero, Vladimir G. Kim, Matthew Fisher, Siddhartha Chaudhuri, Noam Aigerman, Daniel Ritchie

    Abstract: Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: SIGGRAPH Asia 2023, 15 pages

  50. arXiv:2310.07448  [pdf, other

    cs.DM math.CO

    Faster Location in Combinatorial Interaction Testing

    Authors: Ryan E. Dougherty, Dylan N. Green, Grace M. Kim

    Abstract: Factors within a large-scale software system that simultaneously interact and strongly impact the system's response under a configuration are often difficult to identify. Although screening such a system for the existence of such interactions is important, determining their location is more useful for system engineers. Combinatorial interaction testing (CIT) concerns creation of test suites that n… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.