Skip to main content

Showing 1–50 of 728 results for author: Han, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04532  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

    Authors: Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

    Abstract: Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at https://github.com/mit-han-lab/qserve

  2. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2405.01974  [pdf, other

    cs.LG cs.AI q-bio.QM

    Multitask Extension of Geometrically Aligned Transfer Encoder

    Authors: Sung Moon Ko, Sumin Lee, Dae-Woong Jeong, Hyunseung Kim, Chanhui Lee, Soorin Yim, Sehui Han

    Abstract: Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transf… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 7 pages, 3 figures, 2 tables

  4. arXiv:2405.01554  [pdf, other

    cs.LG cs.AI q-bio.NC

    Early-stage detection of cognitive impairment by hybrid quantum-classical algorithm using resting-state functional MRI time-series

    Authors: Junggu Choi, Tak Hur, Daniel K. Park, Na-Young Shin, Seung-Koo Lee, Hakbae Lee, Sanghoon Han

    Abstract: Following the recent development of quantum machine learning techniques, the literature has reported several quantum machine learning algorithms for disease detection. This study explores the application of a hybrid quantum-classical algorithm for classifying region-of-interest time-series data obtained from resting-state functional magnetic resonance imaging in patients with early-stage cognitive… ▽ More

    Submitted 16 March, 2024; originally announced May 2024.

    Comments: 28 pages, 10 figures

  5. arXiv:2405.00021  [pdf, other

    cs.CV cs.AI cs.CL

    SIMPLOT: Enhancing Chart Question Answering by Distilling Essentials

    Authors: Wonjoong Kim, Sangwu Park, Yeonjun In, Seokwon Han, Chanyoung Park

    Abstract: Recently, interpreting complex charts with logical reasoning have emerged as challenges due to the development of vision-language models. A prior state-of-the-art (SOTA) model, Deplot, has presented an end-to-end method that leverages the vision-language model to convert charts into table format utilizing Large Language Models (LLMs) for reasoning. However, unlike natural images, charts contain a… ▽ More

    Submitted 22 February, 2024; originally announced May 2024.

  6. arXiv:2404.17485  [pdf, other

    cs.NI

    A Survey on Industrial Internet of Things (IIoT) Testbeds for Connectivity Research

    Authors: Tianyu Zhang, Chuanyu Xue, Jiachen Wang, Zelin Yun, Natong Lin, Song Han

    Abstract: Industrial Internet of Things (IIoT) technologies have revolutionized industrial processes, enabling smart automation, real-time data analytics, and improved operational efficiency across diverse industry sectors. IIoT testbeds play a critical role in advancing IIoT research and development (R&D) to provide controlled environments for technology evaluation before their real-world deployment. In th… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  7. arXiv:2404.13033  [pdf, other

    cs.CL

    Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs

    Authors: Biyang Guo, He Wang, Wenyilin Xiao, Hong Chen, Zhuxin Lee, Songqiao Han, Hailiang Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs) like ChatGPT and LLaMA, Prompt Engineering (PE) is renowned for boosting zero-shot or in-context learning (ICL) through prompt modifications. Yet, the realm of the sample design for downstream fine-tuning, crucial for task-specific LLM adaptation, is largely unexplored. This paper introduces Sample Design Engineering (SDE), a methodical appro… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 23 pages, 12 figures, 14 tables

  8. arXiv:2404.12720  [pdf, other

    cs.CV cs.CL

    PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering

    Authors: Yihao Ding, Kaixuan Ren, Jiabin Huang, Siwen Luo, Soyeon Caren Han

    Abstract: Document Question Answering (QA) presents a challenge in understanding visually-rich documents (VRD), particularly those dominated by lengthy textual content like research journal articles. Existing studies primarily focus on real-world documents with sparse text, while challenges persist in comprehending the hierarchical semantic relations among multiple pages to locate multimodal components. To… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  9. SIGformer: Sign-aware Graph Transformer for Recommendation

    Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

    Abstract: In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  10. arXiv:2404.11905  [pdf, other

    cs.LG cs.CR

    FedMID: A Data-Free Method for Using Intermediate Outputs as a Defense Mechanism Against Poisoning Attacks in Federated Learning

    Authors: Sungwon Han, Hyeonho Song, Sungwon Park, Meeyoung Cha

    Abstract: Federated learning combines local updates from clients to produce a global model, which is susceptible to poisoning attacks. Most previous defense strategies relied on vectors derived from projections of local updates on a Euclidean space; however, these methods fail to accurately represent the functionality and structure of local models, resulting in inconsistent performance. Here, we present a n… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  11. arXiv:2404.09491  [pdf, other

    cs.LG

    Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

    Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

    Abstract: Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. T… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML, 2024

  12. Hyperbolic Heterogeneous Graph Attention Networks

    Authors: Jongmin Park, Seunghoon Han, Soohwan Jeong, Sungsu Lim

    Abstract: Most previous heterogeneous graph embedding models represent elements in a heterogeneous graph as vector representations in a low-dimensional Euclidean space. However, because heterogeneous graphs inherently possess complex structures, such as hierarchical or power-law structures, distortions can occur when representing them in Euclidean space. To overcome this limitation, we propose Hyperbolic He… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted in ACM THE WEB CONFERENCE 2024 short paper track

  13. arXiv:2404.05555  [pdf, other

    cs.LG cs.AI stat.ML

    On the Convergence of Continual Learning with Adaptive Methods

    Authors: Seungyub Han, Yeongmo Kim, Taehyun Cho, Jungwoo Lee

    Abstract: One of the objectives of continual learning is to prevent catastrophic forgetting in learning multiple tasks sequentially, and the existing solutions have been driven by the conceptualization of the plasticity-stability dilemma. However, the convergence of continual learning for each sequential task is less studied so far. In this paper, we provide a convergence analysis of memory-based continual… ▽ More

    Submitted 15 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Proceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence (UAI 2023), see https://proceedings.mlr.press/v216/han23a.html

    Journal ref: PMLR 216:809-818, 2023

  14. arXiv:2404.05297  [pdf, other

    cs.CR cs.SE

    Automated Attack Synthesis for Constant Product Market Makers

    Authors: Sujin Han, Jinseo Kim, Sung-Ju Lee, Insu Yun

    Abstract: Decentralized Finance enables many novel applications that were impossible in traditional finances. However, it also introduces new types of vulnerabilities, such as composability bugs. The composability bugs refer to issues that lead to erroneous behaviors when multiple smart contracts operate together. One typical example of composability bugs is those between token contracts and Constant Produc… ▽ More

    Submitted 24 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 12 pages, 8 figures

  15. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  16. arXiv:2404.01143  [pdf, other

    cs.CV cs.AI

    Condition-Aware Neural Network for Controlled Image Generation

    Authors: Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han

    Abstract: We present Condition-Aware Neural Network (CAN), a new method for adding control to image generative models. In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network. This is achieved by introducing a condition-aware weight generation module that generates conditional weight for convolution/linear layer… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  17. Tiny Machine Learning: Progress and Futures

    Authors: Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Song Han

    Abstract: Tiny Machine Learning (TinyML) is a new frontier of machine learning. By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence. However, TinyML is challenging due to hardware constraints: the tiny memory resource makes it difficult to hold deep learning models designed for cloud and mobile… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:2206.15472

    Journal ref: IEEE Circuits and Systems Magazine, 23(3), pp. 8-34, October 2023

  18. arXiv:2403.18695  [pdf, other

    eess.SY cs.RO

    An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle Behaviors

    Authors: Luyao Zhang, George Pantazis, Shaohang Han, Sergio Grammatico

    Abstract: One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we pro… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  19. arXiv:2403.17400  [pdf, other

    cs.NI cs.DC

    A Survey on Resource Management in Joint Communication and Computing-Embedded SAGIN

    Authors: Qian Chen, Zheng Guo, Weixiao Meng, Shuai Han, Cheng Li, Tony Q. S. Quek

    Abstract: The advent of the 6G era aims for ubiquitous connectivity, with the integration of non-terrestrial networks (NTN) offering extensive coverage and enhanced capacity. As manufacturing advances and user demands evolve, space-air-ground integrated networks (SAGIN) with computational capabilities emerge as a viable solution for services requiring low latency and high computational power. Resource manag… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 43 pages, 17 figures

  20. arXiv:2403.14944  [pdf, other

    cs.CV

    CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model

    Authors: Seungdae Han, Joohee Kim

    Abstract: There has been a significant progress in text conditional image generation models. Recent advancements in this field depend not only on improvements in model structures, but also vast quantities of text-image paired datasets. However, creating these kinds of datasets is very costly and requires a substantial amount of labor. Famous face datasets don't have corresponding text captions, making it di… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: 15 pages, 9 figures

  21. arXiv:2403.13583  [pdf, other

    cs.SE cs.CL cs.LG

    CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing

    Authors: Xinyi He, Jiaru Zou, Yun Lin, Mengyu Zhou, Shi Han, Zejian Yuan, Dongmei Zhang

    Abstract: Large Language Models (LLMs) have revolutionized code generation ability by converting natural language descriptions into executable code. However, generating complex code within real-world scenarios remains challenging due to intricate structures, subtle bugs, understanding of advanced data types, and lack of supplementary contents. To address these challenges, we introduce the CONLINE framework,… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  22. arXiv:2403.12953  [pdf, other

    cs.CV

    FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

    Authors: Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

    Abstract: In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame fea… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  23. arXiv:2403.11834  [pdf, other

    cs.CL cs.LG

    Towards Understanding the Relationship between In-context Learning and Compositional Generalization

    Authors: Sungjun Han, Sebastian Padó

    Abstract: According to the principle of compositional generalization, the meaning of a complex expression can be understood as a function of the meaning of its parts and of how they are combined. This principle is crucial for human language processing and also, arguably, for NLP models in the face of out-of-distribution data. However, many neural network models, including Transformers, have been shown to st… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: To be published in LREC-COLING 2024

  24. arXiv:2403.08302  [pdf, other

    cs.RO

    Online Multi-Contact Feedback Model Predictive Control for Interactive Robotic Tasks

    Authors: Seo Wook Han, Maged Iskandar, Jinoh Lee, Min Jun Kim

    Abstract: In this paper, we propose a model predictive control (MPC) that accomplishes interactive robotic tasks, in which multiple contacts may occur at unknown locations. To address such scenarios, we made an explicit contact feedback loop in the MPC framework. An algorithm called Multi-Contact Particle Filter with Exploration Particle (MCP-EP) is employed to establish real-time feedback of multi-contact… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for publication at the IEEE International Conference on Robotics and Automation (ICRA), Yokohama, 2024

  25. arXiv:2403.06747  [pdf, other

    cs.IR

    MetaSplit: Meta-Split Network for Limited-Stock Product Recommendation

    Authors: Wenhao Wu, Jialiang Zhou, Ailong He, Shuguang Han, Jufeng Chen, Bo Zheng

    Abstract: Compared to business-to-consumer (B2C) e-commerce systems, consumer-to-consumer (C2C) e-commerce platforms usually encounter the limited-stock problem, that is, a product can only be sold one time in a C2C system. This poses several unique challenges for click-through rate (CTR) prediction. Due to limited user interactions for each product (i.e. item), the corresponding item embedding in the CTR m… ▽ More

    Submitted 27 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted at WWW 2024. This work has already been deployed on the Xianyu platform in Alibaba. The first two authors contributed equally

  26. arXiv:2403.05912  [pdf, other

    eess.IV cs.CV

    Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation

    Authors: Hairong Shi, Songhao Han, Shaofei Huang, Yue Liao, Guanbin Li, Xiangxing Kong, Hua Zhu, Xiaomu Wang, Si Liu

    Abstract: Tumor lesion segmentation on CT or MRI images plays a critical role in cancer diagnosis and treatment planning. Considering the inherent differences in tumor lesion segmentation data across various medical imaging modalities and equipment, integrating medical knowledge into the Segment Anything Model (SAM) presents promising capability due to its versatility and generalization potential. Recent st… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  27. arXiv:2403.04212  [pdf, other

    cs.CL

    Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

    Authors: Seunghee Han, Se Jin Park, Chae Won Kim, Yong Man Ro

    Abstract: Providing emotional support through dialogue systems is becoming increasingly important in today's world, as it can support both mental health and social interactions in many conversation scenarios. Previous works have shown that using persona is effective for generating empathetic and supportive responses. They have often relied on pre-provided persona rather than inferring them during conversati… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Accepted by ICASSP2024

  28. arXiv:2403.02767  [pdf, other

    cs.CV

    DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

    Authors: Cheng Huang, Shoudong Han, Mengyu He, Wenbo Zheng, Yuhao Wei

    Abstract: Accurate data association is crucial in reducing confusion, such as ID switches and assignment errors, in multi-object tracking (MOT). However, existing advanced methods often overlook the diversity among trajectories and the ambiguity and conflicts present in motion and appearance cues, leading to confusion among detections, trajectories, and associations when performing simple global data associ… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  29. arXiv:2403.01394  [pdf, ps, other

    cs.IT

    Successful Transmission Probability and SIR Meta Distribution Analysis for Multi-Antenna Cache-Enabled Networks with Interference Nulling

    Authors: Tianming Feng, Chenyu Wu, Xiaodong Zheng, Peilin Chen, Yilong Liu, Shuai Han

    Abstract: This paper investigates a multi-antenna cache-enabled network with interference nulling (IN) employed at base stations. Two IN schemes, namely, the fixed IN scheme and the flexible IN scheme are considered to improve the received signal-to-interference ratio (SIR) at users. To thoroughly explore the effects of the caching parameter and the IN parameters on the network performance, we focus on the… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  30. arXiv:2402.19481  [pdf, other

    cs.CV

    DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

    Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

    Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method split… ▽ More

    Submitted 15 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: CVPR 2024 Highlight Code: https://github.com/mit-han-lab/distrifuser Website: https://hanlab.mit.edu/projects/distrifusion Blog: https://hanlab.mit.edu/blog/distrifusion

  31. arXiv:2402.19101  [pdf, other

    cs.IR cs.LG

    Effective Two-Stage Knowledge Transfer for Multi-Entity Cross-Domain Recommendation

    Authors: Jianyu Guan, Zongming Yin, Tianyi Zhang, Leihui Chen, Yin Zhang, Fei Huang, Jufeng Chen, Shuguang Han

    Abstract: In recent years, the recommendation content on e-commerce platforms has become increasingly rich -- a single user feed may contain multiple entities, such as selling products, short videos, and content posts. To deal with the multi-entity recommendation problem, an intuitive solution is to adopt the shared-network-based architecture for joint training. The idea is to transfer the extracted knowled… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  32. arXiv:2402.17963  [pdf, other

    cs.DC

    The Design and Implementation of a High-Performance Log-Structured RAID System for ZNS SSDs

    Authors: Jinhong Li, Qiuping Wang, Shujie Han, Patrick P. C. Lee

    Abstract: Zoned Namespace (ZNS) defines a new abstraction for host software to flexibly manage storage in flash-based SSDs as append-only zones. It also provides a Zone Append primitive to further boost the write performance of ZNS SSDs by exploiting intra-zone parallelism. However, making Zone Append effective for reliable and scalable storage, in the form of a RAID array of multiple ZNS SSDs, is non-trivi… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 29 pages

    ACM Class: C.4; C.5.0

  33. arXiv:2402.17127  [pdf, other

    cs.SD eess.AS

    Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0

    Authors: Taein Kang, Soyul Han, Sunmook Choi, Jaejin Seo, Sanghyeok Chung, Seungeun Lee, Seungsang Oh, Il-Youp Kwak

    Abstract: Conventional spoofing detection systems have heavily relied on the use of handcrafted features derived from speech data. However, a notable shift has recently emerged towards the direct utilization of raw speech waveforms, as demonstrated by methods like SincNet filters. This shift underscores the demand for more sophisticated audio sample features. Moreover, the success of deep learning models, p… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 5 pages

    MSC Class: 00A71 ACM Class: I.2.6

  34. arXiv:2402.15151  [pdf, other

    cs.CV cs.CL eess.AS eess.IV

    Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

    Authors: Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

    Abstract: In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements. For example, homophenes, words that share identical lip movements but produce different sounds, can be distinguished by considering the context. In this paper, we propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM),… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  35. arXiv:2402.13776  [pdf, other

    eess.IV cs.CV cs.LG

    Cas-DiffCom: Cascaded diffusion model for infant longitudinal super-resolution 3D medical image completion

    Authors: Lianghu Guo, Tianli Tao, Xinyi Cai, Zihao Zhu, Jiawei Huang, Lixuan Zhu, Zhuoyang Gu, Haifeng Tang, Rui Zhou, Siyan Han, Yan Liang, Qing Yang, Dinggang Shen, Han Zhang

    Abstract: Early infancy is a rapid and dynamic neurodevelopmental period for behavior and neurocognition. Longitudinal magnetic resonance imaging (MRI) is an effective tool to investigate such a crucial stage by capturing the developmental trajectories of the brain structures. However, longitudinal MRI acquisition always meets a serious data-missing problem due to participant dropout and failed scans, makin… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  36. arXiv:2402.13011  [pdf, other

    physics.soc-ph cond-mat.stat-mech cs.GT nlin.AO

    An evolutionary game with reputation-based imitation-mutation dynamics

    Authors: Kehuan Feng, Songlin Han, Minyu Feng, Attila Szolnoki

    Abstract: Reputation plays a crucial role in social interactions by affecting the fitness of individuals during an evolutionary process. Previous works have extensively studied the result of imitation dynamics without focusing on potential irrational choices in strategy updates. We now fill this gap and explore the consequence of such kind of randomness, or one may interpret it as an autonomous thinking. In… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 13 pages, 8 figures, to be published in Applied Mathematics and Computation

    Journal ref: Appl. Math. Comput. 472 (2024) 128618

  37. arXiv:2402.10193  [pdf, other

    cs.LG cs.CL

    BitDelta: Your Fine-Tune May Only Be Worth One Bit

    Authors: James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

    Abstract: Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into t… ▽ More

    Submitted 27 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  38. arXiv:2402.09754  [pdf, other

    stat.ML cs.LG math.ST

    Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

    Authors: Sangil Han, Kyoowon Kim, Sungkyu Jung

    Abstract: The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  39. arXiv:2402.07412  [pdf, other

    cs.LG cs.AI

    Auxiliary Reward Generation with Transition Distance Representation Learning

    Authors: Siyuan Li, Shijie Han, Yingnan Zhao, By Liang, Peng Liu

    Abstract: Reinforcement learning (RL) has shown its strength in challenging sequential decision-making problems. The reward function in RL is crucial to the learning performance, as it serves as a measure of the task completion degree. In real-world problems, the rewards are predominantly human-designed, which requires laborious tuning, and is easily affected by human cognitive biases. To achieve automatic… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  40. arXiv:2402.06918  [pdf, other

    cs.LG cs.AI cs.CL

    Generating Chain-of-Thoughts with a Direct Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought

    Authors: Zhen-Yu Zhang, Siwei Han, Huaxiu Yao, Gang Niu, Masashi Sugiyama

    Abstract: To improve the ability of the large language model (LLMs) to handle complex reasoning problems, chain-of-thoughts (CoT) methods were proposed to guide LLMs to reason step-by-step, facilitating problem solving from simple to complex tasks. State-of-the-art approaches for generating such a chain involve interactive collaboration, where the learner generates candidate intermediate thoughts, evaluated… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  41. arXiv:2402.05008  [pdf, other

    cs.CV cs.AI cs.LG

    EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

    Authors: Zhuoyang Zhang, Han Cai, Song Han

    Abstract: We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from Efficie… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Comments: tech report

  42. arXiv:2402.04617  [pdf, other

    cs.CL cs.AI cs.LG

    InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

    Authors: Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, Maosong Sun

    Abstract: Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs, such as LLM-driven agents. However, existing LLMs, pre-trained on sequences with restricted maximum length, cannot generalize to longer sequences due to the out-of-domain and distraction issues. To alleviate these issues, existing efforts employ sliding attention windows and discard… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  43. arXiv:2402.04287  [pdf

    q-bio.NC cs.ET quant-ph

    Association between Prefrontal fNIRS signals during Cognitive tasks and College scholastic ability test (CSAT) scores: Analysis using a quantum annealing approach

    Authors: Yeaju Kim, Junggu Choi, Bora Kim, Yongwan Park, Jihyun Cha, Jongkwan Choi, Sanghoon Han

    Abstract: Academic achievement is a critical measure of intellectual ability, prompting extensive research into cognitive tasks as potential predictors. Neuroimaging technologies, such as functional near-infrared spectroscopy (fNIRS), offer insights into brain hemodynamics, allowing understanding of the link between cognitive performance and academic achievement. Herein, we explored the association between… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 42 pages, 11 tables

  44. arXiv:2402.03578  [pdf, other

    cs.MA cs.AI

    LLM Multi-Agent Systems: Challenges and Open Problems

    Authors: Shanshan Han, Qifan Zhang, Yuhang Yao, Weizhao Jin, Zhaozhuo Xu, Chaoyang He

    Abstract: This paper explores existing works of multi-agent systems and identifies challenges that remain inadequately addressed. By leveraging the diverse capabilities and roles of individual agents within a multi-agent system, these systems can tackle complex tasks through collaboration. We discuss optimizing task allocation, fostering robust reasoning through iterative debates, managing complex and layer… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  45. arXiv:2402.03348  [pdf, other

    cs.CV cs.AI

    Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

    Authors: Sangyu Han, Yearim Kim, Nojun Kwak

    Abstract: The truthfulness of existing explanation methods in authentically elucidating the underlying model's decision-making process has been questioned. Existing methods have deviated from faithfully representing the model, thus susceptible to adversarial attacks. To address this, we propose a novel eXplainable AI (XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects the model's… ▽ More

    Submitted 25 January, 2024; originally announced February 2024.

    Comments: To be published in ICLR 2024

  46. arXiv:2402.02315  [pdf, other

    cs.CL q-fin.GN

    A Survey of Large Language Models in Finance (FinLLMs)

    Authors: Jean Lee, Nicholas Stevens, Soyeon Caren Han, Minseok Song

    Abstract: Large Language Models (LLMs) have shown remarkable capabilities across a wide variety of Natural Language Processing (NLP) tasks and have attracted attention from multiple domains, including financial services. Despite the extensive research into general-domain LLMs, and their immense potential in finance, Financial LLM (FinLLM) research remains limited. This survey provides a comprehensive overvi… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: More information on https://github.com/adlnlp/FinLLMs

  47. arXiv:2402.01797  [pdf, other

    cs.LG math.OC stat.CO

    Robust support vector machines via conic optimization

    Authors: Valentina Cepeda, Andrés Gómez, Shaoning Han

    Abstract: We consider the problem of learning support vector machines robust to uncertainty. It has been established in the literature that typical loss functions, including the hinge loss, are sensible to data perturbations and outliers, thus performing poorly in the setting considered. In contrast, using the 0-1 loss or a suitable non-convex approximation results in robust estimators, at the expense of la… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  48. arXiv:2402.00319  [pdf, other

    cs.CV cs.AI

    SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling

    Authors: Eileen Wang, Soyeon Caren Han, Josiah Poon

    Abstract: Visual storytelling aims to automatically generate a coherent story based on a given image sequence. Unlike tasks like image captioning, visual stories should contain factual descriptions, worldviews, and human social commonsense to put disjointed elements together to form a coherent and engaging human-writeable story. However, most models mainly focus on applying factual information and using tax… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  49. arXiv:2401.17467  [pdf

    physics.soc-ph cs.SI

    An entropy-based measurement for understanding origin-destination trip distributions: a case study of New York City taxis

    Authors: Yuqin Jiang, Yihong Yuan, Su Yeon Han

    Abstract: A comprehensive understanding of human mobility patterns in urban areas is essential for urban development and transportation planning. In this study, we create entropy-based measurements to capture the geographical distribution diversity of trip origins and destinations. Specifically, we develop origin-entropy and destination-entropy based on taxi and ride-sharing trip records. The origin-entropy… ▽ More

    Submitted 16 April, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  50. arXiv:2401.17450  [pdf, other

    quant-ph cs.AR eess.SY

    Qplacer: Frequency-Aware Component Placement for Superconducting Quantum Computers

    Authors: Junyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran Chen

    Abstract: Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized by unwanted interactions among neighboring components on quantum chips, including qubits, resonators, and substrate. We motivate a general approach to… ▽ More

    Submitted 8 May, 2024; v1 submitted 30 January, 2024; originally announced January 2024.