Skip to main content

Showing 1–50 of 422 results for author: Lin, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00308  [pdf

    cs.CR stat.AP

    FPGA Digital Dice using Pseudo Random Number Generator

    Authors: Michael Lim Kee Hian, Ten Wei Lin, Zachary Wu Xuan, Stephanie-Ann Loy, Maoyang Xiang, T. Hui Teo

    Abstract: The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital d… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 15 pages, 5 figures

  2. arXiv:2404.19752  [pdf, other

    cs.CV

    Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation

    Authors: Yunhao Ge, Xiaohui Zeng, Jacob Samuel Huffman, Tsung-Yi Lin, Ming-Yu Liu, Yin Cui

    Abstract: Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning model… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  3. arXiv:2404.16823  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Learning Visuotactile Skills with Two Multifingered Hands

    Authors: Toru Lin, Yu Zhang, Qiyang Li, Haozhi Qi, Brent Yi, Sergey Levine, Jitendra Malik

    Abstract: Aiming to replicate human-like dexterity, perceptual experiences, and motion patterns, we explore learning from human demonstrations using a bimanual system with multifingered hands and visuotactile data. Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hard… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Code and Project Website: https://toruowo.github.io/hato/

  4. arXiv:2404.15635  [pdf, other

    cs.CV cs.LG

    A Real-time Evaluation Framework for Pedestrian's Potential Risk at Non-Signalized Intersections Based on Predicted Post-Encroachment Time

    Authors: Tengfeng Lin, Zhixiong Jin, Seongjin Choi, Hwasoo Yeo

    Abstract: Addressing pedestrian safety at intersections is one of the paramount concerns in the field of transportation research, driven by the urgency of reducing traffic-related injuries and fatalities. With advances in computer vision technologies and predictive models, the pursuit of developing real-time proactive protection systems is increasingly recognized as vital to improving pedestrian safety at i… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  5. arXiv:2404.14387  [pdf, other

    cs.CL cs.AI

    A Survey on Self-Evolution of Large Language Models

    Authors: Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, Jingren Zhou

    Abstract: Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. However, current LLMs that learn from human or external model supervision are costly and may face performance ceilings as task complexity and diversity increase. To address this issue, self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences ge… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  6. arXiv:2404.12738  [pdf, other

    cs.NI cs.CR

    DeviceRadar: Online IoT Device Fingerprinting in ISPs using Programmable Switches

    Authors: Ruoyu Li, Qing Li, Tao Lin, Qingsong Zou, Dan Zhao, Yucheng Huang, Gareth Tyson, Guorui Xie, Yong Jiang

    Abstract: Device fingerprinting can be used by Internet Service Providers (ISPs) to identify vulnerable IoT devices for early prevention of threats. However, due to the wide deployment of middleboxes in ISP networks, some important data, e.g., 5-tuples and flow statistics, are often obscured, rendering many existing approaches invalid. It is further challenged by the high-speed traffic of hundreds of teraby… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Submitted to IEEE/ACM Transactions on Networking (ToN)

  7. arXiv:2404.12726  [pdf, other

    cs.CL

    Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works

    Authors: Xinfeng Yuan, Siyu Yuan, Yuhan Cui, Tianhe Lin, Xintao Wang, Rui Xu, Jiangjie Chen, Deqing Yang

    Abstract: Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or c… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  8. arXiv:2404.11947  [pdf, other

    cs.LG cs.CV

    VCC-INFUSE: Towards Accurate and Efficient Selection of Unlabeled Examples in Semi-supervised Learning

    Authors: Shijie Fang, Qianhan Feng, Tong Lin

    Abstract: Despite the progress of Semi-supervised Learning (SSL), existing methods fail to utilize unlabeled data effectively and efficiently. Many pseudo-label-based methods select unlabeled examples based on inaccurate confidence scores from the classifier. Most prior work also uses all available unlabeled data without pruning, making it difficult to handle large amounts of unlabeled data. To address thes… ▽ More

    Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted paper of IJCAI 2024. Shijie Fang and Qianhan Feng contributed equally to this paper. New version, some problems and typos are fixed

  9. arXiv:2404.07449  [pdf, other

    cs.CV

    Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs

    Authors: Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed, Michael S. Ryoo, Tsung-Yu Lin

    Abstract: Integration of Large Language Models (LLMs) into visual domain tasks, resulting in visual-LLMs (V-LLMs), has enabled exceptional performance in vision-language tasks, particularly for visual question answering (VQA). However, existing V-LLMs (e.g. BLIP-2, LLaVA) demonstrate weak spatial reasoning and localization awareness. Despite generating highly descriptive and elaborate textual answers, these… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  10. arXiv:2404.06201  [pdf, other

    cs.SE cs.AI

    Open-Source AI-based SE Tools: Opportunities and Challenges of Collaborative Software Learning

    Authors: Zhihao Lin, Wei Ma, Tao Lin, Yaowen Zheng, Jingquan Ge, Jun Wang, Jacques Klein, Tegawende Bissyande, Yang Liu, Li Li

    Abstract: Large Language Models (LLMs) have become instrumental in advancing software engineering (SE) tasks, showcasing their efficacy in code understanding and beyond. Like traditional SE tools, open-source collaboration is key in realising the excellent products. However, with AI models, the essential need is in data. The collaboration of these AI-based SE models hinges on maximising the sources of high-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  11. arXiv:2404.02519  [pdf, other

    cs.CR stat.ME

    Differentially Private Verification of Survey-Weighted Estimates

    Authors: Tong Lin, Jerome P. Reiter

    Abstract: Several official statistics agencies release synthetic data as public use microdata files. In practice, synthetic data do not admit accurate results for every analysis. Thus, it is beneficial for agencies to provide users with feedback on the quality of their analyses of the synthetic data. One approach is to couple synthetic data with a verification server that provides users with measures of the… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 21 pages including references, 5 figures

  12. arXiv:2404.00242  [pdf, other

    cs.CL cs.AI

    DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference

    Authors: Jinwei Yao, Kaiqi Chen, Kexun Zhang, Jiaxuan You, Binhang Yuan, Zeke Wang, Tao Lin

    Abstract: Decoding using tree search can greatly enhance the inference quality for transformer-based Large Language Models (LLMs). Depending on the guidance signal, it searches for the best path from root to leaf in the tree by forming LLM outputs to improve controllability, reasoning ability, alignment, et cetera. However, current tree decoding strategies and their inference systems do not suit each other… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  13. arXiv:2403.16252  [pdf, other

    cs.RO eess.SY

    Legged Robot State Estimation within Non-inertial Environments

    Authors: Zijian He, Sangli Teng, Tzu-Yuan Lin, Maani Ghaffari, Yan Gu

    Abstract: This paper investigates the robot state estimation problem within a non-inertial environment. The proposed state estimation approach relaxes the common assumption of static ground in the system modeling. The process and measurement models explicitly treat the movement of the non-inertial environments without requiring knowledge of its motion in the inertial frame or relying on GPS or sensing envir… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  14. arXiv:2403.14287  [pdf, other

    cs.CV cs.AI eess.IV

    Enhancing Historical Image Retrieval with Compositional Cues

    Authors: Tingyu Lin, Robert Sablatnig

    Abstract: In analyzing vast amounts of digitally stored historical image data, existing content-based retrieval methods often overlook significant non-semantic information, limiting their effectiveness for flexible exploration across varied themes. To broaden the applicability of image retrieval methods for diverse purposes and uncover more general patterns, we innovatively introduce a crucial factor from c… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  15. arXiv:2403.13447  [pdf, other

    cs.AI cs.CL cs.CV

    HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models

    Authors: Wenqiao Zhang, Tianwei Lin, Jiang Liu, Fangxun Shu, Haoyuan Li, Lei Zhang, He Wanggui, Hao Zhou, Zheqi Lv, Hao Jiang, Juncheng Li, Siliang Tang, Yueting Zhuang

    Abstract: Recent advancements indicate that scaling up Multimodal Large Language Models (MLLMs) effectively enhances performance on downstream multimodal tasks. The prevailing MLLM paradigm, \emph{e.g.}, LLaVA, transforms visual features into text-like tokens using a \emph{static} vision-language mapper, thereby enabling \emph{static} LLMs to develop the capability to comprehend visual information through v… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  16. arXiv:2403.13214  [pdf

    cs.CV cs.AI cs.LG q-bio.QM

    Nellie: Automated organelle segmentation, tracking, and hierarchical feature extraction in 2D/3D live-cell microscopy

    Authors: Austin E. Y. T. Lefebvre, Gabriel Sturm, Ting-Yu Lin, Emily Stoops, Magdalena Preciado Lopez, Benjamin Kaufmann-Malaga, Kayley Hake

    Abstract: The analysis of dynamic organelles remains a formidable challenge, though key to understanding biological processes. We introduce Nellie, an automated and unbiased pipeline for segmentation, tracking, and feature extraction of diverse intracellular structures. Nellie adapts to image metadata, eliminating user input. Nellie's preprocessing pipeline enhances structural contrast on multiple intracell… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: for associated code, see https://github.com/aelefebv/nellie; 82 pages, 5 main figures, 11 extended figures

  17. arXiv:2403.12986  [pdf, other

    cs.CV cs.LG

    BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

    Authors: Qianhan Feng, Lujing Xie, Shijie Fang, Tong Lin

    Abstract: Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Accpeted paper of AAAI2024

  18. arXiv:2403.09157  [pdf, ps, other

    eess.IV cs.CV

    VM-UNET-V2 Rethinking Vision Mamba UNet for Medical Image Segmentation

    Authors: Mingya Zhang, Yue Yu, Limei Gu, Tingsheng Lin, Xianping Tao

    Abstract: In the field of medical image segmentation, models based on both CNN and Transformer have been thoroughly investigated. However, CNNs have limited modeling capabilities for long-range dependencies, making it challenging to exploit the semantic information within images fully. On the other hand, the quadratic computational complexity poses a challenge for Transformers. Recently, State Space Models… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 12 pages, 4 figures

  19. arXiv:2403.03761  [pdf, other

    quant-ph cs.IT cs.LG

    Parameterized quantum comb and simpler circuits for reversing unknown qubit-unitary operations

    Authors: Yin Mo, Lei Zhang, Yu-Ao Chen, Yingjian Liu, Tengxiang Lin, Xin Wang

    Abstract: Quantum comb is an essential tool for characterizing complex quantum protocols in quantum information processing. In this work, we introduce PQComb, a framework leveraging parameterized quantum circuits to explore the capabilities of quantum combs for general quantum process transformation tasks and beyond. By optimizing PQComb for time-reversal simulations of unknown unitary evolutions, we develo… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: 12 pages including appendix

  20. arXiv:2403.02338  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Twisting Lids Off with Two Hands

    Authors: Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

    Abstract: Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, attributed to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system. In this work, we consider the problem of twisting lids of various bottle-like objects with two hands, and demonstrate that policies trained in simulation us… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: Project page can be found at https://toruowo.github.io/bimanual-twist

  21. arXiv:2403.02178  [pdf, other

    cs.CL cs.AI cs.LG

    Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

    Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

    Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, rel… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  22. arXiv:2402.19376  [pdf, other

    cs.AR

    OzMAC: An Energy-Efficient Sparsity-Exploiting Multiply-Accumulate-Unit Design for DL Inference

    Authors: Harideep Nair, Prabhu Vellaisamy, Tsung-Han Lin, Perry Wang, Shawn Blanton, John Paul Shen

    Abstract: General Matrix Multiply (GEMM) hardware, employing large arrays of multiply-accumulate (MAC) units, perform bulk of the computation in deep learning (DL). Recent trends have established 8-bit integer (INT8) as the most widely used precision for DL inference. This paper proposes a novel MAC design capable of dynamically exploiting bit sparsity (i.e., number of `0' bits within a binary value) in inp… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  23. arXiv:2402.09721  [pdf, ps, other

    cs.GT cs.AI cs.LG econ.TH

    Persuading a Learning Agent

    Authors: Tao Lin, Yiling Chen

    Abstract: We study a repeated Bayesian persuasion problem (and more generally, any generalized principal-agent problem with complete information) where the principal does not have commitment power and the agent uses algorithms to learn to respond to the principal's signals. We reduce this problem to a one-shot generalized principal-agent problem with an approximately-best-responding agent. This reduction al… ▽ More

    Submitted 22 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  24. arXiv:2402.09240  [pdf, other

    cs.LG cs.CV

    Switch EMA: A Free Lunch for Better Flatness and Sharpness

    Authors: Siyuan Li, Zicheng Liu, Juanxi Tian, Ge Wang, Zedong Wang, Weiyang Jin, Di Wu, Cheng Tan, Tao Lin, Yang Liu, Baigui Sun, Stan Z. Li

    Abstract: Exponential Moving Average (EMA) is a widely used weight averaging (WA) regularization to learn flat optima for better generalizations without extra cost in deep neural network (DNN) optimization. Despite achieving better flatness, existing WA methods might fall into worse final performances or require extra test-time computations. This work unveils the full potential of EMA with a single line of… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Preprint V1. Source code and models at https://github.com/Westlake-AI/SEMA

  25. arXiv:2402.07476  [pdf, other

    quant-ph cs.CC cs.IT

    Expansion of higher-dimensional cubical complexes with application to quantum locally testable codes

    Authors: Irit Dinur, Ting-Chun Lin, Thomas Vidick

    Abstract: We introduce a high-dimensional cubical complex, for any dimension t>0, and apply it to the design of quantum locally testable codes. Our complex is a natural generalization of the constructions by Panteleev and Kalachev and by Dinur et. al of a square complex (case t=2), which have been applied to the design of classical locally testable codes (LTC) and quantum low-density parity check codes (qLD… ▽ More

    Submitted 11 April, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Stronger result: constant degree complexes and without product-expansion conjecture

  26. arXiv:2402.04971  [pdf, other

    cs.AI cs.GT

    Multi-Sender Persuasion -- A Computational Perspective

    Authors: Safwan Hossain, Tonghan Wang, Tao Lin, Yiling Chen, David C. Parkes, Haifeng Xu

    Abstract: We consider multiple senders with informational advantage signaling to convince a single self-interested actor towards certain actions. Generalizing the seminal Bayesian Persuasion framework, such settings are ubiquitous in computational economics, multi-agent learning, and machine learning with multiple objectives. The core solution concept here is the Nash equilibrium of senders' signaling polic… ▽ More

    Submitted 7 February, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  27. arXiv:2402.04520  [pdf, ps, other

    cs.LG cs.AI stat.ML

    On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis

    Authors: Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song, Han Liu

    Abstract: We investigate the computational limits of the memory retrieval dynamics of modern Hopfield models from the fine-grained complexity analysis. Our key contribution is the characterization of a phase transition behavior in the efficiency of all possible modern Hopfield models based on the norm of patterns. Specifically, we establish an upper bound criterion for the norm of input query patterns and m… ▽ More

    Submitted 4 April, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 31 pages; v2: fix typos; v3: fix typos, add clarifications, add references

  28. arXiv:2402.03700  [pdf, other

    cs.HC cs.AI

    GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

    Authors: Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

    Abstract: The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: To Appear in IEEE PacificVis 2024

  29. arXiv:2402.01342  [pdf, other

    cs.LG stat.ML

    Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion

    Authors: Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu

    Abstract: In deep learning, stochastic gradient descent often yields functionally similar yet widely scattered solutions in the weight space even under the same initialization, causing barriers in the Linear Mode Connectivity (LMC) landscape. Overcoming these barriers is crucial for understanding deep learning dynamics and enhancing model-fusion algorithms. Previous studies highlight the role of permutation… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: preprint

  30. arXiv:2402.01276  [pdf, other

    cs.AI

    Federated Unlearning: a Perspective of Stability and Fairness

    Authors: Jiaqi Shao, Tao Lin, Xuanyu Cao, Bing Luo

    Abstract: This paper explores the multifaceted consequences of federated unlearning (FU) with data heterogeneity. We introduce key metrics for FU assessment, concentrating on verification, global stability, and local fairness, and investigate the inherent trade-offs. Furthermore, we formulate the unlearning process with data heterogeneity through an optimization framework. Our key contribution lies in a com… ▽ More

    Submitted 12 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2401.16355  [pdf, other

    cs.CV

    PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology

    Authors: Yuxuan Sun, Hao Wu, Chenglu Zhu, Sunyi Zheng, Qizi Chen, Kai Zhang, Yunlong Zhang, Dan Wan, Xiaoxiao Lan, Mengyue Zheng, Jingxiong Li, Xinheng Lyu, Tao Lin, Lin Yang

    Abstract: The emergence of large multimodal models has unlocked remarkable potential in AI, particularly in pathology. However, the lack of specialized, high-quality benchmark impeded their development and precise evaluation. To address this, we introduce PathMMU, the largest and highest-quality expert-validated pathology benchmark for Large Multimodal Models (LMMs). It comprises 33,428 multimodal multi-cho… ▽ More

    Submitted 20 March, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 27 pages, 12 figures

  32. arXiv:2401.10245  [pdf, other

    cs.CE physics.flu-dyn

    Train Small, Model Big: Scalable Physics Simulators via Reduced Order Modeling and Domain Decomposition

    Authors: Seung Whan Chung, Youngsoo Choi, Pratanu Roy, Thomas Moore, Thomas Roy, Tiras Y. Lin, Du Y. Nguyen, Christopher Hahn, Eric B. Duoss, Sarah E. Baker

    Abstract: Numerous cutting-edge scientific technologies originate at the laboratory scale, but transitioning them to practical industry applications is a formidable challenge. Traditional pilot projects at intermediate scales are costly and time-consuming. An alternative, the E-pilot, relies on high-fidelity numerical simulations, but even these simulations can be computationally prohibitive at larger scale… ▽ More

    Submitted 5 December, 2023; originally announced January 2024.

    Comments: 40 pages, 12 figures. Submitted to Computer Methods in Applied Mechanics and Engineering

    Report number: LLNL-JRNL-857774 MSC Class: 65F55; 65N55 (primary) 76D07 (secondary)

  33. arXiv:2401.03836  [pdf, other

    cs.CV

    WidthFormer: Toward Efficient Transformer-based BEV View Transformation

    Authors: Chenhongyi Yang, Tianwei Lin, Lichao Huang, Elliot J. Crowley

    Abstract: In this work, we present WidthFormer, a novel transformer-based Bird's-Eye-View (BEV) 3D detection method tailored for real-time autonomous-driving applications. WidthFormer is computationally efficient, robust and does not require any special engineering effort to deploy. In this work, we propose a novel 3D positional encoding mechanism capable of accurately encapsulating 3D geometric information… ▽ More

    Submitted 15 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

  34. arXiv:2401.02122  [pdf, other

    cs.CL cs.SD eess.AS

    PEFT for Speech: Unveiling Optimal Placement, Merging Strategies, and Ensemble Techniques

    Authors: Tzu-Han Lin, How-Shing Wang, Hao-Yung Weng, Kuang-Chen Peng, Zih-Ching Chen, Hung-yi Lee

    Abstract: Parameter-Efficient Fine-Tuning (PEFT) is increasingly recognized as an effective method in speech processing. However, the optimal approach and the placement of PEFT methods remain inconclusive. Our study conducts extensive experiments to compare different PEFT methods and their layer-wise placement adapting Differentiable Architecture Search (DARTS). We also explore the use of ensemble learning… ▽ More

    Submitted 7 February, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted to ICASSP 2024 Self-supervision in Audio, Speech and Beyond (SASB) workshop

  35. arXiv:2312.11927  [pdf, other

    cs.LG cs.SI stat.ME

    Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery

    Authors: Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Tianqianjin Lin, Changlong Sun, Xiaozhong Liu

    Abstract: While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 14 pages, 6 figures, accepted by AAAI'24

  36. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1320 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  37. arXiv:2312.11671  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating Language-Model Agents on Realistic Autonomous Tasks

    Authors: Megan Kinniment, Lucas Jun Koba Sato, Haoxing Du, Brian Goodrich, Max Hasin, Lawrence Chan, Luke Harold Miles, Tao R. Lin, Hjalmar Wijk, Joel Burget, Aaron Ho, Elizabeth Barnes, Paul Christiano

    Abstract: In this report, we explore the ability of language model agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild. We refer to this cluster of capabilities as "autonomous replication and adaptation" or ARA. We believe that systems capable of ARA could have wide-reaching and hard-to-anticipate consequences, and that measuring and forecasting… ▽ More

    Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 14 pages

  38. arXiv:2312.10113  [pdf, other

    cs.CV

    Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation

    Authors: Qin Guo, Tianwei Lin

    Abstract: Recently, diffusion-based methods, like InstructPix2Pix (IP2P), have achieved effective instruction-based image editing, requiring only natural language instructions from the user. However, these methods often inadvertently alter unintended areas and struggle with multi-instruction editing, resulting in compromised outcomes. To address these issues, we introduce the Focus on Your Instruction (FoI)… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 14 pages, 9 figures

  39. arXiv:2312.09501  [pdf, other

    cs.CV cs.AI

    EDA: Evolving and Distinct Anchors for Multimodal Motion Prediction

    Authors: Longzhong Lin, Xuewu Lin, Tianwei Lin, Lichao Huang, Rong Xiong, Yue Wang

    Abstract: Motion prediction is a crucial task in autonomous driving, and one of its major challenges lands in the multimodality of future behaviors. Many successful works have utilized mixture models which require identification of positive mixture components, and correspondingly fall into two main lines: prediction-based and anchor-based matching. The prediction clustering phenomenon in prediction-based ma… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI2024)

  40. arXiv:2312.09429  [pdf

    eess.SP cs.LG

    Deep Learning-Enabled Swallowing Monitoring and Postoperative Recovery Biosensing System

    Authors: Chih-Ning Tsai, Pei-Wen Yang, Tzu-Yen Huang, Jung-Chih Chen, Hsin-Yi Tseng, Che-Wei Wu, Amrit Sarmah, Tzu-En Lin

    Abstract: This study introduces an innovative 3D printed dry electrode tailored for biosensing in postoperative recovery scenarios. Fabricated through a drop coating process, the electrode incorporates a novel 2D material.

    Submitted 24 November, 2023; originally announced December 2023.

    Comments: the abstract can't uploaded fully

    MSC Class: NA ACM Class: A.0

  41. arXiv:2312.05757  [pdf, ps, other

    cs.LG cs.AI cs.DL cs.SI stat.ME

    Towards Human-like Perception: Learning Structural Causal Model in Heterogeneous Graph

    Authors: Tianqianjin Lin, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Weikang Yuan, Xurui Li, Changlong Sun, Cui Huang, Xiaozhong Liu

    Abstract: Heterogeneous graph neural networks have become popular in various domains. However, their generalizability and interpretability are limited due to the discrepancy between their inherent inference flows and human reasoning logic or underlying causal relationships for the learning problem. This study introduces a novel solution, HG-SCM (Heterogeneous Graph as Structural Causal Model). It can mimic… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: 28 pages, 10 figures, 6 tables, accepted by Information Processing & Management

    Journal ref: Information Processing & Management, 60 (2024) 1-21

  42. arXiv:2312.04653  [pdf, other

    cs.LG cs.GT

    Learning Thresholds with Latent Values and Censored Feedback

    Authors: Jiahao Zhang, Tao Lin, Weiqiang Zheng, Zhe Feng, Yifeng Teng, Xiaotie Deng

    Abstract: In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward $g(γ, v)$ depends on the proposed threshold $γ$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 18 pages

  43. arXiv:2312.04455  [pdf, other

    cs.CL cs.AI cs.LG

    Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

    Authors: Yuhan Chen, Ang Lv, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan

    Abstract: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the att… ▽ More

    Submitted 1 March, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

  44. arXiv:2312.03526  [pdf, other

    cs.CV cs.AI cs.LG

    On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm

    Authors: Peng Sun, Bei Shi, Daiwei Yu, Tao Lin

    Abstract: Contemporary machine learning requires training large neural networks on massive datasets and thus faces the challenges of high computational demands. Dataset distillation, as a recent emerging strategy, aims to compress real-world datasets for efficient training. However, this line of research currently struggle with large-scale and high-resolution datasets, hindering its practicality and feasibi… ▽ More

    Submitted 19 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: 17 pages, 20 figures

  45. arXiv:2312.03322  [pdf, other

    cs.CV

    Background Clustering Pre-training for Few-shot Segmentation

    Authors: Zhimiao Yu, Tiancheng Lin, Yi Xu

    Abstract: Recent few-shot segmentation (FSS) methods introduce an extra pre-training stage before meta-training to obtain a stronger backbone, which has become a standard step in few-shot learning. Despite the effectiveness, current pre-training scheme suffers from the merged background problem: only base classes are labelled as foregrounds, making it hard to distinguish between novel classes and actual bac… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 6 pages, 2 figures, ICIP 2023

  46. arXiv:2311.17673  [pdf, other

    stat.ML cond-mat.stat-mech cs.AI cs.LG math-ph

    Using Ornstein-Uhlenbeck Process to understand Denoising Diffusion Probabilistic Model and its Noise Schedules

    Authors: Javier E. Santos, Yen Ting Lin

    Abstract: The aim of this short note is to show that Denoising Diffusion Probabilistic Model DDPM, a non-homogeneous discrete-time Markov process, can be represented by a time-homogeneous continuous-time Markov process observed at non-uniformly sampled discrete times. Surprisingly, this continuous-time Markov process is the well-known and well-studied Ornstein-Ohlenbeck (OU) process, which was developed in… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  47. arXiv:2311.13752  [pdf, other

    cs.CV cs.AI

    3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology

    Authors: Asma Ben Abacha, Alberto Santamaria-Pang, Ho Hin Lee, Jameson Merkow, Qin Cai, Surya Teja Devarakonda, Abdullah Islam, Julia Gong, Matthew P. Lungren, Thomas Lin, Noel C Codella, Ivan Tarapov

    Abstract: The increasing use of medical imaging in healthcare settings presents a significant challenge due to the increasing workload for radiologists, yet it also offers opportunity for enhancing healthcare outcomes if effectively leveraged. 3D image retrieval holds potential to reduce radiologist workloads by enabling clinicians to efficiently search through diagnostically similar or otherwise relevant c… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

  48. arXiv:2311.11722  [pdf, other

    cs.CV cs.AI cs.RO

    Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

    Authors: Xuewu Lin, Zixiang Pei, Tianwei Lin, Lichao Huang, Zhizhong Su

    Abstract: In autonomous driving perception systems, 3D detection and tracking are the two fundamental tasks. This paper delves deeper into this field, building upon the Sparse4D framework. We introduce two auxiliary training tasks (Temporal Instance Denoising and Quality Estimation) and propose decoupled attention to make structural improvements, leading to significant enhancements in detection performance.… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  49. arXiv:2311.11238  [pdf, other

    cs.HC cs.AI

    AtomXR: Streamlined XR Prototyping with Natural Language and Immersive Physical Interaction

    Authors: Alice Cai, Caine Ardayfio, AnhPhu Nguyen, Tica Lin, Elena Glassman

    Abstract: As technological advancements in extended reality (XR) amplify the demand for more XR content, traditional development processes face several challenges: 1) a steep learning curve for inexperienced developers, 2) a disconnect between 2D development environments and 3D user experiences inside headsets, and 3) slow iteration cycles due to context switching between development and testing environment… ▽ More

    Submitted 19 November, 2023; originally announced November 2023.

    Comments: 15 pages, 14 figures, in submission

    ACM Class: H.5.2; I.2

  50. arXiv:2311.08588  [pdf, other

    cs.CL cs.AI cs.SE

    CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation

    Authors: Weixiang Yan, Haitian Liu, Yunkun Wang, Yunzhe Li, Qian Chen, Wen Wang, Tingyu Lin, Weishan Zhao, Li Zhu, Shuiguang Deng, Hari Sundaram

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on coding related tasks, particularly on assisting humans in programming and facilitating programming automation. However, existing benchmarks for evaluating the code understanding and generation capacities of LLMs suffer from severe limitations. First, most benchmarks are deficient as they focus on a narrow range of popular pro… ▽ More

    Submitted 5 February, 2024; v1 submitted 14 November, 2023; originally announced November 2023.