Skip to main content

Showing 1–50 of 702 results for author: Cheng, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05616  [pdf, other

    cs.CL cs.AI

    G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

    Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

    Abstract: Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.02686  [pdf, other

    cs.CV cs.AI

    Boosting 3D Neuron Segmentation with 2D Vision Transformer Pre-trained on Natural Images

    Authors: Yik San Cheng, Runkai Zhao, Heng Wang, Hanchuan Peng, Weidong Cai

    Abstract: Neuron reconstruction, one of the fundamental tasks in neuroscience, rebuilds neuronal morphology from 3D light microscope imaging data. It plays a critical role in analyzing the structure-function relationship of neurons in the nervous system. However, due to the scarcity of neuron datasets and high-quality SWC annotations, it is still challenging to develop robust segmentation methods for single… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

    Comments: 3 pages

  3. arXiv:2404.19168  [pdf, other

    cs.CV

    PEVA-Net: Prompt-Enhanced View Aggregation Network for Zero/Few-Shot Multi-View 3D Shape Recognition

    Authors: Dongyun Lin, Yi Cheng, Shangbo Mao, Aiyuan Guo, Yiqun Li

    Abstract: Large vision-language models have impressively promote the performance of 2D visual recognition under zero/few-shot scenarios. In this paper, we focus on exploiting the large vision-language model, i.e., CLIP, to address zero/few-shot 3D shape recognition based on multi-view representations. The key challenge for both tasks is to generate a discriminative descriptor of the 3D shape represented by… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.18919  [pdf, other

    cs.CV

    TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation

    Authors: Junhao Cheng, Baiqiao Yin, Kaixin Cai, Minbin Huang, Hanhui Li, Yuxin He, Xi Lu, Yue Li, Yifei Li, Yuhao Cheng, Yiqiang Yan, Xiaodan Liang

    Abstract: Recent advances in diffusion models can generate high-quality and stunning images from text. However, multi-turn image generation, which is of high demand in real-world scenarios, still faces challenges in maintaining semantic consistency between images and texts, as well as contextual consistency of the same subject across multiple interactive turns. To address this issue, we introduce TheaterGen… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  5. arXiv:2404.18416  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    Capabilities of Gemini Models in Medicine

    Authors: Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean-baptiste Alayrac, Neil Houlsby , et al. (42 additional authors not shown)

    Abstract: Excellence in a wide variety of medical applications poses considerable challenges for AI, requiring advanced reasoning, access to up-to-date medical knowledge and understanding of complex multimodal data. Gemini models, with strong general capabilities in multimodal and long-context reasoning, offer exciting possibilities in medicine. Building on these core strengths of Gemini, we introduce Med-G… ▽ More

    Submitted 1 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  6. arXiv:2404.18225  [pdf, other

    cs.RO

    Quadruped robot traversing 3D complex environments with limited perception

    Authors: Yi Cheng, Hang Liu, Guoping Pan, Linqi Ye, Houde Liu, Bin Liang

    Abstract: Traversing 3-D complex environments has always been a significant challenge for legged locomotion. Existing methods typically rely on external sensors such as vision and lidar to preemptively react to obstacles by acquiring environmental information. However, in scenarios like nighttime or dense forests, external sensors often fail to function properly, necessitating robots to rely on propriocepti… ▽ More

    Submitted 29 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 10 pages, 8 figures,submitted to iros2024

  7. arXiv:2404.17609  [pdf, other

    cs.LG cs.AI cs.CL

    CoSD: Collaborative Stance Detection with Contrastive Heterogeneous Topic Graph Learning

    Authors: Yinghan Cheng, Qi Zhang, Chongyang Shi, Liang Xiao, Shufeng Hao, Liang Hu

    Abstract: Stance detection seeks to identify the viewpoints of individuals either in favor or against a given target or a controversial topic. Current advanced neural models for stance detection typically employ fully parametric softmax classifiers. However, these methods suffer from several limitations, including lack of explainability, insensitivity to the latent data structure, and unimodality, which gre… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 13 pages

  8. arXiv:2404.17486  [pdf, other

    cs.CV

    TextGaze: Gaze-Controllable Face Generation with Natural Language

    Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

    Abstract: Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Under review

  9. arXiv:2404.16771  [pdf, other

    cs.CV cs.AI

    ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

    Authors: Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

    Abstract: Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial de… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Project page: https://ssugarwh.github.io/consistentid.github.io/

  10. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2404.16362  [pdf, other

    cs.CR

    Feature graph construction with static features for malware detection

    Authors: Binghui Zou, Chunjie Cao, Longjuan Wang, Yinan Cheng, Jingzhang Sun

    Abstract: Malware can greatly compromise the integrity and trustworthiness of information and is in a constant state of evolution. Existing feature fusion-based detection methods generally overlook the correlation between features. And mere concatenation of features will reduce the model's characterization ability, lead to low detection accuracy. Moreover, these methods are susceptible to concept drift and… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  12. arXiv:2404.16296  [pdf

    cs.CV cs.AI

    Research on Splicing Image Detection Algorithms Based on Natural Image Statistical Characteristics

    Authors: Ao Xiang, Jingyu Zhang, Qin Yang, Liyang Wang, Yu Cheng

    Abstract: With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues. This paper introduces a new splicing image detection algorithm based on the statistical characteristics of natural images, aimed at improving the accuracy and efficiency of splicing image detection. By a… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  13. arXiv:2404.16160  [pdf, other

    cs.CL cs.AI

    Domain-Specific Improvement on Psychotherapy Chatbot Using Assistant

    Authors: Cheng Kang, Daniel Novak, Katerina Urbanova, Yuqing Cheng, Yong Hu

    Abstract: Large language models (LLMs) have demonstrated impressive generalization capabilities on specific tasks with human-written instruction data. However, the limited quantity, diversity, and professional expertise of such instruction data raise concerns about the performance of LLMs in psychotherapy tasks when provided with domain-specific instructions. To address this, we firstly propose Domain-Speci… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at ICASSP 2024 EIHRC

  14. arXiv:2404.14368  [pdf, other

    cs.CV cs.AI cs.CL

    Graphic Design with Large Multimodal Model

    Authors: Yutao Cheng, Zhao Zhang, Maoke Yang, Hui Nie, Chunyuan Li, Xinglong Wu, Jie Shao

    Abstract: In the field of graphic design, automating the integration of design elements into a cohesive multi-layered artwork not only boosts productivity but also paves the way for the democratization of graphic design. One existing practice is Graphic Layout Generation (GLG), which aims to layout sequential design elements. It has been constrained by the necessity for a predefined correct sequence of laye… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  15. arXiv:2404.13273  [pdf, other

    cs.CV cs.LG

    Multi-feature Reconstruction Network using Crossed-mask Restoration for Unsupervised Anomaly Detection

    Authors: Junpu Wang, Guili Xu, Chunlei Li, Guangshuai Gao, Yuehua Cheng

    Abstract: Unsupervised anomaly detection using only normal samples is of great significance for quality inspection in industrial manufacturing. Although existing reconstruction-based methods have achieved promising results, they still face two problems: poor distinguishable information in image reconstruction and well abnormal regeneration caused by model over-generalization ability. To overcome the above i… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  16. arXiv:2404.12888  [pdf, other

    cs.CV cs.GR cs.LG

    Learn2Talk: 3D Talking Face Learns from 2D Talking Face

    Authors: Yixiang Zhuang, Baoping Cheng, Yao Cheng, Yuntao Jin, Renshuai Liu, Chengyang Li, Xuan Cheng, Jing Liao, Juncong Lin

    Abstract: Speech-driven facial animation methods usually contain two main classes, 3D and 2D talking face, both of which attract considerable research attention in recent years. However, to the best of our knowledge, the research on 3D talking face does not go deeper as 2D talking face, in the aspect of lip-synchronization (lip-sync) and speech perception. To mind the gap between the two sub-fields, we prop… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  17. arXiv:2404.11809  [pdf, other

    cs.CL cs.LG

    Sharing Parameter by Conjugation for Knowledge Graph Embeddings in Complex Space

    Authors: Xincan Feng, Zhi Qu, Yuchang Cheng, Taro Watanabe, Nobuhiro Yugami

    Abstract: A Knowledge Graph (KG) is the directed graphical representation of entities and relations in the real world. KG can be applied in diverse Natural Language Processing (NLP) tasks where knowledge is required. The need to scale up and complete KG automatically yields Knowledge Graph Embedding (KGE), a shallow machine learning model that is suffering from memory and training time consumption issues. T… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, 6 tables, accepted at TextGraphs-16 workshop held in conjunction with COLING 2022

  18. arXiv:2404.09595  [pdf, ps, other

    eess.SP cs.AI

    Building Semantic Communication System via Molecules: An End-to-End Training Approach

    Authors: Yukun Cheng, Wei Chen, Bo Ai

    Abstract: The concept of semantic communication provides a novel approach for applications in scenarios with limited communication resources. In this paper, we propose an end-to-end (E2E) semantic molecular communication system, aiming to enhance the efficiency of molecular communication systems by reducing the transmitted information. Specifically, following the joint source channel coding paradigm, the ne… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  19. BOND: Bootstrapping From-Scratch Name Disambiguation with Multi-task Promoting

    Authors: Yuqing Cheng, Bo Chen, Fanjin Zhang, Jie Tang

    Abstract: From-scratch name disambiguation is an essential task for establishing a reliable foundation for academic platforms. It involves partitioning documents authored by identically named individuals into groups representing distinct real-life experts. Canonically, the process is divided into two decoupled tasks: locally estimating the pairwise similarities between documents followed by globally groupin… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: TheWebConf 2024 (WWW '24)

    ACM Class: H.3.7; H.3.3

    Journal ref: Proceedings of TheWebConf 2024 (WWW '24), May 13--17, 2024, Singapore

  20. arXiv:2404.08246  [pdf

    cs.RO cs.LG

    Agile and versatile bipedal robot tracking control through reinforcement learning

    Authors: Jiayi Li, Linqi Ye, Yi Cheng, Houde Liu, Bin Liang

    Abstract: The remarkable athletic intelligence displayed by humans in complex dynamic movements such as dancing and gymnastics suggests that the balance mechanism in biological beings is decoupled from specific movement patterns. This decoupling allows for the execution of both learned and unlearned movements under certain constraints while maintaining balance through minor whole-body coordination. To repli… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  21. arXiv:2404.07580  [pdf, other

    cs.CV

    Multi-rater Prompting for Ambiguous Medical Image Segmentation

    Authors: Jinhong Wang, Yi Cheng, Jintai Chen, Hongxia Xu, Danny Chen, Jian Wu

    Abstract: Multi-rater annotations commonly occur when medical images are independently annotated by multiple experts (raters). In this paper, we tackle two challenges arisen in multi-rater annotations for medical image segmentation (called ambiguous medical image segmentation): (1) How to train a deep learning model when a group of raters produces a set of diverse but plausible annotations, and (2) how to f… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  22. arXiv:2404.06883  [pdf

    cs.CV cs.AI

    Research on Detection of Floating Objects in River and Lake Based on AI Intelligent Image Recognition

    Authors: Jingyu Zhang, Ao Xiang, Yu Cheng, Qin Yang, Liyang Wang

    Abstract: With the rapid advancement of artificial intelligence technology, AI-enabled image recognition has emerged as a potent tool for addressing challenges in traditional environmental monitoring. This study focuses on the detection of floating objects in river and lake environments, exploring an innovative approach based on deep learning. By intricately analyzing the technical pathways for detecting st… ▽ More

    Submitted 19 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  23. arXiv:2404.06080  [pdf

    eess.IV cs.CV

    Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures

    Authors: Ching-Kai Lin, Di-Chun Wei, Yun-Chien Cheng

    Abstract: This study aims to establish a computer-aided diagnosis system for endobronchial ultrasound (EBUS) surgery to assist physicians in the preliminary diagnosis of metastatic cancer. This involves arranging immediate examinations for other sites of metastatic cancer after EBUS surgery, eliminating the need to wait for reports, thereby shortening the waiting time by more than half and enabling patients… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  24. arXiv:2404.06012  [pdf, other

    cs.CV cs.RO

    Diffusion-Based Point Cloud Super-Resolution for mmWave Radar Data

    Authors: Kai Luan, Chenghao Shi, Neng Wang, Yuwei Cheng, Huimin Lu, Xieyuanli Chen

    Abstract: The millimeter-wave radar sensor maintains stable performance under adverse environmental conditions, making it a promising solution for all-weather perception tasks, such as outdoor mobile robotics. However, the radar point clouds are relatively sparse and contain massive ghost points, which greatly limits the development of mmWave radar technology. In this paper, we propose a novel point cloud s… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Journal ref: Proc. of the IEEE Intl. Conf. on Robotics & Automation (ICRA), 2024

  25. arXiv:2404.04902  [pdf, other

    cs.AI cs.SE

    AI2Apps: A Visual IDE for Building LLM-based AI Agent Applications

    Authors: Xin Pang, Zhucong Li, Jiaxiang Chen, Yuan Cheng, Yinghui Xu, Yuan Qi

    Abstract: We introduce AI2Apps, a Visual Integrated Development Environment (Visual IDE) with full-cycle capabilities that accelerates developers to build deployable LLM-based AI agent Applications. This Visual IDE prioritizes both the Integrity of its development tools and the Visuality of its components, ensuring a smooth and efficient building experience.On one hand, AI2Apps integrates a comprehensive de… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  26. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  27. arXiv:2404.01929  [pdf

    eess.IV cs.CV

    Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method

    Authors: Jyun-An Lin, Yun-Chien Cheng, Ching-Kai Lin

    Abstract: This study aims to establish a computer-aided diagnostic system for lung lesions using bronchoscope endobronchial ultrasound (EBUS) to assist physicians in identifying lesion areas. During EBUS-transbronchial needle aspiration (EBUS-TBNA) procedures, physicians rely on grayscale ultrasound images to determine the location of lesions. However, these images often contain significant noise and can be… ▽ More

    Submitted 9 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  28. arXiv:2404.00301  [pdf, other

    cs.CV

    Monocular Identity-Conditioned Facial Reflectance Reconstruction

    Authors: Xingyu Ren, Jiankang Deng, Yuhao Cheng, Jia Guo, Chao Ma, Yichao Yan, Wenhan Zhu, Xiaokang Yang

    Abstract: Recent 3D face reconstruction methods have made remarkable advancements, yet there remain huge challenges in monocular high-quality facial reflectance reconstruction. Existing methods rely on a large amount of light-stage captured data to learn facial reflectance models. However, the lack of subject diversity poses challenges in achieving good generalization and widespread applicability. In this p… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  29. arXiv:2404.00282  [pdf, other

    cs.LG cs.AI cs.CL cs.RO

    Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

    Authors: Yuji Cao, Huan Zhao, Yuheng Cheng, Ting Shu, Guolong Liu, Gaoqi Liang, Junhua Zhao, Yun Li

    Abstract: With extensive pre-trained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and task planning. In this survey, we provide a comprehensive review of the existing literature in $\textit{LLM-enhanced RL}$ and summarize its characteristics compared t… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 16 pages (including bibliography), 6 figures

  30. arXiv:2403.20009  [pdf, other

    cs.CL cs.LG

    On Large Language Models' Hallucination with Regard to Known Facts

    Authors: Che Jiang, Biqing Qi, Xiangyu Hong, Dayuan Fu, Yang Cheng, Fandong Meng, Mo Yu, Bowen Zhou, Jie Zhou

    Abstract: Large language models are successful in answering factoid questions but are also prone to hallucination.We investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics, an area not previously covered in studies on hallucinations.We are able to conduct this analysis via two key ideas.First, we identify the factual question… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by NAACL 2024 MainConference

  31. arXiv:2403.19655  [pdf, other

    cs.CV

    GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

    Authors: Bowen Zhang, Yiji Cheng, Jiaolong Yang, Chunyu Wang, Feng Zhao, Yansong Tang, Dong Chen, Baining Guo

    Abstract: 3D Gaussian Splatting (GS) have achieved considerable improvement over Neural Radiance Fields in terms of 3D fitting fidelity and rendering speed. However, this unstructured representation with scattered Gaussians poses a significant challenge for generative modeling. To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generativ… ▽ More

    Submitted 5 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: Fix typo in Eq.2; Project Page: https://gaussiancube.github.io/

  32. arXiv:2403.17927  [pdf, other

    cs.SE cs.AI

    MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution

    Authors: Wei Tao, Yucheng Zhou, Wenqiang Zhang, Yu Cheng

    Abstract: In software evolution, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing functionalities. Large Language Models (LLMs) have shown promise in code generation and understanding but face difficulties in code change, particularly at the repository level. To overcome these challenges,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: work in progress

  33. arXiv:2403.15664  [pdf, other

    cs.CV

    What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

    Authors: Yihua Cheng, Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang

    Abstract: Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: CVPR24

  34. arXiv:2403.14918  [pdf, other

    cs.LG

    Deep learning-based method for weather forecasting: A case study in Itoshima

    Authors: Yuzhong Cheng, Linh Thi Hoai Nguyen, Akinori Ozaki, Ton Viet Ta

    Abstract: Accurate weather forecasting is of paramount importance for a wide range of practical applications, drawing substantial scientific and societal interest. However, the intricacies of weather systems pose substantial challenges to accurate predictions. This research introduces a multilayer perceptron model tailored for weather forecasting in Itoshima, Kyushu, Japan. Our meticulously designed archite… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  35. arXiv:2403.12433  [pdf, other

    cs.DB cs.CR

    Algorithmic Complexity Attacks on Dynamic Learned Indexes

    Authors: Rui Yang, Evgenios M. Kornaropoulos, Yue Cheng

    Abstract: Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: VLDB 2024

  36. arXiv:2403.11558  [pdf, other

    cs.CL cs.AI

    Reinforcement Learning with Token-level Feedback for Controllable Text Generation

    Authors: Wendi Li, Wei Wei, Kaihe Xu, Wenfeng Xie, Dangyang Chen, Yu Cheng

    Abstract: To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 Findings

  37. arXiv:2403.11434  [pdf, other

    cs.NI cs.DC

    Earth+: on-board satellite imagery compression leveraging historical earth observations

    Authors: Kuntai Du, Yihua Cheng, Peder Olsen, Shadi Noghabi, Ranveer Chandra, Junchen Jiang

    Abstract: With the increasing deployment of earth observation satellite constellations, the downlink (satellite-to-ground) capacity often limits the freshness, quality, and coverage of the imagery data available to applications on the ground. To overcome the downlink limitation, we present Earth+, a new satellite imagery compression system that, instead of compressing each image individually, pinpoints and… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  38. arXiv:2403.11202  [pdf, other

    cs.AR cs.AI cs.PL

    Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework

    Authors: Kaiyan Chang, Kun Wang, Nan Yang, Ying Wang, Dantong Jin, Wenlong Zhu, Zhirong Chen, Cangyuan Li, Hao Yan, Yunhao Zhou, Zhuoliang Zhao, Yuan Cheng, Yudong Pan, Yiqi Liu, Mengdi Wang, Shengwen Liang, yinhe han, Huawei Li, Xiaowei Li

    Abstract: Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted by DAC 2024; please note that this is not the final camera-ready version

  39. arXiv:2403.10750  [pdf, other

    cs.CL cs.AI

    Depression Detection on Social Media with Large Language Models

    Authors: Xiaochong Lan, Yiming Cheng, Li Sheng, Chen Gao, Yong Li

    Abstract: Depression harms. However, due to a lack of mental health awareness and fear of stigma, many patients do not actively seek diagnosis and treatment, leading to detrimental outcomes. Depression detection aims to determine whether an individual suffers from depression by analyzing their history of posts on social media, which can significantly aid in early detection and intervention. It mainly faces… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  40. arXiv:2403.10547  [pdf, ps, other

    math.OC cs.AI cs.DS cs.LG

    Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

    Authors: Shuyao Li, Yu Cheng, Ilias Diakonikolas, Jelena Diakonikolas, Rong Ge, Stephen J. Wright

    Abstract: Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, limiting the use of existing nonconvex algorithms in adversarial settings. In this paper, we study the problem of finding SOSPs in the strong c… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  41. arXiv:2403.10037  [pdf, other

    cs.CV

    Knowledge Condensation and Reasoning for Knowledge-based VQA

    Authors: Dongze Hao, Jian Jia, Longteng Guo, Qunbo Wang, Te Yang, Yan Li, Yanhua Cheng, Bo Wang, Quan Chen, Han Li, Jing Liu

    Abstract: Knowledge-based visual question answering (KB-VQA) is a challenging task, which requires the model to leverage external knowledge for comprehending and answering questions grounded in visual content. Recent studies retrieve the knowledge passages from external knowledge bases and then use them to answer questions. However, these retrieved knowledge passages often contain irrelevant or noisy inform… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  42. arXiv:2403.09919  [pdf, other

    cs.CL cs.LG

    Recurrent Drafter for Fast Speculative Decoding in Large Language Models

    Authors: Aonan Zhang, Chong Wang, Yi Wang, Xuanyu Zhang, Yunfei Cheng

    Abstract: In this paper, we introduce an improved approach of speculative decoding aimed at enhancing the efficiency of serving large language models. Our method capitalizes on the strengths of two established techniques: the classic two-model speculative decoding approach, and the more recent single-model approach, Medusa. Drawing inspiration from Medusa, our approach adopts a single-model strategy for spe… ▽ More

    Submitted 22 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures

  43. arXiv:2403.09136  [pdf, other

    eess.IV cs.CV

    Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation

    Authors: Lipei Zhang, Yanqi Cheng, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

    Abstract: Recent advancements in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated e… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: 11 pages, 4 figures and 1 table

  44. arXiv:2403.08815  [pdf, other

    cs.NI cs.RO

    TransformLoc: Transforming MAVs into Mobile Localization Infrastructures in Heterogeneous Swarms

    Authors: Haoyang Wang, Jingao Xu, Chenyu Zhao, Zihong Lu, Yuhan Cheng, Xuecheng Chen, Xiao-Ping Zhang, Yunhao Liu, Xinlei Chen

    Abstract: A heterogeneous micro aerial vehicles (MAV) swarm consists of resource-intensive but expensive advanced MAVs (AMAVs) and resource-limited but cost-effective basic MAVs (BMAVs), offering opportunities in diverse fields. Accurate and real-time localization is crucial for MAV swarms, but current practices lack a low-cost, high-precision, and real-time solution, especially for lightweight BMAVs. We fi… ▽ More

    Submitted 14 February, 2024; originally announced March 2024.

    Comments: 10 pages, accepted by IEEE INFOCOM 2024

  45. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry, Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser, Ioannis Antonoglou, Rohan Anil, Sebastian Borgeaud, Andrew Dai, Katie Millican, Ethan Dyer, Mia Glaese, Thibault Sottiaux, Benjamin Lee, Fabio Viola, Malcolm Reynolds, Yuanzhong Xu, James Molloy , et al. (683 additional authors not shown)

    Abstract: In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalit… ▽ More

    Submitted 25 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  46. arXiv:2403.04246  [pdf, other

    stat.ML cs.AI cs.LG

    Efficient CNN-LSTM based Parameter Estimation of Levy Driven Stochastic Differential Equations

    Authors: Shuaiyu Li, Yang Ruan, Changzhou Long, Yuzhong Cheng

    Abstract: This study addresses the challenges in parameter estimation of stochastic differential equations driven by non-Gaussian noises, which are critical in understanding dynamic phenomena such as price fluctuations and the spread of infectious diseases. Previous research highlighted the potential of LSTM networks in estimating parameters of alpha stable Levy driven SDEs but faced limitations including h… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 2023 International Conference on Machine Learning and Applications (ICMLA)

  47. arXiv:2402.18679  [pdf, other

    cs.AI cs.LG

    Data Interpreter: An LLM Agent For Data Science

    Authors: Sirui Hong, Yizhang Lin, Bang Liu, Bangbang Liu, Binhao Wu, Danyang Li, Jiaqi Chen, Jiayi Zhang, Jinlin Wang, Li Zhang, Lingyao Zhang, Min Yang, Mingchen Zhuge, Taicheng Guo, Tuo Zhou, Wei Tao, Wenyi Wang, Xiangru Tang, Xiangtao Lu, Xiawu Zheng, Xinbing Liang, Yaying Fei, Yuheng Cheng, Zongze Xu, Chenglin Wu

    Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies among various tasks, and the ability to identify logical errors for precise reasoning. In this study, we introduce the Data Interpreter, a solution de… ▽ More

    Submitted 12 March, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

  48. arXiv:2402.18512  [pdf, other

    cs.LG

    Log Neural Controlled Differential Equations: The Lie Brackets Make a Difference

    Authors: Benjamin Walker, Andrew D. McLeod, Tiexin Qin, Yichuan Cheng, Haoliang Li, Terry Lyons

    Abstract: The vector field of a controlled differential equation (CDE) describes the relationship between a control path and the evolution of a solution path. Neural CDEs (NCDEs) treat time series data as observations from a control path, parameterise a CDE's vector field using a neural network, and use the solution path as a continuously evolving hidden state. As their formulation makes them robust to irre… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 figures

  49. arXiv:2402.16513  [pdf

    physics.optics cs.ET physics.app-ph

    Photonic Neural Network Fabricated on Thin Film Lithium Niobate for High-Fidelity and Power-Efficient Matrix Computation

    Authors: Yong Zheng, Rongbo Wu, Yuan Ren, Rui Bao, Jian Liu, Yu Ma, Min Wang, Ya Cheng

    Abstract: Photonic neural networks (PNNs) have emerged as a promising platform to address the energy consumption issue that comes with the advancement of artificial intelligence technology, and thin film lithium niobate (TFLN) offers an attractive solution as a material platform mainly for its combined characteristics of low optical loss and large electro-optic (EO) coefficients. Here, we present the first… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: 27 pages,10 figures

  50. arXiv:2402.15896  [pdf, other

    cs.CV

    Multimodal Instruction Tuning with Conditional Mixture of LoRA

    Authors: Ying Shen, Zhiyang Xu, Qifan Wang, Yu Cheng, Wenpeng Yin, Lifu Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in diverse tasks across different domains, with an increasing focus on improving their zero-shot generalization capabilities for unseen multimodal tasks. Multimodal instruction tuning has emerged as a successful strategy for achieving zero-shot generalization by fine-tuning pre-trained models on diverse multimodal ta… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 8 pages, multimodal instruction tuning