Skip to main content

Showing 1–50 of 308 results for author: Wei, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00145  [pdf, other

    cs.SE cs.CV

    GUing: A Mobile GUI Search Engine using a Vision-Language Model

    Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray, Walid Maalej

    Abstract: App developers use the Graphical User Interface (GUI) of other apps as an important source of inspiration to design and improve their own apps. In recent years, research suggested various approaches to retrieve GUI designs that fit a certain text query from screenshot datasets acquired through automated GUI exploration. However, such text-to-GUI retrieval approaches only leverage the textual infor… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  2. arXiv:2404.19108  [pdf, other

    cs.CV astro-ph.IM eess.IV

    Real-Time Convolutional Neural Network-Based Star Detection and Centroiding Method for CubeSat Star Tracker

    Authors: Hongrui Zhao, Michael F. Lembeck, Adrian Zhuang, Riya Shah, Jesse Wei

    Abstract: Star trackers are one of the most accurate celestial sensors used for absolute attitude determination. The devices detect stars in captured images and accurately compute their projected centroids on an imaging focal plane with subpixel precision. Traditional algorithms for star detection and centroiding often rely on threshold adjustments for star pixel detection and pixel brightness weighting for… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  3. arXiv:2404.18688  [pdf, other

    cs.IT

    Distributed Source Coding for Parametric and Non-Parametric Regression

    Authors: Jiahui Wei, Elsa Dupraz, Philippe Mary

    Abstract: The design of communication systems dedicated to machine learning tasks is one key aspect of goal-oriented communications. In this framework, this article investigates the interplay between data reconstruction and learning from the same compressed observations, particularly focusing on the regression problem. We establish achievable rate-generalization error regions for both parametric and non-par… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  5. arXiv:2404.16385  [pdf, other

    cs.CV

    Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

    Authors: Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang

    Abstract: In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  6. arXiv:2404.14827  [pdf, other

    cs.CL

    Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation

    Authors: Jingxuan Wei, Linzhuang Sun, Yichong Leng, Xu Tan, Bihui Yu, Ruifeng Guo

    Abstract: Knowledge distillation, transferring knowledge from a teacher model to a student model, has emerged as a powerful technique in neural machine translation for compressing models or simplifying training targets. Knowledge distillation encompasses two primary methods: sentence-level distillation and token-level distillation. In sentence-level distillation, the student model is trained to align with t… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  7. arXiv:2404.14676  [pdf, other

    cs.CV cs.GR

    DreamPBR: Text-driven Generation of High-resolution SVBRDF with Multi-modal Guidance

    Authors: Linxuan Xin, Zheng Zhang, Jinfu Wei, Ge Li, Duan Gao

    Abstract: Prior material creation methods had limitations in producing diverse results mainly because reconstruction-based methods relied on real-world measurements and generation-based methods were trained on relatively small material datasets. To address these challenges, we propose DreamPBR, a novel diffusion-based generative framework designed to create spatially-varying appearance properties guided by… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 16 pages, 17 figures

    ACM Class: I.3.0, I.4.9

  8. arXiv:2404.11151  [pdf, other

    cs.CV

    REACTO: Reconstructing Articulated Objects from a Single Video

    Authors: Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo, Guosheng Lin, Fayao Liu

    Abstract: In this paper, we address the challenge of reconstructing general articulated 3D objects from a single video. Existing works employing dynamic neural radiance fields have advanced the modeling of articulated objects like humans and animals from videos, but face challenges with piece-wise rigid general articulated objects due to limitations in their deformation models. To tackle this, we propose Qu… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  9. arXiv:2404.10352  [pdf, other

    cs.HC

    CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout

    Authors: Jiafu Wei, Chia-Ming Chang, Xi Yang, Takeo Igarashi

    Abstract: In real-world usage, existing GAN image generation tools come up short due to their lack of intuitive interfaces and limited flexibility. To overcome these limitations, we developed CanvasPic, an innovative tool for flexible GAN image generation. Our tool introduces a novel 2D layout design that allows users to intuitively control image attributes based on real-world images. By interacting with th… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  10. arXiv:2404.07503  [pdf, other

    cs.CL

    Best Practices and Lessons Learned on Synthetic Data for Language Models

    Authors: Ruibo Liu, Jerry Wei, Fangyu Liu, Chenglei Si, Yanzhe Zhang, Jinmeng Rao, Steven Zheng, Daiyi Peng, Diyi Yang, Denny Zhou, Andrew M. Dai

    Abstract: The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challeng… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  11. arXiv:2404.06836  [pdf, other

    cs.CV

    O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation

    Authors: Muer Tie, Julong Wei, Zhengjun Wang, Ke Wu, Shansuai Yuan, Kaizhao Zhang, Jie Jia, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required. Recently, neural implicit representation has provided a promising direction for online interactive mapping. However, implementing open-vocabulary scene understanding capability into online neural implicit mapping still faces three challenges: lac… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  12. arXiv:2404.01548  [pdf, other

    cs.CV cs.AI

    mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

    Authors: Jingxuan Wei, Nan Xu, Guiyong Chang, Yin Luo, BiHui Yu, Ruifeng Guo

    Abstract: In the fields of computer vision and natural language processing, multimodal chart question-answering, especially involving color, structure, and textless charts, poses significant challenges. Traditional methods, which typically involve either direct multimodal processing or a table-to-text conversion followed by language model analysis, have limitations in effectively handling these complex scen… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  13. arXiv:2403.20168  [pdf, other

    eess.IV cs.CV

    Unsupervised Tumor-Aware Distillation for Multi-Modal Brain Image Translation

    Authors: Chuan Huang, Jia Wei, Rui Li

    Abstract: Multi-modal brain images from MRI scans are widely used in clinical diagnosis to provide complementary information from different modalities. However, obtaining fully paired multi-modal images in practice is challenging due to various factors, such as time, cost, and artifacts, resulting in modality-missing brain images. To address this problem, unsupervised multi-modal brain image translation has… ▽ More

    Submitted 24 April, 2024; v1 submitted 29 March, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures. It has been provisionally accepted for IJCNN 2024

  14. arXiv:2403.20159  [pdf, other

    cs.CV

    HGS-Mapping: Online Dense Mapping Using Hybrid Gaussian Representation in Urban Scenes

    Authors: Ke Wu, Kaizhao Zhang, Zhiwei Zhang, Shanshuai Yuan, Muer Tie, Julong Wei, Zijun Xu, Jieru Zhao, Zhongxue Gan, Wenchao Ding

    Abstract: Online dense mapping of urban scenes forms a fundamental cornerstone for scene understanding and navigation of autonomous vehicles. Recent advancements in mapping methods are mainly based on NeRF, whose rendering speed is too slow to meet online requirements. 3D Gaussian Splatting (3DGS), with its rendering speed hundreds of times faster than NeRF, holds greater potential in online dense mapping.… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

  15. arXiv:2403.19060  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Towards Human-Centered Construction Robotics: An RL-Driven Companion Robot For Contextually Assisting Carpentry Workers

    Authors: Yuning Wu, Jiaying Wei, Jean Oh, Daniel Cardoso Llach

    Abstract: In the dynamic construction industry, traditional robotic integration has primarily focused on automating specific tasks, often overlooking the complexity and variability of human aspects in construction workflows. This paper introduces a human-centered approach with a "work companion rover" designed to assist construction workers within their existing practices, aiming to enhance safety and workf… ▽ More

    Submitted 28 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

    Comments: 8 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  16. arXiv:2403.18802  [pdf, other

    cs.CL cs.AI cs.LG

    Long-form factuality in large language models

    Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

    Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  17. arXiv:2403.16519  [pdf, ps, other

    cs.SC

    Two Algorithms for Computing Rational Univariate Representations of Zero-Dimensional Ideals with Parameters

    Authors: Dingkang Wang, Jingjing Wei, Fanghui Xiao, Xiaopeng Zheng

    Abstract: Two algorithms for computing the rational univariate representation of zero-dimensional ideals with parameters are presented in the paper. Different from the rational univariate representation of zero-dimensional ideals without parameters, the number of zeros of zero-dimensional ideals with parameters under various specializations is different, which leads to choosing and checking the separating e… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  18. arXiv:2403.16516  [pdf, other

    cs.CL cs.CV

    Visually Guided Generative Text-Layout Pre-training for Document Intelligence

    Authors: Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong

    Abstract: Prior study shows that pre-training techniques can boost the performance of visual document understanding (VDU), which typically requires models to gain abilities to perceive and reason both document texts and layouts (e.g., locations of texts and table-cells). To this end, we propose visually guided generative text-layout pre-training, named ViTLP. Given a document image, the model optimizes hier… ▽ More

    Submitted 27 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted to NAACL 2024 main conference. The first version of this paper was submitted to OpenReview (https://openreview.net/forum?id=ARtBIBAmNR) in June 2023

  19. arXiv:2403.15766  [pdf, other

    cs.LG cs.AI

    BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion

    Authors: Jia Wei, Xingjun Zhang, Witold Pedrycz

    Abstract: Bagging has achieved great success in the field of machine learning by integrating multiple base classifiers to build a single strong classifier to reduce model variance. The performance improvement of bagging mainly relies on the number and diversity of base classifiers. However, traditional deep learning model training methods are expensive to train individually and difficult to train multiple m… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  20. arXiv:2403.10047  [pdf, other

    cs.CV

    TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

    Authors: Jiahao Lyu, Jin Wei, Gangyan Zeng, Zeng Li, Enze Xie, Wei Wang, Yu Zhou

    Abstract: Existing scene text spotters are designed to locate and transcribe texts from images. However, it is challenging for a spotter to achieve precise detection and recognition of scene texts simultaneously. Inspired by the glimpse-focus spotting pipeline of human beings and impressive performances of Pre-trained Language Models (PLMs) on visual tasks, we ask: 1) "Can machines spot texts without precis… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 12 pages, 8 figures

  21. arXiv:2403.08630  [pdf, other

    stat.ME cs.LG

    Leveraging Non-Decimated Wavelet Packet Features and Transformer Models for Time Series Forecasting

    Authors: Guy P Nason, James L. Wei

    Abstract: This article combines wavelet analysis techniques with machine learning methods for univariate time series forecasting, focusing on three main contributions. Firstly, we consider the use of Daubechies wavelets with different numbers of vanishing moments as input features to both non-temporal and temporal forecasting methods, by selecting these numbers during the cross-validation phase. Secondly, w… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    MSC Class: 62M10; 62M45

  22. arXiv:2403.06407  [pdf, other

    cs.CV

    Can LLMs' Tuning Methods Work in Medical Multimodal Domain?

    Authors: Jiawei Chen, Yue Jiang, Dingkang Yang, Mingcheng Li, Jinjie Wei, Ziyun Qian, Lihua Zhang

    Abstract: While large language models (LLMs) excel in world knowledge understanding, adapting them to specific subfields requires precise adjustments. Due to the model's vast scale, traditional global fine-tuning methods for large models can be computationally expensive and impact generalization. To address this challenge, a range of innovative Parameters-Efficient Fine-Tuning (PEFT) methods have emerged an… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  23. arXiv:2403.05704  [pdf, other

    econ.EM cs.SI stat.AP stat.ME

    Non-robustness of diffusion estimates on networks with measurement error

    Authors: Arun G. Chandrasekhar, Paul Goldsmith-Pinkham, Tyler H. McCormick, Samuel Thau, Jerry Wei

    Abstract: Network diffusion models are used to study things like disease transmission, information spread, and technology adoption. However, small amounts of mismeasurement are extremely likely in the networks constructed to operationalize these models. We show that estimates of diffusions are highly non-robust to this measurement error. First, we show that even when measurement error is vanishingly small,… ▽ More

    Submitted 11 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  24. arXiv:2403.05025  [pdf, other

    cs.AI

    Towards Multimodal Human Intention Understanding Debiasing via Subject-Deconfounding

    Authors: Dingkang Yang, Dongling Xiao, Ke Li, Yuzheng Wang, Zhaoyu Chen, Jinjie Wei, Lihua Zhang

    Abstract: Multimodal intention understanding (MIU) is an indispensable component of human expression analysis (e.g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviors. Existing works invariably focus on designing sophisticated structures or fusion strategies to achieve impressive improvements. Unfortunately, they all suffer from the sub… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 14 pages

  25. arXiv:2403.01756  [pdf, other

    cs.CV

    Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition

    Authors: Yutian Liu, Wenjun Ke, Jianguo Wei

    Abstract: Handwritten mathematical expression recognition (HMER) is challenging in image-to-text tasks due to the complex layouts of mathematical expressions and suffers from problems including over-parsing and under-parsing. To solve these, previous HMER methods improve the attention mechanism by utilizing historical alignment information. However, this approach has limitations in addressing under-parsing… ▽ More

    Submitted 5 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  26. arXiv:2402.15017  [pdf, other

    cs.LG cs.AI cs.CL

    Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

    Authors: Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, Yingyu Liang

    Abstract: Foundation models have emerged as a powerful tool for many AI problems. Despite the tremendous success of foundation models, effective adaptation to new tasks, particularly those with limited labels, remains an open question and lacks theoretical understanding. An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Published at ICLR 2024. 54 pages

  27. arXiv:2402.13481  [pdf, other

    cs.RO cs.AI

    Learning to Model Diverse Driving Behaviors in Highly Interactive Autonomous Driving Scenarios with Multi-Agent Reinforcement Learning

    Authors: Liu Weiwei, Hu Wenxuan, Jing Wei, Lei Lanxin, Gao Lingping, Liu Yong

    Abstract: Autonomous vehicles trained through Multi-Agent Reinforcement Learning (MARL) have shown impressive results in many driving scenarios. However, the performance of these trained policies can be impacted when faced with diverse driving styles and personalities, particularly in highly interactive situations. This is because conventional MARL algorithms usually operate under the assumption of fully co… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  28. arXiv:2402.10892  [pdf, other

    cs.CR cs.CL cs.LG

    Proving membership in LLM pretraining data via data watermarks

    Authors: Johnny Tian-Zheng Wei, Ryan Yixiang Wang, Robin Jia

    Abstract: Detecting whether copyright holders' works were used in LLM pretraining is poised to be an important problem. This work proposes using data watermarks to enable principled detection with only black-box model access, provided that the rightholder contributed multiple training documents and watermarked them before public release. By applying a randomly sampled data watermark, detection can be framed… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  29. arXiv:2402.10412  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Reducing LLM Hallucination without Gold-Standard Answers via Expertise-Weighting

    Authors: Jiaheng Wei, Yuanshun Yao, Jean-Francois Ton, Hongyi Guo, Andrew Estornell, Yang Liu

    Abstract: LLM hallucination, i.e. generating factually incorrect yet seemingly convincing answers, is currently a major threat to the trustworthiness and reliability of LLMs. The first step towards solving this complicated problem is to measure it. However, existing hallucination metrics require to have a benchmark dataset with gold-standard answers, i.e. "best" or "correct" answers written by humans. Such… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Paper Under Review

  30. arXiv:2402.07720  [pdf

    cs.SE

    Interaction-Based Driving Scenario Classification and Labeling

    Authors: Cheng Chang, Jiawei Zhang, Jingwei Ge, Zuo Zhang, Junqing Wei, Li Li

    Abstract: Scenario data play a vital role in autonomous driving related researches, and it is essential to obtain refined descriptions and labels to extract and index scenarios with different types of interactions. However, existing methods cannot cope well with the problem of scenario classification and comparison with vehicle interactions as the core. In this paper, we propose a framework for interaction-… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  31. arXiv:2401.16441  [pdf, other

    cs.LG cs.AI cs.CL

    FaKnow: A Unified Library for Fake News Detection

    Authors: Yiyuan Zhu, Yongjun Li, Jialiang Wang, Ming Gao, Jiali Wei

    Abstract: Over the past years, a large number of fake news detection algorithms based on deep learning have emerged. However, they are often developed under different frameworks, each mandating distinct utilization methodologies, consequently hindering reproducibility. Additionally, a substantial amount of redundancy characterizes the code development of such fake news detection models. To address these con… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  32. arXiv:2401.06785  [pdf, other

    cs.CL cs.AI

    Human-Instruction-Free LLM Self-Alignment with Limited Samples

    Authors: Hongyi Guo, Yuanshun Yao, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

    Abstract: Aligning large language models (LLMs) with human values is a vital task for LLM practitioners. Current alignment techniques have several limitations: (1) requiring a large amount of annotated data; (2) demanding heavy human involvement; (3) lacking a systematic mechanism to continuously improve. In this work, we study aligning LLMs to a new domain with limited samples (e.g. < 100). We propose an a… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

  33. arXiv:2401.01175  [pdf, other

    cs.CV

    Learning Surface Scattering Parameters From SAR Images Using Differentiable Ray Tracing

    Authors: Jiangtao Wei, Yixiang Luomei, Xu Zhang, Feng Xu

    Abstract: Simulating high-resolution Synthetic Aperture Radar (SAR) images in complex scenes has consistently presented a significant research challenge. The development of a microwave-domain surface scattering model and its reversibility are poised to play a pivotal role in enhancing the authenticity of SAR image simulations and facilitating the reconstruction of target parameters. Drawing inspiration from… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  34. arXiv:2401.01002  [pdf, other

    cs.CV

    AI Mobile Application for Archaeological Dating of Bronze Dings

    Authors: Chuntao Li, Ruihua Qi, Chuan Tang, Jiafu Wei, Xi Yang, Qian Zhang, Rixin Zhou

    Abstract: We develop an AI application for archaeological dating of bronze Dings. A classification model is employed to predict the period of the input Ding, and a detection model is used to show the feature parts for making a decision of archaeological dating. To train the two deep learning models, we collected a large number of Ding images from published materials, and annotated the period and the feature… ▽ More

    Submitted 5 September, 2023; originally announced January 2024.

  35. arXiv:2312.13533  [pdf, other

    cs.CL

    Automated Clinical Coding for Outpatient Departments

    Authors: Viktor Schlegel, Abhinav Ramesh Kashyap, Thanh-Tung Nguyen, Tsung-Han Yang, Vijay Prakash Dwivedi, Wei-Hsian Yin, Jeng Wei, Stefan Winkler

    Abstract: Computerised clinical coding approaches aim to automate the process of assigning a set of codes to medical records. While there is active research pushing the state of the art on clinical coding for hospitalized patients, the outpatient setting -- where doctors tend to non-hospitalised patients -- is overlooked. Although both settings can be formalised as a multi-label classification task, they pr… ▽ More

    Submitted 24 December, 2023; v1 submitted 20 December, 2023; originally announced December 2023.

    Comments: 9 pages, preprint under review

  36. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1320 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  37. arXiv:2312.08702  [pdf, other

    cs.AI

    Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory

    Authors: Linzhuang Sun, Nan Xu, Jingxuan Wei, Bihui Yu, Liping Bu, Yin Luo

    Abstract: Having the ability to empathize is crucial for accurately representing human behavior during conversations. Despite numerous research aim to improve the cognitive capability of models by incorporating external knowledge, there has been limited attention on the sensible and rational expression of the conversation itself, which are crucial components of the cognitive empathy. Guided by self-presenta… ▽ More

    Submitted 1 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  38. arXiv:2312.08585   

    cs.CL cs.LG

    Unraveling Key Factors of Knowledge Distillation

    Authors: Jingxuan Wei, Linzhuang Sun, Xu Tan, Bihui Yu, Ruifeng Guo

    Abstract: Knowledge distillation, a technique for model compression and performance enhancement, has gained significant traction in Neural Machine Translation (NMT). However, existing research primarily focuses on empirical applications, and there is a lack of comprehensive understanding of how student model capacity, data complexity, and decoding strategies collectively influence distillation effectiveness… ▽ More

    Submitted 23 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

    Comments: I am requesting the withdrawal of this paper from arXiv due to the realization that the overall composition and structure of the article are not yet sufficiently refined. It is my intention to thoroughly revise and enhance the paper to ensure that it meets the highest standards of academic writing and accurately reflects the research conducted

  39. arXiv:2312.07213  [pdf, other

    cs.AI

    Human-computer Interaction for Brain-inspired Computing Based on Machine Learning And Deep Learning: A Review

    Authors: Bihui Yu, Sibo Zhang, Lili Zhou, Jingxuan Wei, Linzhuang Sun, Liping Bu

    Abstract: The continuous development of artificial intelligence has a profound impact on biomedicine and other fields, providing new research ideas and technical methods. Brain-inspired computing is an important intersection between multimodal technology and biomedical field. Focusing on the application scenarios of decoding text and speech from brain signals in human-computer interaction, this paper presen… ▽ More

    Submitted 7 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: 25pages, 8 figures and 4 tables

  40. arXiv:2312.04114  [pdf, other

    cs.CR

    TI-DNS: A Trusted and Incentive DNS Resolution Architecture based on Blockchain

    Authors: Yufan Fu, Jiuqi Wei, Ying Li, Botao Peng, Xiaodong Li

    Abstract: Domain Name System (DNS) is a critical component of the Internet infrastructure, responsible for translating domain names into IP addresses. However, DNS is vulnerable to some malicious attacks, including DNS cache poisoning, which redirects users to malicious websites displaying offensive or illegal content. Existing countermeasures often suffer from at least one of the following weakness: weak a… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  41. arXiv:2312.04076  [pdf, other

    cs.CV

    Large Language Models are Good Prompt Learners for Low-Shot Image Classification

    Authors: Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu, Haidong Zhu, Ram Nevatia

    Abstract: Low-shot image classification, where training images are limited or inaccessible, has benefited from recent progress on pre-trained vision-language (VL) models with strong generalizability, e.g. CLIP. Prompt learning methods built with VL models generate text features from the class names that only have confined class-specific information. Large Language Models (LLMs), with their vast encyclopedic… ▽ More

    Submitted 2 April, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: CVPR 2024

  42. arXiv:2312.04059  [pdf

    cs.CL

    Comparing Large Language Model AI and Human-Generated Coaching Messages for Behavioral Weight Loss

    Authors: Zhuoran Huang, Michael P. Berry, Christina Chwyl, Gary Hsieh, Jing Wei, Evan M. Forman

    Abstract: Automated coaching messages for weight control can save time and costs, but their repetitive, generic nature may limit their effectiveness compared to human coaching. Large language model (LLM) based artificial intelligence (AI) chatbots, like ChatGPT, could offer more personalized and novel messages to address repetition with their data-processing abilities. While LLM AI demonstrates promise to e… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 29 pages, 5 figures

  43. GDN: A Stacking Network Used for Skin Cancer Diagnosis

    Authors: Jingmin Wei, Haoyang Shen, Ziyi Wang, Ziqian Zhang

    Abstract: Skin cancer, the primary type of cancer that can be identified by visual recognition, requires an automatic identification system that can accurately classify different types of lesions. This paper presents GoogLe-Dense Network (GDN), which is an image-classification model to identify two types of skin cancer, Basal Cell Carcinoma, and Melanoma. GDN uses stacking of different networks to enhance t… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Published at ICSPS 2021

  44. arXiv:2311.14109  [pdf, other

    cs.AI

    Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training

    Authors: Cheng Tan, Jingxuan Wei, Zhangyang Gao, Linzhuang Sun, Siyuan Li, Xihong Yang, Stan Z. Li

    Abstract: Multimodal reasoning is a challenging task that requires models to reason across multiple modalities to answer questions. Existing approaches have made progress by incorporating language and visual modalities into a two-stage reasoning framework, separating rationale generation from answer inference. However, these approaches often fall short due to the inadequate quality of the generated rational… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  45. arXiv:2311.09817  [pdf, other

    cs.CV

    Neural-Logic Human-Object Interaction Detection

    Authors: Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang

    Abstract: The interaction decoder utilized in prevalent Transformer-based HOI detectors typically accepts pre-composed human-object pairs as inputs. Though achieving remarkable performance, such paradigm lacks feasibility and cannot explore novel combinations over entities during decoding. We present L OGIC HOI, a new HOI detector that leverages neural-logic reasoning and Transformer to infer feasible inter… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023; Code: https://github.com/weijianan1/LogicHOI

  46. arXiv:2311.06070  [pdf, other

    cs.CV

    Learning-Based Biharmonic Augmentation for Point Cloud Classification

    Authors: Jiacheng Wei, Guosheng Lin, Henghui Ding, Jie Hu, Kim-Hui Yap

    Abstract: Point cloud datasets often suffer from inadequate sample sizes in comparison to image datasets, making data augmentation challenging. While traditional methods, like rigid transformations and scaling, have limited potential in increasing dataset diversity due to their constraints on altering individual sample shapes, we introduce the Biharmonic Augmentation (BA) method. BA is a novel and efficient… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  47. arXiv:2311.05122  [pdf, ps, other

    cs.CV

    ScribblePolyp: Scribble-Supervised Polyp Segmentation through Dual Consistency Alignment

    Authors: Zixun Zhang, Yuncheng Jiang, Jun Wei, Hannah Cui, Zhen Li

    Abstract: Automatic polyp segmentation models play a pivotal role in the clinical diagnosis of gastrointestinal diseases. In previous studies, most methods relied on fully supervised approaches, necessitating pixel-level annotations for model training. However, the creation of pixel-level annotations is both expensive and time-consuming, impeding the development of model generalization. In response to this… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted by BIBM 2023

  48. Zero-shot Bilingual App Reviews Mining with Large Language Models

    Authors: Jialiang Wei, Anne-Lise Courbis, Thomas Lambolais, Binbin Xu, Pierre Louis Bernard, Gérard Dray

    Abstract: App reviews from app stores are crucial for improving software requirements. A large number of valuable reviews are continually being posted, describing software problems and expected features. Effectively utilizing user reviews necessitates the extraction of relevant information, as well as their subsequent summarization. Due to the substantial volume of user reviews, manual analysis is arduous.… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Accepted for The 35th IEEE International Conference on Tools with Artificial Intelligence

  49. arXiv:2311.01792  [pdf, other

    cs.CL cs.AI

    AFPQ: Asymmetric Floating Point Quantization for LLMs

    Authors: Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

    Abstract: Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  50. arXiv:2310.17894  [pdf, other

    cs.CL cs.AI

    Natural Language Interfaces for Tabular Data Querying and Visualization: A Survey

    Authors: Weixu Zhang, Yifei Wang, Yuanfeng Song, Victor Junqiu Wei, Yuxing Tian, Yiyan Qi, Jonathan H. Chan, Raymond Chi-Wing Wong, Haiqin Yang

    Abstract: The emergence of natural language processing has revolutionized the way users interact with tabular data, enabling a shift from traditional query languages and manual plotting to more intuitive, language-based interfaces. The rise of large language models (LLMs) such as ChatGPT and its successors has further advanced this field, opening new avenues for natural language processing techniques. This… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.

    Comments: 20 pages, 4 figures, 5 tables. Submitted to IEEE TKDE