Skip to main content

Showing 1–50 of 2,073 results for author: Li, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05413  [pdf

    cs.DB

    Digital Evolution: Novo Nordisk's Shift to Ontology-Based Data Management

    Authors: Shawn Zheng Kai Tan, Shounak Baksi, Thomas Gad Bjerrgaard, Preethi Elangovan, Thrishna Kuttikattu Gopalakrishnan, Darko Hric, Joffrey Joumaa, Beidi Li, Kashif Rabbani, Santhosh Kannan Venkatesan, Joshua Daniel Valdez, Saritha Vettikunnel Kuriakose

    Abstract: Biomedical data is growing exponentially, and managing it is increasingly challenging. While Findable, Accessible, Interoperable and Reusable (FAIR) data principles provide guidance, their adoption has proven difficult, especially in larger enterprises like pharmaceutical companies. In this manuscript, we describe how we leverage an Ontology-Based Data Management (OBDM) strategy for digital transf… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures

  2. arXiv:2405.03869  [pdf, other

    cs.LG cs.AI

    Outlier Gradient Analysis: Efficiently Improving Deep Learning Model Performance via Hessian-Free Influence Functions

    Authors: Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, Hongfu Liu

    Abstract: Influence functions offer a robust framework for assessing the impact of each training data sample on model predictions, serving as a prominent tool in data-centric learning. Despite their widespread use in various tasks, the strong convexity assumption on the model and the computational cost associated with calculating the inverse of the Hessian matrix pose constraints, particularly when analyzin… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  3. arXiv:2405.03728  [pdf, other

    cs.NE cs.AI

    GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

    Authors: Xiaobin Li, Kai Wu, Yujian Betterest Li, Xiaoyu Zhang, Handing Wang, Jing Liu

    Abstract: Pretrained Optimization Models (POMs) leverage knowledge gained from optimizing various tasks, providing efficient solutions for new optimization challenges through direct usage or fine-tuning. Despite the inefficiencies and limited generalization abilities observed in current POMs, our proposed model, the general pre-trained optimization model (GPOM), addresses these shortcomings. GPOM constructs… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  4. arXiv:2405.03685  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Language-Image Models with 3D Understanding

    Authors: Jang Hyun Cho, Boris Ivanovic, Yulong Cao, Edward Schmerling, Yue Wang, Xinshuo Weng, Boyi Li, Yurong You, Philipp Krähenbühl, Yan Wang, Marco Pavone

    Abstract: Multi-modal large language models (MLLMs) have shown incredible capabilities in a variety of 2D vision and language tasks. We extend MLLMs' perceptual capabilities to ground and reason about images in 3-dimensional space. To that end, we first develop a large-scale pre-training dataset for 2D and 3D called LV3D by combining multiple existing 2D and 3D recognition datasets under a common task formu… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Project page: https://janghyuncho.github.io/Cube-LLM

  5. arXiv:2405.03316  [pdf, other

    cs.LG cs.CR

    Provably Unlearnable Examples

    Authors: Derui Wang, Minhui Xue, Bo Li, Seyit Camtepe, Liming Zhu

    Abstract: The exploitation of publicly accessible data has led to escalating concerns regarding data privacy and intellectual property (IP) breaches in the age of artificial intelligence. As a strategy to safeguard both data privacy and IP-related domain knowledge, efforts have been undertaken to render shared data unlearnable for unauthorized models in the wild. Existing methods apply empirically optimized… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  6. arXiv:2405.03272  [pdf, other

    cs.CV

    WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning

    Authors: Yuanhan Zhang, Kaichen Zhang, Bo Li, Fanyi Pu, Christopher Arif Setiadharma, Jingkang Yang, Ziwei Liu

    Abstract: Multimodal information, together with our knowledge, help us to understand the complex and dynamic world. Large language models (LLM) and large multimodal models (LMM), however, still struggle to emulate this capability. In this paper, we present WorldQA, a video understanding dataset designed to push the boundaries of multimodal world models with three appealing properties: (1) Multimodal Inputs:… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  7. arXiv:2405.02933  [pdf, other

    cs.CL

    Relay Decoding: Concatenating Large Language Models for Machine Translation

    Authors: Chengpeng Fu, Xiaocheng Feng, Yichong Huang, Wenshuai Huo, Baohang Li, Hui Wang, Bin Qin, Ting Liu

    Abstract: Leveraging large language models for machine translation has demonstrated promising results. However, it does require the large language models to possess the capability of handling both the source and target languages in machine translation. When it is challenging to find large models that support the desired languages, resorting to continuous learning methods becomes a costly endeavor. To mitiga… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: Work in progress

  8. arXiv:2405.02564  [pdf

    cs.CV cs.AI q-bio.NC

    Leveraging the Human Ventral Visual Stream to Improve Neural Network Robustness

    Authors: Zhenan Shao, Linjian Ma, Bo Li, Diane M. Beck

    Abstract: Human object recognition exhibits remarkable resilience in cluttered and dynamic visual environments. In contrast, despite their unparalleled performance across numerous visual tasks, Deep Neural Networks (DNNs) remain far less robust than humans, showing, for example, a surprising susceptibility to adversarial attacks involving image perturbations that are (almost) imperceptible to humans. Human… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  9. arXiv:2405.00466  [pdf, other

    cs.CV cs.CR

    Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable

    Authors: Haozhe Liu, Wentian Zhang, Bing Li, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energ… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  10. arXiv:2405.00266  [pdf, other

    cs.NI

    Robot-As-A-Sensor: Forming a Sensing Network with Robots for Underground Mining Missions

    Authors: Xiaoyu Ai, Chengpei Xu, Binghao Li, Feng Xia

    Abstract: Nowadays, robots are deployed as mobile platforms equipped with sensing, communication and computing capabilities, especially in the mining industry, where they perform tasks in hazardous and repetitive environments. Despite their potential, individual robots face significant limitations when completing complex tasks that require the collaboration of multiple robots. This collaboration requires a… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: Submitted to Special Issue on Neuro-Inspired Learning for Robotics for IEEE Transactions on Cognitive and Developmental Systems

  11. arXiv:2405.00256  [pdf, other

    cs.CV

    ASAM: Boosting Segment Anything Model with Adversarial Tuning

    Authors: Bo Li, Haoke Xiao, Lv Tang

    Abstract: In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

    Comments: This paper is accepted by CVPR2024

  12. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  13. arXiv:2404.18413  [pdf, other

    cs.CV cs.AI

    3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

    Authors: Xinyu Ma, Xuebo Liu, Derek F. Wong, Jun Rao, Bei Li, Liang Ding, Lidia S. Chao, Dacheng Tao, Min Zhang

    Abstract: Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT researc… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  14. arXiv:2404.18361  [pdf, other

    cs.DC cs.AR

    Improving Multi-Instance GPU Efficiency via Sub-Entry Sharing TLB Design

    Authors: Bingyao Li, Yueqi Wang, Tianyu Wang, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang

    Abstract: NVIDIA's Multi-Instance GPU (MIG) technology enables partitioning GPU computing power and memory into separate hardware instances, providing complete isolation including compute resources, caches, and memory. However, prior work identifies that MIG does not extend to partitioning the last-level TLB (i.e., L3 TLB), which remains shared among all instances. To enhance TLB reach, NVIDIA GPUs reorgani… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  15. arXiv:2404.18132  [pdf, other

    cs.GT

    Allocating Mixed Goods with Customized Fairness and Indivisibility Ratio

    Authors: Bo Li, Zihao Li, Shengxin Liu, Zekai Wu

    Abstract: We consider the problem of fairly allocating a combination of divisible and indivisible goods. While fairness criteria like envy-freeness (EF) and proportionality (PROP) can always be achieved for divisible goods, only their relaxed versions, such as the ''up to one'' relaxations EF1 and PROP1, can be satisfied when the goods are indivisible. The ''up to one'' relaxations require the fairness cond… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: Appears in the 33rd International Joint Conference on Artificial Intelligence (IJCAI), 2024

  16. arXiv:2404.17739  [pdf, other

    cs.SE

    How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts

    Authors: Beian Wang, Chong Wang, Peng Liang, Bing Li, Cheng Zeng

    Abstract: Since the emergence of GPT-3, Large Language Models (LLMs) have caught the eyes of researchers, practitioners, and educators in the field of software engineering. However, there has been relatively little investigation regarding the performance of LLMs in assisting with requirements analysis and UML modeling. This paper explores how LLMs can assist novice analysts in creating three types of typica… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  17. arXiv:2404.17433  [pdf, other

    cs.CV

    PromptCIR: Blind Compressed Image Restoration with Prompt Learning

    Authors: Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, Zhibo Chen

    Abstract: Blind Compressed Image Restoration (CIR) has garnered significant attention due to its practical applications. It aims to mitigate compression artifacts caused by unknown quality factors, particularly with JPEG codecs. Existing works on blind CIR often seek assistance from a quality factor prediction network to facilitate their network to restore compressed images. However, the predicted numerical… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Winner of NTIRE 2024 Blind Compressed Image Enhancement Challenge

  18. arXiv:2404.17310  [pdf, other

    cs.CV

    Image Copy-Move Forgery Detection via Deep PatchMatch and Pairwise Ranking Learning

    Authors: Yuanman Li, Yingjie He, Changsheng Chen, Li Dong, Bin Li, Jiantao Zhou, Xia Li

    Abstract: Recent advances in deep learning algorithms have shown impressive progress in image copy-move forgery detection (CMFD). However, these algorithms lack generalizability in practical scenarios where the copied regions are not present in the training images, or the cloned regions are part of the background. Additionally, these algorithms utilize convolution operations to distinguish source and target… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 16 pages, 14figures

  19. arXiv:2404.16952  [pdf, other

    cs.RO

    Simultaneous Estimation of Shape and Force along Highly Deformable Surgical Manipulators Using Sparse FBG Measurement

    Authors: Yiang Lu, Bin Li, Wei Chen, Junyan Yan, Shing Shin Cheng, Jiangliu Wang, Jianshu Zhou, Qi Dou, Yun-hui Liu

    Abstract: Recently, fiber optic sensors such as fiber Bragg gratings (FBGs) have been widely investigated for shape reconstruction and force estimation of flexible surgical robots. However, most existing approaches need precise model parameters of FBGs inside the fiber and their alignments with the flexible robots for accurate sensing results. Another challenge lies in online acquiring external forces at ar… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Accepted to ICRA 2024

  20. arXiv:2404.16790  [pdf, other

    cs.CV

    SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension

    Authors: Bohao Li, Yuying Ge, Yi Chen, Yixiao Ge, Ruimao Zhang, Ying Shan

    Abstract: Comprehending text-rich visual content is paramount for the practical application of Multimodal Large Language Models (MLLMs), since text-rich scenarios are ubiquitous in the real world, which are characterized by the presence of extensive texts embedded within images. Recently, the advent of MLLMs with impressive versatility has raised the bar for what we can expect from MLLMs. However, their pro… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  21. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  22. arXiv:2404.15805  [pdf, other

    q-bio.BM cs.LG

    Beyond ESM2: Graph-Enhanced Protein Sequence Modeling with Efficient Clustering

    Authors: Shujian Jiao, Bingxuan Li, Lei Wang, Xiaojin Zhang, Wei Chen, Jiajie Peng, Zhongyu Wei

    Abstract: Proteins are essential to life's processes, underpinning evolution and diversity. Advances in sequencing technology have revealed millions of proteins, underscoring the need for sophisticated pre-trained protein models for biological analysis and AI development. Facebook's ESM2, the most advanced protein language model to date, leverages a masked prediction task for unsupervised learning, crafting… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.15677  [pdf, other

    cs.CV

    CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

    Authors: Qinghe Wang, Baolu Li, Xiaomin Li, Bing Cao, Liqian Ma, Huchuan Lu, Xu Jia

    Abstract: Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a framework that allows sampling new characters with consistent identities in the latent space of GANs for diffusion models. More specifically, we consi… ▽ More

    Submitted 27 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Code will be released very soon: https://github.com/qinghew/CharacterFactory

  24. arXiv:2404.15009  [pdf, other

    cs.CV eess.IV

    The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)

    Authors: Anahita Fathi Kazerooni, Nastaran Khalili, Deep Gandhi, Xinyang Liu, Zhifan Jiang, Syed Muhammed Anwar, Jake Albrecht, Maruf Adewole, Udunna Anazodo, Hannah Anderson, Sina Bagheri, Ujjwal Baid, Timothy Bergquist, Austin J. Borja, Evan Calabrese, Verena Chung, Gian-Marco Conte, Farouk Dako, James Eddy, Ivan Ezhov, Ariana Familiar, Keyvan Farahani, Anurag Gottipati, Debanjan Haldar, Shuvanjan Haldar , et al. (51 additional authors not shown)

    Abstract: Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr… ▽ More

    Submitted 29 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.17033

  25. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  26. arXiv:2404.13991  [pdf, other

    cs.NI

    5GC$^2$ache: Improving 5G UPF Performance via Cache Optimization

    Authors: Haonan Jia, Meng Wang, Biyi Li, Yirui Liu, Junchen Guo, Pengyu Zhang

    Abstract: Last Level Cache (LLC) is a precious and critical resource that impacts the performance of applications running on top of CPUs. In this paper, we reveal the significant impact of LLC on the performance of the 5G user plane function (UPF) when running a cloudified 5G core on general-purposed servers. With extensive measurements showing that the throughput can degrade by over 50\% when the precious… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  27. arXiv:2404.13872  [pdf, other

    cs.CV

    FreqBlender: Enhancing DeepFake Detection by Blending Frequency Knowledge

    Authors: Hanzhe Li, Yuezun Li, Jiaran Zhou, Bin Li, Junyu Dong

    Abstract: Generating synthetic fake faces, known as pseudo-fake faces, is an effective way to improve the generalization of DeepFake detection. Existing methods typically generate these faces by blending real or fake faces in color space. While these methods have shown promise, they overlook the simulation of frequency distribution in pseudo-fake faces, limiting the learning of generic forgery traces in-dep… ▽ More

    Submitted 6 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  28. arXiv:2404.13603  [pdf, other

    cs.IT eess.SP

    Beyond MMSE: Rank-1 Subspace Channel Estimator for Massive MIMO Systems

    Authors: Bin Li, Ziping Wei, Shaoshi Yang, Yang Zhang, Jun Zhang, Chenglin Zhao, Sheng Chen

    Abstract: To glean the benefits offered by massive multi-input multi-output (MIMO) systems, channel state information must be accurately acquired. Despite the high accuracy, the computational complexity of classical linear minimum mean squared error (MMSE) estimator becomes prohibitively high in the context of massive MIMO, while the other low-complexity methods degrade the estimation accuracy seriously. In… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 12 figures, accepted to appear on IEEE Transactions on Communications, Apr. 2024

  29. arXiv:2404.13353  [pdf, other

    cs.CV

    Generating Daylight-driven Architectural Design via Diffusion Models

    Authors: Pengzhi Li, Baijuan Li

    Abstract: In recent years, the rapid development of large-scale models has made new possibilities for interdisciplinary fields such as architecture. In this paper, we present a novel daylight-driven AI-aided architectural design method. Firstly, we formulate a method for generating massing models, producing architectural massing models using random parameters quickly. Subsequently, we integrate a daylight-d… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: Project page: https://zrealli.github.io/DDADesign/

    Journal ref: CVPR 2024 Workshop

  30. arXiv:2404.12887  [pdf, other

    cs.CV eess.IV

    3D Multi-frame Fusion for Video Stabilization

    Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

    Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  31. arXiv:2404.12852  [pdf, other

    cs.CR cs.CV cs.LG

    LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning

    Authors: Beichen Li, Yuanfang Guo, Heqi Peng, Yangxi Li, Yunhong Wang

    Abstract: Deep neural networks are vulnerable to backdoor attacks. Among the existing backdoor defense methods, trigger reverse engineering based approaches, which reconstruct the backdoor triggers via optimizations, are the most versatile and effective ones compared to other types of methods. In this paper, we summarize and construct a generic paradigm for the typical trigger reverse engineering process. B… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  32. arXiv:2404.12715  [pdf, other

    cs.CL

    Enabling Ensemble Learning for Heterogeneous Large Language Models with Deep Parallel Collaboration

    Authors: Yichong Huang, Xiaocheng Feng, Baohang Li, Yang Xiang, Hui Wang, Bing Qin, Ting Liu

    Abstract: Large language models (LLMs) have shown complementary strengths in various tasks and instances, motivating the research of ensembling LLMs to push the frontier leveraging the wisdom of the crowd. Existing work achieves this objective via training the extra reward model or fusion model to select or fuse all candidate answers. However, these methods pose a great challenge to the generalizability of… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures

  33. arXiv:2404.12635  [pdf, other

    cs.CV cs.CR cs.LG

    AED-PADA:Improving Generalizability of Adversarial Example Detection via Principal Adversarial Domain Adaptation

    Authors: Heqi Peng, Yunhong Wang, Ruijie Yang, Beichen Li, Rui Wang, Yuanfang Guo

    Abstract: Adversarial example detection, which can be conveniently applied in many scenarios, is important in the area of adversarial defense. Unfortunately, existing detection methods suffer from poor generalization performance, because their training process usually relies on the examples generated from a single known adversarial attack and there exists a large discrepancy between the training and unseen… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  34. arXiv:2404.12415  [pdf

    eess.IV cs.CV cs.LG

    Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry

    Authors: Shubhadip Dasgupta, Satwik Pate, Divya Rathore, L. G. Divyanth, Ayan Das, Anshuman Nayak, Subhadip Dey, Asim Biswas, David C. Weindorf, Bin Li, Sergio Henrique Godinho Silva, Bruno Teixeira Ribeiro, Sanjay Srivastava, Somsubhra Chakraborty

    Abstract: This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 37 pages, 10 figures; manuscript under peer-review for publication in the jounral 'Computers and Electronics in Agriculture'

  35. arXiv:2404.12390  [pdf, other

    cs.CV cs.AI cs.CL

    BLINK: Multimodal Large Language Models Can See but Not Perceive

    Authors: Xingyu Fu, Yushi Hu, Bangzheng Li, Yu Feng, Haoyu Wang, Xudong Lin, Dan Roth, Noah A. Smith, Wei-Chiu Ma, Ranjay Krishna

    Abstract: We introduce Blink, a new benchmark for multimodal language models (LLMs) that focuses on core visual perception abilities not found in other evaluations. Most of the Blink tasks can be solved by humans "within a blink" (e.g., relative depth estimation, visual correspondence, forensics detection, and multi-view reasoning). However, we find these perception-demanding tasks cast significant challeng… ▽ More

    Submitted 4 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Multimodal Benchmark, Project Url: https://zeyofu.github.io/blink/

  36. arXiv:2404.12241  [pdf, other

    cs.CL cs.AI

    Introducing v0.5 of the AI Safety Benchmark from MLCommons

    Authors: Bertie Vidgen, Adarsh Agrawal, Ahmed M. Ahmed, Victor Akinwande, Namir Al-Nuaimi, Najla Alfaraj, Elie Alhajjar, Lora Aroyo, Trupti Bavalatti, Borhane Blili-Hamelin, Kurt Bollacker, Rishi Bomassani, Marisa Ferrara Boston, Siméon Campos, Kal Chakra, Canyu Chen, Cody Coleman, Zacharie Delpierre Coudert, Leon Derczynski, Debojyoti Dutta, Ian Eisenberg, James Ezick, Heather Frase, Brian Fuller, Ram Gandikota , et al. (72 additional authors not shown)

    Abstract: This paper introduces v0.5 of the AI Safety Benchmark, which has been created by the MLCommons AI Safety Working Group. The AI Safety Benchmark has been designed to assess the safety risks of AI systems that use chat-tuned language models. We introduce a principled approach to specifying and constructing the benchmark, which for v0.5 covers only a single use case (an adult chatting to a general-pu… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.11879  [pdf, other

    cs.DS

    Public Event Scheduling with Busy Agents

    Authors: Bo Li, Lijun Li, Minming Li, Ruilong Zhang

    Abstract: We study a public event scheduling problem, where multiple public events are scheduled to coordinate the availability of multiple agents. The availability of each agent is determined by solving a separate flexible interval job scheduling problem, where the jobs are required to be preemptively processed. The agents want to attend as many events as possible, and their agreements are considered to be… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: To appear in IJCAI 2024

  38. arXiv:2404.11313  [pdf, other

    eess.IV cs.AI

    NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

    Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

  39. arXiv:2404.10849  [pdf, other

    cs.RO cs.AI

    End-To-End Training and Testing Gamification Framework to Learn Human Highway Driving

    Authors: Satya R. Jaladi, Zhimin Chen, Narahari R. Malayanur, Raja M. Macherla, Bing Li

    Abstract: The current autonomous stack is well modularized and consists of perception, decision making and control in a handcrafted framework. With the advances in artificial intelligence (AI) and computing resources, researchers have been pushing the development of end-to-end AI for autonomous driving, at least in problems of small searching space such as in highway scenarios, and more and more photorealis… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: Accepted by ITSC

  40. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  41. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  42. arXiv:2404.09640  [pdf, other

    cs.CV

    CREST: Cross-modal Resonance through Evidential Deep Learning for Enhanced Zero-Shot Learning

    Authors: Haojian Huang, Xiaozhen Qiao, Zhuo Chen, Haodong Chen, Bingyu Li, Zhe Sun, Mulin Chen, Xuelong Li

    Abstract: Zero-shot learning (ZSL) enables the recognition of novel classes by leveraging semantic knowledge transfer from known to unknown categories. This knowledge, typically encapsulated in attribute descriptions, aids in identifying class-specific visual features, thus facilitating visual-semantic alignment and improving ZSL performance. However, real-world challenges such as distribution imbalances an… ▽ More

    Submitted 20 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Ongoing work; 10 pages, 2 Tables, 9 Figures; Repo is available at: https://github.com/JethroJames/CREST

  43. arXiv:2404.09232  [pdf, other

    cs.LG cs.DC

    MAP: Model Aggregation and Personalization in Federated Learning with Incomplete Classes

    Authors: Xin-Chun Li, Shaoming Song, Yinchuan Li, Bingshuai Li, Yunfeng Shao, Yang Yang, De-Chuan Zhan

    Abstract: In some real-world applications, data samples are usually distributed on local devices, where federated learning (FL) techniques are proposed to coordinate decentralized clients without directly sharing users' private data. FL commonly follows the parameter server architecture and contains multiple personalization and aggregation procedures. The natural data heterogeneity across clients, i.e., Non… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Accepted by TKDE (11-Apr-2024)

  44. arXiv:2404.08965  [pdf, other

    cs.CV cs.MM

    Seeing Text in the Dark: Algorithm and Benchmark

    Authors: Chengpei Xu, Hao Fu, Long Ma, Wenjing Jia, Chengqi Zhang, Feng Xia, Xiaoyu Ai, Binghao Li, Wenjie Zhang

    Abstract: Localizing text in low-light environments is challenging due to visual degradations. Although a straightforward solution involves a two-stage pipeline with low-light image enhancement (LLE) as the initial step followed by detector, LLE is primarily designed for human vision instead of machine and can accumulate errors. In this work, we propose an efficient and effective single-stage approach for l… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

  45. arXiv:2404.08917  [pdf, other

    cs.CV

    MAProtoNet: A Multi-scale Attentive Interpretable Prototypical Part Network for 3D Magnetic Resonance Imaging Brain Tumor Classification

    Authors: Binghua Li, Jie Mao, Zhe Sun, Chao Li, Qibin Zhao, Toshihisa Tanaka

    Abstract: Automated diagnosis with artificial intelligence has emerged as a promising area in the realm of medical imaging, while the interpretability of the introduced deep neural networks still remains an urgent concern. Although contemporary works, such as XProtoNet and MProtoNet, has sought to design interpretable prediction models for the issue, the localization precision of their resulting attribution… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  46. arXiv:2404.08676  [pdf, other

    cs.CL cs.CY cs.LG

    ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming

    Authors: Simone Tedeschi, Felix Friedrich, Patrick Schramowski, Kristian Kersting, Roberto Navigli, Huu Nguyen, Bo Li

    Abstract: When building Large Language Models (LLMs), it is paramount to bear safety in mind and protect them with guardrails. Indeed, LLMs should never generate content promoting or normalizing harmful, illegal, or unethical behavior that may contribute to harm to individuals or society. This principle applies to both normal and adversarial use. In response, we introduce ALERT, a large-scale benchmark to a… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 19 pages, preprint

    MSC Class: I.2

  47. arXiv:2404.07721  [pdf, other

    eess.SP cs.IT

    Trainable Joint Channel Estimation, Detection and Decoding for MIMO URLLC Systems

    Authors: Yi Sun, Hong Shen, Bingqing Li, Wei Xu, Pengcheng Zhu, Nan Hu, Chunming Zhao

    Abstract: The receiver design for multi-input multi-output (MIMO) ultra-reliable and low-latency communication (URLLC) systems can be a tough task due to the use of short channel codes and few pilot symbols. Consequently, error propagation can occur in traditional turbo receivers, leading to performance degradation. Moreover, the processing delay induced by information exchange between different modules may… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 17 pages, 12 figures, accepted by IEEE Transactions on Wireless Communications

  48. arXiv:2404.07687  [pdf, other

    cs.CV

    Chaos in Motion: Unveiling Robustness in Remote Heart Rate Measurement through Brain-Inspired Skin Tracking

    Authors: Jie Wang, Jing Lian, Minjie Ma, Junqiang Lei, Chunbiao Li, Bin Li, Jizhao Liu

    Abstract: Heart rate is an important physiological indicator of human health status. Existing remote heart rate measurement methods typically involve facial detection followed by signal extraction from the region of interest (ROI). These SOTA methods have three serious problems: (a) inaccuracies even failures in detection caused by environmental influences or subject movement; (b) failures for special patie… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 8 pages, 10 figures

  49. arXiv:2404.06892  [pdf, other

    cs.CV

    SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving

    Authors: Diankun Zhang, Guoan Wang, Runwen Zhu, Jianbo Zhao, Xiwu Chen, Siyu Zhang, Jiahao Gong, Qibin Zhou, Wenyuan Zhang, Ningzi Wang, Feiyang Tan, Hangning Zhou, Ziyao Xu, Haotian Yao, Chi Zhang, Xiaojun Liu, Xiaoguang Di, Bin Li

    Abstract: End-to-End paradigms use a unified framework to implement multi-tasks in an autonomous driving system. Despite simplicity and clarity, the performance of end-to-end autonomous driving methods on sub-tasks is still far behind the single-task methods. Meanwhile, the widely used dense BEV features in previous end-to-end methods make it costly to extend to more modalities or tasks. In this paper, we p… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  50. arXiv:2404.06735  [pdf, other

    stat.ML cs.LG math.ST stat.AP stat.ME

    A Copula Graphical Model for Multi-Attribute Data using Optimal Transport

    Authors: Qi Zhang, Bing Li, Lingzhou Xue

    Abstract: Motivated by modern data forms such as images and multi-view data, the multi-attribute graphical model aims to explore the conditional independence structure among vectors. Under the Gaussian assumption, the conditional independence between vectors is characterized by blockwise zeros in the precision matrix. To relax the restrictive Gaussian assumption, in this paper, we introduce a novel semipara… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 37 pages