Skip to main content

Showing 1–50 of 1,502 results for author: Wang, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05955  [pdf, other

    cs.CL

    Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning

    Authors: Junzhi Chen, Juhao Liang, Benyou Wang

    Abstract: The emergence of large language models (LLMs) has opened up unprecedented possibilities for automating complex tasks that are often comparable to human performance. Despite their capabilities, LLMs still encounter difficulties in completing tasks that require high levels of accuracy and complexity due to their inherent limitations in handling multifaceted problems single-handedly. This paper intro… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05786  [pdf, other

    cs.LG

    FusionTransNet for Smart Urban Mobility: Spatiotemporal Traffic Forecasting Through Multimodal Network Integration

    Authors: Binwu Wang, Yan Leng, Guang Wang, Yang Wang

    Abstract: This study develops FusionTransNet, a framework designed for Origin-Destination (OD) flow predictions within smart and multimodal urban transportation systems. Urban transportation complexity arises from the spatiotemporal interactions among various traffic modes. Motivated by analyzing multimodal data from Shenzhen, a framework that can dissect complicated spatiotemporal interactions between thes… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05556  [pdf, other

    cs.CE

    Extension of graph-accelerated non-intrusive polynomial chaos to high-dimensional uncertainty quantification through the active subspace method

    Authors: Bingran Wang, Nicholas C. Orndorff, Mark Sperry, John T. Hwang

    Abstract: The recently introduced graph-accelerated non-intrusive polynomial chaos (NIPC) method has shown effectiveness in solving a broad range of uncertainty quantification (UQ) problems with multidisciplinary systems. It uses integration-based NIPC to solve the UQ problem and generates the quadrature rule in a desired tensor structure, so that the model evaluations can be efficiently accelerated through… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  4. arXiv:2405.05237  [pdf, other

    cs.CV

    EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning

    Authors: Jingfeng Yao, Xinggang Wang, Yuehao Song, Huangxuan Zhao, Jun Ma, Yajie Chen, Wenyu Liu, Bo Wang

    Abstract: The diagnosis and treatment of chest diseases play a crucial role in maintaining human health. X-ray examination has become the most common clinical examination means due to its efficiency and cost-effectiveness. Artificial intelligence analysis methods for chest X-ray images are limited by insufficient annotation data and varying levels of annotation, resulting in weak generalization ability and… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: codes available at: https://github.com/hustvl/EVA-X

  5. arXiv:2405.04861  [pdf, other

    cs.SE

    Insights into Deep Learning Refactoring: Bridging the Gap Between Practices and Expectations

    Authors: SiQi Wang, Xing Hu, Bei Wang, WenXin Yao, Xin Xia, XingYu Wang

    Abstract: With the rapid development of deep learning, the implementation of intricate algorithms and substantial data processing have become standard elements of deep learning projects. As a result, the code has become progressively complex as the software evolves, which is difficult to maintain and understand. Existing studies have investigated the impact of refactoring on software quality within traditio… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 24 pages, 18 figures

  6. arXiv:2405.04135  [pdf, other

    cs.AI

    In-context Learning for Automated Driving Scenarios

    Authors: Ziqi Zhou, Jingyue Zhang, Jingyuan Zhang, Boyue Wang, Tianyu Shi, Alaa Khamis

    Abstract: One of the key challenges in current Reinforcement Learning (RL)-based Automated Driving (AD) agents is achieving flexible, precise, and human-like behavior cost-effectively. This paper introduces an innovative approach utilizing Large Language Models (LLMs) to intuitively and effectively optimize RL reward functions in a human-centric way. We developed a framework where instructions and dynamic e… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures, 35 references

  7. arXiv:2405.04064  [pdf, other

    cs.AI

    MFA-Net: Multi-Scale feature fusion attention network for liver tumor segmentation

    Authors: Yanli Yuan, Bingbing Wang, Chuan Zhang, Jingyi Xu, Ximeng Liu, Liehuang Zhu

    Abstract: Segmentation of organs of interest in medical CT images is beneficial for diagnosis of diseases. Though recent methods based on Fully Convolutional Neural Networks (F-CNNs) have shown success in many segmentation tasks, fusing features from images with different scales is still a challenge: (1) Due to the lack of spatial awareness, F-CNNs share the same weights at different spatial locations. (2)… ▽ More

    Submitted 9 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Paper accepted in Human-Centric Representation Learning workshop at AAAI 2024

  8. arXiv:2405.03501  [pdf, other

    cs.LG cs.AI cs.CV

    Boosting Single Positive Multi-label Classification with Generalized Robust Loss

    Authors: Yanxi Chen, Chunxiao Li, Xinyang Dai, Jinhuan Li, Weiyu Sun, Yiming Wang, Renyuan Zhang, Tinghe Zhang, Bo Wang

    Abstract: Multi-label learning (MLL) requires comprehensive multi-semantic annotations that is hard to fully obtain, thus often resulting in missing labels scenarios. In this paper, we investigate Single Positive Multi-label Learning (SPML), where each image is associated with merely one positive label. Existing SPML methods only focus on designing losses using mechanisms such as hard pseudo-labeling and ro… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 6 tables

  9. arXiv:2405.03371  [pdf, other

    cs.CL

    Explainable Fake News Detection With Large Language Model via Defense Among Competing Wisdom

    Authors: Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, Yi Chang

    Abstract: Most fake news detection methods learn latent feature representations based on neural networks, which makes them black boxes to classify a piece of news without giving any justification. Existing explainable systems generate veracity justifications from investigative journalism, which suffer from debunking delayed and low efficiency. Recent studies simply assume that the justification is equivalen… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 12 pages, WWW'2024

  10. arXiv:2405.03177  [pdf, other

    cs.CV

    Transformer-based RGB-T Tracking with Channel and Spatial Feature Fusion

    Authors: Yunfeng Li, Bo Wang, Ye Li, Zhiwen Yu, Liang Wang

    Abstract: Complementary RGB and TIR modalities enable RGB-T tracking to achieve competitive performance in challenging scenarios. Therefore, how to better fuse cross-modal features is the core issue of RGB-T tracking. Some previous methods either insufficiently fuse RGB and TIR features, or depend on intermediaries containing information from both modalities to achieve cross-modal information interaction. T… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  11. arXiv:2405.03138  [pdf, other

    cs.CL

    CRAFT: Extracting and Tuning Cultural Instructions from the Wild

    Authors: Bin Wang, Geyu Lin, Zhengyuan Liu, Chengwei Wei, Nancy F. Chen

    Abstract: Large language models (LLMs) have rapidly evolved as the foundation of various natural language processing (NLP) applications. Despite their wide use cases, their understanding of culturally-related concepts and reasoning remains limited. Meantime, there is a significant need to enhance these models' cultural reasoning capabilities, especially concerning underrepresented regions. This paper introd… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 6 pages

  12. arXiv:2405.03076  [pdf, other

    cs.MA

    Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management

    Authors: Bingzhang Wang, Zhiyu Cai, Muhammad Monjurul Karim, Chenxi Liu, Yinhai Wang

    Abstract: The digitization of traffic sensing infrastructure has significantly accumulated an extensive traffic data warehouse, which presents unprecedented challenges for transportation analytics. The complexities associated with querying large-scale multi-table databases require specialized programming expertise and labor-intensive development. Additionally, traditional analysis methods have focused mainl… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 8 pages, 5 figures, submitted to 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

  13. arXiv:2405.02963  [pdf

    cs.CR eess.SY

    Preventive Audits for Data Applications Before Data Sharing in the Power IoT

    Authors: Bohong Wang, Qinglai Guo, Yanxi Lin, Yang Yu

    Abstract: With the increase in data volume, more types of data are being used and shared, especially in the power Internet of Things (IoT). However, the processes of data sharing may lead to unexpected information leakage because of the ubiquitous relevance among the different data, thus it is necessary for data owners to conduct preventive audits for data applications before data sharing to avoid the risk… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 19 pages, 18 figures

  14. arXiv:2405.02639  [pdf, other

    cs.RO

    Wall-Climbing Performance of Gecko-inspired Robot with Soft Feet and Digits enhanced by Gravity Compensation

    Authors: Bingcheng Wang, Zhiyuan Weng, Haoyu Wang, Shuangjie Wang, Zhouyi Wang, Zhendong Dai, Ardian Jusufi

    Abstract: Gravitational forces can induce deviations in body posture from desired configurations in multi-legged arboreal robot locomotion with low leg stiffness, affecting the contact angle between the swing leg's end-effector and the climbing surface during the gait cycle. The relationship between desired and actual foot positions is investigated here in a leg-stiffness-enhanced model under external force… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  15. arXiv:2405.02145  [pdf, other

    cs.RO

    Characterized Diffusion and Spatial-Temporal Interaction Network for Trajectory Prediction in Autonomous Driving

    Authors: Haicheng Liao, Xuelin Li, Yongkang Li, Hanlin Kong, Chengyue Wang, Bonan Wang, Yanchen Guan, KaHou Tam, Zhenning Li, Chengzhong Xu

    Abstract: Trajectory prediction is a cornerstone in autonomous driving (AD), playing a critical role in enabling vehicles to navigate safely and efficiently in dynamic environments. To address this task, this paper presents a novel trajectory prediction model tailored for accuracy in the face of heterogeneous and uncertain traffic scenarios. At the heart of this model lies the Characterized Diffusion Module… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  16. arXiv:2405.01680  [pdf, other

    cs.LG

    Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective Activations

    Authors: Nima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang, Charles X. Ling

    Abstract: The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work,… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted at IJCAI 2024

  17. arXiv:2405.01266  [pdf, other

    cs.RO cs.AI

    MFTraj: Map-Free, Behavior-Driven Trajectory Prediction for Autonomous Driving

    Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Huanming Shen, Bonan Wang, Dongping Liao, Guofa Li, Chengzhong Xu

    Abstract: This paper introduces a trajectory prediction model tailored for autonomous driving, focusing on capturing complex interactions in dynamic traffic scenarios without reliance on high-definition maps. The model, termed MFTraj, harnesses historical trajectory data combined with a novel dynamic geometric graph-based behavior-aware module. At its core, an adaptive structure-aware interactive graph conv… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: Accepted by IJCAI 2024

  18. arXiv:2405.00338  [pdf, other

    cs.IR

    Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model

    Authors: Yu Cui, Feng Liu, Pengbo Wang, Bohao Wang, Heng Tang, Yi Wan, Jun Wang, Jiawei Chen

    Abstract: Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conv… ▽ More

    Submitted 3 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 2 figures

  19. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  20. arXiv:2404.18532  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    MileBench: Benchmarking MLLMs in Long Context

    Authors: Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

    Abstract: Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 29 pages, 13 figures, 14 tables

  21. arXiv:2404.17739  [pdf, other

    cs.SE

    How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts

    Authors: Beian Wang, Chong Wang, Peng Liang, Bing Li, Cheng Zeng

    Abstract: Since the emergence of GPT-3, Large Language Models (LLMs) have caught the eyes of researchers, practitioners, and educators in the field of software engineering. However, there has been relatively little investigation regarding the performance of LLMs in assisting with requirements analysis and UML modeling. This paper explores how LLMs can assist novice analysts in creating three types of typica… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  22. arXiv:2404.17736  [pdf, other

    eess.SP cs.CV

    Diffusion-Aided Joint Source Channel Coding For High Realism Wireless Image Transmission

    Authors: Mingyu Yang, Bowen Liu, Boyang Wang, Hun-Seok Kim

    Abstract: Deep learning-based joint source-channel coding (deep JSCC) has been demonstrated as an effective approach for wireless image transmission. Nevertheless, current research has concentrated on minimizing a standard distortion metric such as Mean Squared Error (MSE), which does not necessarily improve the perceptual quality. To address this issue, we propose DiffJSCC, a novel framework that leverages… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  23. arXiv:2404.17698  [pdf, other

    cs.HC

    "Actually I Can Count My Blessings": User-Centered Design of an Application to Promote Gratitude Among Young Adults

    Authors: Ananya Bhattacharjee, Zichen Gong, Bingcheng Wang, Timothy James Luckcock, Emma Watson, Elena Allica Abellan, Leslie Gutman, Anne Hsu, Joseph Jay Williams

    Abstract: Regular practice of gratitude has the potential to enhance psychological wellbeing and foster stronger social connections among young adults. However, there is a lack of research investigating user needs and expectations regarding gratitude-promoting applications. To address this gap, we employed a user-centered design approach to develop a mobile application that facilitates gratitude practice. O… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  24. arXiv:2404.17589  [pdf

    cs.IR cs.LG

    An Off-Policy Reinforcement Learning Algorithm Customized for Multi-Task Fusion in Large-Scale Recommender Systems

    Authors: Peng Liu, Cong Xu, Ming Zhao, Jiawei Zhu, Bin Wang, Yi Ren

    Abstract: As the last critical stage of RSs, Multi-Task Fusion (MTF) is responsible for combining multiple scores outputted by Multi-Task Learning (MTL) into a final score to maximize user satisfaction, which determines the ultimate recommendation results. Recently, to optimize long-term user satisfaction within a recommendation session, Reinforcement Learning (RL) is used for MTF in the industry. However,… ▽ More

    Submitted 6 May, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  25. arXiv:2404.17520  [pdf, other

    cs.RO

    A Cognitive-Driven Trajectory Prediction Model for Autonomous Driving in Mixed Autonomy Environment

    Authors: Haicheng Liao, Zhenning Li, Chengyue Wang, Bonan Wang, Hanlin Kong, Yanchen Guan, Guofa Li, Zhiyong Cui, Chengzhong Xu

    Abstract: As autonomous driving technology progresses, the need for precise trajectory prediction models becomes paramount. This paper introduces an innovative model that infuses cognitive insights into trajectory prediction, focusing on perceived safety and dynamic decision-making. Distinct from traditional approaches, our model excels in analyzing interactions and behavior patterns in mixed autonomy traff… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  26. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  27. arXiv:2404.16666  [pdf, other

    cs.CV

    PhyRecon: Physically Plausible Neural Scene Reconstruction

    Authors: Junfeng Ni, Yixin Chen, Bohan Jing, Nan Jiang, Bin Wang, Bo Dai, Yixin Zhu, Song-Chun Zhu, Siyuan Huang

    Abstract: While neural implicit representations have gained popularity in multi-view 3D reconstruction, previous work struggles to yield physically plausible results, thereby limiting their applications in physics-demanding domains like embodied AI and robotics. The lack of plausibility originates from both the absence of physics modeling in the existing pipeline and their inability to recover intricate geo… ▽ More

    Submitted 28 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://phyrecon.github.io/

  28. arXiv:2404.15289  [pdf, other

    eess.SP cs.LG

    EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network

    Authors: Bin Wang, Fei Deng, Peifan Jiang

    Abstract: Electroencephalogram (EEG) signals play a pivotal role in clinical medicine, brain research, and neurological disease studies. However, susceptibility to various physiological and environmental artifacts introduces noise in recorded EEG data, impeding accurate analysis of underlying brain activity. Denoising techniques are crucial to mitigate this challenge. Recent advancements in deep learningbas… ▽ More

    Submitted 20 March, 2024; originally announced April 2024.

  29. arXiv:2404.15254  [pdf, other

    cs.CV

    UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition

    Authors: Bin Wang, Zhuangcheng Gu, Chao Xu, Bo Zhang, Botian Shi, Conghui He

    Abstract: This paper presents the UniMER dataset to provide the first study on Mathematical Expression Recognition (MER) towards complex real-world scenarios. The UniMER dataset consists of a large-scale training set UniMER-1M offering an unprecedented scale and diversity with one million training instances and a meticulously designed test set UniMER-Test that reflects a diverse range of formula distributio… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 17 pages, 5 figures

  30. arXiv:2404.14961  [pdf, other

    cs.LG

    Cache-Aware Reinforcement Learning in Large-Scale Recommender Systems

    Authors: Xiaoshuang Chen, Gengrui Zhang, Yao Wang, Yulin Wu, Shuo Su, Kaiqiao Zhan, Ben Wang

    Abstract: Modern large-scale recommender systems are built upon computation-intensive infrastructure and usually suffer from a huge difference in traffic between peak and off-peak periods. In peak periods, it is challenging to perform real-time computation for each request due to the limited budget of computational resources. The recommendation with a cache is a solution to this problem, where a user-wise r… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 8 pages, 8 figures

  31. arXiv:2404.14607  [pdf, other

    cs.CL

    Q-Tuning: Queue-based Prompt Tuning for Lifelong Few-shot Language Learning

    Authors: Yanhui Guo, Shaoyuan Xu, Jinmiao Fu, Jia Liu, Chaosheng Dong, Bryan Wang

    Abstract: This paper introduces \textbf{Q-tuning}, a novel approach for continual prompt tuning that enables the lifelong learning of a pre-trained language model. When learning a new task, Q-tuning trains a task-specific prompt by adding it to a prompt queue consisting of the prompts from older tasks. To better transfer the knowledge of old tasks, we design an adaptive knowledge aggregation technique that… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 findings

  32. arXiv:2404.14567  [pdf, other

    cs.CL

    WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models

    Authors: Ronald Xie, Steven Palayew, Augustin Toma, Gary Bader, Bo Wang

    Abstract: This paper outlines our submission to the MEDIQA2024 Multilingual and Multimodal Medical Answer Generation (M3G) shared task. We report results for two standalone solutions under the English category of the task, the first involving two consecutive API calls to the Claude 3 Opus API and the second involving training an image-disease label joint embedding in the style of CLIP for image classificati… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  33. arXiv:2404.14544  [pdf, other

    cs.CL

    WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction

    Authors: Augustin Toma, Ronald Xie, Steven Palayew, Patrick R. Lawler, Bo Wang

    Abstract: Medical errors in clinical text pose significant risks to patient safety. The MEDIQA-CORR 2024 shared task focuses on detecting and correcting these errors across three subtasks: identifying the presence of an error, extracting the erroneous sentence, and generating a corrected sentence. In this paper, we present our approach that achieved top performance in all three subtasks. For the MS dataset,… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  34. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  35. arXiv:2404.13866  [pdf, other

    cs.CV math.PR

    Plug-and-Play Algorithm Convergence Analysis From The Standpoint of Stochastic Differential Equation

    Authors: Zhongqi Wang, Bingnan Wang, Maosheng Xiang

    Abstract: The Plug-and-Play (PnP) algorithm is popular for inverse image problem-solving. However, this algorithm lacks theoretical analysis of its convergence with more advanced plug-in denoisers. We demonstrate that discrete PnP iteration can be described by a continuous stochastic differential equation (SDE). We can also achieve this transformation through Markov process formulation of PnP. Then, we can… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 17pages, Preprint, Under review

  36. SIGformer: Sign-aware Graph Transformer for Recommendation

    Authors: Sirui Chen, Jiawei Chen, Sheng Zhou, Bohao Wang, Shen Han, Chanfei Su, Yuqing Yuan, Can Wang

    Abstract: In recommender systems, most graph-based methods focus on positive user feedback, while overlooking the valuable negative feedback. Integrating both positive and negative feedback to form a signed graph can lead to a more comprehensive understanding of user preferences. However, the existing efforts to incorporate both types of feedback are sparse and face two main limitations: 1) They process pos… ▽ More

    Submitted 6 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR2024

  37. arXiv:2404.11932  [pdf, other

    cs.CL cs.AI

    CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignment

    Authors: Geyu Lin, Bin Wang, Zhengyuan Liu, Nancy F. Chen

    Abstract: Multilingual proficiency presents a significant challenge for large language models (LLMs). English-centric models are usually suboptimal in other languages, particularly those that are linguistically distant from English. This performance discrepancy mainly stems from the imbalanced distribution of training data across languages during pre-training and instruction tuning stages. To address this p… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: 11 pages

  38. arXiv:2404.11051  [pdf

    cs.CV

    WPS-Dataset: A benchmark for wood plate segmentation in bark removal processing

    Authors: Rijun Wang, Guanghao Zhang, Fulong Liang, Bo Wang, Xiangwei Mou, Yesheng Chen, Peng Sun, Canjin Wang

    Abstract: Using deep learning methods is a promising approach to improving bark removal efficiency and enhancing the quality of wood products. However, the lack of publicly available datasets for wood plate segmentation in bark removal processing poses challenges for researchers in this field. To address this issue, a benchmark for wood plate segmentation in bark removal processing named WPS-dataset is prop… ▽ More

    Submitted 25 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Report number: b06d7e0b-306f-476a-a72d-59a8793ac232 | v.1.2

  39. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  40. arXiv:2404.09826  [pdf, other

    cs.CV

    A Recipe for CAC: Mosaic-based Generalized Loss for Improved Class-Agnostic Counting

    Authors: Tsung-Han Chou, Brian Wang, Wei-Chen Chiu, Jun-Cheng Chen

    Abstract: Class agnostic counting (CAC) is a vision task that can be used to count the total occurrence number of any given reference objects in the query image. The task is usually formulated as a density map estimation problem through similarity computation among a few image samples of the reference object and the query image. In this paper, we point out a severe issue of the existing CAC framework: Given… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  41. arXiv:2404.09790  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

    Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

  42. arXiv:2404.09754  [pdf, other

    cs.CL

    Resilience of Large Language Models for Noisy Instructions

    Authors: Bin Wang, Chengwei Wei, Zhengyuan Liu, Geyu Lin, Nancy F. Chen

    Abstract: As the rapidly advancing domain of natural language processing (NLP), large language models (LLMs) have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 12 pages

  43. arXiv:2404.09674  [pdf, ps, other

    cs.DS cs.DB cs.FL

    A Circus of Circuits: Connections Between Decision Diagrams, Circuits, and Automata

    Authors: Antoine Amarilli, Marcelo Arenas, YooJung Choi, Mikaël Monet, Guy Van den Broeck, Benjie Wang

    Abstract: This document is an introduction to two related formalisms to define Boolean functions: binary decision diagrams, and Boolean circuits. It presents these formalisms and several of their variants studied in the setting of knowledge compilation. Last, it explains how these formalisms can be connected to the notions of automata over words and trees.

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 26 pages

  44. arXiv:2404.09613  [pdf, other

    cs.ET cs.AI cs.AR

    Efficient and accurate neural field reconstruction using resistive memory

    Authors: Yifei Yu, Shaocong Wang, Woyu Zhang, Xinyuan Zhang, Xiuzhe Wu, Yangu He, Jichang Yang, Yue Zhang, Ning Lin, Bo Wang, Xi Chen, Songqi Wang, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human beings construct perception of space by integrating sparse observations into massively interconnected synapses and neurons, offering a superior parallelism and efficiency. Replicating this capability in AI finds wide applications in medical imaging, AR/VR, and embodied AI, where input data is often sparse and computing resources are limited. However, traditional signal reconstruction methods… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  45. arXiv:2404.09533  [pdf, other

    cs.CV cs.AI cs.LG

    WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

    Authors: Bin Wang, Fei Deng, Peifan Jiang, Shuang Wang, Xiao Han, Zhixuan Zhang

    Abstract: Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer N… ▽ More

    Submitted 29 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  46. arXiv:2404.09515  [pdf, other

    cs.CV

    Revealing the structure-property relationships of copper alloys with FAGC

    Authors: Yuexing Han, Guanxin Wan, Tao Han, Bing Wang, Yi Liu

    Abstract: Understanding how the structure of materials affects their properties is a cornerstone of materials science and engineering. However, traditional methods have struggled to accurately describe the quantitative structure-property relationships for complex structures. In our study, we bridge this gap by leveraging machine learning to analyze images of materials' microstructures, thus offering a novel… ▽ More

    Submitted 18 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  47. arXiv:2404.08968  [pdf, other

    cs.CV cs.LG

    MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes

    Authors: Bor-Shiun Wang, Chien-Yi Wang, Wei-Chen Chiu

    Abstract: Recent advancements in post-hoc and inherently interpretable methods have markedly enhanced the explanations of black box classifier models. These methods operate either through post-analysis or by integrating concept learning during model training. Although being effective in bridging the semantic gap between a model's latent space and human interpretation, these explanation methods only partiall… ▽ More

    Submitted 23 April, 2024; v1 submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  48. arXiv:2404.06842  [pdf, other

    cs.CV

    MoCha-Stereo: Motif Channel Attention Network for Stereo Matching

    Authors: Ziyang Chen, Wei Long, He Yao, Yongjun Zhang, Bingshu Wang, Yongbin Qin, Jia Wu

    Abstract: Learning-based stereo matching techniques have made significant progress. However, existing methods inevitably lose geometrical structure information during the feature channel generation process, resulting in edge detail mismatches. In this paper, the Motif Cha}nnel Attention Stereo Matching Network (MoCha-Stereo) is designed to address this problem. We provide the Motif Channel Correlation Volum… ▽ More

    Submitted 11 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024

    Journal ref: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024

  49. arXiv:2404.06516  [pdf, other

    cs.GT cs.LG

    Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games

    Authors: Jing Dong, Baoxiang Wang, Yaoliang Yu

    Abstract: In this work, we study potential games and Markov potential games under stochastic cost and bandit feedback. We propose a variant of the Frank-Wolfe algorithm with sufficient exploration and recursive gradient estimation, which provably converges to the Nash equilibrium while attaining sublinear regret for each individual player. Our algorithm simultaneously achieves a Nash regret and a regret bou… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  50. arXiv:2404.06512  [pdf, other

    cs.CV cs.CL

    InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

    Authors: Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang

    Abstract: The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution. Recent efforts have aimed to enhance the high-resolution understanding capabilities of LVLMs, yet they remain capped at approximately 1500 x 1500 pixels and constrained to a relatively narrow reso… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Code and models are publicly available at https://github.com/InternLM/InternLM-XComposer