Skip to main content

Showing 1–50 of 1,088 results for author: Wang, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05722  [pdf, other

    cs.LG

    A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

    Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

    Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities, such as the electronic-structure Hamiltonian. Inspired by covariant theory in physics, we address this problem by exploring the mathemat… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.05590  [pdf, other

    cs.CR cs.AR cs.LG

    TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

    Authors: Fangzhou Wang, Qijing Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  3. arXiv:2405.05083  [pdf, other

    cs.CC cs.GT

    Committee Elections with Candidate Attribute Constraints

    Authors: Aizhong Zhou, Fengbo Wang, Jiong Guo

    Abstract: In many real-world applications of committee elections, the candidates are associated with certain attributes and the chosen committee is required to satisfy some constraints posed on the candidate attributes. For instance, when dress collocation, it is generally acknowledged that when wearing a tie, you'd better wear a shirt, and wearing a suit, you'd better wear leather shoes. Here, dresses are… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.05062  [pdf, ps, other

    cs.CC cs.GT

    Controlling Borda Elections by Adding or Deleting either Votes or Candidates: Complete and Top-Truncated Votes

    Authors: Aizhong Zhou, Fengbo Wang, Jiong Guo

    Abstract: An election is defined as a pair of a set of candidates C=\{c_1,\cdots,c_m\} and a multiset of votes V=\{v_1,\cdots,v_n\}, where each vote is a linear order of the candidates. The Borda election rule is characterized by a vector \langle m-1,m-2,\cdots,0\rangle, which means that the candidate ranked at the i-th position of a vote v receives a score m-i from v, and the candidate receiving the most s… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  5. arXiv:2405.04940  [pdf, other

    cs.CV

    Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID

    Authors: Wentao Tan, Changxing Ding, Jiayu Jiang, Fei Wang, Yibing Zhan, Dapeng Tao

    Abstract: Text-to-image person re-identification (ReID) retrieves pedestrian images according to textual descriptions. Manually annotating textual descriptions is time-consuming, restricting the scale of existing datasets and therefore the generalization ability of ReID models. As a result, we study the transferable text-to-image ReID problem, where we train a model on our proposed large-scale database and… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  6. arXiv:2405.03372  [pdf, other

    cs.NI cs.AI

    Snake Learning: A Communication- and Computation-Efficient Distributed Learning Framework for 6G

    Authors: Xiaoxue Yu, Xingfu Yi, Rongpeng Li, Fei Wang, Chenghui Peng, Zhifeng Zhao, Honggang Zhang

    Abstract: In the evolution towards 6G, integrating Artificial Intelligence (AI) with advanced network infrastructure emerges as a pivotal strategy for enhancing network intelligence and resource utilization. Existing distributed learning frameworks like Federated Learning and Split Learning often struggle with significant challenges in dynamic network environments including high synchronization demands, cos… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 6 figures

  7. arXiv:2404.17929  [pdf, other

    cs.CV cs.AI cs.CL

    Spatio-Temporal Side Tuning Pre-trained Foundation Models for Video-based Pedestrian Attribute Recognition

    Authors: Xiao Wang, Qian Zhu, Jiandong Jin, Jun Zhu, Futian Wang, Bo Jiang, Yaowei Wang, Yonghong Tian

    Abstract: Existing pedestrian attribute recognition (PAR) algorithms are mainly developed based on a static image, however, the performance is unreliable in challenging scenarios, such as heavy occlusion, motion blur, etc. In this work, we propose to understand human attributes using video frames that can fully use temporal information by fine-tuning a pre-trained multi-modal foundation model efficiently. S… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: Parameter Efficient Fine-Tuning Strategy for Video-based Pedestrian Attribute Recognition

  8. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  9. arXiv:2404.16223  [pdf, other

    cs.CV eess.IV

    Deep RAW Image Super-Resolution. A NTIRE 2024 Challenge Survey

    Authors: Marcos V. Conde, Florin-Alexandru Vasluianu, Radu Timofte, Jianxing Zhang, Jia Li, Fan Wang, Xiaopeng Li, Zikun Liu, Hyunhee Park, Sejun Song, Changho Kim, Zhijuan Huang, Hongyuan Yu, Cheng Wan, Wending Xiang, Jiamin Lin, Hang Zhong, Qiaosong Zhang, Yue Sun, Xuanwu Yin, Kunlong Zuo, Senyan Xu, Siyuan Jiang, Zhijing Sun, Jiaying Zhu , et al. (10 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as nois… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 - NTIRE Workshop

  10. arXiv:2404.15294  [pdf

    eess.SP cs.LG

    Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios

    Authors: Junjie Zhang, Zheming Zhang, Huachen Xiang, Yangquan Tan, Linnan Huo, Fengyi Wang

    Abstract: Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational co… ▽ More

    Submitted 25 March, 2024; originally announced April 2024.

    Comments: 5 pages, 6 figures

  11. arXiv:2404.14394  [pdf, other

    cs.AI cs.CL cs.CV

    A Multimodal Automated Interpretability Agent

    Authors: Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang, Achyuta Rajaram, Evan Hernandez, Jacob Andreas, Antonio Torralba

    Abstract: This paper describes MAIA, a Multimodal Automated Interpretability Agent. MAIA is a system that uses neural models to automate neural model understanding tasks like feature interpretation and failure mode discovery. It equips a pre-trained vision-language model with a set of tools that support iterative experimentation on subcomponents of other models to explain their behavior. These include tools… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 25 pages, 13 figures

  12. arXiv:2404.13777  [pdf, other

    cs.HC

    Explainable Interfaces for Rapid Gaze-Based Interactions in Mixed Reality

    Authors: Mengjie Yu, Dustin Harris, Ian Jones, Ting Zhang, Yue Liu, Naveen Sendhilnathan, Narine Kokhlikyan, Fulton Wang, Co Tran, Jordan L. Livingston, Krista E. Taylor, Zhenhong Hu, Mary A. Hood, Hrvoje Benko, Tanya R. Jonker

    Abstract: Gaze-based interactions offer a potential way for users to naturally engage with mixed reality (XR) interfaces. Black-box machine learning models enabled higher accuracy for gaze-based interactions. However, due to the black-box nature of the model, users might not be able to understand and effectively adapt their gaze behaviour to achieve high quality interaction. We posit that explainable AI (XA… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  13. arXiv:2404.13765  [pdf, other

    cs.HC

    SciDaSynth: Interactive Structured Knowledge Extraction and Synthesis from Scientific Literature with Large Language Model

    Authors: Xingbo Wang, Samantha L. Huey, Rui Sheng, Saurabh Mehta, Fei Wang

    Abstract: Extraction and synthesis of structured knowledge from extensive scientific literature are crucial for advancing and disseminating scientific progress. Although many existing systems facilitate literature review and digest, they struggle to process multimodal, varied, and inconsistent information within and across the literature into structured data. We introduce SciDaSynth, a novel interactive sys… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

    Comments: 15 pages, 7 figures

  14. arXiv:2404.13680  [pdf, other

    cs.CV cs.AI

    PoseAnimate: Zero-shot high fidelity pose controllable character animation

    Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi

    Abstract: Image-to-video(I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity with the source image.However, existing approaches suffer from character appearance inconsistency and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding.To address thes… ▽ More

    Submitted 30 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  15. arXiv:2404.12642  [pdf, other

    cs.CL cs.CV

    Cooperative Sentiment Agents for Multimodal Sentiment Analysis

    Authors: Shanmin Wang, Hui Shuai, Qingshan Liu, Fei Wang

    Abstract: In this paper, we propose a new Multimodal Representation Learning (MRL) method for Multimodal Sentiment Analysis (MSA), which facilitates the adaptive interaction between modalities through Cooperative Sentiment Agents, named Co-SA. Co-SA comprises two critical components: the Sentiment Agents Establishment (SAE) phase and the Sentiment Agents Cooperation (SAC) phase. During the SAE phase, each s… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  16. arXiv:2404.11706  [pdf, other

    cs.AI

    Pretraining Billion-scale Geospatial Foundational Models on Frontier

    Authors: Aristeidis Tsaris, Philipe Ambrozio Dias, Abhishek Potnis, Junqi Yin, Feiyi Wang, Dalton Lunga

    Abstract: As AI workloads increase in scope, generalization capability becomes challenging for small task-specific models and their demand for large amounts of labeled training samples increases. On the contrary, Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning and have been shown to adapt to various tasks with minimal fine-tuning. Although large FMs have d… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  17. arXiv:2404.11045  [pdf, other

    cs.CL

    Offset Unlearning for Large Language Models

    Authors: James Y. Huang, Wenxuan Zhou, Fei Wang, Fred Morstatter, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Despite the strong capabilities of Large Language Models (LLMs) to acquire knowledge from their training corpora, the memorization of sensitive information in the corpora such as copyrighted, harmful, and private content has led to ethical and legal concerns. In response to these challenges, unlearning has emerged as a potential remedy for LLMs affected by problematic training data. However, previ… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  18. arXiv:2404.09697  [pdf, other

    cs.CV

    HSIDMamba: Exploring Bidirectional State-Space Models for Hyperspectral Denoising

    Authors: Yang Liu, Jiahua Xiao, Yu Guo, Peilin Jiang, Haiwei Yang, Fei Wang

    Abstract: Effectively discerning spatial-spectral dependencies in HSI denoising is crucial, but prevailing methods using convolution or transformers still face computational efficiency limitations. Recently, the emerging Selective State Space Model(Mamba) has risen with its nearly linear computational complexity in processing natural language sequences, which inspired us to explore its potential in handling… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  19. arXiv:2404.09172  [pdf, other

    cs.CV cs.AI

    LoopAnimate: Loopable Salient Object Animation

    Authors: Fanyi Wang, Peng Liu, Haotian Hu, Dan Meng, Jingwen Su, Jinjin Xu, Yanhao Zhang, Xiaoming Ren, Zhiwang Zhang

    Abstract: Research on diffusion model-based video generation has advanced rapidly. However, limitations in object fidelity and generation length hinder its practical applications. Additionally, specific domains like animated wallpapers require seamless looping, where the first and last frames of the video match seamlessly. To address these challenges, this paper proposes LoopAnimate, a novel method for gene… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

  20. The Survey on Multi-Source Data Fusion in Cyber-Physical-Social Systems:Foundational Infrastructure for Industrial Metaverses and Industries 5.0

    Authors: Xiao Wang, Yutong Wang, Jing Yang, Xiaofeng Jia, Lijun Li, Weiping Ding, Fei-Yue Wang

    Abstract: As the concept of Industries 5.0 develops, industrial metaverses are expected to operate in parallel with the actual industrial processes to offer ``Human-Centric" Safe, Secure, Sustainable, Sensitive, Service, and Smartness ``6S" manufacturing solutions. Industrial metaverses not only visualize the process of productivity in a dynamic and evolutional way, but also provide an immersive laboratory… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Journal ref: Information Fusion 2024

  21. arXiv:2404.06362  [pdf, other

    cs.CV cs.AI

    Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation

    Authors: Sidra Aleem, Fangyijie Wang, Mayug Maniparambil, Eric Arazo, Julia Dietlmeier, Guenole Silvestre, Kathleen Curran, Noel E. O'Connor, Suzanne Little

    Abstract: The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily… ▽ More

    Submitted 30 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  22. arXiv:2404.05384  [pdf, other

    cs.CV cs.AI

    Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance

    Authors: Dazhong Shen, Guanglu Song, Zeyue Xue, Fu-Yun Wang, Yu Liu

    Abstract: Classifier-Free Guidance (CFG) has been widely used in text-to-image diffusion models, where the CFG scale is introduced to control the strength of text guidance on the whole image space. However, we argue that a global CFG scale results in spatial inconsistency on varying semantic strengths and suboptimal image quality. To address this problem, we present a novel approach, Semantic-aware Classifi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR-2024

  23. arXiv:2404.03736  [pdf, other

    cs.CV

    SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer

    Authors: Zijie Wu, Chaohui Yu, Yanqin Jiang, Chenjie Cao, Fan Wang, Xiang Bai

    Abstract: Recent advances in 2D/3D generative models enable the generation of dynamic 3D objects from a single-view video. Existing approaches utilize score distillation sampling to form the dynamic scene as dynamic NeRF or dense 3D Gaussians. However, these methods struggle to strike a balance among reference view alignment, spatio-temporal consistency, and motion fidelity under single-view conditions due… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sc4d.github.io/

  24. arXiv:2404.02823  [pdf, other

    cs.CL cs.AI cs.LG

    Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models

    Authors: Haoran Sun, Lixin Liu, Junjie Li, Fengyu Wang, Baohua Dong, Ran Lin, Ruohui Huang

    Abstract: The ability of large language models (LLMs) to follow instructions is crucial to real-world applications. Despite recent advances, several studies have highlighted that LLMs struggle when faced with challenging instructions, especially those that include complex constraints, hindering their effectiveness in various tasks. To address this challenge, we introduce Conifer, a novel instruction tuning… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  25. arXiv:2404.02360  [pdf, other

    cs.LG q-bio.BM

    FraGNNet: A Deep Probabilistic Model for Mass Spectrum Prediction

    Authors: Adamo Young, Fei Wang, David Wishart, Bo Wang, Hannes Röst, Russ Greiner

    Abstract: The process of identifying a compound from its mass spectrum is a critical step in the analysis of complex mixtures. Typical solutions for the mass spectrum to compound (MS2C) problem involve matching the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to mass spectrum (C2MS) models can improve retrieval rate… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: 21 pages, 4 figures, 9 tables

  26. arXiv:2404.01720  [pdf, other

    cs.CL

    Self-Improvement Programming for Temporal Knowledge Graph Question Answering

    Authors: Zhuo Chen, Zhao Zhang, Zixuan Li, Fei Wang, Yutao Zeng, Xiaolong Jin, Yongjun Xu

    Abstract: Temporal Knowledge Graph Question Answering (TKGQA) aims to answer questions with temporal intent over Temporal Knowledge Graphs (TKGs). The core challenge of this task lies in understanding the complex semantic information regarding multiple types of time constraints (e.g., before, first) in questions. Existing end-to-end methods implicitly model the time constraints by learning time-aware embedd… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 (long paper)

  27. arXiv:2404.01154  [pdf, other

    cs.CV cs.AI

    Uncovering the Text Embedding in Text-to-Image Diffusion Models

    Authors: Hu Yu, Hao Luo, Fan Wang, Feng Zhao

    Abstract: The correspondence between input text and the generated image exhibits opacity, wherein minor textual modifications can induce substantial deviations in the generated image. While, text embedding, as the pivotal intermediary between text and images, remains relatively underexplored. In this paper, we address this research gap by delving into the text embedding space, unleashing its capacity for co… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  28. arXiv:2404.00133  [pdf, other

    cs.RO

    An Optimization-Based Planner with B-spline Parameterized Continuous-Time Reference Signals

    Authors: Chuyuan Tao, Sheng Cheng, Yang Zhao, Fanxin Wang, Naira Hovakimyan

    Abstract: For the cascaded planning and control modules implemented for robot navigation, the frequency gap between the planner and controller has received limited attention. In this study, we introduce a novel B-spline parameterized optimization-based planner (BSPOP) designed to address the frequency gap challenge with limited onboard computational power in robots. The proposed planner generates continuous… ▽ More

    Submitted 29 March, 2024; originally announced April 2024.

  29. arXiv:2403.19517  [pdf, other

    cs.CV

    XScale-NVS: Cross-Scale Novel View Synthesis with Hash Featurized Manifold

    Authors: Guangyu Wang, Jinzhi Zhang, Fan Wang, Ruqi Huang, Lu Fang

    Abstract: We propose XScale-NVS for high-fidelity cross-scale novel view synthesis of real-world large-scale scenes. Existing representations based on explicit surface suffer from discretization resolution or UV distortion, while implicit volumetric representations lack scalability for large scenes due to the dispersed weight distribution and surface ambiguity. In light of the above challenges, we introduce… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024. Project page: xscalenvs.github.io/

  30. arXiv:2403.19193  [pdf, other

    cs.CV

    Text Data-Centric Image Captioning with Interactive Prompts

    Authors: Yiyu Wang, Hao Luo, Jungang Xu, Yingfei Sun, Fan Wang

    Abstract: Supervised image captioning approaches have made great progress, but it is challenging to collect high-quality human-annotated image-text data. Recently, large-scale vision and language models (e.g., CLIP) and large-scale generative language models (e.g., GPT-2) have shown strong performances in various tasks, which also provide some new solutions for image captioning with web paired data, unpaire… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  31. arXiv:2403.18134  [pdf, other

    eess.IV cs.CV

    Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification

    Authors: Zhan Shi, Jingwei Zhang, Jun Kong, Fusheng Wang

    Abstract: In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  32. arXiv:2403.16539  [pdf, other

    cs.CV

    DOrA: 3D Visual Grounding with Order-Aware Referring

    Authors: Tung-Yu Wu, Sheng-Yu Huang, Yu-Chiang Frank Wang

    Abstract: 3D visual grounding aims to identify the target object within a 3D point cloud scene referred to by a natural language description. While previous works attempt to exploit the verbo-visual relation with proposed cross-modal transformers, unstructured natural utterances and scattered objects might lead to undesirable performances. In this paper, we introduce DOrA, a novel 3D visual grounding framew… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  33. arXiv:2403.16149  [pdf, other

    cs.CR cs.AI cs.LG

    A Survey on Consumer IoT Traffic: Security and Privacy

    Authors: Yan Jia, Yuxin Song, Zihou Liu, Qingyin Tan, Fangming Wang, Yu Zhang, Zheli Liu

    Abstract: For the past few years, the Consumer Internet of Things (CIoT) has entered public lives. While CIoT has improved the convenience of people's daily lives, it has also brought new security and privacy concerns. In this survey, we try to figure out what researchers can learn about the security and privacy of CIoT by traffic analysis, a popular method in the security community. From the security and p… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  34. arXiv:2403.16038  [pdf, other

    cs.CL

    Monotonic Paraphrasing Improves Generalization of Language Model Prompting

    Authors: Qin Liu, Fei Wang, Nan Xu, Tianyi Yan, Tao Meng, Muhao Chen

    Abstract: Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phr… ▽ More

    Submitted 18 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Under review at ARR 2024 April

  35. arXiv:2403.14676  [pdf, other

    cs.CY cs.AI cs.LG

    Unified Uncertainty Estimation for Cognitive Diagnosis Models

    Authors: Fei Wang, Qi Liu, Enhong Chen, Chuanren Liu, Zhenya Huang, Jinze Wu, Shijin Wang

    Abstract: Cognitive diagnosis models have been widely used in different areas, especially intelligent education, to measure users' proficiency levels on knowledge concepts, based on which users can get personalized instructions. As the measurement is not always reliable due to the weak links of the models and data, the uncertainty of measurement also offers important information for decisions. However, the… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  36. arXiv:2403.14084  [pdf, other

    math.NA cs.LG

    Learning-based Multi-continuum Model for Multiscale Flow Problems

    Authors: Fan Wang, Yating Wang, Wing Tat Leung, Zongben Xu

    Abstract: Multiscale problems can usually be approximated through numerical homogenization by an equation with some effective parameters that can capture the macroscopic behavior of the original system on the coarse grid to speed up the simulation. However, this approach usually assumes scale separation and that the heterogeneity of the solution can be approximated by the solution average in each coarse blo… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  37. arXiv:2403.13745  [pdf, other

    cs.CV

    Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation

    Authors: Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang, Xiaoyu Shi, Dazhong Shen, Guanglu Song, Yu Liu, Hongsheng Li

    Abstract: Video outpainting is a challenging task, aiming at generating video content outside the viewport of the input video while maintaining inter-frame and intra-frame consistency. Existing methods fall short in either generation quality or flexibility. We introduce MOTIA Mastering Video Outpainting Through Input-Specific Adaptation, a diffusion-based pipeline that leverages both the intrinsic data-spec… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Code will be available at https://github.com/G-U-N/Be-Your-Outpainter

  38. arXiv:2403.12959  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    WHAC: World-grounded Humans and Cameras

    Authors: Wanqi Yin, Zhongang Cai, Ruisi Wang, Fanzhou Wang, Chen Wei, Haiyi Mei, Weiye Xiao, Zhitao Yang, Qingping Sun, Atsushi Yamashita, Ziwei Liu, Lei Yang

    Abstract: Estimating human and camera trajectories with accurate scale in the world coordinate system from a monocular video is a highly desirable yet challenging and ill-posed problem. In this study, we aim to recover expressive parametric human models (i.e., SMPL-X) and corresponding camera poses jointly, by leveraging the synergy between three critical players: the world, the human, and the camera. Our a… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Homepage: https://wqyin.github.io/projects/WHAC/

  39. arXiv:2403.11808  [pdf, other

    cs.CV

    Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation

    Authors: Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You

    Abstract: Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency. However, the exploration of enhancing inference efficiency during adaptation remains underexplored. This limits the broader application of pre-trained ViT models, especially when the model is computationally extensive. In this paper,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  40. arXiv:2403.11163  [pdf, ps, other

    stat.ME cs.LG math.ST stat.CO

    A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

    Authors: Xuetong Li, Yuan Gao, Hong Chang, Danyang Huang, Yingying Ma, Rui Pan, Haobo Qi, Feifei Wang, Shuyuan Wu, Ke Xu, Jing Zhou, Xuening Zhu, Yingqiu Zhu, Hansheng Wang

    Abstract: This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus on three categories of statistical computation methods: (1) distributed computing, (2) subsampling methods, and (3) minibatch gradient techniques. The first clas… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

  41. eKichabi v2: Designing and Scaling a Dual-Platform Agricultural Technology in Rural Tanzania

    Authors: Ananditha Raghunath, Alexander Metzger, Hans Easton, XunMei Liu, Fanchong Wang, Yunqi Wang, Yunwei Zhao, Hosea Mpogole, Richard Anderson

    Abstract: Although farmers in Sub-Saharan Africa are accessing feature phones and smartphones at historically high rates, they face challenges finding a robust network of agricultural contacts. With collaborators, we conduct a quantitative survey of 1014 agricultural households in Kagera, Tanzania to characterize technology access, use, and comfort levels in the region. Recognizing the paucity of research o… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  42. arXiv:2403.09338  [pdf, other

    cs.CV cs.AI

    LocalMamba: Visual State Space Model with Windowed Selective Scan

    Authors: Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu

    Abstract: Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  43. arXiv:2403.09296  [pdf, other

    cs.CV

    Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models

    Authors: Yu-Chu Yu, Chi-Pin Huang, Jr-Jen Chen, Kai-Po Chang, Yung-Hsuan Lai, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Large-scale vision-language models (VLMs) have shown a strong zero-shot generalization capability on unseen-domain data. However, when adapting pre-trained VLMs to a sequence of downstream tasks, they are prone to forgetting previously learned knowledge and degrade their zero-shot classification capability. To tackle this problem, we propose a unique Selective Dual-Teacher Knowledge Transfer frame… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  44. arXiv:2403.09121  [pdf, other

    cs.HC

    OutlineSpark: Igniting AI-powered Presentation Slides Creation from Computational Notebooks through Outlines

    Authors: Fengjie Wang, Yanna Lin, Leni Yang, Haotian Li, Mingyang Gu, Min Zhu, Huamin Qu

    Abstract: Computational notebooks are widely utilized for exploration and analysis. However, creating slides to communicate analysis results from these notebooks is quite tedious and time-consuming. Researchers have proposed automatic systems for generating slides from notebooks, which, however, often do not consider the process of users conceiving and organizing their messages from massive code cells. Thos… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: To appear in Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2024)

  45. arXiv:2403.08826  [pdf, other

    cs.HC cs.LG

    A Dataset for the Validation of Truth Inference Algorithms Suitable for Online Deployment

    Authors: Fei Wang, Haoyu Liu, Haoyang Bi, Xiangzhuang Shen, Renyu Zhu, Runze Wu, Minmin Lin, Tangjie Lv, Changjie Fan, Qi Liu, Zhenya Huang, Enhong Chen

    Abstract: For the purpose of efficient and cost-effective large-scale data labeling, crowdsourcing is increasingly being utilized. To guarantee the quality of data labeling, multiple annotations need to be collected for each data sample, and truth inference algorithms have been developed to accurately infer the true labels. Despite previous studies having released public datasets to evaluate the efficacy of… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  46. arXiv:2403.08271  [pdf, other

    cs.CV cs.AI

    Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification

    Authors: Long Lan, Fengxiang Wang, Shuyan Li, Xiangtao Zheng, Zengmao Wang, Xinwang Liu

    Abstract: Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learn… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  47. arXiv:2403.08002  [pdf, other

    cs.CL cs.CV

    Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation

    Authors: Juan Manuel Zambrano Chaves, Shih-Cheng Huang, Yanbo Xu, Hanwen Xu, Naoto Usuyama, Sheng Zhang, Fei Wang, Yujia Xie, Mahmoud Khademi, Ziyi Yang, Hany Awadalla, Julia Gong, Houdong Hu, Jianwei Yang, Chunyuan Li, Jianfeng Gao, Yu Gu, Cliff Wong, Mu Wei, Tristan Naumann, Muhao Chen, Matthew P. Lungren, Serena Yeung-Levy, Curtis P. Langlotz, Sheng Wang , et al. (1 additional authors not shown)

    Abstract: The scaling laws and extraordinary performance of large foundation models motivate the development and utilization of such models in biomedicine. However, despite early promising results on some biomedical benchmarks, there are still major challenges that need to be addressed before these models can be used in real-world clinics. Frontier general-domain models such as GPT-4V still have significant… ▽ More

    Submitted 3 May, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  48. arXiv:2403.07560  [pdf, other

    cs.CV

    Unleashing Network Potentials for Semantic Scene Completion

    Authors: Fengyun Wang, Qianru Sun, Dong Zhang, Jinhui Tang

    Abstract: Semantic scene completion (SSC) aims to predict complete 3D voxel occupancy and semantics from a single-view RGB-D image, and recent SSC methods commonly adopt multi-modal inputs. However, our investigation reveals two limitations: ineffective feature learning from single modalities and overfitting to limited datasets. To address these issues, this paper proposes a novel SSC framework - Adversaria… ▽ More

    Submitted 14 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted by CVPR2024

  49. arXiv:2403.07347  [pdf, other

    cs.CV

    Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture

    Authors: Fei Wang, Dan Guo, Kun Li, Zhun Zhong, Meng Wang

    Abstract: Video Motion Magnification (VMM) aims to reveal subtle and imperceptible motion information of objects in the macroscopic world. Prior methods directly model the motion field from the Eulerian perspective by Representation Learning that separates shape and texture or Multi-domain Learning from phase fluctuations. Inspired by the frequency spectrum, we observe that the low-frequency components with… ▽ More

    Submitted 24 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  50. arXiv:2403.07185  [pdf, other

    cs.LG stat.ML

    Uncertainty in Graph Neural Networks: A Survey

    Authors: Fangxin Wang, Yuqing Liu, Kay Liu, Yibo Wang, Sourav Medya, Philip S. Yu

    Abstract: Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources such as inherent randomness in data and model training errors can lead to unstable and erroneous predictions. Therefore, identifying, quantifying, and utilizing uncertainty are essential to enhance the performance of the model for the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 13 main pages, 3 figures, 1 table. Under review