Skip to main content

Showing 1–50 of 839 results for author: Luo, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05616  [pdf, other

    cs.CL cs.AI

    G-SAP: Graph-based Structure-Aware Prompt Learning over Heterogeneous Knowledge for Commonsense Reasoning

    Authors: Ruiting Dai, Yuqiao Tan, Lisi Mo, Shuang Liang, Guohao Huo, Jiayi Luo, Yao Cheng

    Abstract: Commonsense question answering has demonstrated considerable potential across various applications like assistants and social robots. Although fully fine-tuned pre-trained Language Models(LM) have achieved remarkable performance in commonsense reasoning, their tendency to excessively prioritize textual information hampers the precise transfer of structural knowledge and undermines interpretability… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.00361  [pdf, other

    cs.CL

    AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

    Authors: Zefang Liu, Jiahua Luo

    Abstract: We introduce AdaMoLE, a novel method for fine-tuning large language models (LLMs) through an Adaptive Mixture of Low-Rank Adaptation (LoRA) Experts. Moving beyond conventional methods that employ a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adaptively responding to the varying complexities of different tas… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  3. arXiv:2404.18528  [pdf, other

    cs.LG

    Generation of Uncorrelated Residual Variables for Chemical Process Fault Diagnosis via Transfer Learning-based Input-Output Decoupled Network

    Authors: Zhuofu Pan, Qingkai Sui, Yalin Wang, Jiang Luo, Jie Chen, Hongtian Chen

    Abstract: Structural decoupling has played an essential role in model-based fault isolation and estimation in past decades, which facilitates accurate fault localization and reconstruction thanks to the diagonal transfer matrix design. However, traditional methods exhibit limited effectiveness in modeling high-dimensional nonlinearity and big data, and the decoupling idea has not been well-valued in data-dr… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  4. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  5. arXiv:2404.15895  [pdf, other

    cs.CY cs.CR

    Global Trends in Cryptocurrency Regulation: An Overview

    Authors: Xihan Xiong, Junliang Luo

    Abstract: Cryptocurrencies have evolved into an important asset class, providing a variety of benefits. However, they also present significant risks, such as market volatility and the potential for misuse in illegal activities. These risks underline the urgent need for a comprehensive regulatory framework to ensure consumer protection, market integrity, and financial stability. Yet, the global landscape of… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2404.15532  [pdf, other

    cs.HC cs.AI cs.CL cs.CV cs.MA

    BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis

    Authors: Shuhang Lin, Wenyue Hua, Lingyao Li, Che-Jui Chang, Lizhou Fan, Jianchao Ji, Hang Hua, Mingyu Jin, Jiebo Luo, Yongfeng Zhang

    Abstract: This paper presents BattleAgent, an emulation system that combines the Large Vision-Language Model and Multi-agent System. This novel system aims to simulate complex dynamic interactions among multiple agents, as well as between agents and their environments, over a period of time. It emulates both the decision-making processes of leaders and the viewpoints of ordinary participants, such as soldie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 26 pages, 14 figures The data and code for this project are accessible at https://github.com/agiresearch/battleagent

  7. arXiv:2404.15272  [pdf, other

    cs.CV cs.AI cs.CL

    CT-GLIP: 3D Grounded Language-Image Pretraining with CT Scans and Radiology Reports for Full-Body Scenarios

    Authors: Jingyang Lin, Yingda Xia, Jianpeng Zhang, Ke Yan, Le Lu, Jiebo Luo, Ling Zhang

    Abstract: Medical Vision-Language Pretraining (Med-VLP) establishes a connection between visual content from medical images and the relevant textual descriptions. Existing Med-VLP methods primarily focus on 2D images depicting a single body part, notably chest X-rays. In this paper, we extend the scope of Med-VLP to encompass 3D images, specifically targeting full-body scenarios, by using a multimodal datas… ▽ More

    Submitted 28 April, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

    Comments: 12 pages, 5 figures, 3 tables

  8. arXiv:2404.14715  [pdf, other

    cs.CV cs.CL

    FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

    Authors: Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

    Abstract: Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoning for VLMs, current models often struggle to effectively and precisely capture the compositional information on both the image and text sides. To add… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  9. arXiv:2404.14581  [pdf, other

    cs.CV cs.AI cs.CR

    The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

    Authors: Yuying Li, Zeyan Liu, Junyi Zhao, Liangqin Ren, Fengjun Li, Jiebo Luo, Bo Luo

    Abstract: Generative AI models can produce high-quality images based on text prompts. The generated images often appear indistinguishable from images generated by conventional optical photography devices or created by human artists (i.e., real images). While the outstanding performance of such generative models is generally well received, security concerns arise. For instance, such image generators could be… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  10. arXiv:2404.14047  [pdf, other

    cs.LG

    How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study

    Authors: Wei Huang, Xudong Ma, Haotong Qin, Xingyu Zheng, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno

    Abstract: Meta's LLaMA family has become one of the most powerful open-source Large Language Model (LLM) series. Notably, LLaMA3 models have recently been released and achieve impressive performance across various with super-large scale pre-training on over 15T tokens of data. Given the wide application of low-bit quantization for LLMs in resource-limited scenarios, we explore LLaMA3's capabilities when qua… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  11. arXiv:2404.13820  [pdf, other

    cs.CC cs.NE

    Prove Symbolic Regression is NP-hard by Symbol Graph

    Authors: Jinglu Song, Qiang Lu, Bozhou Tian, Jingwen Zhang, Jake Luo, Zhiguang Wang

    Abstract: Symbolic regression (SR) is the task of discovering a symbolic expression that fits a given data set from the space of mathematical expressions. Despite the abundance of research surrounding the SR problem, there's a scarcity of works that confirm its NP-hard nature. Therefore, this paper introduces the concept of a symbol graph as a comprehensive representation of the entire mathematical expressi… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  12. arXiv:2404.13441  [pdf

    physics.app-ph cs.LG

    Machine Learning-Assisted Thermoelectric Cooling for On-Demand Multi-Hotspot Thermal Management

    Authors: Jiajian Luo, Jaeho Lee

    Abstract: The rapid emergence of System-on-Chip (SoC) technology introduces multiple dynamic hotspots with spatial and temporal evolution to the system, necessitating a more efficient, sophisticated, and intelligent approach to achieve on-demand thermal management. In this study, we present a novel machine learning-assisted optimization algorithm for thermoelectric coolers (TECs) that can achieve global opt… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: This article has been submitted to Journal of Applied Physics under review

  13. arXiv:2404.12353  [pdf, other

    cs.CV cs.AI

    V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning

    Authors: Hang Hua, Yunlong Tang, Chenliang Xu, Jiebo Luo

    Abstract: Video summarization aims to create short, accurate, and cohesive summaries of longer videos. Despite the existence of various video summarization datasets, a notable limitation is their limited amount of source videos, which hampers the effective fine-tuning of advanced large vision-language models (VLMs). Additionally, most existing datasets are created for video-to-video summarization, overlooki… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  14. arXiv:2404.10234  [pdf, other

    cs.AI cs.CV cs.IR

    Compressible and Searchable: AI-native Multi-Modal Retrieval System with Learned Image Compression

    Authors: Jixiang Luo

    Abstract: The burgeoning volume of digital content across diverse modalities necessitates efficient storage and retrieval methods. Conventional approaches struggle to cope with the escalating complexity and scale of multimedia data. In this paper, we proposed framework addresses this challenge by fusing AI-native multi-modal search capabilities with neural image compression. First we analyze the intricate r… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  15. arXiv:2404.09690  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Harnessing GPT-4V(ision) for Insurance: A Preliminary Exploration

    Authors: Chenwei Lin, Hanjia Lyu, Jiebo Luo, Xian Xu

    Abstract: The emergence of Large Multimodal Models (LMMs) marks a significant milestone in the development of artificial intelligence. Insurance, as a vast and complex discipline, involves a wide variety of data forms in its operational processes, including text, images, and videos, thereby giving rise to diverse multimodal tasks. Despite this, there has been limited systematic exploration of multimodal tas… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  16. arXiv:2404.08001  [pdf, other

    hep-ph cs.AI cs.CL cs.LG hep-ex physics.comp-ph

    Xiwu: A Basis Flexible and Learnable LLM for High Energy Physics

    Authors: Zhengde Zhang, Yiyu Zhang, Haodong Yao, Jianwen Luo, Rui Zhao, Bo Huang, Jiameng Zhao, Yipu Liao, Ke Li, Lina Zhao, Jun Cao, Fazhi Qi, Changzheng Yuan

    Abstract: Large Language Models (LLMs) are undergoing a period of rapid updates and changes, with state-of-the-art (SOTA) model frequently being replaced. When applying LLMs to a specific scientific field, it's challenging to acquire unique domain knowledge while keeping the model itself advanced. To address this challenge, a sophisticated large language model system named as Xiwu has been developed, allowi… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

    ACM Class: I.2.7

  17. arXiv:2404.05781  [pdf, other

    q-bio.NC cs.LG

    Group-specific discriminant analysis reveals statistically validated sex differences in lateralization of brain functional network

    Authors: Shuo Zhou, Junhao Luo, Yaya Jiang, Haolin Wang, Haiping Lu, Gaolang Gong

    Abstract: Lateralization is a fundamental feature of the human brain, where sex differences have been observed. Conventional studies in neuroscience on sex-specific lateralization are typically conducted on univariate statistical comparisons between male and female groups. However, these analyses often lack effective validation of group specificity. Here, we formulate modeling sex differences in lateralizat… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  18. arXiv:2404.05014  [pdf, other

    cs.CV

    MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators

    Authors: Shenghai Yuan, Jinfa Huang, Yujun Shi, Yongqi Xu, Ruijie Zhu, Bin Lin, Xinhua Cheng, Li Yuan, Jiebo Luo

    Abstract: Recent advances in Text-to-Video generation (T2V) have achieved remarkable success in synthesizing high-quality general videos from textual descriptions. A largely overlooked problem in T2V is that existing models have not adequately encoded physical knowledge of the real world, thus generated videos tend to have limited motion and poor variations. In this paper, we propose \textbf{MagicTime}, a m… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  19. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  20. arXiv:2404.00034  [pdf, other

    q-fin.ST cs.LG q-fin.GN

    Investigating Similarities Across Decentralized Financial (DeFi) Services

    Authors: Junliang Luo, Stefan Kitzler, Pietro Saggese

    Abstract: We explore the adoption of graph representation learning (GRL) algorithms to investigate similarities across services offered by Decentralized Finance (DeFi) protocols. Following existing literature, we use Ethereum transaction data to identify the DeFi building blocks. These are sets of protocol-specific smart contracts that are utilized in combination within single transactions and encapsulate t… ▽ More

    Submitted 23 March, 2024; originally announced April 2024.

    Report number: Chainsci/2024/23

  21. arXiv:2403.19940  [pdf, other

    cs.RO

    MoMa-Pos: Where Should Mobile Manipulators Stand in Cluttered Environment Before Task Execution?

    Authors: Beichen Shao, Yan Ding, Xingchen Wang, Xuefeng Xie, Fuqiang Gu, Jun Luo, Chao Chen

    Abstract: Mobile manipulators always need to determine feasible base positions prior to carrying out navigation-manipulation tasks. Real-world environments are often cluttered with various furniture, obstacles, and dozens of other objects. Efficiently computing base positions poses a challenge. In this work, we introduce a framework named MoMa-Pos to address this issue. MoMa-Pos first learns to predict a sm… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Submitted to IROS 2024

  22. arXiv:2403.19545  [pdf, other

    cs.RO cs.AI

    Lamarckian Inheritance Improves Robot Evolution in Dynamic Environments

    Authors: Jie Luo, Karine Miras, Carlo Longhi, Oliver Weissl, Agoston E. Eiben

    Abstract: This study explores the integration of Lamarckian system into evolutionary robotics (ER), comparing it with the traditional Darwinian model across various environments. By adopting Lamarckian principles, where robots inherit learned traits, alongside Darwinian learning without inheritance, we investigate adaptation in dynamic settings. Our research, conducted in six distinct environmental setups,… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Nature. arXiv admin note: substantial text overlap with arXiv:2309.13099; text overlap with arXiv:2303.12594, arXiv:2309.14387

  23. arXiv:2403.19066  [pdf, other

    cs.CV cs.AI

    Generative Quanta Color Imaging

    Authors: Vishal Purohit, Junjie Luo, Yiheng Chi, Qi Guo, Stanley H. Chan, Qiang Qiu

    Abstract: The astonishing development of single-photon cameras has created an unprecedented opportunity for scientific and industrial imaging. However, the high data throughput generated by these 1-bit sensors creates a significant bottleneck for low-power applications. In this paper, we explore the possibility of generating a color image from a single binary frame of a single-photon camera. We evidently fi… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: Accepted at IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  24. arXiv:2403.18784  [pdf, other

    cs.CV

    SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable Surface

    Authors: Jiahao Luo, Jing Liu, James Davis

    Abstract: We present SplatFace, a novel Gaussian splatting framework designed for 3D human face reconstruction without reliance on accurate pre-determined geometry. Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions. We incorporate a generic 3D Morphable Model (3DMM) to provide a surface geometric structure, making it possible to reco… ▽ More

    Submitted 29 March, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  25. Masked Multi-Domain Network: Multi-Type and Multi-Scenario Conversion Rate Prediction with a Single Model

    Authors: Wentao Ouyang, Xiuwu Zhang, Chaofeng Guo, Shukui Ren, Yupei Sui, Kun Zhang, Jinmei Luo, Yunfeng Chen, Dongbo Xu, Xiangzheng Liu, Yanlong Du

    Abstract: In real-world advertising systems, conversions have different types in nature and ads can be shown in different display scenarios, both of which highly impact the actual conversion rate (CVR). This results in the multi-type and multi-scenario CVR prediction problem. A desired model for this problem should satisfy the following requirements: 1) Accuracy: the model should achieve fine-grained accura… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CIKM 2023 (larger figures)

  26. arXiv:2403.17000  [pdf, other

    cs.CV cs.MM

    Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution

    Authors: Zhikai Chen, Fuchen Long, Zhaofan Qiu, Ting Yao, Wengang Zhou, Jiebo Luo, Tao Mei

    Abstract: Diffusion models are just at a tipping point for image super-resolution task. Nevertheless, it is not trivial to capitalize on diffusion models for video super-resolution which necessitates not only the preservation of visual appearance from low-resolution to high-resolution videos, but also the temporal consistency across video frames. In this paper, we propose a novel approach, pursuing Spatial… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  27. arXiv:2403.16855  [pdf, other

    eess.SY cs.IT cs.LG cs.NI

    Semantic-Aware Remote Estimation of Multiple Markov Sources Under Constraints

    Authors: Jiping Luo, Nikolaos Pappas

    Abstract: This paper studies semantic-aware communication for remote estimation of multiple Markov sources over a lossy and rate-constrained channel. Unlike most existing studies that treat all source states equally, we exploit the semantics of information and consider that the remote actuator has different tolerances for the estimation errors of different states. We aim to find an optimal scheduling policy… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  28. arXiv:2403.16494  [pdf, other

    cs.CV

    CT-Bound: Fast Boundary Estimation From Noisy Images Via Hybrid Convolution and Transformer Neural Networks

    Authors: Wei Xu, Junjie Luo, Qi Guo

    Abstract: We present CT-Bound, a fast boundary estimation method for noisy images using a hybrid Convolution and Transformer neural network. The proposed architecture decomposes boundary estimation into two tasks: local detection and global regularization of image boundaries. It first estimates a parametric representation of boundary structures only using the input image within a small receptive field and t… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 8 pages, 6 figures

  29. arXiv:2403.13667  [pdf, other

    cs.CV cs.MM

    DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

    Authors: Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

    Abstract: Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accept to CVPR 2024

  30. arXiv:2403.13301  [pdf

    cs.HC cs.CL

    Reading Users' Minds from What They Say: An Investigation into LLM-based Empathic Mental Inference

    Authors: Qihao Zhu, Leah Chong, Maria Yang, Jianxi Luo

    Abstract: In human-centered design, developing a comprehensive and in-depth understanding of user experiences, i.e., empathic understanding, is paramount for designing products that truly meet human needs. Nevertheless, accurately comprehending the real underlying mental states of a large human population remains a significant challenge today. This difficulty mainly arises from the trade-off between depth a… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Submitted to IDETC-CIE2024

  31. arXiv:2403.13030  [pdf, other

    eess.IV cs.CV

    Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization

    Authors: Jixiang Luo, Yan Wang, Hongwei Qin

    Abstract: Learned Image Compression (LIC) has achieved dramatic progress regarding objective and subjective metrics. MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics. However, they all suffer from blurring or deformation at low bit rates, especially at below $0.2bpp$. Besides, deformation on human faces and text… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  32. arXiv:2403.13002  [pdf, other

    cs.HC cs.AI cs.CL

    AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models

    Authors: Shuo Jiang, Jianxi Luo

    Abstract: Researchers and innovators have made enormous efforts in developing ideation methods, such as morphological analysis and design-by-analogy, to aid engineering design ideation for problem solving and innovation. Among these, TRIZ stands out as the most well-known approach, widely applied for systematic innovation. However, the complexity of TRIZ resources and concepts, coupled with its reliance on… ▽ More

    Submitted 2 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 13pages, 6 figures

    ACM Class: I.2.7; I.2.1

  33. arXiv:2403.12910  [pdf, other

    cs.RO cs.AI cs.LG

    Yell At Your Robot: Improving On-the-Fly from Language Corrections

    Authors: Lucy Xiaoyang Shi, Zheyuan Hu, Tony Z. Zhao, Archit Sharma, Karl Pertsch, Jianlan Luo, Sergey Levine, Chelsea Finn

    Abstract: Hierarchical policies that combine language and low-level control have been shown to perform impressively long-horizon robotic tasks, by leveraging either zero-shot high-level planners like pretrained language and vision-language models (LLMs/VLMs) or models trained on annotated robotic demonstrations. However, for complex and dexterous skills, attaining high success rates on long-horizon tasks st… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project website: https://yay-robot.github.io/

  34. arXiv:2403.07798   

    cs.CV

    A Fourier Transform Framework for Domain Adaptation

    Authors: Le Luo, Bingrong Xu, Qingyong Zhang, Cheng Lian, Jie Luo

    Abstract: By using unsupervised domain adaptation (UDA), knowledge can be transferred from a label-rich source domain to a target domain that contains relevant information but lacks labels. Many existing UDA algorithms suffer from directly using raw images as input, resulting in models that overly focus on redundant information and exhibit poor generalization capability. To address this issue, we attempt to… ▽ More

    Submitted 21 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: The paper contains significant errors and the experimental methodology is not rigorous. The experimental section and methodology need to be rewritten

  35. arXiv:2403.05326  [pdf, other

    cs.CL cs.AI

    ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues

    Authors: Yiding Liu, Jingjing Wang, Jiamin Luo, Tao Zeng, Guodong Zhou

    Abstract: Aspect Sentiment Understanding (ASU) in interactive scenarios (e.g., Question-Answering and Dialogue) has attracted ever-more interest in recent years and achieved important progresses. However, existing studies on interactive ASU largely ignore the coreference issue for opinion targets (i.e., aspects), while this phenomenon is ubiquitous in interactive scenarios especially dialogues, limiting the… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  36. arXiv:2403.04789  [pdf, other

    cs.CL cs.AI cs.LG

    TopicDiff: A Topic-enriched Diffusion Approach for Multimodal Conversational Emotion Detection

    Authors: Jiamin Luo, Jingjing Wang, Guodong Zhou

    Abstract: Multimodal Conversational Emotion (MCE) detection, generally spanning across the acoustic, vision and language modalities, has attracted increasing interest in the multimedia community. Previous studies predominantly focus on learning contextual information in conversations with only a few considering the topic information in single language modality, while always neglecting the acoustic and visio… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  37. arXiv:2403.01216  [pdf, other

    cs.CL cs.AI cs.LG

    API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access

    Authors: Jiayuan Su, Jing Luo, Hongwei Wang, Lu Cheng

    Abstract: This study aims to address the pervasive challenge of quantifying uncertainty in large language models (LLMs) without logit-access. Conformal Prediction (CP), known for its model-agnostic and distribution-free features, is a desired approach for various LLMs and data distributions. However, existing CP methods for LLMs typically assume access to the logits, which are unavailable for some API-only… ▽ More

    Submitted 3 April, 2024; v1 submitted 2 March, 2024; originally announced March 2024.

  38. arXiv:2402.19116  [pdf, other

    cs.CL cs.AI

    How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

    Authors: Jiamin Luo, Jianing Zhao, Jingjing Wang, Guodong Zhou

    Abstract: Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching, while merely leveraging the coarse-grained sentence-image pairs for training. However, existing studies on WPG largely ignore the implicit phrase-region matching relations, which are crucial for evaluating the capability of models in understanding the deep multimodal semantics. To thi… ▽ More

    Submitted 4 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  39. Complexity of Manipulation and Bribery in Premise-Based Judgment Aggregation with Simple Formulas

    Authors: Robert Bredereck, Junjie Luo

    Abstract: Judgment aggregation is a framework to aggregate individual opinions on multiple, logically connected issues into a collective outcome. These opinions are cast by judges, which can be for example referees, experts, advisors or jurors, depending on the application and context. It is open to manipulative attacks such as \textsc{Manipulation} where judges cast their judgments strategically. Previous… ▽ More

    Submitted 29 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

    Journal ref: Information and Computation, 2024, 296: 105128

  40. arXiv:2402.15700  [pdf, other

    cs.LG cs.AI cs.CL

    CoRelation: Boosting Automatic ICD Coding Through Contextualized Code Relation Learning

    Authors: Junyu Luo, Xiaochen Wang, Jiaqi Wang, Aofei Chang, Yaqing Wang, Fenglong Ma

    Abstract: Automatic International Classification of Diseases (ICD) coding plays a crucial role in the extraction of relevant information from clinical notes for proper recording and billing. One of the most important directions for boosting the performance of automatic ICD coding is modeling ICD code relations. However, current methods insufficiently model the intricate relationships among ICD codes and oft… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: LREC-Coling 2024

  41. arXiv:2402.14289  [pdf, other

    cs.LG cs.CL

    TinyLLaVA: A Framework of Small-scale Large Multimodal Models

    Authors: Baichuan Zhou, Ying Hu, Xi Weng, Junlong Jia, Jie Luo, Xien Liu, Ji Wu, Lei Huang

    Abstract: We present the TinyLLaVA framework that provides a unified perspective in designing and analyzing the small-scale Large Multimodal Models (LMMs). We empirically study the effects of different vision encoders, connection modules, language models, training data and training recipes. Our extensive experiments showed that better quality of data combined with better training recipes, smaller LMMs can c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Our model weights and codes will be made public at https://github.com/DLCV-BUAA/TinyLLaVABench

  42. arXiv:2402.13866  [pdf, other

    cs.CL cs.AI

    Kuaiji: the First Chinese Accounting Large Language Model

    Authors: Jiayuan Luo, Songhua Yang, Xiaoling Qiu, Panyu Chen, Yufei Nai, Wenxuan Zeng, Wentao Zhang, Xinke Jiang

    Abstract: Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated impressive proficiency in comprehending and generating natural language. However, they encounter difficulties when tasked with adapting to specialized domains such as accounting. To address this challenge, we introduce Kuaiji, a tailored Accounting Large Language Model. Kuaiji is meticulously fine-tuned using the Baichuan framew… ▽ More

    Submitted 24 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

    Comments: version 2.0

  43. arXiv:2402.13022  [pdf, other

    cs.CL cs.MM

    SoMeLVLM: A Large Vision Language Model for Social Media Processing

    Authors: Xinnong Zhang, Haoyu Kuang, Xinyi Mou, Hanjia Lyu, Kun Wu, Siming Chen, Jiebo Luo, Xuanjing Huang, Zhongyu Wei

    Abstract: The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall sho… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  44. arXiv:2402.09871  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music

    Authors: Zihao Wang, Shuyu Li, Tao Zhang, Qi Wang, Pengfei Yu, Jinyang Luo, Yan Liu, Ming Xi, Kejun Zhang

    Abstract: The rapidly evolving multimodal Large Language Models (LLMs) urgently require new benchmarks to uniformly evaluate their performance on understanding and textually describing music. However, due to semantic gaps between Music Information Retrieval (MIR) algorithms and human understanding, discrepancies between professionals and the public, and low precision of annotations, existing music descripti… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted by International Joint Conference on Artificial Intelligence 2024 (IJCAI 2024)

    MSC Class: 68Txx(Primary)14F05; 91Fxx(Secondary) ACM Class: I.2.7; J.5

  45. arXiv:2402.07132  [pdf, other

    cs.SE

    BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level Defect Prediction

    Authors: Shaojian Qiu, Huihao Huang, Jianxiang Luo, Yingjie Kuang, Haoyu Luo

    Abstract: Software defect prediction aims to identify defect-prone code, aiding developers in optimizing testing resource allocation. Most defect prediction approaches primarily focus on coarse-grained, file-level defect prediction, which fails to provide developers with the precision required to locate defective code. Recently, some researchers have proposed fine-grained, line-level defect prediction metho… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE SANER 2024

  46. arXiv:2402.06646  [pdf

    physics.ao-ph cs.LG physics.geo-ph

    Diffusion Model-based Probabilistic Downscaling for 180-year East Asian Climate Reconstruction

    Authors: Fenghua Ling, Zeyu Lu, Jing-Jia Luo, Lei Bai, Swadhin K. Behera, Dachao Jin, Baoxiang Pan, Huidong Jiang, Toshio Yamagata

    Abstract: As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. He… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  47. arXiv:2402.05957  [pdf, other

    cs.LG

    Accelerating PDE Data Generation via Differential Operator Action in Solution Space

    Authors: Huanshuo Dong, Hong Wang, Haoyang Liu, Jian Luo, Jie Wang

    Abstract: Recent advancements in data-driven approaches, such as Neural Operator (NO), have demonstrated their effectiveness in reducing the solving time of Partial Differential Equations (PDEs). However, one major challenge faced by these approaches is the requirement for a large amount of high-precision training data, which needs significant computational costs during the generation process. To address th… ▽ More

    Submitted 6 May, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  48. arXiv:2402.05445  [pdf, other

    cs.LG cs.CL

    Accurate LoRA-Finetuning Quantization of LLMs via Information Retention

    Authors: Haotong Qin, Xudong Ma, Xingyu Zheng, Xiaoyang Li, Yang Zhang, Shouda Liu, Jie Luo, Xianglong Liu, Michele Magno

    Abstract: The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information reten… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  49. arXiv:2402.04504  [pdf, other

    cs.CV

    Text2Street: Controllable Text-to-image Generation for Street Views

    Authors: Jinming Su, Songen Gu, Yiting Duan, Xingyue Chen, Junfeng Luo

    Abstract: Text-to-image generation has made remarkable progress with the emergence of diffusion models. However, it is still a difficult task to generate images for street views based on text, mainly because the road topology of street scenes is complex, the traffic status is diverse and the weather condition is various, which makes conventional text-to-image models difficult to deal with. To address these… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  50. arXiv:2402.03919  [pdf, other

    cs.IT eess.SP

    Sensing Mutual Information with Random Signals in Gaussian Channels: Bridging Sensing and Communication Metrics

    Authors: Lei Xie, Fan Liu, Jiajin Luo, Shenghui Song

    Abstract: Sensing performance is typically evaluated by classical radar metrics, such as Cramer-Rao bound and signal-to-clutter-plus-noise ratio. The recent development of the integrated sensing and communication (ISAC) framework motivated the efforts to unify the performance metric for sensing and communication, where mutual information (MI) was proposed as a sensing performance metric with deterministic s… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.07081