Skip to main content

Showing 1–50 of 211 results for author: Lu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  2. arXiv:2404.14316  [pdf, other

    cs.CL

    Automated Long Answer Grading with RiceChem Dataset

    Authors: Shashank Sonkar, Kangqi Ni, Lesa Tran Lu, Kristi Kincaid, John S. Hutchinson, Richard G. Baraniuk

    Abstract: We introduce a new area of study in the field of educational Natural Language Processing: Automated Long Answer Grading (ALAG). Distinguishing itself from Automated Short Answer Grading (ASAG) and Automated Essay Grading (AEG), ALAG presents unique challenges due to the complexity and multifaceted nature of fact-based long answers. To study ALAG, we introduce RiceChem, a dataset derived from a col… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  3. arXiv:2404.13680  [pdf, other

    cs.CV cs.AI

    PoseAnimate: Zero-shot high fidelity pose controllable character animation

    Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Yu-Gang Jiang, Guo-Jun Qi

    Abstract: Image-to-video(I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity with the source image.However, existing approaches suffer from character appearance inconsistency and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding.To address thes… ▽ More

    Submitted 30 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

  4. arXiv:2404.11044  [pdf, other

    cs.AR

    Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

    Authors: Luming Wang, Xu Zhang, Songyue Wang, Zhuolun Jiang, Tianyue Lu, Mingyu Chen, Siwei Luo, Keji Huang

    Abstract: The growing memory demands of modern applications have driven the adoption of far memory technologies in data centers to provide cost-effective, high-capacity memory solutions. However, far memory presents new performance challenges because its access latencies are significantly longer and more variable than local DRAM. For applications to achieve acceptable performance on far memory, a high degre… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  5. arXiv:2403.17898  [pdf, other

    cs.CV

    Octree-GS: Towards Consistent Real-time Rendering with LOD-Structured 3D Gaussians

    Authors: Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, Bo Dai

    Abstract: The recent 3D Gaussian splatting (3D-GS) has shown remarkable rendering fidelity and efficiency compared to NeRF-based neural scene representations. While demonstrating the potential for real-time rendering, 3D-GS encounters rendering bottlenecks in large scenes with complex details due to an excessive number of Gaussian primitives located within the viewing frustum. This limitation is particularl… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Project page: https://city-super.github.io/octree-gs/

  6. arXiv:2403.16964  [pdf, other

    cs.CV

    GSDF: 3DGS Meets SDF for Improved Rendering and Reconstruction

    Authors: Mulin Yu, Tao Lu, Linning Xu, Lihan Jiang, Yuanbo Xiangli, Bo Dai

    Abstract: Presenting a 3D scene from multiview images remains a core and long-standing challenge in computer vision and computer graphics. Two main requirements lie in rendering and reconstruction. Notably, SOTA rendering quality is usually achieved with neural volumetric rendering techniques, which rely on aggregated point/primitive-wise color and neglect the underlying scene geometry. Learning of neural i… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Project page: https://city-super.github.io/GSDF

  7. arXiv:2403.12995  [pdf, other

    q-bio.BM cs.CE cs.LG

    Multi-Scale Protein Language Model for Unified Molecular Modeling

    Authors: Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou

    Abstract: Protein language models have demonstrated significant potential in the field of protein engineering. However, current protein language models primarily operate at the residue scale, which limits their ability to provide information at the atom level. This limitation prevents us from fully exploiting the capabilities of protein language models for applications involving both proteins and small mole… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  8. arXiv:2403.09626  [pdf, other

    cs.CV

    Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

    Authors: Guo Chen, Yifei Huang, Jilan Xu, Baoqi Pei, Zhe Chen, Zhiqi Li, Jiahao Wang, Kunchang Li, Tong Lu, Limin Wang

    Abstract: Understanding videos is one of the fundamental directions in computer vision research, with extensive efforts dedicated to exploring various architectures such as RNN, 3D CNN, and Transformers. The newly proposed architecture of state space model, e.g., Mamba, shows promising traits to extend its success in long sequence modeling to video modeling. To assess whether Mamba can be a viable alternati… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Technical Report

  9. arXiv:2403.04247  [pdf, other

    cs.CL

    UltraWiki: Ultra-fine-grained Entity Set Expansion with Negative Seed Entities

    Authors: Yangning Li, Qingsong Lv, Tianyu Yu, Yinghui Li, Shulin Huang, Tingwei Lu, Xuming Hu, Wenhao JIang, Hai-Tao Zheng, Hui Wang

    Abstract: Entity Set Expansion (ESE) aims to identify new entities belonging to the same semantic class as a given set of seed entities. Traditional methods primarily relied on positive seed entities to represent a target semantic class, which poses challenge for the representation of ultra-fine-grained semantic classes. Ultra-fine-grained semantic classes are defined based on fine-grained semantic classes… ▽ More

    Submitted 23 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Initial Version

  10. arXiv:2403.03419  [pdf, other

    cs.CL cs.AI

    Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization

    Authors: Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

    Abstract: Large language models (LLMs) have revolutionized the role of AI, yet also pose potential risks of propagating unethical content. Alignment technologies have been introduced to steer LLMs towards human preference, gaining increasing attention. Despite notable breakthroughs in this direction, existing methods heavily rely on high-quality positive-negative training pairs, suffering from noisy labels… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  11. arXiv:2403.02308  [pdf, other

    cs.CV

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

    Authors: Yuchen Duan, Weiyun Wang, Zhe Chen, Xizhou Zhu, Lewei Lu, Tong Lu, Yu Qiao, Hongsheng Li, Jifeng Dai, Wenhai Wang

    Abstract: Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), o… ▽ More

    Submitted 7 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  12. arXiv:2402.15991  [pdf, other

    cs.CL

    $C^3$: Confidence Calibration Model Cascade for Inference-Efficient Cross-Lingual Natural Language Understanding

    Authors: Taixi Lu, Haoyu Wang, Huajie Shao, Jing Gao, Huaxiu Yao

    Abstract: Cross-lingual natural language understanding (NLU) is a critical task in natural language processing (NLP). Recent advancements have seen multilingual pre-trained language models (mPLMs) significantly enhance the performance of these tasks. However, mPLMs necessitate substantial resources and incur high computational costs during inference, posing challenges for deployment in real-world and real-t… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  13. arXiv:2402.08426  [pdf, other

    cs.IR cs.LG

    Frequency-aware Graph Signal Processing for Collaborative Filtering

    Authors: Jiafeng Xia, Dongsheng Li, Hansu Gu, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Graph Signal Processing (GSP) based recommendation algorithms have recently attracted lots of attention due to its high efficiency. However, these methods failed to consider the importance of various interactions that reflect unique user/item characteristics and failed to utilize user and item high-order neighborhood information to model user preference, thus leading to sub-optimal performance. To… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  14. arXiv:2402.02374  [pdf, other

    cs.CV

    PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

    Authors: Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae-Kyun Kim, Tong Lu, Hongdong Li, Ming-Hsuan Yang

    Abstract: Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection per… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

    Comments: 10 pages, 10 figures

  15. InteractOut: Leveraging Interaction Proxies as Input Manipulation Strategies for Reducing Smartphone Overuse

    Authors: Tao Lu, Hongxiao Zheng, Tianying Zhang, Xuhai Xu, Anhong Guo

    Abstract: Smartphone overuse poses risks to people's physical and mental health. However, current intervention techniques mainly focus on explicitly changing screen content (i.e., output) and often fail to persistently reduce smartphone overuse due to being over-restrictive or over-flexible. We present the design and implementation of InteractOut, a suite of implicit input manipulation techniques that lever… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: CHI 2024

  16. arXiv:2401.15261  [pdf, other

    cs.CV

    Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes

    Authors: Diandian Guo, Deng-Ping Fan, Tongyu Lu, Christos Sakaridis, Luc Van Gool

    Abstract: The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects… ▽ More

    Submitted 25 April, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: CVPR 2024 highlight

  17. arXiv:2401.10529  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

    Authors: Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated proficiency in handling a variety of visual-language tasks. However, current MLLM benchmarks are predominantly designed to evaluate reasoning based on static information about a single image, and the ability of modern MLLMs to extrapolate from image sequences, which is essential for understanding our ever-changing world, has been less inve… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: 27 pages, 23 figures

  18. arXiv:2401.10208  [pdf, other

    cs.CV cs.CL

    MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

    Authors: Changyao Tian, Xizhou Zhu, Yuwen Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Yuntao Chen, Lewei Lu, Tong Lu, Jie Zhou, Hongsheng Li, Yu Qiao, Jifeng Dai

    Abstract: Developing generative models for interleaved image-text data has both research and practical value. It requires models to understand the interleaved sequences and subsequently generate images and text. However, existing attempts are limited by the issue that the fixed number of visual tokens cannot efficiently capture image details, which is particularly problematic in the multi-image scenarios. T… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 20 pages, 9 figures, 17 tables

  19. arXiv:2401.08036  [pdf, other

    cs.CV

    3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

    Authors: Haibin Zhou, Jun Chang, Tao Lu, Huabing Zhou

    Abstract: 3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  20. arXiv:2401.06197  [pdf, other

    cs.CV

    Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications

    Authors: Yuwen Xiong, Zhiqi Li, Yuntao Chen, Feng Wang, Xizhou Zhu, Jiapeng Luo, Wenhai Wang, Tong Lu, Hongsheng Li, Yu Qiao, Lewei Lu, Jie Zhou, Jifeng Dai

    Abstract: We introduce Deformable Convolution v4 (DCNv4), a highly efficient and effective operator designed for a broad spectrum of vision applications. DCNv4 addresses the limitations of its predecessor, DCNv3, with two key enhancements: 1. removing softmax normalization in spatial aggregation to enhance its dynamic property and expressive power and 2. optimizing memory access to minimize redundant operat… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Tech report; Code: https://github.com/OpenGVLab/DCNv4

  21. arXiv:2401.01552  [pdf, other

    cs.CV

    CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers

    Authors: Yi Rong, Haoran Zhou, Lixin Yuan, Cheng Mei, Jiahao Wang, Tong Lu

    Abstract: Point cloud completion is an indispensable task for recovering complete point clouds due to incompleteness caused by occlusion, limited sensor resolution, etc. The family of coarse-to-fine generation architectures has recently exhibited great success in point cloud completion and gradually became mainstream. In this work, we unveil one of the key ingredients behind these methods: meticulously devi… ▽ More

    Submitted 14 February, 2024; v1 submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted to AAAI 2024

  22. arXiv:2312.17235  [pdf, other

    cs.CV

    A Simple LLM Framework for Long-Range Video Question-Answering

    Authors: Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

    Abstract: We present LLoVi, a language-based framework for long-range video question-answering (LVQA). Unlike prior long-range video understanding methods, which are often costly and require specialized long-range video modeling design (e.g., memory queues, state-space layers, etc.), our approach uses a frame/clip-level visual captioner (e.g., BLIP2, LaViLa, LLaVA) coupled with a Large Language Model (GPT-3… ▽ More

    Submitted 26 February, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  23. arXiv:2312.15690  [pdf, other

    cs.CV

    Word length-aware text spotting: Enhancing detection and recognition in dense text image

    Authors: Hao Wang, Huabing Zhou, Yanduo Zhang, Tao Lu, Jiayi Ma

    Abstract: Scene text spotting is essential in various computer vision applications, enabling extracting and interpreting textual information from images. However, existing methods often neglect the spatial semantics of word images, leading to suboptimal detection recall rates for long and short words within long-tailed word length distributions that exist prominently in dense scenes. In this paper, we prese… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

  24. arXiv:2312.14238  [pdf, other

    cs.CV

    InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

    Authors: Zhe Chen, Jiannan Wu, Wenhai Wang, Weijie Su, Guo Chen, Sen Xing, Muyan Zhong, Qinglong Zhang, Xizhou Zhu, Lewei Lu, Bin Li, Ping Luo, Tong Lu, Yu Qiao, Jifeng Dai

    Abstract: The exponential growth of large language models (LLMs) has opened up numerous possibilities for multimodal AGI systems. However, the progress in vision and vision-language foundation models, which are also critical elements of multi-modal AGI, has not kept pace with LLMs. In this work, we design a large-scale vision-language foundation model (InternVL), which scales up the vision foundation model… ▽ More

    Submitted 15 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: 25 pages, 5 figures, 28 tables

  25. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1320 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 2 April, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  26. arXiv:2312.03031  [pdf, other

    cs.CV

    Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?

    Authors: Zhiqi Li, Zhiding Yu, Shiyi Lan, Jiahan Li, Jan Kautz, Tong Lu, Jose M. Alvarez

    Abstract: End-to-end autonomous driving recently emerged as a promising research direction to target autonomy from a full-stack perspective. Along this line, many of the latest works follow an open-loop evaluation setting on nuScenes to study the planning behavior. In this paper, we delve deeper into the problem by conducting thorough analyses and demystifying more devils in the details. We initially observ… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  27. arXiv:2312.00109  [pdf, other

    cs.CV

    Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering

    Authors: Tao Lu, Mulin Yu, Linning Xu, Yuanbo Xiangli, Limin Wang, Dahua Lin, Bo Dai

    Abstract: Neural rendering methods have significantly advanced photo-realistic 3D scene rendering in various academic and industrial applications. The recent 3D Gaussian Splatting method has achieved the state-of-the-art rendering quality and speed combining the benefits of both primitive-based representations and volumetric representations. However, it often leads to heavily redundant Gaussians that try to… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

    Comments: Project page: https://city-super.github.io/scaffold-gs/

  28. arXiv:2311.18251  [pdf, other

    cs.HC

    Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground

    Authors: Zhenyu Xu, Hailin Xu, Zhouyang Lu, Yingying Zhao, Rui Zhu, Yujiang Wang, Mingzhi Dong, Yuhu Chang, Qin Lv, Robert P. Dick, Fan Yang, Tun Lu, Ning Gu, Li Shang

    Abstract: Developing chatbots as personal companions has long been a goal of artificial intelligence researchers. Recent advances in Large Language Models (LLMs) have delivered a practical solution for endowing chatbots with anthropomorphic language capabilities. However, it takes more than LLMs to enable chatbots that can act as companions. Humans use their understanding of individual personalities to driv… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 36 pages, 25 figures, Under review at ACM IMWUT

  29. arXiv:2311.17338  [pdf, other

    cs.CV cs.AI

    VideoAssembler: Identity-Consistent Video Generation with Reference Entities using Diffusion Model

    Authors: Haoyu Zhao, Tianyi Lu, Jiaxi Gu, Xing Zhang, Zuxuan Wu, Hang Xu, Yu-Gang Jiang

    Abstract: Identity-consistent video generation seeks to synthesize videos that are guided by both textual prompts and reference images of entities. Current approaches typically utilize cross-attention layers to integrate the appearance of the entity, which predominantly captures semantic attributes, resulting in compromised fidelity of entities. Moreover, these methods necessitate iterative fine-tuning for… ▽ More

    Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

  30. arXiv:2311.17277  [pdf, other

    cs.LG cs.CY

    An Online Optimization-Based Decision Support Tool for Small Farmers in India: Learning in Non-stationary Environments

    Authors: Tuxun Lu, Aviva Prins

    Abstract: Crop management decision support systems are specialized tools for farmers that reduce the riskiness of revenue streams, especially valuable for use under the current climate changes that impact agricultural productivity. Unfortunately, small farmers in India, who could greatly benefit from these tools, do not have access to them. In this paper, we model an individual greenhouse as a Markov Decisi… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  31. arXiv:2311.09825  [pdf, other

    cs.CL

    Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks

    Authors: Yuxuan Lu, Bingsheng Yao, Shao Zhang, Yun Wang, Peng Zhang, Tun Lu, Toby Jia-Jun Li, Dakuo Wang

    Abstract: Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained wit… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  32. arXiv:2311.07911  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction-Following Evaluation for Large Language Models

    Authors: Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou

    Abstract: One core capability of Large Language Models (LLMs) is to follow natural language instructions. However, the evaluation of such abilities is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    MSC Class: 68T50 (Primary) 68T99 (Secondary) ACM Class: I.2.7

  33. arXiv:2311.04635  [pdf, other

    cs.IR

    Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction

    Authors: Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu

    Abstract: Click Through Rate (CTR) prediction plays an essential role in recommender systems and online advertising. It is crucial to effectively model feature interactions to improve the prediction performance of CTR models. However, existing methods face three significant challenges. First, while most methods can automatically capture high-order feature interactions, their performance tends to diminish as… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: This paper is accepted by Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM '23). In the Arxiv version, we add additional designs with associated experiments

  34. arXiv:2311.04625  [pdf, other

    cs.IR

    A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR Prediction

    Authors: Fangye Wang, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Li Shang, Ning Gu

    Abstract: Click-through rate (CTR) prediction is widely used in academia and industry. Most CTR tasks fall into a feature embedding \& feature interaction paradigm, where the accuracy of CTR prediction is mainly improved by designing practical feature interaction structures. However, recent studies have argued that the fixed feature embedding learned only through the embedding layer limits the performance o… ▽ More

    Submitted 1 December, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  35. arXiv:2310.16400  [pdf, other

    cs.CV cs.AI

    Fuse Your Latents: Video Editing with Multi-source Latent Diffusion Models

    Authors: Tianyi Lu, Xing Zhang, Jiaxi Gu, Hang Xu, Renjing Pei, Songcen Xu, Zuxuan Wu

    Abstract: Latent Diffusion Models (LDMs) are renowned for their powerful capabilities in image and video synthesis. Yet, video editing methods suffer from insufficient pre-training data or video-by-video re-training cost. In addressing this gap, we propose FLDM (Fused Latent Diffusion Model), a training-free framework to achieve text-guided video editing by applying off-the-shelf image editing methods in vi… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  36. arXiv:2310.11053  [pdf, other

    cs.CL cs.AI cs.CY

    Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning

    Authors: Shitong Duan, Xiaoyuan Yi, Peng Zhang, Tun Lu, Xing Xie, Ning Gu

    Abstract: Large Language Models (LLMs) have made unprecedented breakthroughs, yet their increasing integration into everyday life might raise societal risks due to generated unethical content. Despite extensive study on specific issues like bias, the intrinsic values of LLMs remain largely unexplored from a moral philosophy perspective. This work delves into ethical values utilizing Moral Foundation Theory.… ▽ More

    Submitted 4 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024

  37. arXiv:2310.08384  [pdf, other

    cs.NE

    Towards Running Time Analysis of Interactive Multi-objective Evolutionary Algorithms

    Authors: Tianhao Lu, Chao Bian, Chao Qian

    Abstract: Evolutionary algorithms (EAs) are widely used for multi-objective optimization due to their population-based nature. Traditional multi-objective EAs (MOEAs) generate a large set of solutions to approximate the Pareto front, leaving a decision maker (DM) with the task of selecting a preferred solution. However, this process can be inefficient and time-consuming, especially when there are many objec… ▽ More

    Submitted 15 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

  38. arXiv:2310.06435  [pdf, other

    cs.CR cs.AR

    DASICS: Enhancing Memory Protection with Dynamic Compartmentalization

    Authors: Yue Jin, Yibin Xu, Chengyuan Yang, Han Wang, Tianyi Huang, Tianyue Lu, Mingyu Chen

    Abstract: In the existing software development ecosystem, security issues introduced by third-party code cannot be overlooked. Among these security concerns, memory access vulnerabilities stand out prominently, leading to risks such as the theft or tampering of sensitive data. To address this issue, software-based defense mechanisms have been established at the programming language, compiler, and operating… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: 16 pages, 6 figures

  39. arXiv:2310.00898  [pdf, other

    cs.CL

    Enabling Language Models to Implicitly Learn Self-Improvement

    Authors: Ziqi Wang, Le Hou, Tianjian Lu, Yuexin Wu, Yunxuan Li, Hongkun Yu, Heng Ji

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in open-ended text generation tasks. However, the inherent open-ended nature of these tasks implies that there is always room for improvement in the quality of model responses. To address this challenge, various approaches have been proposed to enhance the performance of LLMs. There has been a growing focus on enabling LLMs to… ▽ More

    Submitted 14 March, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: 28 pages, 5 figures, 4 tables

  40. UPL-SFDA: Uncertainty-aware Pseudo Label Guided Source-Free Domain Adaptation for Medical Image Segmentation

    Authors: Jianghao Wu, Guotai Wang, Ran Gu, Tao Lu, Yinan Chen, Wentao Zhu, Tom Vercauteren, Sébastien Ourselin, Shaoting Zhang

    Abstract: Domain Adaptation (DA) is important for deep learning-based medical image segmentation models to deal with testing images from a new target domain. As the source-domain data are usually unavailable when a trained model is deployed at a new center, Source-Free Domain Adaptation (SFDA) is appealing for data and annotation-efficient adaptation to the target domain. However, existing SFDA methods have… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 12 pages, 6 figures, to be published on IEEE TMI

  41. arXiv:2309.07861  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    CiwaGAN: Articulatory information exchange

    Authors: Gašper Beguš, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli

    Abstract: Humans encode information into sounds by controlling articulators and decode information from sounds using the auditory apparatus. This paper introduces CiwaGAN, a model of human spoken language acquisition that combines unsupervised articulatory modeling with an unsupervised model of information exchange through the auditory modality. While prior research includes unsupervised articulatory modeli… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

  42. arXiv:2309.04859  [pdf, other

    cs.AR

    PyHGL: A Python-based Hardware Generation Language Framework

    Authors: Jintao Sun, Zeke Wang, Tao Lu, Wenzhi Chen

    Abstract: Hardware generation languages (HGLs) increase hardware design productivity by creating parameterized modules and test benches. Unfortunately, existing tools are not widely adopted due to several demerits, including limited support for asynchronous circuits and unknown states, lack of concise and efficient language features, and low integration of simulation and verification functions. This paper i… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  43. arXiv:2309.04752  [pdf, other

    cs.CV

    Deep Video Restoration for Under-Display Camera

    Authors: Xuanxi Chen, Tao Wang, Ziqian Shao, Kaihao Zhang, Wenhan Luo, Tong Lu, Zikun Liu, Tae-Kyun Kim, Hongdong Li

    Abstract: Images or videos captured by the Under-Display Camera (UDC) suffer from severe degradation, such as saturation degeneration and color shift. While restoration for UDC has been a critical task, existing works of UDC restoration focus only on images. UDC video restoration (UDC-VR) has not been explored in the community. In this work, we first propose a GAN-based generation pipeline to simulate the r… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  44. arXiv:2309.03549  [pdf, other

    cs.CV cs.AI cs.MM

    Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation

    Authors: Jiaxi Gu, Shicong Wang, Haoyu Zhao, Tianyi Lu, Xing Zhang, Zuxuan Wu, Songcen Xu, Wei Zhang, Yu-Gang Jiang, Hang Xu

    Abstract: Inspired by the remarkable success of Latent Diffusion Models (LDMs) for image synthesis, we study LDM for text-to-video generation, which is a formidable challenge due to the computational and memory constraints during both model training and inference. A single LDM is usually only capable of generating a very limited number of video frames. Some existing works focus on separate prediction models… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  45. arXiv:2309.01361  [pdf, other

    cs.ET cs.CV cs.RO

    High Frequency, High Accuracy Pointing onboard Nanosats using Neuromorphic Event Sensing and Piezoelectric Actuation

    Authors: Yasir Latif, Peter Anastasiou, Yonhon Ng, Zebb Prime, Tien-Fu Lu, Matthew Tetlow, Robert Mahony, Tat-Jun Chin

    Abstract: As satellites become smaller, the ability to maintain stable pointing decreases as external forces acting on the satellite come into play. At the same time, reaction wheels used in the attitude determination and control system (ADCS) introduce high frequency jitter which can disrupt pointing stability. For space domain awareness (SDA) tasks that track objects tens of thousands of kilometres away,… ▽ More

    Submitted 10 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  46. arXiv:2308.09904  [pdf, other

    cs.IR cs.AI

    RAH! RecSys-Assistant-Human: A Human-Centered Recommendation Framework with LLM Agents

    Authors: Yubo Shu, Haonan Zhang, Hansu Gu, Peng Zhang, Tun Lu, Dongsheng Li, Ning Gu

    Abstract: The rapid evolution of the web has led to an exponential growth in content. Recommender systems play a crucial role in Human-Computer Interaction (HCI) by tailoring content based on individual preferences. Despite their importance, challenges persist in balancing recommendation accuracy with user satisfaction, addressing biases while preserving user privacy, and solving cold-start problems in cros… ▽ More

    Submitted 17 October, 2023; v1 submitted 19 August, 2023; originally announced August 2023.

  47. arXiv:2308.09244  [pdf, other

    cs.CV

    SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

    Authors: Haisong Liu, Yao Teng, Tao Lu, Haiguang Wang, Limin Wang

    Abstract: Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great attention over the past few years. Dense detectors typically follow a two-stage pipeline by first constructing a dense BEV feature and then performing object detection in BEV space, which suffers from complex view transformations and high computation cost. On the other side, sparse detectors follow a query-based paradi… ▽ More

    Submitted 5 September, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023. This version fixes some typos

  48. arXiv:2308.07893  [pdf, other

    cs.CV

    Memory-and-Anticipation Transformer for Online Action Understanding

    Authors: Jiahao Wang, Guo Chen, Yifei Huang, Limin Wang, Tong Lu

    Abstract: Most existing forecasting systems are memory-based methods, which attempt to mimic human forecasting ability by employing various memory mechanisms and have progressed in temporal modeling for memory dependency. Nevertheless, an obvious weakness of this paradigm is that it can only model limited historical dependence and can not transcend the past. In this paper, we rethink the temporal dependence… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 Camera Ready

  49. arXiv:2308.06878  [pdf, other

    cs.IR cs.LG

    AutoSeqRec: Autoencoder for Efficient Sequential Recommendation

    Authors: Sijia Liu, Jiahao Liu, Hansu Gu, Dongsheng Li, Tun Lu, Peng Zhang, Ning Gu

    Abstract: Sequential recommendation demonstrates the capability to recommend items by modeling the sequential behavior of users. Traditional methods typically treat users as sequences of items, overlooking the collaborative relationships among them. Graph-based methods incorporate collaborative information by utilizing the user-item interaction graph. However, these methods sometimes face challenges in term… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

    Comments: 10 pages, accepted by CIKM 2023

  50. PTransIPs: Identification of phosphorylation sites enhanced by protein PLM embeddings

    Authors: Ziyang Xu, Haitian Zhong, Bingrui He, Xueying Wang, Tianchi Lu

    Abstract: Phosphorylation is pivotal in numerous fundamental cellular processes and plays a significant role in the onset and progression of various diseases. The accurate identification of these phosphorylation sites is crucial for unraveling the molecular mechanisms within cells and during viral infections, potentially leading to the discovery of novel therapeutic targets. In this study, we develop PTrans… ▽ More

    Submitted 13 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.