Skip to main content

Showing 1–50 of 294 results for author: Zhou, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.05957  [pdf, other

    cs.CL

    OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning

    Authors: Dan Qiao, Yi Su, Pinzheng Wang, Jing Ye, Wenjing Xie, Yuechi Zhou, Yuyang Ding, Zecheng Tang, Jikai Wang, Yixin Ji, Yue Wang, Pei Guo, Zechen Sun, Zikang Zhang, Juntao Li, Pingfu Chao, Wenliang Chen, Guohong Fu, Guodong Zhou, Qiaoming Zhu, Min Zhang

    Abstract: Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  2. arXiv:2405.04840  [pdf, other

    cs.IR

    Federated Adaptation for Foundation Model-based Recommendations

    Authors: Chunxu Zhang, Guodong Long, Hongkuan Guo, Xiao Fang, Yang Song, Zhaojie Liu, Guorui Zhou, Zijian Zhang, Yang Liu, Bo Yang

    Abstract: With the recent success of large language models, particularly foundation models with generalization abilities, applying foundation models for recommendations becomes a new paradigm to improve existing recommendation systems. It becomes a new open challenge to enable the foundation model to capture user preference changes in a timely manner with reasonable communication and computation costs while… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted as a regular paper of IJCAI'24

  3. arXiv:2405.02880  [pdf, other

    cs.CV cs.RO

    Blending Distributed NeRFs with Tri-stage Robust Pose Optimization

    Authors: Baijun Ye, Caiyun Liu, Xiaoyu Ye, Yuantao Chen, Yuhai Wang, Zike Yan, Yongliang Shi, Hao Zhao, Guyue Zhou

    Abstract: Due to the limited model capacity, leveraging distributed Neural Radiance Fields (NeRFs) for modeling extensive urban environments has become a necessity. However, current distributed NeRF registration approaches encounter aliasing artifacts, arising from discrepancies in rendering resolutions and suboptimal pose precision. These factors collectively deteriorate the fidelity of pose estimation wit… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

  4. arXiv:2404.18192  [pdf, other

    cs.RO

    Block-Map-Based Localization in Large-Scale Environment

    Authors: Yixiao Feng, Zhou Jiang, Yongliang Shi, Yunlong Feng, Xiangyu Chen, Hao Zhao, Guyue Zhou

    Abstract: Accurate localization is an essential technology for the flexible navigation of robots in large-scale environments. Both SLAM-based and map-based localization will increase the computing load due to the increase in map size, which will affect downstream tasks such as robot navigation and services. To this end, we propose a localization system based on Block Maps (BMs) to reduce the computational l… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 7 pages, 4 figures, 4 tables, published to ICRA 2024

  5. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  6. arXiv:2404.15807  [pdf, other

    cs.CL

    One Subgraph for All: Efficient Reasoning on Opening Subgraphs for Inductive Knowledge Graph Completion

    Authors: Zhiwen Xie, Yi Zhang, Guangyou Zhou, Jin Liu, Xinhui Tu, Jimmy Xiangji Huang

    Abstract: Knowledge Graph Completion (KGC) has garnered massive research interest recently, and most existing methods are designed following a transductive setting where all entities are observed during training. Despite the great progress on the transductive KGC, these methods struggle to conduct reasoning on emerging KGs involving unseen entities. Thus, inductive KGC, which aims to deduce missing links am… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  7. arXiv:2404.13946  [pdf, other

    cs.LG

    Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning

    Authors: Rong Wang, Guichen Zhou, Mingjun Gao, Yunpeng Xiao

    Abstract: In recent years, the neural network backdoor hidden in the parameters of the federated learning model has been proved to have great security risks. Considering the characteristics of trigger generation, data poisoning and model training in backdoor attack, this paper designs a backdoor attack method based on federated learning. Firstly, aiming at the concealment of the backdoor trigger, a TrojanGa… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

  8. arXiv:2404.13425  [pdf, other

    cs.CV cs.AI

    AdvLoRA: Adversarial Low-Rank Adaptation of Vision-Language Models

    Authors: Yuheng Ji, Yue Liu, Zhicheng Zhang, Zhao Zhang, Yuting Zhao, Gang Zhou, Xingwei Zhang, Xinwang Liu, Xiaolong Zheng

    Abstract: Vision-Language Models (VLMs) are a significant technique for Artificial General Intelligence (AGI). With the fast growth of AGI, the security problem become one of the most important challenges for VLMs. In this paper, through extensive experiments, we demonstrate the vulnerability of the conventional adaptation methods for VLMs, which may bring significant security risks. In addition, as the siz… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  9. arXiv:2404.06078  [pdf, other

    cs.IR

    End-to-end training of Multimodal Model and ranking Model

    Authors: Xiuqi Deng, Lu Xu, Xiyao Li, Jinkai Yu, Erpeng Xue, Zhongyuan Wang, Di Zhang, Zhaojie Liu, Guorui Zhou, Yang Song, Na Mou, Shen Jiang, Han Li

    Abstract: Traditional recommender systems heavily rely on ID features, which often encounter challenges related to cold-start and generalization. Modeling pre-extracted content features can mitigate these issues, but is still a suboptimal solution due to the discrepancies between training tasks and model parameters. End-to-end training presents a promising solution for these problems, yet most of the existi… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 9 pages, 8 figures

  10. arXiv:2404.04579  [pdf, other

    cs.HC

    TeleAware Robot: Designing Awareness-augmented Telepresence Robot for Remote Collaborative Locomotion

    Authors: Ruyi Li, Yaxin Zhu, Min Liu, Yihang Zeng, Shanning Zhuang, Jiayi Fu, Yi Lu, Guyue Zhou, Can Liu, Jiangtao Gong

    Abstract: Telepresence robots can be used to support users to navigate an environment remotely and share the visiting experience with their social partners. Although such systems allow users to see and hear the remote environment and communicate with their partners via live video feed, this does not provide enough awareness of the environment and their remote partner's activities. In this paper, we introduc… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 33 pages, 12 figures

    MSC Class: H.5.2

    Journal ref: IMUWT 2024

  11. arXiv:2404.04167  [pdf, other

    cs.CL cs.AI

    Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model

    Authors: Xinrun Du, Zhouliang Yu, Songyang Gao, Ding Pan, Yuyang Cheng, Ziyang Ma, Ruibin Yuan, Xingwei Qu, Jiaheng Liu, Tianyu Zheng, Xinchen Luo, Guorui Zhou, Binhang Yuan, Wenhu Chen, Jie Fu, Ge Zhang

    Abstract: In this study, we introduce CT-LLM, a 2B large language model (LLM) that illustrates a pivotal shift towards prioritizing the Chinese language in developing LLMs. Uniquely initiated from scratch, CT-LLM diverges from the conventional methodology by primarily incorporating Chinese textual data, utilizing an extensive corpus of 1,200 billion tokens, including 800 billion Chinese tokens, 300 billion… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

  12. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Yixin Zhu, Guyue Zhou, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation of ungraspable objects with two-finger grippers presents significant challenges due to the paucity of graspable features, while traditional pre-grasping techniques, which rely on repositioning objects and leveraging external aids like table edges, lack the adaptability across object categories and scenes. Addressing this, we introduce PreAfford, a novel pre-grasping planning f… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  13. Efficient Multi-branch Segmentation Network for Situation Awareness in Autonomous Navigation

    Authors: Guan-Cheng Zhou, Chen Chengb, Yan-zhou Chena

    Abstract: Real-time and high-precision situational awareness technology is critical for autonomous navigation of unmanned surface vehicles (USVs). In particular, robust and fast obstacle semantic segmentation methods are essential. However, distinguishing between the sea and the sky is challenging due to the differences between port and maritime environments. In this study, we built a dataset that captured… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Journal ref: Ocean Engineering 302 (2024) 117741

  14. arXiv:2403.16535  [pdf, other

    cs.RO

    Arm-Constrained Curriculum Learning for Loco-Manipulation of the Wheel-Legged Robot

    Authors: Zifan Wang, Yufei Jia, Lu Shi, Haoyu Wang, Haizhou Zhao, Xueyang Li, Jinni Zhou, Jun Ma, Guyue Zhou

    Abstract: Incorporating a robotic manipulator into a wheel-legged robot enhances its agility and expands its potential for practical applications. However, the presence of potential instability and uncertainties presents additional challenges for control objectives. In this paper, we introduce an arm-constrained curriculum learning architecture to tackle the issues introduced by adding the manipulator. Firs… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  15. arXiv:2403.14674  [pdf

    cs.CY

    Packaging Up Media Mix Modeling: An Introduction to Robyn's Open-Source Approach

    Authors: Gufeng Zhou, Igor Skokan, Julian Runge

    Abstract: While attribution of user behavior across apps and websites had led to unseen levels of determinism in digital advertising measurement, privacy-centric changes to the digital data landscape are bringing probabilistic techniques such as marketing and media mix modeling en vogue again. Many small and midsize advertisers lack the scale and resources to invest in advanced proprietary modeling efforts… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  16. arXiv:2403.12787  [pdf, other

    cs.CV

    DDSB: An Unsupervised and Training-free Method for Phase Detection in Echocardiography

    Authors: Zhenyu Bu, Yang Liu, Jiayu Huo, Jingjing Peng, Kaini Wang, Guangquan Zhou, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Accurate identification of End-Diastolic (ED) and End-Systolic (ES) frames is key for cardiac function assessment through echocardiography. However, traditional methods face several limitations: they require extensive amounts of data, extensive annotations by medical experts, significant training resources, and often lack robustness. Addressing these challenges, we proposed an unsupervised and tra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  17. arXiv:2403.12386  [pdf

    cs.CL cs.AI

    Pipelined Biomedical Event Extraction Rivaling Joint Learning

    Authors: Pengchao Wu, Xuefeng Li, Jinghang Gu, Longhua Qian, Guodong Zhou

    Abstract: Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  18. arXiv:2403.10319  [pdf, other

    cs.NI cs.CR

    NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models

    Authors: Chen Qian, Xiaochang Li, Qineng Wang, Gang Zhou, Huajie Shao

    Abstract: In computer networking, network traffic refers to the amount of data transmitted in the form of packets between internetworked computers or Cyber-Physical Systems. Monitoring and analyzing network traffic is crucial for ensuring the performance, security, and reliability of a network. However, a significant challenge in network traffic analysis is to process diverse data packets including both cip… ▽ More

    Submitted 18 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  19. arXiv:2403.07027  [pdf, ps, other

    cs.LG

    FWin transformer for dengue prediction under climate and ocean influence

    Authors: Nhat Thanh Tran, Jack Xin, Guofa Zhou

    Abstract: Dengue fever is one of the most deadly mosquito-born tropical infectious diseases. Detailed long range forecast model is vital in controlling the spread of disease and making mitigation efforts. In this study, we examine methods used to forecast dengue cases for long range predictions. The dataset consists of local climate/weather in addition to global climate indicators of Singapore from 2000 to… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  20. arXiv:2403.05326  [pdf, other

    cs.CL cs.AI

    ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues

    Authors: Yiding Liu, Jingjing Wang, Jiamin Luo, Tao Zeng, Guodong Zhou

    Abstract: Aspect Sentiment Understanding (ASU) in interactive scenarios (e.g., Question-Answering and Dialogue) has attracted ever-more interest in recent years and achieved important progresses. However, existing studies on interactive ASU largely ignore the coreference issue for opinion targets (i.e., aspects), while this phenomenon is ubiquitous in interactive scenarios especially dialogues, limiting the… ▽ More

    Submitted 10 April, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  21. arXiv:2403.04789  [pdf, other

    cs.CL cs.AI cs.LG

    TopicDiff: A Topic-enriched Diffusion Approach for Multimodal Conversational Emotion Detection

    Authors: Jiamin Luo, Jingjing Wang, Guodong Zhou

    Abstract: Multimodal Conversational Emotion (MCE) detection, generally spanning across the acoustic, vision and language modalities, has attracted increasing interest in the multimedia community. Previous studies predominantly focus on learning contextual information in conversations with only a few considering the topic information in single language modality, while always neglecting the acoustic and visio… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  22. arXiv:2403.02942  [pdf, other

    cs.IT eess.SP

    Tensor Decomposition-based Time Varying Channel Estimation for mmWave MIMO-OFDM Systems

    Authors: Ruizhe Wang, Hong Ren, Cunhua Pan, Gui Zhou, Jiangzhou Wang

    Abstract: In this paper, we consider the time-varying channel estimation in millimeter wave (mmWave) multiple-input multiple-output MIMO systems with hybrid beamforming architectures. Different from the existing contributions that considered single-carrier mmWave systems with high mobility, the wideband orthogonal frequency division multiplexing (OFDM) system is considered in this work. To solve the channel… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  23. arXiv:2403.01820  [pdf, other

    math.NA cs.LG

    Macroscopic auxiliary asymptotic preserving neural networks for the linear radiative transfer equations

    Authors: Hongyan Li, Song Jiang, Wenjun Sun, Liwei Xu, Guanyu Zhou

    Abstract: We develop a Macroscopic Auxiliary Asymptotic-Preserving Neural Network (MA-APNN) method to solve the time-dependent linear radiative transfer equations (LRTEs), which have a multi-scale nature and high dimensionality. To achieve this, we utilize the Physics-Informed Neural Networks (PINNs) framework and design a new adaptive exponentially weighted Asymptotic-Preserving (AP) loss function, which i… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

    Comments: 24 pages, 29 figures

  24. arXiv:2402.19116  [pdf, other

    cs.CL cs.AI

    How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding

    Authors: Jiamin Luo, Jianing Zhao, Jingjing Wang, Guodong Zhou

    Abstract: Weakly-supervised Phrase Grounding (WPG) is an emerging task of inferring the fine-grained phrase-region matching, while merely leveraging the coarse-grained sentence-image pairs for training. However, existing studies on WPG largely ignore the implicit phrase-region matching relations, which are crucial for evaluating the capability of models in understanding the deep multimodal semantics. To thi… ▽ More

    Submitted 4 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  25. arXiv:2402.16915  [pdf, other

    cs.LG cs.AI

    More Than Routing: Joint GPS and Route Modeling for Refine Trajectory Representation Learning

    Authors: Zhipeng Ma, Zheyan Tu, Xinhai Chen, Yan Zhang, Deguo Xia, Guyue Zhou, Yilun Chen, Yu Zheng, Jiangtao Gong

    Abstract: Trajectory representation learning plays a pivotal role in supporting various downstream tasks. Traditional methods in order to filter the noise in GPS trajectories tend to focus on routing-based methods used to simplify the trajectories. However, this approach ignores the motion details contained in the GPS data, limiting the representation capability of trajectory representation learning. To fil… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  26. arXiv:2402.15852  [pdf, other

    cs.CV cs.RO

    NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

    Authors: Jiazhao Zhang, Kunyu Wang, Rongtao Xu, Gengze Zhou, Yicong Hong, Xiaomeng Fang, Qi Wu, Zhizheng Zhang, He Wang

    Abstract: Vision-and-Language Navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions. In this field, generalization is a long-standing challenge, either to out-of-distribution scenes or from Sim to Real. In this paper, we propose NaVid, a video-based large vision language model (VLM), to mitigate such a… ▽ More

    Submitted 23 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

  27. arXiv:2402.15738  [pdf, other

    cs.CR eess.SY

    Privacy-Preserving State Estimation in the Presence of Eavesdroppers: A Survey

    Authors: Xinhao Yan, Guanzhong Zhou, Daniel E. Quevedo, Carlos Murguia, Bo Chen, Hailong Huang

    Abstract: Networked systems are increasingly the target of cyberattacks that exploit vulnerabilities within digital communications, embedded hardware, and software. Arguably, the simplest class of attacks -- and often the first type before launching destructive integrity attacks -- are eavesdropping attacks, which aim to infer information by collecting system data and exploiting it for malicious purposes. A… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 16 pages, 5 figures, 4 tables

  28. CLIPose: Category-Level Object Pose Estimation with Pre-trained Vision-Language Knowledge

    Authors: Xiao Lin, Minghao Zhu, Ronghao Dang, Guangliang Zhou, Shaolong Shu, Feng Lin, Chengju Liu, Qijun Chen

    Abstract: Most of existing category-level object pose estimation methods devote to learning the object category information from point cloud modality. However, the scale of 3D datasets is limited due to the high cost of 3D data collection and annotation. Consequently, the category features extracted from these limited point cloud samples may not be comprehensive. This motivates us to investigate whether we… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 14 pages, 4 figures, 9 tables

  29. Understanding Human-AI Collaboration in Music Therapy Through Co-Design with Therapists

    Authors: Jingjing Sun, Jingyi Yang, Guyue Zhou, Yucheng Jin, Jiangtao Gong

    Abstract: The rapid development of musical AI technologies has expanded the creative potential of various musical activities, ranging from music style transformation to music generation. However, little research has investigated how musical AIs can support music therapists, who urgently need new technology support. This study used a mixed method, including semi-structured interviews and a participatory desi… ▽ More

    Submitted 15 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 20 pages, 7 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  30. "It Must Be Gesturing Towards Me": Gesture-Based Interaction between Autonomous Vehicles and Pedestrians

    Authors: Xiang Chang, Zihe Chen, Xiaoyan Dong, Yuxin Cai, Tingmin Yan, Haolin Cai, Zherui Zhou, Guyue Zhou, Jiangtao Gong

    Abstract: Interacting with pedestrians understandably and efficiently is one of the toughest challenges faced by autonomous vehicles (AVs) due to the limitations of current algorithms and external human-machine interfaces (eHMIs). In this paper, we design eHMIs based on gestures inspired by the most popular method of interaction between pedestrians and human drivers. Eight common gestures were selected to c… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 26 pages,22 figures

    MSC Class: H.5.2

    Journal ref: CHI2024

  31. arXiv:2402.14399  [pdf, other

    cs.IR cs.AI

    Ensure Timeliness and Accuracy: A Novel Sliding Window Data Stream Paradigm for Live Streaming Recommendation

    Authors: Fengqi Liang, Baigong Zheng, Liqin Zhao, Guorui Zhou, Qian Wang, Yanan Niu

    Abstract: Live streaming recommender system is specifically designed to recommend real-time live streaming of interest to users. Due to the dynamic changes of live content, improving the timeliness of the live streaming recommender system is a critical problem. Intuitively, the timeliness of the data determines the upper bound of the timeliness that models can learn. However, none of the previous works addr… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  32. arXiv:2402.09055  [pdf, other

    cs.CV cs.AI

    Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

    Authors: Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li, Guodong Zhou

    Abstract: The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notabl… ▽ More

    Submitted 14 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ICMR 2024

  33. arXiv:2402.05869  [pdf, other

    cs.CV

    Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images

    Authors: Xiaoxiao Long, Yuhang Zheng, Yupeng Zheng, Beiwen Tian, Cheng Lin, Lingjie Liu, Hao Zhao, Guyue Zhou, Wenping Wang

    Abstract: We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. The difficulty of reliably capturing geometric context in existing methods impedes their ability to accurately enforce the consistency between the different geometric properties, thereby leading to a bottleneck of geometric estimation quality. We therefore propose t… ▽ More

    Submitted 31 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2103.15483

  34. arXiv:2402.04580  [pdf, other

    cs.RO cs.AI cs.LG

    A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

    Authors: Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan

    Abstract: The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and labora… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  35. arXiv:2402.02456  [pdf, other

    cs.LG cs.CL

    Discovering More Effective Tensor Network Structure Search Algorithms via Large Language Models (LLMs)

    Authors: Junhua Zeng, Guoxu Zhou, Chao Li, Zhun Sun, Qibin Zhao

    Abstract: Tensor network structure search (TN-SS), aiming at searching for suitable tensor network (TN) structures in representing high-dimensional problems, largely promotes the efficacy of TN in various machine learning applications. Nonetheless, finding a satisfactory TN structure using existing algorithms remains challenging. To develop more effective algorithms and avoid the human labor-intensive devel… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  36. arXiv:2402.00632  [pdf, other

    cs.CL

    Prosody in Cascade and Direct Speech-to-Text Translation: a case study on Korean Wh-Phrases

    Authors: Giulio Zhou, Tsz Kin Lam, Alexandra Birch, Barry Haddow

    Abstract: Speech-to-Text Translation (S2TT) has typically been addressed with cascade systems, where speech recognition systems generate a transcription that is subsequently passed to a translation model. While there has been a growing interest in developing direct speech translation systems to avoid propagating errors and losing non-verbal content, prior work in direct S2TT has struggled to conclusively es… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: Accepted at Findings of EACL 2024

  37. arXiv:2401.09716  [pdf, other

    cs.CV cs.AI

    HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization

    Authors: Guanglin Zhou, Zhongyi Han, Shiming Chen, Biwei Huang, Liming Zhu, Tongliang Liu, Lina Yao, Kun Zhang

    Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to a fixed structure or uniform parameterization to encapsulate invariant features can inadvertently blend specific aspects. Such an approach struggles with nuanced differentiation of inter-domain variations and m… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

  38. arXiv:2401.05946  [pdf, other

    cs.LG cs.AI

    Learning Cognitive Maps from Transformer Representations for Efficient Planning in Partially Observed Environments

    Authors: Antoine Dedieu, Wolfgang Lehrach, Guangyao Zhou, Dileep George, Miguel Lázaro-Gredilla

    Abstract: Despite their stellar performance on a wide range of tasks, including in-context tasks only revealed during inference, vanilla transformers and variants trained for next-token predictions (a) do not learn an explicit world model of their environment which can be flexibly queried and (b) cannot be used for planning or navigation. In this paper, we consider partially observed environments (POEs), wh… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

  39. arXiv:2401.04942  [pdf, other

    cs.CV

    Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

    Authors: Beiwen Tian, Huan-ang Gao, Leiyao Cui, Yupeng Zheng, Lan Luo, Baofeng Wang, Rong Zhi, Guyue Zhou, Hao Zhao

    Abstract: In the past several years, road anomaly segmentation is actively explored in the academia and drawing growing attention in the industry. The rationale behind is straightforward: if the autonomous car can brake before hitting an anomalous object, safety is promoted. However, this rationale naturally calls for a temporally informed setting while existing methods and benchmarks are designed in an unr… ▽ More

    Submitted 10 January, 2024; originally announced January 2024.

  40. arXiv:2401.01193  [pdf, ps, other

    cs.CC cs.DS

    Further Explanations on "SAT Requires Exhaustive Search"

    Authors: Qingxiu Dong, Guangyan Zhou, Ke Xu

    Abstract: Recently, Xu and Zhou [2023] introduced a constructive approach for exploring computational hardness, proving that SAT requires exhaustive search. In light of certain misinterpretations concerning the contributions and proofs in that paper, we focus on providing detailed explanations in this work. We begin by delineating the core innovation of the constructive approach, shedding light on the pivot… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

  41. arXiv:2312.15820  [pdf, other

    cs.CV

    WebVLN: Vision-and-Language Navigation on Websites

    Authors: Qi Chen, Dileepa Pitawela, Chongyang Zhao, Gengze Zhou, Hsiang-Ting Chen, Qi Wu

    Abstract: Vision-and-Language Navigation (VLN) task aims to enable AI agents to accurately understand and follow natural language instructions to navigate through real-world environments, ultimately reaching specific target locations. We recognise a promising opportunity to extend VLN to a comparable navigation task that holds substantial significance in our daily lives, albeit within the virtual realm: nav… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  42. arXiv:2312.12791  [pdf, other

    cs.RO cs.AI cs.LG

    Model-Based Control with Sparse Neural Dynamics

    Authors: Ziang Liu, Genggeng Zhou, Jeff He, Tobia Marcucci, Li Fei-Fei, Jiajun Wu, Yunzhu Li

    Abstract: Learning predictive models from observations using deep neural networks (DNNs) is a promising new approach to many real-world planning and control problems. However, common DNNs are too unstructured for effective planning, and current control methods typically rely on extensive sampling or local gradient descent. In this paper, we propose a new framework for integrated model learning and predictiv… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: Accepted at NeurIPS 2023. For tutorial code and additional visualizations, see https://robopil.github.io/Sparse-Dynamics/

  43. arXiv:2312.08994  [pdf, other

    cs.LG cs.AR

    PANDA: Architecture-Level Power Evaluation by Unifying Analytical and Machine Learning Solutions

    Authors: Qijun Zhang, Shiyu Li, Guanglei Zhou, Jingyu Pan, Chen-Chia Chang, Yiran Chen, Zhiyao Xie

    Abstract: Power efficiency is a critical design objective in modern microprocessor design. To evaluate the impact of architectural-level design decisions, an accurate yet efficient architecture-level power model is desired. However, widely adopted data-independent analytical power models like McPAT and Wattch have been criticized for their unreliable accuracy. While some machine learning (ML) methods have b… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Journal ref: IEEE/ACM International Conference on Computer-Aided Design (ICCAD) 2023

  44. arXiv:2312.07424  [pdf, other

    cs.LG cs.AI cs.CV

    How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

    Authors: Zhongyi Han, Guanglin Zhou, Rundong He, Jindong Wang, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang

    Abstract: In machine learning, generalization against distribution shifts -- where deployment conditions diverge from the training scenarios -- is crucial, particularly in fields like climate modeling, biomedicine, and autonomous driving. The emergence of foundation models, distinguished by their extensive pretraining and task versatility, has led to an increased interest in their adaptability to distributi… ▽ More

    Submitted 25 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: added the investigation of Gemini. 66 pages, 41 figures

  45. arXiv:2312.04245  [pdf, other

    cs.MA cs.AI

    Mastering Complex Coordination through Attention-based Dynamic Graph

    Authors: Guangchong Zhou, Zhiwei Xu, Zeren Zhang, Guoliang Fan

    Abstract: The coordination between agents in multi-agent systems has become a popular topic in many fields. To catch the inner relationship between agents, the graph structure is combined with existing methods and improves the results. But in large-scale tasks with numerous agents, an overly complex graph would lead to a boost in computational cost and a decline in performance. Here we present DAGMIX, a nov… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  46. arXiv:2312.00651  [pdf, other

    cs.CV cs.AI

    TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models

    Authors: Pengxiang Li, Kai Chen, Zhili Liu, Ruiyuan Gao, Lanqing Hong, Guo Zhou, Hua Yao, Dit-Yan Yeung, Huchuan Lu, Xu Jia

    Abstract: Despite remarkable achievements in video synthesis, achieving granular control over complex dynamics, such as nuanced movement among multiple interacting objects, still presents a significant hurdle for dynamic world modeling, compounded by the necessity to manage appearance and disappearance, drastic scale changes, and ensure consistency for instances across frames. These challenges hinder the de… ▽ More

    Submitted 20 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

  47. arXiv:2311.11041  [pdf, other

    cs.IT eess.SP

    Channel Estimation for FAS-assisted Multiuser mmWave Systems

    Authors: Hao Xu, Gui Zhou, Kai-Kit Wong, Wee Kiat New, Chao Wang, Chan-Byoung Chae, Ross Murch, Shi Jin, Yangyang Zhang

    Abstract: This letter investigates the challenge of channel estimation in a multiuser millimeter-wave (mmWave) time-division duplexing (TDD) system. In this system, the base station (BS) employs a multi-antenna uniform linear array (ULA), while each mobile user is equipped with a fluid antenna system (FAS). Accurate channel state information (CSI) plays a crucial role in the precise placement of antennas in… ▽ More

    Submitted 3 January, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: 6 pages, 4 figures

  48. arXiv:2311.11037  [pdf, other

    cs.IT

    Capacity Maximization for FAS-assisted Multiple Access Channels

    Authors: Hao Xu, Kai-Kit Wong, Wee Kiat New, Gui Zhou, Ross Murch, Chan-Byoung Chae, Yongxu Zhu, Shi Jin

    Abstract: This paper investigates a multiuser millimeter-wave (mmWave) uplink system in which each user is equipped with a multi-antenna fluid antenna system (FAS) while the base station (BS) has multiple fixed-position antennas. Our primary objective is to maximize the system capacity by optimizing the transmit covariance matrices and the antenna position vectors of the users jointly. To gain deeper insigh… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

    Comments: 13 pages, 8 figures

  49. arXiv:2311.06211  [pdf, other

    cs.CV cs.RO

    ASSIST: Interactive Scene Nodes for Scalable and Realistic Indoor Simulation

    Authors: Zhide Zhong, Jiakai Cao, Songen Gu, Sirui Xie, Weibo Gao, Liyi Luo, Zike Yan, Hao Zhao, Guyue Zhou

    Abstract: We present ASSIST, an object-wise neural radiance field as a panoptic representation for compositional and realistic simulation. Central to our approach is a novel scene node data structure that stores the information of each object in a unified fashion, allowing online interaction in both intra- and cross-scene settings. By incorporating a differentiable neural network along with the associated b… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  50. arXiv:2311.04150   

    cs.HC

    What Makes a Fantastic Passenger-Car Driver in Urban Contexts?

    Authors: Yueteng Yu, Zhijie Yi, Xinyu Yang, Mengdi Chu, Junrong Lu, Xiang Chang, Yiyao Liu, Jingli Qin, Ye Jin, Jialin Song, Xingrui Gu, Jirui Yuan, Guyue Zhou, Jiangtao Gong

    Abstract: The accurate evaluation of the quality of driving behavior is crucial for optimizing and implementing autonomous driving technology in practice. However, there is no comprehensive understanding of good driving behaviors currently. In this paper, we sought to understand driving behaviors from the perspectives of both drivers and passengers. We invited 10 expert drivers and 14 novice drivers to comp… ▽ More

    Submitted 12 November, 2023; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Part of the content of the paper will be modified. One of the authors has recommended its withdrawal due to personal reasons