Search | arXiv e-print repository

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. (2) Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448$\times$448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. (3) High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code has been released at https://github.com/OpenGVLab/InternVL. △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Technical report

arXiv:2402.13533 [pdf, other]

FinGPT-HPC: Efficient Pretraining and Finetuning Large Language Models for Financial Applications with High-Performance Computing

Authors: Xiao-Yang Liu, Jie Zhang, Guoxuan Wang, Weiqing Tong, Anwar Walid

Abstract: Large language models (LLMs) are computationally intensive. The computation workload and the memory footprint grow quadratically with the dimension (layer width). Most of LLMs' parameters come from the linear layers of the transformer structure and are highly redundant. These linear layers contribute more than 80% of the computation workload and 99% of the model size. To pretrain and finetune LLMs… ▽ More Large language models (LLMs) are computationally intensive. The computation workload and the memory footprint grow quadratically with the dimension (layer width). Most of LLMs' parameters come from the linear layers of the transformer structure and are highly redundant. These linear layers contribute more than 80% of the computation workload and 99% of the model size. To pretrain and finetune LLMs efficiently, there are three major challenges to address: 1) reducing redundancy of the linear layers; 2) reducing GPU memory footprint; 3) improving GPU utilization when using distributed training. Prior methods, such as LoRA and QLoRA, utilized low-rank matrices and quantization to reduce the number of trainable parameters and model size, respectively. However, the resulting model still consumes a large amount of GPU memory. In this paper, we present high-performance GPU-based methods that exploit low-rank structures to pretrain and finetune LLMs for financial applications. We replace one conventional linear layer of the transformer structure with two narrower linear layers, which allows us to reduce the number of parameters by several orders of magnitude. By quantizing the parameters into low precision (8-bit and 4-bit), the memory consumption of the resulting model is further reduced. Compared with existing LLMs, our methods achieve a speedup of 1.3X and a model compression ratio of 2.64X for pretaining without accuracy drop. For finetuning, our methods achieve an average accuracy increase of 6.3% and 24.0% in general tasks and financial tasks, respectively, and GPU memory consumption ratio of 6.3X. The sizes of our models are smaller than 0.59 GB, allowing inference on a smartphone. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.04991 [pdf, other]

Exploring the Opportunity of Augmented Reality (AR) in Supporting Older Adults Explore and Learn Smartphone Applications

Authors: Xiaofu Jin, Wai Tong, Xiaoying Wei, Xian Wang, Emily Kuang, Xiaoyu Mo, Huamin Qu, Mingming Fan

Abstract: The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop w… ▽ More The global aging trend compels older adults to navigate the evolving digital landscape, presenting a substantial challenge in mastering smartphone applications. While Augmented Reality (AR) holds promise for enhancing learning and user experience, its role in aiding older adults' smartphone app exploration remains insufficiently explored. Therefore, we conducted a two-phase study: (1) a workshop with 18 older adults to identify app exploration challenges and potential AR interventions, and (2) tech-probe participatory design sessions with 15 participants to co-create AR support tools. Our research highlights AR's effectiveness in reducing physical and cognitive strain among older adults during app exploration, especially during multi-app usage and the trial-and-error learning process. We also examined their interactional experiences with AR, yielding design considerations on tailoring AR tools for smartphone app exploration. Ultimately, our study unveils the prospective landscape of AR in supporting the older demographic, both presently and in future scenarios. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2312.12381 [pdf, other]

Blockchain-Based Identity Authentication Oriented to Multi-Cluster UAV Networking

Authors: Zesong Dong, Wei Tong, Zhiwei Zhang, Jian Li, Weidong Yang, Yulong Shen

Abstract: Unmanned Aerial Vehicle (UAV) networking is increasingly used in field environments such as power inspection, agricultural plant protection, and emergency rescue. To guarantee UAV networking security, UAV identity authentication attracts wide attention, especially in the field environment without perfect infrastructure. Some blockchain-based UAV identity authentication solutions are proposed to es… ▽ More Unmanned Aerial Vehicle (UAV) networking is increasingly used in field environments such as power inspection, agricultural plant protection, and emergency rescue. To guarantee UAV networking security, UAV identity authentication attracts wide attention, especially in the field environment without perfect infrastructure. Some blockchain-based UAV identity authentication solutions are proposed to establish decentralized and trusted authentication systems without relying on infrastructure. However, these solutions do not support disconnected UAV reconnection or even disband a cluster directly after its head UAV disconnection, which compromises cluster robustness and task result integrity. In this paper, we propose a blockchain-based identity authentication solution oriented to multi-cluster UAV networking with a UAV disconnection mechanism and a task result backup mechanism. Specifically, we build a blockchain maintained by head UAVs of all clusters, managing identity information to guarantee the security of decentralized identity management. The UAV disconnection mechanism permits a verified distributed UAV reconnection to ensure the robustness of the UAV cluster, and on this basis, the task result backup mechanism ensures the integrity of the task results stored in a cluster even any UAV disconnection. Finally, extensive experimental results prove the superiority of our solutions in terms of robustness, integrity, delay, and energy consumption. △ Less

Submitted 14 November, 2023; originally announced December 2023.

arXiv:2312.09245 [pdf, other]

DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

Authors: Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

Abstract: Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge… ▽ More Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge the gap between the language decisions and the vehicle control commands by standardizing the decision states according to the off-the-shelf motion planning module. (2) We employ a multi-modal LLM (MLLM) to model the behavior planning module of a module AD system, which uses driving rules, user commands, and inputs from various sensors (e.g., camera, lidar) as input and makes driving decisions and provide explanations; This model can plug-and-play in existing AD systems such as Apollo for close-loop driving. (3) We design an effective data engine to collect a dataset that includes decision state and corresponding explanation annotation for model training and evaluation. We conduct extensive experiments and show that our model achieves 76.1 driving score on the CARLA Town05 Long, and surpasses the Apollo baseline by 4.7 points under the same settings, demonstrating the effectiveness of our model. We hope this work can serve as a baseline for autonomous driving with LLMs. Code and models shall be released at https://github.com/OpenGVLab/DriveMLM. △ Less

Submitted 25 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Technical Report

arXiv:2312.06200 [pdf, ps, other]

Achieving the Fundamental Limit of Lossless Analog Compression via Polarization

Authors: Shuai Yuan, Liuquan Yao, Yuan Li, Huazi Zhang, Jun Wang, Wen Tong, Zhiming Ma

Abstract: In this paper, we study the lossless analog compression for i.i.d. nonsingular signals via the polarization-based framework. We prove that for nonsingular source, the error probability of maximum a posteriori (MAP) estimation polarizes under the Hadamard transform, which extends the polarization phenomenon to analog domain. Building on this insight, we propose partial Hadamard compression and deve… ▽ More In this paper, we study the lossless analog compression for i.i.d. nonsingular signals via the polarization-based framework. We prove that for nonsingular source, the error probability of maximum a posteriori (MAP) estimation polarizes under the Hadamard transform, which extends the polarization phenomenon to analog domain. Building on this insight, we propose partial Hadamard compression and develop the corresponding analog successive cancellation (SC) decoder. The proposed scheme consists of deterministic measurement matrices and non-iterative reconstruction algorithm, providing benefits in both space and computational complexity. Using the polarization of error probability, we prove that our approach achieves the information-theoretical limit for lossless analog compression developed by Wu and Verdu. △ Less

Submitted 19 January, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: 48 pages, 5 figures. This work was presented in part at the 2023 IEEE Global Communications Conference

arXiv:2311.13106 [pdf, other]

Ten issues of NetGPT

Authors: Wen Tong, Chenghui Peng, Tingting Yang, Fei Wang, Juan Deng, Rongpeng Li, Lu Yang, Honggang Zhang, Dong Wang, Ming Ai, Li Yang, Guangyi Liu, Yang Yang, Yao Xiao, Liexiang Yue, Wanfei Sun, Zexu Li, Wenwen Sun

Abstract: With the rapid development and application of foundation models (FMs), it is foreseeable that FMs will play an important role in future wireless communications. As current Artificial Intelligence (AI) algorithms applied in wireless networks are dedicated models that aim for different neural network architectures and objectives, drawbacks in aspects of generality, performance gain, management, coll… ▽ More With the rapid development and application of foundation models (FMs), it is foreseeable that FMs will play an important role in future wireless communications. As current Artificial Intelligence (AI) algorithms applied in wireless networks are dedicated models that aim for different neural network architectures and objectives, drawbacks in aspects of generality, performance gain, management, collaboration, etc. need to be conquered. In this paper, we define NetGPT (Network Generative Pre-trained Transformer) -- the foundation models for wireless communications, and summarize ten issues regarding design and application of NetGPT. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.04320 [pdf, other]

Proprioceptive Invariant Robot State Estimation

Authors: Tzu-Yuan Lin, Tingjun Li, Wenzhe Tong, Maani Ghaffari

Abstract: This paper reports on developing a real-time invariant proprioceptive robot state estimation framework called DRIFT. A didactic introduction to invariant Kalman filtering is provided to make this cutting-edge symmetry-preserving approach accessible to a broader range of robotics applications. Furthermore, this work dives into the development of a proprioceptive state estimation framework for dead… ▽ More This paper reports on developing a real-time invariant proprioceptive robot state estimation framework called DRIFT. A didactic introduction to invariant Kalman filtering is provided to make this cutting-edge symmetry-preserving approach accessible to a broader range of robotics applications. Furthermore, this work dives into the development of a proprioceptive state estimation framework for dead reckoning that only consumes data from an onboard inertial measurement unit and kinematics of the robot, with two optional modules, a contact estimator and a gyro filter for low-cost robots, enabling a significant capability on a variety of robotics platforms to track the robot's state over long trajectories in the absence of perceptual data. Extensive real-world experiments using a legged robot, an indoor wheeled robot, a field robot, and a full-size vehicle, as well as simulation results with a marine robot, are provided to understand the limits of DRIFT. △ Less

Submitted 20 February, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2310.04826 [pdf, other]

doi 10.1145/3313831.3376436

Augmenting Static Visualizations with PapARVis Designer

Authors: Chen Zhu-Tian, Wai Tong, Qianwen Wang, Benjamin Bach, Huamin Qu

Abstract: This paper presents an authoring environment for augmenting static visualizations with virtual content in augmented reality. Augmenting static visualizations can leverage the best of both physical and digital worlds, but its creation currently involves different tools and devices, without any means to explicitly design and debug both static and virtual content simultaneously. To address these issu… ▽ More This paper presents an authoring environment for augmenting static visualizations with virtual content in augmented reality. Augmenting static visualizations can leverage the best of both physical and digital worlds, but its creation currently involves different tools and devices, without any means to explicitly design and debug both static and virtual content simultaneously. To address these issues, we design an environment that seamlessly integrates all steps of a design and deployment workflow through its main features: i) an extension to Vega, ii) a preview, and iii) debug hints that facilitate valid combinations of static and augmented content. We inform our design through a design space with four ways to augment static visualizations. We demonstrate the expressiveness of our tool through examples, including books, posters, projections, wall-sized visualizations. A user study shows high user satisfaction of our environment and confirms that participants can create augmented visualizations in an average of 4.63 minutes. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.02459 [pdf, other]

doi 10.21437/Interspeech.2023-1378

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Authors: Jiaxu Zhu, Weinan Tong, Yaoxun Xu, Changhe Song, Zhiyong Wu, Zhao You, Dan Su, Dong Yu, Helen Meng

Abstract: Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation and text representation is inconsistent. Although the previous method up-samples the text representation to align with acoustic modality, it may not… ▽ More Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation and text representation is inconsistent. Although the previous method up-samples the text representation to align with acoustic modality, it may not match the expected actual duration. In this paper, we proposed novel representations match strategy through down-sampling acoustic representation to align with text modality. By introducing a continuous integrate-and-fire (CIF) module generating acoustic representations consistent with token length, our ASR model can learn unified representations from both modalities better, allowing for domain adaptation using text-only data of the target domain. Experiment results of new domain data demonstrate the effectiveness of the proposed method. △ Less

Submitted 7 October, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: Proceedings of Interspeech. arXiv admin note: text overlap with arXiv:2309.01437

arXiv:2306.02851 [pdf, other]

Scene as Occupancy

Authors: Chonghao Sima, Wenwen Tong, Tai Wang, Li Chen, Silei Wu, Hanming Deng, Yi Gu, Lewei Lu, Ping Luo, Dahua Lin, Hongyang Li

Abstract: Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occu… ▽ More Human driver can easily describe the complex traffic scene by visual system. Such an ability of precise perception is essential for driver's planning. To achieve this, a geometry-aware representation that quantizes the physical 3D scene into structured grid map with semantic labels per cell, termed as 3D Occupancy, would be desirable. Compared to the form of bounding box, a key insight behind occupancy is that it could capture the fine-grained details of critical obstacles in the scene, and thereby facilitate subsequent tasks. Prior or concurrent literature mainly concentrate on a single scene completion task, where we might argue that the potential of this occupancy representation might obsess broader impact. In this paper, we propose OccNet, a multi-view vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy. At the core of OccNet is a general occupancy embedding to represent 3D physical world. Such a descriptor could be applied towards a wide span of driving tasks, including detection, segmentation and planning. To validate the effectiveness of this new representation and our proposed algorithm, we propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes. Empirical experiments show that there are evident performance gain across multiple tasks, e.g., motion planning could witness a collision rate reduction by 15%-58%, demonstrating the superiority of our method. △ Less

Submitted 26 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: Project link: https://github.com/OpenDriveLab/OccNet

arXiv:2303.10340 [pdf, other]

3D Data Augmentation for Driving Scenes on Camera

Authors: Wenwen Tong, Jiangwei Xie, Tianyu Li, Hanming Deng, Xiangwei Geng, Ruoyi Zhou, Dingchen Yang, Bo Dai, Lewei Lu, Hongyang Li

Abstract: Driving scenes are extremely diverse and complicated that it is impossible to collect all cases with human effort alone. While data augmentation is an effective technique to enrich the training data, existing methods for camera data in autonomous driving applications are confined to the 2D image plane, which may not optimally increase data diversity in 3D real-world scenarios. To this end, we prop… ▽ More Driving scenes are extremely diverse and complicated that it is impossible to collect all cases with human effort alone. While data augmentation is an effective technique to enrich the training data, existing methods for camera data in autonomous driving applications are confined to the 2D image plane, which may not optimally increase data diversity in 3D real-world scenarios. To this end, we propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space. We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects. Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds. As such, the training database could be effectively scaled up. However, the 3D object modeling is constrained to the image quality and the limited viewpoints. To overcome these problems, we modify the original NeRF by introducing a geometric rectified loss and a symmetric-aware training strategy. We evaluate our method for the camera-only monocular 3D detection task on the Waymo and nuScences datasets. The proposed data augmentation approach contributes to a gain of 1.7% and 1.4% in terms of detection accuracy, on Waymo and nuScences respectively. Furthermore, the constructed 3D models serve as digital driving assets and could be recycled for different detectors or other 3D perception tasks. △ Less

Submitted 18 March, 2023; originally announced March 2023.

arXiv:2302.13549 [pdf]

Random-Order Enumeration for Self-Reducible NP-Problems

Authors: Pengyu Chen, Dongjing Miao, Weitian Tong, Zizheng Guo, Jianzhong Li, Zhipeng Cai

Abstract: In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration algorithm produces all solutions of a problem instance without repetition. To be a statistically meaningful representation of the solution space, solutions are req… ▽ More In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration algorithm produces all solutions of a problem instance without repetition. To be a statistically meaningful representation of the solution space, solutions are required to be enumerated in uniformly random order. This paper studies a set of self-reducible NP-problems in three hierarchies, where the problems are polynomially countable ($Sr_{NP}^{FP}$), admit FPTAS ($Sr_{NP}^{FPTAS}$), and admit FPRAS ($Sr_{NP}^{FPRAS}$), respectively. The trivial algorithm based on a (almost) uniform generator is in fact inefficient. We provide a new insight that the (almost) uniform generator is not the end of the story. More efficient algorithmic frameworks are proposed to enumerate solutions in uniformly random order for problems in these three hierarchies. (1) For problems in $Sr_{NP}^{FP}$, we show a random-order enumeration algorithm with polynomial delay (PDREnum); (2) For problems in $Sr_{NP}^{FPTAS}$, we show a Las Vegas random-order enumeration algorithm with expected polynomial delay (PDLVREnum); (3) For problems in $Sr_{NP}^{FPRAS}$, we devise a fully polynomial delay Atlantic City random-order enumeration algorithm with expected delay polynomial in the input size and the given error probability $δ$ (FPACREnum), which has a probability of at least $1-δ$ becoming a Las Vegas random-order enumeration algorithm. Finally, to further improve the efficiency of the random-order enumeration algorithms, based on the master/slave paradigm, we present a parallelization with 1.5-optimal enumeration delay and running time, along with the theoretical analysis. △ Less

Submitted 27 February, 2023; originally announced February 2023.

arXiv:2302.08743

Multi-View Clustering from the Perspective of Mutual Information

Authors: Fu Lele, Zhang Lei, Wang Tong, Chen Chuan, Zhang Chuanfu, Zheng Zibin

Abstract: Exploring the complementary information of multi-view data to improve clustering effects is a crucial issue in multi-view clustering. In this paper, we propose a novel model based on information theory termed Informative Multi-View Clustering (IMVC), which extracts the common and view-specific information hidden in multi-view data and constructs a clustering-oriented comprehensive representation.… ▽ More Exploring the complementary information of multi-view data to improve clustering effects is a crucial issue in multi-view clustering. In this paper, we propose a novel model based on information theory termed Informative Multi-View Clustering (IMVC), which extracts the common and view-specific information hidden in multi-view data and constructs a clustering-oriented comprehensive representation. More specifically, we concatenate multiple features into a unified feature representation, then pass it through a encoder to retrieve the common representation across views. Simultaneously, the features of each view are sent to a encoder to produce a compact view-specific representation, respectively. Thus, we constrain the mutual information between the common representation and view-specific representations to be minimal for obtaining multi-level information. Further, the common representation and view-specific representation are spliced to model the refined representation of each view, which is fed into a decoder to reconstruct the initial data with maximizing their mutual information. In order to form a comprehensive representation, the common representation and all view-specific representations are concatenated. Furthermore, to accommodate the comprehensive representation better for the clustering task, we maximize the mutual information between an instance and its k-nearest neighbors to enhance the intra-cluster aggregation, thus inducing well separation of different clusters at the overall aspect. Finally, we conduct extensive experiments on six benchmark datasets, and the experimental results indicate that the proposed IMVC outperforms other methods. △ Less

Submitted 29 May, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

Comments: We think the paper writing isn't good enough, so we would like to withdraw the paper and renew the writing manner

arXiv:2302.01966 [pdf, other]

Towards an Understanding of Distributed Asymmetric Collaborative Visualization on Problem-solving

Authors: Wai Tong, Meng Xia, Kam Kwai Wong, Doug A. Bowman, Ting-Chuen Pong, Huamin Qu, Yalong Yang

Abstract: This paper provided empirical knowledge of the user experience for using collaborative visualization in a distributed asymmetrical setting through controlled user studies. With the ability to access various computing devices, such as Virtual Reality (VR) head-mounted displays, scenarios emerge when collaborators have to or prefer to use different computing environments in different places. However… ▽ More This paper provided empirical knowledge of the user experience for using collaborative visualization in a distributed asymmetrical setting through controlled user studies. With the ability to access various computing devices, such as Virtual Reality (VR) head-mounted displays, scenarios emerge when collaborators have to or prefer to use different computing environments in different places. However, we still lack an understanding of using VR in an asymmetric setting for collaborative visualization. To get an initial understanding and better inform the designs for asymmetric systems, we first conducted a formative study with 12 pairs of participants. All participants collaborated in asymmetric (PC-VR) and symmetric settings (PC-PC and VR-VR). We then improved our asymmetric design based on the key findings and observations from the first study. Another ten pairs of participants collaborated with enhanced PC-VR and PC-PC conditions in a follow-up study. We found that a well-designed asymmetric collaboration system could be as effective as a symmetric system. Surprisingly, participants using PC perceived less mental demand and effort in the asymmetric setting (PC-VR) compared to the symmetric setting (PC-PC). We provided fine-grained discussions about the trade-offs between different collaboration settings. △ Less

Submitted 3 February, 2023; originally announced February 2023.

Comments: 11 pages, 12 figures, accepted at IEEE VR 2023

arXiv:2211.06769 [pdf, other]

Realistic Bokeh Effect Rendering on Mobile GPUs, Mobile AI & AIM 2022 challenge: Report

Authors: Andrey Ignatov, Radu Timofte, Jin Zhang, Feng Zhang, Gaocheng Yu, Zhe Ma, Hongbin Wang, Minsu Kwon, Haotian Qian, Wentao Tong, Pan Mu, Ziping Wang, Guangjing Yan, Brian Lee, Lei Fei, Huaijin Chen, Hyebin Cho, Byeongjun Kwon, Munchurl Kim, Mingyang Qian, Huixin Ma, Yanan Li, Xiaotao Wang, Lei Lei

Abstract: As mobile cameras with compact optics are unable to produce a strong bokeh effect, lots of interest is now devoted to deep learning-based solutions for this task. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based bokeh effect rendering approach that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale EBB!… ▽ More As mobile cameras with compact optics are unable to produce a strong bokeh effect, lots of interest is now devoted to deep learning-based solutions for this task. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based bokeh effect rendering approach that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale EBB! bokeh dataset consisting of 5K shallow / wide depth-of-field image pairs captured using the Canon 7D DSLR camera. The runtime of the resulting models was evaluated on the Kirin 9000's Mali GPU that provides excellent acceleration results for the majority of common deep learning ops. A detailed description of all models developed in this challenge is provided in this paper. △ Less

Submitted 7 November, 2022; originally announced November 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2211.03885; text overlap with arXiv:2105.07809, arXiv:2211.04470, arXiv:2211.05256, arXiv:2211.05910

arXiv:2209.15140 [pdf, other]

Fully Proprioceptive Slip-Velocity-Aware State Estimation for Mobile Robots via Invariant Kalman Filtering and Disturbance Observer

Authors: Xihang Yu, Sangli Teng, Theodor Chakhachiro, Wenzhe Tong, Tingjun Li, Tzu-Yuan Lin, Sarah Koehler, Manuel Ahumada, Jeffrey M. Walls, Maani Ghaffari

Abstract: This paper develops a novel slip estimator using the invariant observer design theory and Disturbance Observer (DOB). The proposed state estimator for mobile robots is fully proprioceptive and combines data from an inertial measurement unit and body velocity within a Right Invariant Extended Kalman Filter (RI-EKF). By embedding the slip velocity into $\mathrm{SE}_3(3)$ matrix Lie group, the develo… ▽ More This paper develops a novel slip estimator using the invariant observer design theory and Disturbance Observer (DOB). The proposed state estimator for mobile robots is fully proprioceptive and combines data from an inertial measurement unit and body velocity within a Right Invariant Extended Kalman Filter (RI-EKF). By embedding the slip velocity into $\mathrm{SE}_3(3)$ matrix Lie group, the developed DOB-based RI-EKF provides real-time velocity and slip velocity estimates on different terrains. Experimental results using a Husky wheeled robot confirm the mathematical derivations and effectiveness of the proposed method in estimating the observable state variables. Open-source software is available for download and reproducing the presented results. △ Less

Submitted 30 September, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: The work will be presented in IROS2023. github repository at https://github.com/UMich-CURLY/slip_detection_DOB. arXiv admin note: text overlap with arXiv:1805.10410 by other authors

arXiv:2208.10603 [pdf, other]

Exploring Interactions with Printed Data Visualizations in Augmented Reality

Authors: Wai Tong, Zhutian Chen, Meng Xia, Leo Yu-Ho Lo, Linping Yuan, Benjamin Bach, Huamin Qu

Abstract: This paper presents a design space of interaction techniques to engage with visualizations that are printed on paper and augmented through Augmented Reality. Paper sheets are widely used to deploy visualizations and provide a rich set of tangible affordances for interactions, such as touch, folding, tilting, or stacking. At the same time, augmented reality can dynamically update visualization cont… ▽ More This paper presents a design space of interaction techniques to engage with visualizations that are printed on paper and augmented through Augmented Reality. Paper sheets are widely used to deploy visualizations and provide a rich set of tangible affordances for interactions, such as touch, folding, tilting, or stacking. At the same time, augmented reality can dynamically update visualization content to provide commands such as pan, zoom, filter, or detail on demand. This paper is the first to provide a structured approach to mapping possible actions with the paper to interaction commands. This design space and the findings of a controlled user study have implications for future designs of augmented reality systems involving paper sheets and visualizations. Through workshops (N=20) and ideation, we identified 81 interactions that we classify in three dimensions: 1) commands that can be supported by an interaction, 2) the specific parameters provided by an (inter)action with paper, and 3) the number of paper sheets involved in an interaction. We tested user preference and viability of 11 of these interactions with a prototype implementation in a controlled study (N=12, HoloLens 2) and found that most of the interactions are intuitive and engaging to use. We summarized interactions (e.g., tilt to pan) that have strong affordance to complement "point" for data exploration, physical limitations and properties of paper as a medium, cases requiring redundancy and shortcuts, and other implications for design. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Comments: 11 pages, 9 figures, 1 table, accepted at IEEE VIS 2022

arXiv:2207.11238 [pdf]

Improved lightweight identification of agricultural diseases based on MobileNetV3

Authors: Yuhang Jiang, Wenping Tong

Abstract: At present, the identification of agricultural pests and diseases has the problem that the model is not lightweight enough and difficult to apply. Based on MobileNetV3, this paper introduces the Coordinate Attention block. The parameters of MobileNetV3-large are reduced by 22%, the model size is reduced by 19.7%, and the accuracy is improved by 0.92%. The parameters of MobileNetV3-small are reduce… ▽ More At present, the identification of agricultural pests and diseases has the problem that the model is not lightweight enough and difficult to apply. Based on MobileNetV3, this paper introduces the Coordinate Attention block. The parameters of MobileNetV3-large are reduced by 22%, the model size is reduced by 19.7%, and the accuracy is improved by 0.92%. The parameters of MobileNetV3-small are reduced by 23.4%, the model size is reduced by 18.3%, and the accuracy is increased by 0.40%. In addition, the improved MobileNetV3-small was migrated to Jetson Nano for testing. The accuracy increased by 2.48% to 98.31%, and the inference speed increased by 7.5%. It provides a reference for deploying the agricultural pest identification model to embedded devices. △ Less

Submitted 19 July, 2022; originally announced July 2022.

Comments: Accepted by CAIBDA 2022

arXiv:2206.06897 [pdf, other]

On the Message Passing Efficiency of Polar and Low-Density Parity-Check Decoders

Authors: Dawei Yin, Yuan Li, Xianbin Wang, Jiajie Tong, Huazi Zhang, Jun Wang, Guanghui Wang, Jun Chen, Guiying Yan, Zhiming Ma, Wen Tong

Abstract: This study focuses on the efficiency of message-passing-based decoding algorithms for polar and low-density parity-check (LDPC) codes. Both successive cancellation (SC) and belief propagation (BP) decoding algorithms are studied {in} the message-passing framework. Counter-intuitively, SC decoding demonstrates the highest decoding efficiency, although it was considered a weak decoder {in terms of}… ▽ More This study focuses on the efficiency of message-passing-based decoding algorithms for polar and low-density parity-check (LDPC) codes. Both successive cancellation (SC) and belief propagation (BP) decoding algorithms are studied {in} the message-passing framework. Counter-intuitively, SC decoding demonstrates the highest decoding efficiency, although it was considered a weak decoder {in terms of} error-correction performance. We analyze the complexity-performance tradeoff to dynamically track the decoding efficiency, where the complexity is measured by the number of messages passed (NMP), and the performance is measured by the statistical distance to the maximum a posteriori (MAP) estimate. This study offers a new insight into the contribution of each message passed in decoding, and compares various decoding algorithms on a message-by-message level. The analysis corroborates recent results on terabits-per-second polar SC decoders, and might shed light on better scheduling strategies. △ Less

Submitted 20 April, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.14407 [pdf, ps, other]

An efficient polynomial-time approximation scheme for parallel multi-stage open shops

Authors: Jianming Dong, Ruyan Jin, Guohui Lin, Bing Su, Weitian Tong, Yao Xu

Abstract: Various new scheduling problems have been arising from practical production processes and spawning new research areas in the scheduling field. We study the parallel multi-stage open shops problem, which generalizes the classic open shop scheduling and parallel machine scheduling problems. Given m identical k-stage open shops and a set of n jobs, we aim to process all jobs on these open shops with… ▽ More Various new scheduling problems have been arising from practical production processes and spawning new research areas in the scheduling field. We study the parallel multi-stage open shops problem, which generalizes the classic open shop scheduling and parallel machine scheduling problems. Given m identical k-stage open shops and a set of n jobs, we aim to process all jobs on these open shops with the minimum makespan, i.e., the completion time of the last job, under the constraint that job preemption is not allowed. We present an efficient polynomial-time approximation scheme (EPTAS) for the case when both m and k are constant. The main idea for our EPTAS is the combination of several categorization, scaling, and linear programming rounding techniques. Jobs and/or operations are first scaled and then categorized carefully into multiple types so that different types of jobs and/or operations are scheduled appropriately without increasing the makespan too much. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.06523 [pdf, ps, other]

Deterministic Identification over Channels without CSI

Authors: Yuan Li, Xianbin Wang, Huazi Zhang, Jun Wang, Wen Tong, Guiying Yan, Zhiming Ma

Abstract: Identification capacities of randomized and deterministic identification were proved to exceed channel capacity for Gaussian channels \emph{with} channel side information (CSI). In this work, we extend deterministic identification to the block fading channels without CSI by applying identification codes for both channel estimation and user identification. We prove that identification capacity is a… ▽ More Identification capacities of randomized and deterministic identification were proved to exceed channel capacity for Gaussian channels \emph{with} channel side information (CSI). In this work, we extend deterministic identification to the block fading channels without CSI by applying identification codes for both channel estimation and user identification. We prove that identification capacity is asymptotically higher than transmission capacity even in the absence of CSI. And we also analyze the finite-length performance theoretically and numerically. The simulation results verify the feasibility of the proposed blind deterministic identification in finite blocklength regime. △ Less

Submitted 11 August, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

arXiv:2204.06049 [pdf, ps, other]

On the Rate-Distortion-Perception Function

Authors: Jun Chen, Lei Yu, Jia Wang, Wuxian Shi, Yiqun Ge, Wen Tong

Abstract: Rate-distortion-perception theory generalizes Shannon's rate-distortion theory by introducing a constraint on the perceptual quality of the output. The perception constraint complements the conventional distortion constraint and aims to enforce distribution-level consistencies. In this new theory, the information-theoretic limit is characterized by the rate-distortion-perception function. Although… ▽ More Rate-distortion-perception theory generalizes Shannon's rate-distortion theory by introducing a constraint on the perceptual quality of the output. The perception constraint complements the conventional distortion constraint and aims to enforce distribution-level consistencies. In this new theory, the information-theoretic limit is characterized by the rate-distortion-perception function. Although a coding theorem for the rate-distortion-perception function has recently been established, the fundamental nature of the optimal coding schemes remains unclear, especially regarding the role of randomness in encoding and decoding. It is shown in the present work that except for certain extreme cases, the rate-distortion-perception function is achievable by deterministic codes. This paper also clarifies the subtle differences between two notions of perfect perceptual quality and explores some alternative formulations of the perception constraint. △ Less

Submitted 12 April, 2022; originally announced April 2022.

arXiv:2204.00856 [pdf, other]

ComputableViz: Mathematical Operators as a Formalism for Visualization Processing and Analysis

Authors: Aoyu Wu, Wai Tong, Haotian Li, Dominik Moritz, Yong Wang, Huamin Qu

Abstract: Data visualizations are created and shared on the web at an unprecedented speed, raising new needs and questions for processing and analyzing visualizations after they have been generated and digitized. However, existing formalisms focus on operating on a single visualization instead of multiple visualizations, making it challenging to perform analysis tasks such as sorting and clustering visualiz… ▽ More Data visualizations are created and shared on the web at an unprecedented speed, raising new needs and questions for processing and analyzing visualizations after they have been generated and digitized. However, existing formalisms focus on operating on a single visualization instead of multiple visualizations, making it challenging to perform analysis tasks such as sorting and clustering visualizations. Through a systematic analysis of previous work, we abstract visualization-related tasks into mathematical operators such as union and propose a design space of visualization operations. We realize the design by developing ComputableViz, a library that supports operations on multiple visualization specifications. To demonstrate its usefulness and extensibility, we present multiple usage scenarios concerning processing and analyzing visualization, such as generating visualization embeddings and automatically making visualizations accessible. We conclude by discussing research opportunities and challenges for managing and exploiting the massive visualizations on the web. △ Less

Submitted 2 April, 2022; originally announced April 2022.

Comments: 15 pages, 12 figures. In the ACM Conference on Human Factors in Computing Systems (CHI) 2022

arXiv:2203.00573 [pdf, other]

doi 10.1103/PhysRevE.105.064118

Contrasting random and learned features in deep Bayesian linear regression

Authors: Jacob A. Zavatone-Veth, William L. Tong, Cengiz Pehlevan

Abstract: Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are t… ▽ More Understanding how feature learning affects generalization is among the foremost goals of modern deep learning theory. Here, we study how the ability to learn representations affects the generalization performance of a simple class of models: deep Bayesian linear neural networks trained on unstructured Gaussian data. By comparing deep random feature models to deep networks in which all layers are trained, we provide a detailed characterization of the interplay between width, depth, data density, and prior mismatch. We show that both models display sample-wise double-descent behavior in the presence of label noise. Random feature models can also display model-wise double-descent if there are narrow bottleneck layers, while deep networks do not show these divergences. Random feature models can have particular widths that are optimal for generalization at a given data density, while making neural networks as wide or as narrow as possible is always optimal. Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained. Taken together, our findings begin to elucidate how architectural details affect generalization performance in this simple class of deep regression models. △ Less

Submitted 16 June, 2022; v1 submitted 1 March, 2022; originally announced March 2022.

Comments: 35 pages, 7 figures. v2: minor typos corrected and references added; published in PRE

Journal ref: Physical Review E 105, 064118 (2022)

arXiv:2201.10929 [pdf, other]

Task-Oriented Image Semantic Communication Based on Rate-Distortion Theory

Authors: Fangfang Liu, Wanjie Tong, Yang Yang, Zhengfen Sun, Caili Guo

Abstract: Task-oriented image semantic communication is a new communication paradigm, which aims to transmit semantics for artificial intelligent (AI) tasks while ignoring the reconstruction quality of the images. However, in some applications, such as autonomous driving, both image reconstruction quality and the performance of the followed AI tasks must be simultaneously considered. To tackle this challeng… ▽ More Task-oriented image semantic communication is a new communication paradigm, which aims to transmit semantics for artificial intelligent (AI) tasks while ignoring the reconstruction quality of the images. However, in some applications, such as autonomous driving, both image reconstruction quality and the performance of the followed AI tasks must be simultaneously considered. To tackle this challenge, this paper proposes a task-oriented semantic communication scheme with semantic reconstruction (TOSC-SR). Its main goal is to simultaneously minimize pixel-level and task-relevant semantic-level distortion during communications under a certain rate, which formulates a new rate-distortion optimization problem. To successfully measure the loss at the semantic level, a new form of semantic distortion measured by the mutual information between the semantic-reconstructed images and the task labels is proposed. Then, we derive an analytical solution for the formulated problem, where the self-consistent equations of the problem are obtained to determine the optimal mapping of the source and the semantic-reconstructed images. To implement TOSC-SR, we further obtain an extended form of rate-distortion form based on the variational approximation of mutual information, which is applicable to multiple AI tasks. Experimental results show that the proposed approach outperforms the traditional JPEG, JPEG2000, BPG, VVC-based image communication systems and deep learning based benchmarks in terms of image reconstruction quality, AI task performance, and multi-task generalization ability. △ Less

Submitted 1 December, 2022; v1 submitted 26 January, 2022; originally announced January 2022.

Comments: 17 pages, 8 figures

arXiv:2201.07784 [pdf, other]

On Distributed Lossy Coding of Symmetrically Correlated Gaussian Sources

Authors: Siyao Zhou, Sadaf Salehkalaibar, Jingjing Qian, Jun Chen, Wuxian Shi, Yiqun Ge, Wen Tong

Abstract: A distributed lossy compression network with $L$ encoders and a decoder is considered. Each encoder observes a source and sends a compressed version to the decoder. The decoder produces a joint reconstruction of target signals with the mean squared error distortion below a given threshold. It is assumed that the observed sources can be expressed as the sum of target signals and corruptive noises w… ▽ More A distributed lossy compression network with $L$ encoders and a decoder is considered. Each encoder observes a source and sends a compressed version to the decoder. The decoder produces a joint reconstruction of target signals with the mean squared error distortion below a given threshold. It is assumed that the observed sources can be expressed as the sum of target signals and corruptive noises which are independently generated from two symmetric multivariate Gaussian distributions. The minimum compression rate of this network versus the distortion threshold is referred to as the rate-distortion function, for which an explicit lower bound is established by solving a minimization problem. Our lower bound matches the well-known Berger-Tung upper bound for some values of the distortion threshold. The asymptotic gap between the upper and lower bounds is characterized in the large $L$ limit. △ Less

Submitted 3 June, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

arXiv:2201.04196 [pdf, ps, other]

doi 10.1016/j.tcs.2022.04.044

A polynomial-time approximation scheme for parallel two-stage flowshops under makespan constraint

Authors: Weitian Tong, Yao Xu, Huili Zhang

Abstract: As a hybrid of the Parallel Two-stage Flowshop problem and the Multiple Knapsack problem, we investigate the scheduling of parallel two-stage flowshops under makespan constraint, which was motivated by applications in cloud computing and introduced by Chen et al. [3] recently. A set of two-stage jobs are selected and scheduled on parallel two-stage flowshops to achieve the maximum total profit whi… ▽ More As a hybrid of the Parallel Two-stage Flowshop problem and the Multiple Knapsack problem, we investigate the scheduling of parallel two-stage flowshops under makespan constraint, which was motivated by applications in cloud computing and introduced by Chen et al. [3] recently. A set of two-stage jobs are selected and scheduled on parallel two-stage flowshops to achieve the maximum total profit while maintaining the given makespan constraint. We give a positive answer to an open question about its approximability proposed by Chen et al. [3]. More specifically, based on guessing strategies and rounding techniques for linear programs, we present a polynomial-time approximation scheme (PTAS) for the case when the number of flowshops is a fixed constant. △ Less

Submitted 18 May, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

Comments: Theoretical Computer Science (2022)

arXiv:2201.01389 [pdf, other]

Semantic Communications: Principles and Challenges

Authors: Zhijin Qin, Xiaoming Tao, Jianhua Lu, Wen Tong, Geoffrey Ye Li

Abstract: Semantic communication, regarded as the breakthrough beyond the Shannon paradigm, aims at the successful transmission of semantic information conveyed by the source rather than the accurate reception of each single symbol or bit regardless of its meaning. This article provides an overview on semantic communications. After a brief review of Shannon information theory, we discuss semantic communicat… ▽ More Semantic communication, regarded as the breakthrough beyond the Shannon paradigm, aims at the successful transmission of semantic information conveyed by the source rather than the accurate reception of each single symbol or bit regardless of its meaning. This article provides an overview on semantic communications. After a brief review of Shannon information theory, we discuss semantic communications with theory, framework, and system design enabled by deep learning. Different from the symbol/bit error rate used for measuring conventional communication systems, performance metrics for semantic communications are also discussed. The article concludes with several open questions in semantic communications. △ Less

Submitted 27 June, 2022; v1 submitted 30 December, 2021; originally announced January 2022.

arXiv:2112.10087 [pdf, other]

Reasoning Structural Relation for Occlusion-Robust Facial Landmark Localization

Authors: Congcong Zhu, Xiaoqiang Li, Jide Li, Songmin Dai, Weiqin Tong

Abstract: In facial landmark localization tasks, various occlusions heavily degrade the localization accuracy due to the partial observability of facial features. This paper proposes a structural relation network (SRN) for occlusion-robust landmark localization. Unlike most existing methods that simply exploit the shape constraint, the proposed SRN aims to capture the structural relations among different fa… ▽ More In facial landmark localization tasks, various occlusions heavily degrade the localization accuracy due to the partial observability of facial features. This paper proposes a structural relation network (SRN) for occlusion-robust landmark localization. Unlike most existing methods that simply exploit the shape constraint, the proposed SRN aims to capture the structural relations among different facial components. These relations can be considered a more powerful shape constraint against occlusion. To achieve this, a hierarchical structural relation module (HSRM) is designed to hierarchically reason the structural relations that represent both long- and short-distance spatial dependencies. Compared with existing network architectures, HSRM can efficiently model the spatial relations by leveraging its geometry-aware network architecture, which reduces the semantic ambiguity caused by occlusion. Moreover, the SRN augments the training data by synthesizing occluded faces. To further extend our SRN for occluded video data, we formulate the occluded face synthesis as a Markov decision process (MDP). Specifically, it plans the movement of the dynamic occlusion based on an accumulated reward associated with the performance degradation of the pre-trained SRN. This procedure augments hard samples for robust facial landmark tracking. Extensive experimental results indicate that the proposed method achieves outstanding performance on occluded and masked faces. Code is available at https://github.com/zhuccly/SRN. △ Less

Submitted 19 December, 2021; originally announced December 2021.

Comments: Accepted by Pattern recognition

arXiv:2110.12610 [pdf, other]

Antenna Array Enabled Space/Air/Ground Communications and Networking for 6G

Authors: Zhenyu Xiao, Zhu Han, Arumugam Nallanathan, Octavia A. Dobre, Bruno Clerckx, Jinho Choi, Chong He, Wen Tong

Abstract: Antenna arrays have a long history of more than 100 years and have evolved closely with the development of electronic and information technologies, playing an indispensable role in wireless communications and radar. With the rapid development of electronic and information technologies, the demand for all-time, all-domain, and full-space network services has exploded, and new communication requirem… ▽ More Antenna arrays have a long history of more than 100 years and have evolved closely with the development of electronic and information technologies, playing an indispensable role in wireless communications and radar. With the rapid development of electronic and information technologies, the demand for all-time, all-domain, and full-space network services has exploded, and new communication requirements have been put forward on various space/air/ground platforms. To meet the ever increasing requirements of the future sixth generation (6G) wireless communications, such as high capacity, wide coverage, low latency, and strong robustness, it is promising to employ different types of antenna arrays with various beamforming technologies in space/air/ground communication networks, bringing in advantages such as considerable antenna gains, multiplexing gains, and diversity gains. However, enabling antenna array for space/air/ground communication networks poses specific, distinctive and tricky challenges, which has aroused extensive research attention. This paper aims to overview the field of antenna array enabled space/air/ground communications and networking. The technical potentials and challenges of antenna array enabled space/air/ground communications and networking are presented first. Subsequently, the antenna array structures and designs are discussed. We then discuss various emerging technologies facilitated by antenna arrays to meet the new communication requirements of space/air/ground communication systems. Enabled by these emerging technologies, the distinct characteristics, challenges, and solutions for space communications, airborne communications, and ground communications are reviewed. Finally, we present promising directions for future research in antenna array enabled space/air/ground communications and networking. △ Less

Submitted 26 March, 2022; v1 submitted 24 October, 2021; originally announced October 2021.

arXiv:2110.00931 [pdf]

doi 10.35833/MPCE.2022.000099

Exploration of Artificial Intelligence-oriented Power System Dynamic Simulators

Authors: Tannan Xiao, Ying Chen, Jianquan Wang, Shaowei Huang, Weilin Tong, Tirui He

Abstract: With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consi… ▽ More With the rapid development of artificial intelligence (AI), it is foreseeable that the accuracy and efficiency of dynamic analysis for future power system will be greatly improved by the integration of dynamic simulators and AI. To explore the interaction mechanism of power system dynamic simulations and AI, a general design of an AI-oriented power system dynamic simulator is proposed, which consists of a high-performance simulator with neural network supportability and flexible external and internal application programming interfaces (APIs). With the support of APIs, simulation-assisted AI and AI-assisted simulation form a comprehensive interaction mechanism between power system dynamic simulations and AI. A prototype of this design is implemented and made public based on a highly efficient electromechanical simulator. Tests of this prototype are carried out under four scenarios including sample generation, AI-based stability prediction, data-driven dynamic component modeling, and AI-aided stability control, which prove the validity, flexibility, and efficiency of the design and implementation of the AI-oriented power system dynamic simulator. △ Less

Submitted 6 July, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: 10 pages, 8 figures, 1 table. Accepted by Journal of Modern Power System and Clean Energy

arXiv:2109.11320 [pdf, other]

Nine Challenges in Artificial Intelligence and Wireless Communications for 6G

Authors: Wen Tong, Geoffrey Ye Li

Abstract: In recent years, techniques developed in artificial intelligence (AI), especially those in machine learning (ML), have been successfully applied in various areas, leading to a widespread belief that AI will collectively play an important role in future wireless communications. To accomplish the aspiration, we present nine challenges to be addressed by the interdisciplinary areas of AI/ML and wirel… ▽ More In recent years, techniques developed in artificial intelligence (AI), especially those in machine learning (ML), have been successfully applied in various areas, leading to a widespread belief that AI will collectively play an important role in future wireless communications. To accomplish the aspiration, we present nine challenges to be addressed by the interdisciplinary areas of AI/ML and wireless communications, with particular focus towards the sixth generation (6G) wireless networks. Specifically, this article classifies the nine challenges into computation in AI, distributed neural networks and learning, and ML enabled semantic communications. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: 6 pages

arXiv:2107.08607 [pdf, ps, other]

A unified polar decoder platform for low-power and low-cost devices

Authors: Jiajie Tong, Qifan Zhang, Huazi Zhang, Rong Li, Jun Wang, Wen Tong

Abstract: In this paper, we design a polar decoding platform for diverse application scenarios that require low-cost and low-power communications. Specifically, prevalent polar decoders such as successive cancellation (SC), SC-list (SCL) and Fano decoders are all supported under the same architecture. Unlike high-throughput or low-latency decoders that promote parallelism, this architecture promotes seriali… ▽ More In this paper, we design a polar decoding platform for diverse application scenarios that require low-cost and low-power communications. Specifically, prevalent polar decoders such as successive cancellation (SC), SC-list (SCL) and Fano decoders are all supported under the same architecture. Unlike high-throughput or low-latency decoders that promote parallelism, this architecture promotes serialization by repeatedly calling a ``sub-process'' that is executed by a core module. The resulting serial SCL-8 decoder is only 3 times as big as an SC decoder. Cost and power are minimized through resource sharing and adaptive decoding techniques, etc. We carried out performance simulation and hardware implementation to evaluate the actual chip area and energy consumption. △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: 6 pages, 8 figures. Part of this paper was presented in an invited talk at the 2021 International Symposium on Information Theory (ISIT)

arXiv:2107.08600 [pdf, ps, other]

Fast polar codes for terabits-per-second throughput communications

Authors: Jiajie Tong, Xianbin Wang, Qifan Zhang, Huazi Zhang, Rong Li, Jun Wang, Wen Tong

Abstract: Targeting high-throughput and low-power communications, we implement two successive cancellation (SC) decoders for polar codes. With $16nm$ ASIC technology, the area efficiency and energy efficiency are $4Tbps/mm^2$ and $0.63pJ/bit$, respectively, for the unrolled decoder, and $561Gbps/mm^2$ and $1.21pJ/bit$, respectively, for the recursive decoder. To achieve such a high throughput, a novel code… ▽ More Targeting high-throughput and low-power communications, we implement two successive cancellation (SC) decoders for polar codes. With $16nm$ ASIC technology, the area efficiency and energy efficiency are $4Tbps/mm^2$ and $0.63pJ/bit$, respectively, for the unrolled decoder, and $561Gbps/mm^2$ and $1.21pJ/bit$, respectively, for the recursive decoder. To achieve such a high throughput, a novel code construction, coined as fast polar codes, is proposed and jointly optimized with a highly-parallel SC decoding architecture. First, we reuse existing modules to fast decode more outer code blocks, and then modify code construction to facilitate faster decoding for all outer code blocks up to a degree of parallelism of $16$. Furthermore, parallel comparison circuits and bit quantization schemes are customized for hardware implementation. Collectively, they contribute to an $2.66\times$ area efficiency improvement and $33\%$ energy saving over the state of the art. △ Less

Submitted 18 July, 2021; originally announced July 2021.

Comments: 8 pages, 5 figures. Part of this paper was presented in an invited talk at the 2021 International Symposium on Information Theory (ISIT)

arXiv:2104.01026 [pdf, other]

SGBA: A Stealthy Scapegoat Backdoor Attack against Deep Neural Networks

Authors: Ying He, Zhili Shen, Chang Xia, Jingyu Hua, Wei Tong, Sheng Zhong

Abstract: Outsourced deep neural networks have been demonstrated to suffer from patch-based trojan attacks, in which an adversary poisons the training sets to inject a backdoor in the obtained model so that regular inputs can be still labeled correctly while those carrying a specific trigger are falsely given a target label. Due to the severity of such attacks, many backdoor detection and containment system… ▽ More Outsourced deep neural networks have been demonstrated to suffer from patch-based trojan attacks, in which an adversary poisons the training sets to inject a backdoor in the obtained model so that regular inputs can be still labeled correctly while those carrying a specific trigger are falsely given a target label. Due to the severity of such attacks, many backdoor detection and containment systems have recently, been proposed for deep neural networks. One major category among them are various model inspection schemes, which hope to detect backdoors before deploying models from non-trusted third-parties. In this paper, we show that such state-of-the-art schemes can be defeated by a so-called Scapegoat Backdoor Attack, which introduces a benign scapegoat trigger in data poisoning to prevent the defender from reversing the real abnormal trigger. In addition, it confines the values of network parameters within the same variances of those from clean model during training, which further significantly enhances the difficulty of the defender to learn the differences between legal and illegal models through machine-learning approaches. Our experiments on 3 popular datasets show that it can escape detection by all five state-of-the-art model inspection schemes. Moreover, this attack brings almost no side-effects on the attack effectiveness and guarantees the universal feature of the trigger compared with original patch-based trojan attacks. △ Less

Submitted 16 May, 2022; v1 submitted 2 April, 2021; originally announced April 2021.

arXiv:2103.14300 [pdf, other]

Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interaction

Authors: Anxing Xiao, Wenzhe Tong, Lizhi Yang, Jun Zeng, Zhongyu Li, Koushil Sreenath

Abstract: An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method… ▽ More An autonomous robot that is able to physically guide humans through narrow and cluttered spaces could be a big boon to the visually-impaired. Most prior robotic guiding systems are based on wheeled platforms with large bases with actuated rigid guiding canes. The large bases and the actuated arms limit these prior approaches from operating in narrow and cluttered environments. We propose a method that introduces a quadrupedal robot with a leash to enable the robot-guiding human system to change its intrinsic dimension (by letting the leash go slack) in order to fit into narrow spaces. We propose a hybrid physical Human-Robot Interaction model that involves leash tension to describe the dynamical relationship in the robot-guiding human system. This hybrid model is utilized in a mixed-integer programming problem to develop a reactive planner that is able to utilize slack-taut switching to guide a blind-folded person to safely travel in a confined space. The proposed leash-guided robot framework is deployed on a Mini Cheetah quadrupedal robot and validated in experiments. △ Less

Submitted 28 June, 2021; v1 submitted 26 March, 2021; originally announced March 2021.

Comments: Accepted to 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2103.14215 [pdf, other]

The Complete Affine Automorphism Group of Polar Codes

Authors: Yuan Li, Huazi Zhang, Rong Li, Jun Wang, Wen Tong, Guiying Yan, Zhiming Ma

Abstract: Recently, a permutation-based successive cancellation (PSC) decoding framework for polar codes attaches much attention. It decodes several permuted codewords with independent successive cancellation (SC) decoders. Its latency thus can be reduced to that of SC decoding. However, the PSC framework is ineffective for permutations falling into the lower-triangular affine (LTA) automorphism group, as t… ▽ More Recently, a permutation-based successive cancellation (PSC) decoding framework for polar codes attaches much attention. It decodes several permuted codewords with independent successive cancellation (SC) decoders. Its latency thus can be reduced to that of SC decoding. However, the PSC framework is ineffective for permutations falling into the lower-triangular affine (LTA) automorphism group, as they are invariant under SC decoding. As such, a larger block lower-triangular affine (BLTA) group that contains SC-variant permutations was discovered for decreasing polar codes. But it was unknown whether BLTA equals the complete automorphism group. In this paper, we prove that BLTA equals the complete automorphisms of decreasing polar codes that can be formulated as affine trasformations. △ Less

Submitted 25 March, 2021; originally announced March 2021.

Comments: 6 pages, 5 figures

arXiv:2008.06678 [pdf, other]

MobileVisFixer: Tailoring Web Visualizations for Mobile Phones Leveraging an Explainable Reinforcement Learning Framework

Authors: Aoyu Wu, Wai Tong, Tim Dwyer, Bongshin Lee, Petra Isenberg, Huamin Qu

Abstract: We contribute MobileVisFixer, a new method to make visualizations more mobile-friendly. Although mobile devices have become the primary means of accessing information on the web, many existing visualizations are not optimized for small screens and can lead to a frustrating user experience. Currently, practitioners and researchers have to engage in a tedious and time-consuming process to ensure tha… ▽ More We contribute MobileVisFixer, a new method to make visualizations more mobile-friendly. Although mobile devices have become the primary means of accessing information on the web, many existing visualizations are not optimized for small screens and can lead to a frustrating user experience. Currently, practitioners and researchers have to engage in a tedious and time-consuming process to ensure that their designs scale to screens of different sizes, and existing toolkits and libraries provide little support in diagnosing and repairing issues. To address this challenge, MobileVisFixer automates a mobile-friendly visualization re-design process with a novel reinforcement learning framework. To inform the design of MobileVisFixer, we first collected and analyzed SVG-based visualizations on the web, and identified five common mobile-friendly issues. MobileVisFixer addresses four of these issues on single-view Cartesian visualizations with linear or discrete scales by a Markov Decision Process model that is both generalizable across various visualizations and fully explainable. MobileVisFixer deconstructs charts into declarative formats, and uses a greedy heuristic based on Policy Gradient methods to find solutions to this difficult, multi-criteria optimization problem in reasonable time. In addition, MobileVisFixer can be easily extended with the incorporation of optimization algorithms for data visualizations. Quantitative evaluation on two real-world datasets demonstrates the effectiveness and generalizability of our method. △ Less

Submitted 15 August, 2020; originally announced August 2020.

Comments: Accepted at IEEE VIS 2020 (Info VIS)

arXiv:1905.10949 [pdf, other]

doi 10.1145/3292500.3330900

QuesNet: A Unified Representation for Heterogeneous Test Questions

Authors: Yu Yin, Qi Liu, Zhenya Huang, Enhong Chen, Wei Tong, Shijin Wang, Yu Su

Abstract: Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representat… ▽ More Understanding learning materials (e.g. test questions) is a crucial issue in online learning systems, which can promote many applications in education domain. Unfortunately, many supervised approaches suffer from the problem of scarce human labeled data, whereas abundant unlabeled resources are highly underutilized. To alleviate this problem, an effective solution is to use pre-trained representations for question understanding. However, existing pre-training methods in NLP area are infeasible to learn test question representations due to several domain-specific characteristics in education. First, questions usually comprise of heterogeneous data including content text, images and side information. Second, there exists both basic linguistic information as well as domain logic and knowledge. To this end, in this paper, we propose a novel pre-training method, namely QuesNet, for comprehensively learning question representations. Specifically, we first design a unified framework to aggregate question information with its heterogeneous inputs into a comprehensive vector. Then we propose a two-level hierarchical pre-training algorithm to learn better understanding of test questions in an unsupervised way. Here, a novel holed language model objective is developed to extract low-level linguistic features, and a domain-oriented objective is proposed to learn high-level logic and knowledge. Moreover, we show that QuesNet has good capability of being fine-tuned in many question-based tasks. We conduct extensive experiments on large-scale real-world question data, where the experimental results clearly demonstrate the effectiveness of QuesNet for question understanding as well as its superior applicability. △ Less

Submitted 26 May, 2019; originally announced May 2019.

arXiv:1902.00650 [pdf, other]

doi 10.1016/j.cma.2019.112769

Volumetric Spline Parameterization for Isogeometric Analysis

Authors: Maodong Pan, Falai Chen, Weihua Tong

Abstract: Given the spline representation of the boundary of a three dimensional domain, constructing a volumetric spline parameterization of the domain (i.e., a map from a unit cube to the domain) with the given boundary is a fundamental problem in isogeometric analysis. A good domain parameterization should satisfy the following criteria: (1) the parameterization is a bijective map; and (2) the map has lo… ▽ More Given the spline representation of the boundary of a three dimensional domain, constructing a volumetric spline parameterization of the domain (i.e., a map from a unit cube to the domain) with the given boundary is a fundamental problem in isogeometric analysis. A good domain parameterization should satisfy the following criteria: (1) the parameterization is a bijective map; and (2) the map has lowest possible distortion. However, none of the state-of-the-art volumetric parameterization methods has fully addressed the above issues. In this paper, we propose a three-stage approach for constructing volumetric parameterization satisfying the above criteria. Firstly, a harmonic map is computed between a unit cube and the computational domain. Then a bijective map modeled by a max-min optimization problem is computed in a coarse-to-fine way, and an algorithm based on divide and conquer strategy is proposed to solve the optimization problem efficiently. Finally, to ensure high quality of the parameterization, the MIPS (Most Isometric Parameterizations) method is adopted to reduce the conformal distortion of the bijective map. We provide several examples to demonstrate the feasibility of our approach and to compare our approach with some state-of-the-art methods. The results show that our algorithm produces bijective parameterization with high quality even for complex domains. △ Less

Submitted 2 February, 2019; originally announced February 2019.

arXiv:1812.09353 [pdf, other]

A local search $4/3$-approximation algorithm for the minimum $3$-path partition problem

Authors: Yong Chen, Randy Goebel, Guohui Lin, Longcheng Liu, Bing Su, Weitian Tong, Yao Xu, An Zhang

Abstract: Given a graph $G = (V, E)$, the $3$-path partition problem is to find a minimum collection of vertex-disjoint paths each of order at most $3$ to cover all the vertices of $V$. It is different from but closely related to the well-known $3$-set cover problem. The best known approximation algorithm for the $3$-path partition problem was proposed recently and has a ratio $13/9$. Here we present a loca… ▽ More Given a graph $G = (V, E)$, the $3$-path partition problem is to find a minimum collection of vertex-disjoint paths each of order at most $3$ to cover all the vertices of $V$. It is different from but closely related to the well-known $3$-set cover problem. The best known approximation algorithm for the $3$-path partition problem was proposed recently and has a ratio $13/9$. Here we present a local search algorithm and show, by an amortized analysis, that it is a $4/3$-approximation. This ratio matches up to the best approximation ratio for the $3$-set cover problem. △ Less

Submitted 21 December, 2018; originally announced December 2018.

Comments: 16 pages, 21 figures

arXiv:1812.05536 [pdf, other]

doi 10.1109/JLT.2018.2875538

High-speed PAM4-based Optical SDM Interconnects with Directly Modulated Long-wavelength VCSEL

Authors: Joris Van Kerrebrouck, Xiaodan Pang, Oskars Ozolins, Rui Lin, Aleksejs Udalcovs, Lu Zhang, Haolin Li, Silvia Spiga, Markus-Christian Amann, Lin Gan, Ming Tang, Songnian Fu, Richard Schatz, Gunnar Jacobsen, Sergei Popov, Deming Liu, Weijun Tong, Guy Torfs, Johan Bauwelinck, Jiajia Chen, Xin Yin

Abstract: This paper reports the demonstration of high-speed PAM-4 transmission using a 1.5-μm single-mode vertical cavity surface emitting laser (SM-VCSEL) over multicore fiber with 7 cores over different distances. We have successfully generated up to 70 Gbaud 4-level pulse amplitude modulation (PAM-4) signals with a VCSEL in optical back-to-back, and transmitted 50 Gbaud PAM-4 signals over both 1-km disp… ▽ More This paper reports the demonstration of high-speed PAM-4 transmission using a 1.5-μm single-mode vertical cavity surface emitting laser (SM-VCSEL) over multicore fiber with 7 cores over different distances. We have successfully generated up to 70 Gbaud 4-level pulse amplitude modulation (PAM-4) signals with a VCSEL in optical back-to-back, and transmitted 50 Gbaud PAM-4 signals over both 1-km dispersion-uncompensated and 10-km dispersion-compensated in each core, enabling a total data throughput of 700 Gbps over the 7-core fiber. Moreover, 56 Gbaud PAM-4 over 1-km has also been shown, whereby unfortunately not all cores provide the required 3.8 $\times$ 10 $^{-3}$ bit error rate (BER) for the 7% overhead-hard decision forward error correction (7% OH HDFEC). The limited bandwidth of the VCSEL and the adverse chromatic dispersion of the fiber are suppressed with pre-equalization based on accurate end-to-end channel characterizations. With a digital post-equalization, BER performance below the 7% OH-HDFEC limit is achieved over all cores. The demonstrated results show a great potential to realize high-capacity and compact short-reach optical interconnects for data centers. △ Less

Submitted 13 November, 2018; originally announced December 2018.

Comments: 7 pages, accepted to publication in 'Journal of Lightwave Technology (JLT)

arXiv:1811.04682 [pdf, other]

Learning Segmentation Masks with the Independence Prior

Authors: Songmin Dai, Xiaoqiang Li, Lu Wang, Pin Wu, Weiqin Tong, Yimin Chen

Abstract: An instance with a bad mask might make a composite image that uses it look fake. This encourages us to learn segmentation by generating realistic composite images. To achieve this, we propose a novel framework that exploits a new proposed prior called the independence prior based on Generative Adversarial Networks (GANs). The generator produces an image with multiple category-specific instance pro… ▽ More An instance with a bad mask might make a composite image that uses it look fake. This encourages us to learn segmentation by generating realistic composite images. To achieve this, we propose a novel framework that exploits a new proposed prior called the independence prior based on Generative Adversarial Networks (GANs). The generator produces an image with multiple category-specific instance providers, a layout module and a composition module. Firstly, each provider independently outputs a category-specific instance image with a soft mask. Then the provided instances' poses are corrected by the layout module. Lastly, the composition module combines these instances into a final image. Training with adversarial loss and penalty for mask area, each provider learns a mask that is as small as possible but enough to cover a complete category-specific instance. Weakly supervised semantic segmentation methods widely use grouping cues modeling the association between image parts, which are either artificially designed or learned with costly segmentation labels or only modeled on local pairs. Unlike them, our method automatically models the dependence between any parts and learns instance segmentation. We apply our framework in two cases: (1) Foreground segmentation on category-specific images with box-level annotation. (2) Unsupervised learning of instance appearances and masks with only one image of homogeneous object cluster (HOC). We get appealing results in both tasks, which shows the independence prior is useful for instance segmentation and it is possible to unsupervisedly learn instance masks with only one image. △ Less

Submitted 13 November, 2018; v1 submitted 12 November, 2018; originally announced November 2018.

Comments: 7+5 pages, 13 figures, Accepted to AAAI 2019

arXiv:1704.05709 [pdf, ps, other]

$β$-expansion: A Theoretical Framework for Fast and Recursive Construction of Polar Codes

Authors: Gaoning He, Jean-Claude Belfiore, Xiaocheng Liu, Yiqun Ge, Ran Zhang, Ingmar Land, Ying Chen, Rong Li, Jun Wang, Ganghua Yang, Wen Tong

Abstract: In this work, we introduce $β$-expansion, a notion borrowed from number theory, as a theoretical framework to study fast construction of polar codes based on a recursive structure of universal partial order (UPO) and polarization weight (PW) algorithm. We show that polar codes can be recursively constructed from UPO by continuously solving several polynomial equations at each recursive step. From… ▽ More In this work, we introduce $β$-expansion, a notion borrowed from number theory, as a theoretical framework to study fast construction of polar codes based on a recursive structure of universal partial order (UPO) and polarization weight (PW) algorithm. We show that polar codes can be recursively constructed from UPO by continuously solving several polynomial equations at each recursive step. From these polynomial equations, we can extract an interval for $β$, such that ranking the synthetic channels through a closed-form $β$-expansion preserves the property of nested frozen sets, which is a desired feature for low-complex construction. In an example of AWGN channels, we show that this interval for $β$ converges to a constant close to $1.1892 \approx 2^{1/4}$ when the code block-length trends to infinity. Both asymptotic analysis and simulation results validate our theoretical claims. △ Less

Submitted 19 April, 2017; originally announced April 2017.

arXiv:1610.09778 [pdf, other]

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns

Authors: Jingbo Shang, Meng Jiang, Wenzhu Tong, Jinfeng Xiao, Jian Peng, Jiawei Han

Abstract: In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical and high dimensional features into a comprehensive struct… ▽ More In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework (DPPred) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns. △ Less

Submitted 30 October, 2016; originally announced October 2016.

arXiv:1606.04157 [pdf, other]

Single machine scheduling with job-dependent machine deterioration

Authors: Wenchang Luo, Yao Xu, Weitian Tong, Guohui Lin

Abstract: We consider the single machine scheduling problem with job-dependent machine deterioration. In the problem, we are given a single machine with an initial non-negative maintenance level, and a set of jobs each with a non-preemptive processing time and a machine deterioration. Such a machine deterioration quantifies the decrement in the machine maintenance level after processing the job. To avoid ma… ▽ More We consider the single machine scheduling problem with job-dependent machine deterioration. In the problem, we are given a single machine with an initial non-negative maintenance level, and a set of jobs each with a non-preemptive processing time and a machine deterioration. Such a machine deterioration quantifies the decrement in the machine maintenance level after processing the job. To avoid machine breakdown, one should guarantee a non-negative maintenance level at any time point; and whenever necessary, a maintenance activity must be allocated for restoring the machine maintenance level. The goal of the problem is to schedule the jobs and the maintenance activities such that the total completion time of jobs is minimized. There are two variants of maintenance activities: in the partial maintenance case each activity can be allocated to increase the machine maintenance level to any level not exceeding the maximum; in the full maintenance case every activity must be allocated to increase the machine maintenance level to the maximum. In a recent work, the problem in the full maintenance case has been proven NP-hard; several special cases of the problem in the partial maintenance case were shown solvable in polynomial time, but the complexity of the general problem is left open. In this paper we first prove that the problem in the partial maintenance case is NP-hard, thus settling the open problem; we then design a $2$-approximation algorithm. △ Less

Submitted 13 June, 2016; originally announced June 2016.

Comments: 15 pages

Journal ref: Proceedings of ISAAC 2016, LIPIcs55, pages 1-13

arXiv:1307.7089 [pdf, ps, other]

An approximation algorithm for the Bandpass-2 problem

Authors: Weitian Tong, Zhi-Zhong Chen, Lusheng Wang, Yinfeng Xu, Jiuping Xu, Randy Goebel, Guohui Lin

Abstract: The general Bandpass-$B$ problem is NP-hard and can be approximated by a reduction into the weighted $B$-set packing problem, with a worst case performance ratio of $O(B^2)$. When $B = 2$, a maximum weight matching gives a 2-approximation to the problem. In this paper, we call the Bandpass-2 problem simply the Bandpass problem. The Bandpass problem can be viewed as a variation of the maximum trave… ▽ More The general Bandpass-$B$ problem is NP-hard and can be approximated by a reduction into the weighted $B$-set packing problem, with a worst case performance ratio of $O(B^2)$. When $B = 2$, a maximum weight matching gives a 2-approximation to the problem. In this paper, we call the Bandpass-2 problem simply the Bandpass problem. The Bandpass problem can be viewed as a variation of the maximum traveling salesman problem, in which the edge weights are dynamic rather than given at the front. We present a ${426}{227}$-approximation algorithm for the problem. Such an improved approximation is built on an intrinsic structural property proven for the optimal solution and several novel schemes to partition a $b$-matching into desired matchings. △ Less

Submitted 26 July, 2013; originally announced July 2013.

arXiv:1304.3653 [pdf, ps, other]

Algorithms for Cut Problems on Trees

Authors: Iyad Kanj, Guohui Lin, Tian Liu, Weitian Tong, Ge Xia, Jinhui Xu, Boting Yang, Fenghui Zhang, Peng Zhang, Binhai Zhu

Abstract: We study the {\sc multicut on trees} and the {\sc generalized multiway Cut on trees} problems. For the {\sc multicut on trees} problem, we present a parameterized algorithm that runs in time $O^{*}(ρ^k)$, where $ρ= \sqrt{\sqrt{2} + 1} \approx 1.555$ is the positive root of the polynomial $x^4-2x^2-1$. This improves the current-best algorithm of Chen et al. that runs in time $O^{*}(1.619^k)$. For t… ▽ More We study the {\sc multicut on trees} and the {\sc generalized multiway Cut on trees} problems. For the {\sc multicut on trees} problem, we present a parameterized algorithm that runs in time $O^{*}(ρ^k)$, where $ρ= \sqrt{\sqrt{2} + 1} \approx 1.555$ is the positive root of the polynomial $x^4-2x^2-1$. This improves the current-best algorithm of Chen et al. that runs in time $O^{*}(1.619^k)$. For the {\sc generalized multiway cut on trees} problem, we show that this problem is solvable in polynomial time if the number of terminal sets is fixed; this answers an open question posed in a recent paper by Liu and Zhang. By reducing the {\sc generalized multiway cut on trees} problem to the {\sc multicut on trees} problem, our results give a parameterized algorithm that solves the {\sc generalized multiway cut on trees} problem in time $O^{*}(ρ^k)$, where $ρ= \sqrt{\sqrt{2} + 1} \approx 1.555$ time. △ Less

Submitted 12 April, 2013; originally announced April 2013.

MSC Class: 68Q25

Showing 1–49 of 49 results for author: Tong, W