Search | arXiv e-print repository

Digital ASIC Design with Ongoing LLMs: Strategies and Prospects

Authors: Maoyang Xiang, Emil Goh, T. Hui Teo

Abstract: The escalating complexity of modern digital systems has imposed significant challenges on integrated circuit (IC) design, necessitating tools that can simplify the IC design flow. The advent of Large Language Models (LLMs) has been seen as a promising development, with the potential to automate the generation of Hardware Description Language (HDL) code, thereby streamlining digital IC design. Howe… ▽ More The escalating complexity of modern digital systems has imposed significant challenges on integrated circuit (IC) design, necessitating tools that can simplify the IC design flow. The advent of Large Language Models (LLMs) has been seen as a promising development, with the potential to automate the generation of Hardware Description Language (HDL) code, thereby streamlining digital IC design. However, the practical application of LLMs in this area faces substantial hurdles. Notably, current LLMs often generate HDL code with small but critical syntax errors and struggle to accurately convey the high-level semantics of circuit designs. These issues significantly undermine the utility of LLMs for IC design, leading to misinterpretations and inefficiencies. In response to these challenges, this paper presents targeted strategies to harness the capabilities of LLMs for digital ASIC design. We outline approaches that improve the reliability and accuracy of HDL code generation by LLMs. As a practical demonstration of these strategies, we detail the development of a simple three-phase Pulse Width Modulation (PWM) generator. This project, part of the "Efabless AI-Generated Open-Source Chip Design Challenge," successfully passed the Design Rule Check (DRC) and was fabricated, showcasing the potential of LLMs to enhance digital ASIC design. This work underscores the feasibility and benefits of integrating LLMs into the IC design process, offering a novel approach to overcoming the complexities of modern digital systems. △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: 8 pages, 2 figures, 1 table

arXiv:2405.00308 [pdf]

FPGA Digital Dice using Pseudo Random Number Generator

Authors: Michael Lim Kee Hian, Ten Wei Lin, Zachary Wu Xuan, Stephanie-Ann Loy, Maoyang Xiang, T. Hui Teo

Abstract: The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital d… ▽ More The goal of this project is to design a digital dice that displays dice numbers in real-time. The number is generated by a pseudo-random number generator (PRNG) using XORshift algorithm that is implemented in Verilog HDL on an FPGA. The digital dice is equipped with tilt sensor, display, power management circuit, and rechargeable battery hosted in a 3D printed dice casing. By shaking the digital dice, the tilt sensor signal produces a seed for the PRNG. This digital dice demonstrates a set of possible random numbers of 2, 4, 6, 8, 10, 12, 20, 100 that simulate the number of dice sides. The kit is named SUTDicey. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 15 pages, 5 figures

arXiv:2404.19246 [pdf]

Logistic Map Pseudo Random Number Generator in FPGA

Authors: Mateo Jalen Andrew Calderon, Lee Jun Lei Lucas, Syarifuddin Azhar Bin Rosli, Stephanie See Hui Ying, Jarell Lim En Yu, Maoyang Xiang, T. Hui Teo

Abstract: This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display dr… ▽ More This project develops a pseudo-random number generator (PRNG) using the logistic map, implemented in Verilog HDL on an FPGA and processes its output through a Central Limit Theorem (CLT) function to achieve a Gaussian distribution. The system integrates additional FPGA modules for real-time interaction and visualisation, including a clock generator, UART interface, XADC, and a 7-segment display driver. These components facilitate the direct display of PRNG values on the FPGA and the transmission of data to a laptop for histogram analysis, verifying the Gaussian nature of the output. This approach demonstrates the practical application of chaotic systems for generating Gaussian-distributed pseudo-random numbers in digital hardware, highlighting the logistic map's potential in PRNG design. △ Less

Submitted 30 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures

arXiv:2404.18955 [pdf, other]

GARA: A novel approach to Improve Genetic Algorithms' Accuracy and Efficiency by Utilizing Relationships among Genes

Authors: Zhaoning Shi, Meng Xiang, Zhaoyang Hai, Xiabi Liu, Yan Pei

Abstract: Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize rela… ▽ More Genetic algorithms have played an important role in engineering optimization. Traditional GAs treat each gene separately. However, biophysical studies of gene regulatory networks revealed direct associations between different genes. It inspires us to propose an improvement to GA in this paper, Gene Regulatory Genetic Algorithm (GRGA), which, to our best knowledge, is the first time to utilize relationships among genes for improving GA's accuracy and efficiency. We design a directed multipartite graph encapsulating the solution space, called RGGR, where each node corresponds to a gene in the solution and the edge represents the relationship between adjacent nodes. The edge's weight reflects the relationship degree and is updated based on the idea that the edges' weights in a complete chain as candidate solution with acceptable or unacceptable performance should be strengthened or reduced, respectively. The obtained RGGR is then employed to determine appropriate loci of crossover and mutation operators, thereby directing the evolutionary process toward faster and better convergence. We analyze and validate our proposed GRGA approach in a single-objective multimodal optimization problem, and further test it on three types of applications, including feature selection, text summarization, and dimensionality reduction. Results illustrate that our GARA is effective and promising. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.16504 [pdf]

Hardware Implementation of Double Pendulum Pseudo Random Number Generator

Authors: Jarrod Lim, Tom Manuel Opalla Piccio, Chua Min Jie Michelle, Maoyang Xiang, T. Hui Teo

Abstract: The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfaci… ▽ More The objective of this project is to utilize an FPGA board which is the CMOD A7 35t to obtain a pseudo random number which can be used for encryption. We aim to achieve this by leveraging the inherent randomness present in environmental data captured by sensors. This data will be used as a seed to initialize an algorithm implemented on the CMOD A7 35t FPGA board. The project will focus on interfacing the sensors with the FPGA and developing suitable algorithms to ensure the generated numbers exhibit strong randomness properties. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 15 pages, 12 figure

arXiv:2404.15498 [pdf, other]

Drop-Connect as a Fault-Tolerance Approach for RRAM-based Deep Neural Network Accelerators

Authors: Mingyuan Xiang, Xuhan Xie, Pedro Savarese, Xin Yuan, Michael Maire, Yanjing Li

Abstract: Resistive random-access memory (RRAM) is widely recognized as a promising emerging hardware platform for deep neural networks (DNNs). Yet, due to manufacturing limitations, current RRAM devices are highly susceptible to hardware defects, which poses a significant challenge to their practical applicability. In this paper, we present a machine learning technique that enables the deployment of defect… ▽ More Resistive random-access memory (RRAM) is widely recognized as a promising emerging hardware platform for deep neural networks (DNNs). Yet, due to manufacturing limitations, current RRAM devices are highly susceptible to hardware defects, which poses a significant challenge to their practical applicability. In this paper, we present a machine learning technique that enables the deployment of defect-prone RRAM accelerators for DNN applications, without necessitating modifying the hardware, retraining of the neural network, or implementing additional detection circuitry/logic. The key idea involves incorporating a drop-connect inspired approach during the training phase of a DNN, where random subsets of weights are selected to emulate fault effects (e.g., set to zero to mimic stuck-at-1 faults), thereby equipping the DNN with the ability to learn and adapt to RRAM defects with the corresponding fault rates. Our results demonstrate the viability of the drop-connect approach, coupled with various algorithm and system-level design and trade-off considerations. We show that, even in the presence of high defect rates (e.g., up to 30%), the degradation of DNN accuracy can be as low as less than 1% compared to that of the fault-free version, while incurring minimal system-level runtime/energy costs. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14381 [pdf, other]

TAVGBench: Benchmarking Text to Audible-Video Generation

Authors: Yuxin Mao, Xuyang Shen, Jing Zhang, Zhen Qin, Jinxing Zhou, Mochu Xiang, Yiran Zhong, Yuchao Dai

Abstract: The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1.7 million clips with a total duration of 11.8 th… ▽ More The Text to Audible-Video Generation (TAVG) task involves generating videos with accompanying audio based on text descriptions. Achieving this requires skillful alignment of both audio and video elements. To support research in this field, we have developed a comprehensive Text to Audible-Video Generation Benchmark (TAVGBench), which contains over 1.7 million clips with a total duration of 11.8 thousand hours. We propose an automatic annotation pipeline to ensure each audible video has detailed descriptions for both its audio and video contents. We also introduce the Audio-Visual Harmoni score (AVHScore) to provide a quantitative measure of the alignment between the generated audio and video modalities. Additionally, we present a baseline model for TAVG called TAVDiffusion, which uses a two-stream latent diffusion model to provide a fundamental starting point for further research in this area. We achieve the alignment of audio and video by employing cross-attention and contrastive learning. Through extensive experiments and evaluations on TAVGBench, we demonstrate the effectiveness of our proposed model under both conventional metrics and our proposed metrics. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Technical Report. Project page:https://github.com/OpenNLPLab/TAVGBench

arXiv:2404.13866 [pdf, other]

Plug-and-Play Algorithm Convergence Analysis From The Standpoint of Stochastic Differential Equation

Authors: Zhongqi Wang, Bingnan Wang, Maosheng Xiang

Abstract: The Plug-and-Play (PnP) algorithm is popular for inverse image problem-solving. However, this algorithm lacks theoretical analysis of its convergence with more advanced plug-in denoisers. We demonstrate that discrete PnP iteration can be described by a continuous stochastic differential equation (SDE). We can also achieve this transformation through Markov process formulation of PnP. Then, we can… ▽ More The Plug-and-Play (PnP) algorithm is popular for inverse image problem-solving. However, this algorithm lacks theoretical analysis of its convergence with more advanced plug-in denoisers. We demonstrate that discrete PnP iteration can be described by a continuous stochastic differential equation (SDE). We can also achieve this transformation through Markov process formulation of PnP. Then, we can take a higher standpoint of PnP algorithms from stochastic differential equations, and give a unified framework for the convergence property of PnP according to the solvability condition of its corresponding SDE. We reveal that a much weaker condition, bounded denoiser with Lipschitz continuous measurement function would be enough for its convergence guarantee, instead of previous Lipschitz continuous denoiser condition. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 17pages, Preprint, Under review

arXiv:2404.10091 [pdf, other]

Empowering Federated Learning with Implicit Gossiping: Mitigating Connection Unreliability Amidst Unknown and Arbitrary Dynamics

Authors: Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

Abstract: Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communica… ▽ More Federated learning is a popular distributed learning approach for training a machine learning model without disclosing raw data. It consists of a parameter server and a possibly large collection of clients (e.g., in cross-device federated learning) that may operate in congested and changing environments. In this paper, we study federated learning in the presence of stochastic and dynamic communication failures wherein the uplink between the parameter server and client $i$ is on with unknown probability $p_i^t$ in round $t$. Furthermore, we allow the dynamics of $p_i^t$ to be arbitrary. We first demonstrate that when the $p_i^t$'s vary across clients, the most widely adopted federated learning algorithm, Federated Average (FedAvg), experiences significant bias. To address this observation, we propose Federated Postponed Broadcast (FedPBC), a simple variant of FedAvg. FedPBC differs from FedAvg in that the parameter server postpones broadcasting the global model till the end of each round. Despite uplink failures, we show that FedPBC converges to a stationary point of the original non-convex objective. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links in round $t$. Despite the time-varying nature of $p_i^t$, we can bound the perturbation of the global model dynamics using techniques to control gossip-type information mixing errors. Extensive experiments have been conducted on real-world datasets over diversified unreliable uplink patterns to corroborate our analysis. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: This is a substantial extension of the conference paper "Towards Bias Correction of Fedavg over Nonuniform and Time-varying Communications", which was published in 2023 62nd IEEE Conference on Decision and Control (CDC), DOI: 10.1109/CDC49753.2023.10383258

arXiv:2403.07039 [pdf]

From English to ASIC: Hardware Implementation with Large Language Model

Authors: Emil Goh, Maoyang Xiang, I-Chyn Wey, T. Hui Teo

Abstract: In the realm of ASIC engineering, the landscape has been significantly reshaped by the rapid development of LLM, paralleled by an increase in the complexity of modern digital circuits. This complexity has escalated the requirements for HDL coding, necessitating a higher degree of precision and sophistication. However, challenges have been faced due to the less-than-optimal performance of modern la… ▽ More In the realm of ASIC engineering, the landscape has been significantly reshaped by the rapid development of LLM, paralleled by an increase in the complexity of modern digital circuits. This complexity has escalated the requirements for HDL coding, necessitating a higher degree of precision and sophistication. However, challenges have been faced due to the less-than-optimal performance of modern language models in generating hardware description code, a situation further exacerbated by the scarcity of the corresponding high-quality code datasets. These challenges have highlighted the gap between the potential of LLMs to revolutionize digital circuit design and their current capabilities in accurately interpreting and implementing hardware specifications. To address these challenges, a strategy focusing on the fine-tuning of the leading-edge nature language model and the reshuffling of the HDL code dataset has been developed. The fine-tuning aims to enhance models' proficiency in generating precise and efficient ASIC design, while the dataset reshuffling is intended to broaden the scope and improve the quality of training material. The model demonstrated significant improvements compared to the base model, with approximately 10% to 20% increase in accuracy across a wide range of temperature for the pass@1 metric. This approach is expected to facilitate a simplified and more efficient LLM-assisted framework for complex circuit design, leveraging their capabilities to meet the sophisticated demands of HDL coding and thus streamlining the ASIC development process. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 15 pages, 1 figure

arXiv:2402.14849 [pdf]

Asynchronous and Segmented Bidirectional Encoding for NMT

Authors: Jingpu Yang, Zehua Han, Mengyu Xiang, Helin Wang, Yuxiao Huang, Miao Fang

Abstract: With the rapid advancement of Neural Machine Translation (NMT), enhancing translation efficiency and quality has become a focal point of research. Despite the commendable performance of general models such as the Transformer in various aspects, they still fall short in processing long sentences and fully leveraging bidirectional contextual information. This paper introduces an improved model based… ▽ More With the rapid advancement of Neural Machine Translation (NMT), enhancing translation efficiency and quality has become a focal point of research. Despite the commendable performance of general models such as the Transformer in various aspects, they still fall short in processing long sentences and fully leveraging bidirectional contextual information. This paper introduces an improved model based on the Transformer, implementing an asynchronous and segmented bidirectional decoding strategy aimed at elevating translation efficiency and accuracy. Compared to traditional unidirectional translations from left-to-right or right-to-left, our method demonstrates heightened efficiency and improved translation quality, particularly in handling long sentences. Experimental results on the IWSLT2017 dataset confirm the effectiveness of our approach in accelerating translation and increasing accuracy, especially surpassing traditional unidirectional strategies in long sentence translation. Furthermore, this study analyzes the impact of sentence length on decoding outcomes and explores the model's performance in various scenarios. The findings of this research not only provide an effective encoding strategy for the NMT field but also pave new avenues and directions for future studies. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2311.16771 [pdf, other]

The HR-Calculus: Enabling Information Processing with Quaternion Algebra

Authors: Danilo P. Mandic, Sayed Pouria Talebi, Clive Cheong Took, Yili Xia, Dongpo Xu, Min Xiang, Pauline Bourigault

Abstract: From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing technique… ▽ More From their inception, quaternions and their division algebra have proven to be advantageous in modelling rotation/orientation in three-dimensional spaces and have seen use from the initial formulation of electromagnetic filed theory through to forming the basis of quantum filed theory. Despite their impressive versatility in modelling real-world phenomena, adaptive information processing techniques specifically designed for quaternion-valued signals have only recently come to the attention of the machine learning, signal processing, and control communities. The most important development in this direction is introduction of the HR-calculus, which provides the required mathematical foundation for deriving adaptive information processing techniques directly in the quaternion domain. In this article, the foundations of the HR-calculus are revised and the required tools for deriving adaptive learning techniques suitable for dealing with quaternion-valued signals, such as the gradient operator, chain and product derivative rules, and Taylor series expansion are presented. This serves to establish the most important applications of adaptive information processing in the quaternion domain for both single-node and multi-node formulations. The article is supported by Supplementary Material, which will be referred to as SM. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2310.08303 [pdf, other]

Multimodal Variational Auto-encoder based Audio-Visual Segmentation

Authors: Yuxin Mao, Jing Zhang, Mochu Xiang, Yiran Zhong, Yuchao Dai

Abstract: We propose an Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the video sequence. Existing AVS methods focus on implicit feature fusion strategies, where models are trained to fit the discrete samples in the dataset. With a limited and less diverse dataset, the resulting performance is usually unsatisfactory.… ▽ More We propose an Explicit Conditional Multimodal Variational Auto-Encoder (ECMVAE) for audio-visual segmentation (AVS), aiming to segment sound sources in the video sequence. Existing AVS methods focus on implicit feature fusion strategies, where models are trained to fit the discrete samples in the dataset. With a limited and less diverse dataset, the resulting performance is usually unsatisfactory. In contrast, we address this problem from an effective representation learning perspective, aiming to model the contribution of each modality explicitly. Specifically, we find that audio contains critical category information of the sound producers, and visual data provides candidate sound producer(s). Their shared information corresponds to the target sound producer(s) shown in the visual data. In this case, cross-modal shared representation learning is especially important for AVS. To achieve this, our ECMVAE factorizes the representations of each modality with a modality-shared representation and a modality-specific representation. An orthogonality constraint is applied between the shared and specific representations to maintain the exclusive attribute of the factorized latent code. Further, a mutual information maximization regularizer is introduced to achieve extensive exploration of each modality. Quantitative and qualitative evaluations on the AVSBench demonstrate the effectiveness of our approach, leading to a new state-of-the-art for AVS, with a 3.84 mIOU performance leap on the challenging MS3 subset for multiple sound source segmentation. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: Accepted by ICCV2023,Project page(https://npucvr.github.io/MMVAE-AVS),Code(https://github.com/OpenNLPLab/MMVAE-AVS)

arXiv:2307.16579 [pdf, other]

Contrastive Conditional Latent Diffusion for Audio-visual Segmentation

Authors: Yuxin Mao, Jing Zhang, Mochu Xiang, Yunqiu Lv, Yiran Zhong, Yuchao Dai

Abstract: We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio. We interpret AVS as a conditional generation task, where audio is defined as the conditional variable for sound producer(s) segmentation. With our new interpretation, it is especially necessary to model the correlation between audio and the final segme… ▽ More We propose a latent diffusion model with contrastive learning for audio-visual segmentation (AVS) to extensively explore the contribution of audio. We interpret AVS as a conditional generation task, where audio is defined as the conditional variable for sound producer(s) segmentation. With our new interpretation, it is especially necessary to model the correlation between audio and the final segmentation map to ensure its contribution. We introduce a latent diffusion model to our framework to achieve semantic-correlated representation learning. Specifically, our diffusion model learns the conditional generation process of the ground-truth segmentation map, leading to ground-truth aware inference when we perform the denoising process at the test stage. As a conditional diffusion model, we argue it is essential to ensure that the conditional variable contributes to model output. We then introduce contrastive learning to our framework to learn audio-visual correspondence, which is proven consistent with maximizing the mutual information between model prediction and the audio data. In this way, our latent diffusion model via contrastive learning explicitly maximizes the contribution of audio for AVS. Experimental results on the benchmark dataset verify the effectiveness of our solution. Code and results are online via our project page: https://github.com/OpenNLPLab/DiffusionAVS. △ Less

Submitted 31 July, 2023; originally announced July 2023.

arXiv:2307.09929 [pdf, other]

Measuring and Modeling Uncertainty Degree for Monocular Depth Estimation

Authors: Mochu Xiang, Jing Zhang, Nick Barnes, Yuchao Dai

Abstract: Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models. However, the intrinsic ill-posedness and ordinal-sensitive nature of MDE pose major challenges to the estimation of uncertainty degree of the trained models. On the one hand, utilizing current uncertainty modeling methods may increase memory co… ▽ More Effectively measuring and modeling the reliability of a trained model is essential to the real-world deployment of monocular depth estimation (MDE) models. However, the intrinsic ill-posedness and ordinal-sensitive nature of MDE pose major challenges to the estimation of uncertainty degree of the trained models. On the one hand, utilizing current uncertainty modeling methods may increase memory consumption and are usually time-consuming. On the other hand, measuring the uncertainty based on model accuracy can also be problematic, where uncertainty reliability and prediction accuracy are not well decoupled. In this paper, we propose to model the uncertainty of MDE models from the perspective of the inherent probability distributions originating from the depth probability volume and its extensions, and to assess it more fairly with more comprehensive metrics. By simply introducing additional training regularization terms, our model, with surprisingly simple formations and without requiring extra modules or multiple inferences, can provide uncertainty estimations with state-of-the-art reliability, and can be further improved when combined with ensemble or sampling methods. A series of experiments demonstrate the effectiveness of our methods. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2306.00280 [pdf, other]

Towards Bias Correction of FedAvg over Nonuniform and Time-Varying Communications

Authors: Ming Xiang, Stratis Ioannidis, Edmund Yeh, Carlee Joe-Wong, Lili Su

Abstract: Federated learning (FL) is a decentralized learning framework wherein a parameter server (PS) and a collection of clients collaboratively train a model via minimizing a global objective. Communication bandwidth is a scarce resource; in each round, the PS aggregates the updates from a subset of clients only. In this paper, we focus on non-convex minimization that is vulnerable to non-uniform and ti… ▽ More Federated learning (FL) is a decentralized learning framework wherein a parameter server (PS) and a collection of clients collaboratively train a model via minimizing a global objective. Communication bandwidth is a scarce resource; in each round, the PS aggregates the updates from a subset of clients only. In this paper, we focus on non-convex minimization that is vulnerable to non-uniform and time-varying communication failures between the PS and the clients. Specifically, in each round $t$, the link between the PS and client $i$ is active with probability $p_i^t$, which is $\textit{unknown}$ to both the PS and the clients. This arises when the channel conditions are heterogeneous across clients and are changing over time. We show that when the $p_i^t$'s are not uniform, $\textit{Federated Average}$ (FedAvg) -- the most widely adopted FL algorithm -- fails to minimize the global objective. Observing this, we propose $\textit{Federated Postponed Broadcast}$ (FedPBC) which is a simple variant of FedAvg. It differs from FedAvg in that the PS postpones broadcasting the global model till the end of each round. We show that FedPBC converges to a stationary point of the original objective. The introduced staleness is mild and there is no noticeable slowdown. Both theoretical analysis and numerical results are provided. On the technical front, postponing the global model broadcasts enables implicit gossiping among the clients with active links at round $t$. Despite $p_i^t$'s are time-varying, we are able to bound the perturbation of the global model dynamics via the techniques of controlling the gossip-type information mixing errors. △ Less

Submitted 31 May, 2023; originally announced June 2023.

arXiv:2305.19971 [pdf, other]

Federated Learning in the Presence of Adversarial Client Unavailability

Authors: Lili Su, Ming Xiang, Jiaming Xu, Pengkun Yang

Abstract: Federated learning is a decentralized machine learning framework that enables collaborative model training without revealing raw data. Due to the diverse hardware and software limitations, a client may not always be available for the computation requests from the parameter server. An emerging line of research is devoted to tackling arbitrary client unavailability. However, existing work still impo… ▽ More Federated learning is a decentralized machine learning framework that enables collaborative model training without revealing raw data. Due to the diverse hardware and software limitations, a client may not always be available for the computation requests from the parameter server. An emerging line of research is devoted to tackling arbitrary client unavailability. However, existing work still imposes structural assumptions on the unavailability patterns, impeding their applicability in challenging scenarios wherein the unavailability patterns are beyond the control of the parameter server. Moreover, in harsh environments like battlefields, adversaries can selectively and adaptively silence specific clients. In this paper, we relax the structural assumptions and consider adversarial client unavailability. To quantify the degrees of client unavailability, we use the notion of $ε$-adversary dropout fraction. We show that simple variants of FedAvg or FedProx, albeit completely agnostic to $ε$, converge to an estimation error on the order of $ε(G^2 + σ^2)$ for non-convex global objectives and $ε(G^2 + σ^2)/μ^2$ for $μ$ strongly convex global objectives, where $G$ is a heterogeneity parameter and $σ^2$ is the noise level. Conversely, we prove that any algorithm has to suffer an estimation error of at least $ε(G^2 + σ^2)/8$ and $ε(G^2 + σ^2)/(8μ^2)$ for non-convex global objectives and $μ$-strongly convex global objectives. Furthermore, the convergence speeds of the FedAvg or FedProx variants are $O(1/\sqrt{T})$ for non-convex objectives and $O(1/T)$ for strongly-convex objectives, both of which are the best possible for any first-order method that only has access to noisy gradients. △ Less

Submitted 19 February, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

arXiv:2304.07051 [pdf, other]

The Second Monocular Depth Estimation Challenge

Authors: Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis , et al. (18 additional authors not shown)

Abstract: This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes… ▽ More This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes complex natural environments, e.g. forests or fields, which are greatly underrepresented in current benchmarks. The challenge received eight unique submissions that outperformed the provided SotA baseline on any of the pointcloud- or image-based metrics. The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%. Supervised submissions generally leveraged large collections of datasets to improve data diversity. Self-supervised submissions instead updated the network architecture and pretrained backbones. These results represent a significant progress in the field, while highlighting avenues for future research, such as reducing interpolation artifacts at depth boundaries, improving self-supervised indoor performance and overall natural image accuracy. △ Less

Submitted 26 April, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Published at CVPRW2023

arXiv:2210.00665 [pdf, other]

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

Authors: Ming Xiang, Lili Su

Abstract: Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for n… ▽ More Federated Learning (FL) is a nascent decentralized learning framework under which a massive collection of heterogeneous clients collaboratively train a model without revealing their local data. Scarce communication, privacy leakage, and Byzantine attacks are the key bottlenecks of system scalability. In this paper, we focus on communication-efficient distributed (stochastic) gradient descent for non-convex optimization, a driving force of FL. We propose two algorithms, named {\em Adaptive Stochastic Sign SGD (Ada-StoSign)} and {\em $β$-Stochastic Sign SGD ($β$-StoSign)}, each of which compresses the local gradients into bit vectors. To handle unbounded gradients, Ada-StoSign uses a novel norm tracking function that adaptively adjusts a coarse estimation on the $\ell_{\infty}$ of the local gradients - a key parameter used in gradient compression. We show that Ada-StoSign converges in expectation with a rate $O(\log T/\sqrt{T} + 1/\sqrt{M})$, where $M$ is the number of clients. To the best of our knowledge, when $M$ is sufficiently large, Ada-StoSign outperforms the state-of-the-art sign-based method whose convergence rate is $O(T^{-1/4})$. Under bounded gradient assumption, $β$-StoSign achieves quantifiable Byzantine resilience and privacy assurances, and works with partial client participation and mini-batch gradients which could be unbounded. We corroborate and complement our theories by experiments on MNIST and CIFAR-10 datasets. △ Less

Submitted 17 February, 2023; v1 submitted 2 October, 2022; originally announced October 2022.

arXiv:2110.06427 [pdf, other]

Dense Uncertainty Estimation

Authors: Jing Zhang, Yuchao Dai, Mochu Xiang, Deng-Ping Fan, Peyman Moghadam, Mingyi He, Christian Walder, Kaihao Zhang, Mehrtash Harandi, Nick Barnes

Abstract: Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing. In this way, a specific weights set is estimated while ignoring any uncertainty that may occur in… ▽ More Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks.The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing. In this way, a specific weights set is estimated while ignoring any uncertainty that may occur in the proper weight space. The latter introduces randomness into the framework, either by assuming a prior distribution over model parameters (i.e. Bayesian Neural Networks) or including latent variables (i.e. generative models) to explore the contribution of latent variables for model predictions, leading to stochastic predictions during testing. Different from the former that achieves point estimation, the latter aims to estimate the prediction distribution, making it possible to estimate uncertainty, representing model ignorance about its predictions. We claim that conventional deterministic neural network based dense prediction tasks are prone to overfitting, leading to over-confident predictions, which is undesirable for decision making. In this paper, we investigate stochastic neural networks and uncertainty estimation techniques to achieve both accurate deterministic prediction and reliable uncertainty estimation. Specifically, we work on two types of uncertainty estimations solutions, namely ensemble based methods and generative model based methods, and explain their pros and cons while using them in fully/semi/weakly-supervised framework. Due to the close connection between uncertainty estimation and model calibration, we also introduce how uncertainty estimation can be used for deep model calibration to achieve well-calibrated models, namely dense model calibration. Code and data are available at https://github.com/JingZhang617/UncertaintyEstimation. △ Less

Submitted 12 October, 2021; originally announced October 2021.

Comments: Technical Report

arXiv:2106.13217 [pdf, other]

Exploring Depth Contribution for Camouflaged Object Detection

Authors: Mochu Xiang, Jing Zhang, Yunqiu Lv, Aixuan Li, Yiran Zhong, Yuchao Dai

Abstract: Camouflaged object detection (COD) aims to segment camouflaged objects hiding in the environment, which is challenging due to the similar appearance of camouflaged objects and their surroundings. Research in biology suggests depth can provide useful object localization cues for camouflaged object discovery. In this paper, we study the depth contribution for camouflaged object detection, where the… ▽ More Camouflaged object detection (COD) aims to segment camouflaged objects hiding in the environment, which is challenging due to the similar appearance of camouflaged objects and their surroundings. Research in biology suggests depth can provide useful object localization cues for camouflaged object discovery. In this paper, we study the depth contribution for camouflaged object detection, where the depth maps are generated with existing monocular depth estimation (MDE) methods. Due to the domain gap between the MDE dataset and our COD dataset, the generated depth maps are not accurate enough to be directly used. We then introduce two solutions to avoid the noisy depth maps from dominating the training process. Firstly, we present an auxiliary depth estimation branch ("ADE"), aiming to regress the depth maps. We find that "ADE" is especially necessary for our "generated depth" scenario. Secondly, we introduce a multi-modal confidence-aware loss function via a generative adversarial network to weigh the contribution of depth for camouflaged object detection. Our extensive experiments on various camouflaged object detection datasets explain that the existing "sensor depth" based RGB-D segmentation techniques work poorly with "generated depth", and our proposed two solutions work cooperatively, achieving effective depth contribution exploration for camouflaged object detection. △ Less

Submitted 13 January, 2022; v1 submitted 24 June, 2021; originally announced June 2021.

Comments: The first work in RGB-D Camouflaged object detection (COD)

arXiv:2006.08413 [pdf, other]

Reciprocal Adversarial Learning via Characteristic Functions

Authors: Shengxi Li, Zeyang Yu, Min Xiang, Danilo Mandic

Abstract: Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions. To stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. This provides extensive IPM-GANs with theoretical support for basically comparing moments in an embedded domain of the \textit{cri… ▽ More Generative adversarial nets (GANs) have become a preferred tool for tasks involving complicated distributions. To stabilise the training and reduce the mode collapse of GANs, one of their main variants employs the integral probability metric (IPM) as the loss function. This provides extensive IPM-GANs with theoretical support for basically comparing moments in an embedded domain of the \textit{critic}. We generalise this by comparing the distributions rather than their moments via a powerful tool, i.e., the characteristic function (CF), which uniquely and universally comprising all the information about a distribution. For rigour, we first establish the physical meaning of the phase and amplitude in CF, and show that this provides a feasible way of balancing the accuracy and diversity of generation. We then develop an efficient sampling strategy to calculate the CFs. Within this framework, we further prove an equivalence between the embedded and data domains when a reciprocal exists, where we naturally develop the GAN in an auto-encoder structure, in a way of comparing everything in the embedded space (a semantically meaningful manifold). This efficient structure uses only two modules, together with a simple training strategy, to achieve bi-directionally generating clear images, which is referred to as the reciprocal CF GAN (RCF-GAN). Experimental results demonstrate the superior performances of the proposed RCF-GAN in terms of both generation and reconstruction. △ Less

Submitted 23 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

Comments: This work has been accepted to NeurIPS 2020

arXiv:2002.03541 [pdf, ps, other]

Resilient Consensus via Weight Learning and Its Application in Fault-Tolerant Clock Synchronization

Authors: Jian Hou, Zhiyong Chen, ZhiyunLin, Mengfan Xiang

Abstract: This paper addresses the distributed consensus problem in the presence of faulty nodes. A novel weight learning algorithm is introduced such that neither network connectivity nor a sequence of history records is required to achieve resilient consensus. The critical idea is to dynamically update the interaction weights among neighbors learnt from their credibility measurement. Basically, we define… ▽ More This paper addresses the distributed consensus problem in the presence of faulty nodes. A novel weight learning algorithm is introduced such that neither network connectivity nor a sequence of history records is required to achieve resilient consensus. The critical idea is to dynamically update the interaction weights among neighbors learnt from their credibility measurement. Basically, we define a reward function that is inversely proportional to the distance to its neighbor, and then adjust the credibility based on the reward derived at the present step and the previous credibility. In such a way, the interaction weights are updated at every step, which integrates the historic information and degrades the influences from faulty nodes. Both fixed and stochastic topologies are considered in this paper. Furthermore, we apply this novel approach in clock synchronization problem. By updating the logical clock skew and offset via the corresponding weight learning algorithms, respectively, the logical clock synchronization is eventually achieved regardless of faulty nodes. Simulations are provided to illustrate the effectiveness of the strategy. △ Less

Submitted 9 February, 2020; originally announced February 2020.

arXiv:1906.09951 [pdf]

Fast Calculation of Probabilistic Optimal Power Flow: A Deep Learning Approach

Authors: Yan Yang, Juan Yu, Zhifang Yang, Mingxu Xiang, Ren Liu

Abstract: Probabilistic optimal power flow (POPF) is an important analytical tool to ensure the secure and economic operation of power systems. POPF needs to solve enormous nonlinear and nonconvex optimization problems. The huge computational burden has become the major bottleneck for the practical application. This paper presents a deep learning approach to solve the POPF problem efficiently and accurately… ▽ More Probabilistic optimal power flow (POPF) is an important analytical tool to ensure the secure and economic operation of power systems. POPF needs to solve enormous nonlinear and nonconvex optimization problems. The huge computational burden has become the major bottleneck for the practical application. This paper presents a deep learning approach to solve the POPF problem efficiently and accurately. Taking advantage of the deep structure and reconstructive strategy of stacked denoising auto encoders (SDAE), a SDAE-based optimal power flow (OPF) is developed to extract the high-level nonlinear correlations between the system operating condition and the OPF solution. A training process is designed to learn the feature of POPF. The trained SDAE network can be utilized to conveniently calculate the OPF solution of random samples generated by Monte-Carlo simulation (MCS) without the need of optimization. A modified IEEE 118-bus power system is simulated to demonstrate the effectiveness of the proposed method. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1906.03700 [pdf, other]

doi 10.1609/aaai.v34i04.5897

Solving general elliptical mixture models through an approximate Wasserstein manifold

Authors: Shengxi Li, Zeyang Yu, Min Xiang, Danilo Mandic

Abstract: We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback-Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by ada… ▽ More We address the estimation problem for general finite mixture models, with a particular focus on the elliptical mixture models (EMMs). Compared to the widely adopted Kullback-Leibler divergence, we show that the Wasserstein distance provides a more desirable optimisation space. We thus provide a stable solution to the EMMs that is both robust to initialisations and reaches a superior optimum by adaptively optimising along a manifold of an approximate Wasserstein distance. To this end, we first provide a unifying account of computable and identifiable EMMs, which serves as a basis to rigorously address the underpinning optimisation problem. Due to a probability constraint, solving this problem is extremely cumbersome and unstable, especially under the Wasserstein distance. To relieve this issue, we introduce an efficient optimisation method on a statistical manifold defined under an approximate Wasserstein distance, which allows for explicit metrics and computable operations, thus significantly stabilising and improving the EMM estimation. We further propose an adaptive method to accelerate the convergence. Experimental results demonstrate the excellent performance of the proposed EMM solver. △ Less

Submitted 7 October, 2020; v1 submitted 9 June, 2019; originally announced June 2019.

Comments: This work has been accepted to AAAI2020. Note that this version also corrects a small error on the Equation (16) in proof

arXiv:1705.09058 [pdf, other]

An Empirical Analysis of Approximation Algorithms for the Euclidean Traveling Salesman Problem

Authors: Yihui He, Ming Xiang

Abstract: With applications to many disciplines, the traveling salesman problem (TSP) is a classical computer science optimization problem with applications to industrial engineering, theoretical computer science, bioinformatics, and several other disciplines. In recent years, there have been a plethora of novel approaches for approximate solutions ranging from simplistic greedy to cooperative distributed a… ▽ More With applications to many disciplines, the traveling salesman problem (TSP) is a classical computer science optimization problem with applications to industrial engineering, theoretical computer science, bioinformatics, and several other disciplines. In recent years, there have been a plethora of novel approaches for approximate solutions ranging from simplistic greedy to cooperative distributed algorithms derived from artificial intelligence. In this paper, we perform an evaluation and analysis of cornerstone algorithms for the Euclidean TSP. We evaluate greedy, 2-opt, and genetic algorithms. We use several datasets as input for the algorithms including a small dataset, a mediumsized dataset representing cities in the United States, and a synthetic dataset consisting of 200 cities to test algorithm scalability. We discover that the greedy and 2-opt algorithms efficiently calculate solutions for smaller datasets. Genetic algorithm has the best performance for optimality for medium to large datasets, but generally have longer runtime. Our implementations is public available. △ Less

Submitted 25 May, 2017; originally announced May 2017.

Comments: 4 pages, 5 figures

arXiv:1705.00058 [pdf, ps, other]

Simultaneous diagonalisation of the covariance and complementary covariance matrices in quaternion widely linear signal processing

Authors: Min Xiang, Shirin Enshaeifar, Alexander E. Stott, Clive Cheong Took, Yili Xia, Sithan Kanna, Danilo P. Mandic

Abstract: Recent developments in quaternion-valued widely linear processing have established that the exploitation of complete second-order statistics requires consideration of both the standard covariance and the three complementary covariance matrices. Although such matrices have a tremendous amount of structure and their decomposition is a powerful tool in a variety of applications, the non-commutative n… ▽ More Recent developments in quaternion-valued widely linear processing have established that the exploitation of complete second-order statistics requires consideration of both the standard covariance and the three complementary covariance matrices. Although such matrices have a tremendous amount of structure and their decomposition is a powerful tool in a variety of applications, the non-commutative nature of the quaternion product has been prohibitive to the development of quaternion uncorrelating transforms. To this end, we introduce novel techniques for a simultaneous decomposition of the covariance and complementary covariance matrices in the quaternion domain, whereby the quaternion version of the Takagi factorisation is explored to diagonalise symmetric quaternion-valued matrices. This gives new insights into the quaternion uncorrelating transform (QUT) and forms a basis for the proposed quaternion approximate uncorrelating transform (QAUT) which simultaneously diagonalises all four covariance matrices associated with improper quaternion signals. The effectiveness of the proposed uncorrelating transforms is validated by simulations on both synthetic and real-world quaternion-valued signals. △ Less

Submitted 29 January, 2018; v1 submitted 28 April, 2017; originally announced May 2017.

Comments: 41 pages, single column, 10 figures

Showing 1–27 of 27 results for author: Xiang, M