Electrical Engineering and Systems Science
- [1] arXiv:2405.04591 [pdf, ps, html, other]
-
Title: Visually Guided Swarm Motion Coordination via Insect-inspired Small Target Motion ReactionsComments: 12 pages journal, submitted for peer reviewSubjects: Systems and Control (eess.SY)
Despite progress developing experimentally-consistent models of insect in-flight sensing and feedback for individual agents, a lack of systematic understanding of the multi-agent and group performance of the resulting bio-inspired sensing and feedback approaches remains a barrier to robotic swarm implementations. This study introduces the small-target motion reactive (STMR) swarming approach by designing a concise engineering model of the small target motion detector (STMD) neurons found in insect lobula complexes. The STMD neuron model identifies the bearing angle at which peak optic flow magnitude occurs, and this angle is used to design an output feedback switched control system. A theoretical stability analysis provides bi-agent stability and state boundedness in group contexts. The approach is simulated and implemented on ground vehicles for validation and behavioral studies. The results indicate despite having the lowest connectivity of contemporary approaches (each agent instantaneously regards only a single neighbor), collective group motion can be achieved. STMR group level metric analysis also highlights continuously varying polarization and decreasing heading variance.
- [2] arXiv:2405.04595 [pdf, ps, other]
-
Title: An Advanced Features Extraction Module for Remote Sensing Image Super-ResolutionComments: Preprint of paper from The 21st International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology or ECTI-CON 2024, Khon Kaen, ThailandJournal-ref: ECTI-CON 2024, Khon Kaen ThailandSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In recent years, convolutional neural networks (CNNs) have achieved remarkable advancement in the field of remote sensing image super-resolution due to the complexity and variability of textures and structures in remote sensing images (RSIs), which often repeat in the same images but differ across others. Current deep learning-based super-resolution models focus less on high-frequency features, which leads to suboptimal performance in capturing contours, textures, and spatial information. State-of-the-art CNN-based methods now focus on the feature extraction of RSIs using attention mechanisms. However, these methods are still incapable of effectively identifying and utilizing key content attention signals in RSIs. To solve this problem, we proposed an advanced feature extraction module called Channel and Spatial Attention Feature Extraction (CSA-FE) for effectively extracting the features by using the channel and spatial attention incorporated with the standard vision transformer (ViT). The proposed method trained over the UCMerced dataset on scales 2, 3, and 4. The experimental results show that our proposed method helps the model focus on the specific channels and spatial locations containing high-frequency information so that the model can focus on relevant features and suppress irrelevant ones, which enhances the quality of super-resolved images. Our model achieved superior performance compared to various existing models.
- [3] arXiv:2405.04610 [pdf, ps, html, other]
-
Title: Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer ClassificationComments: Accepted in 4th International Conference on Computing and Communication Networks (ICCCNet-2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Lung and colon cancer are serious worldwide health challenges that require early and precise identification to reduce mortality risks. However, diagnosis, which is mostly dependent on histopathologists' competence, presents difficulties and hazards when expertise is insufficient. While diagnostic methods like imaging and blood markers contribute to early detection, histopathology remains the gold standard, although time-consuming and vulnerable to inter-observer mistakes. Limited access to high-end technology further limits patients' ability to receive immediate medical care and diagnosis. Recent advances in deep learning have generated interest in its application to medical imaging analysis, specifically the use of histopathological images to diagnose lung and colon cancer. The goal of this investigation is to use and adapt existing pre-trained CNN-based models, such as Xception, DenseNet201, ResNet101, InceptionV3, DenseNet121, DenseNet169, ResNet152, and InceptionResNetV2, to enhance classification through better augmentation strategies. The results show tremendous progress, with all eight models reaching impressive accuracy ranging from 97% to 99%. Furthermore, attention visualization techniques such as GradCAM, GradCAM++, ScoreCAM, Faster Score-CAM, and LayerCAM, as well as Vanilla Saliency and SmoothGrad, are used to provide insights into the models' classification decisions, thereby improving interpretability and understanding of malignant and benign image classification.
- [4] arXiv:2405.04627 [pdf, ps, html, other]
-
Title: SingIt! Singer Voice TransformationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
In this paper, we propose a model which can generate a singing voice from normal speech utterance by harnessing zero-shot, many-to-many style transfer learning. Our goal is to give anyone the opportunity to sing any song in a timely manner. We present a system comprising several available blocks, as well as a modified auto-encoder, and show how this highly-complex challenge can be achieved by tailoring rather simple solutions together. We demonstrate the applicability of the proposed system using a group of 25 non-expert listeners. Samples of the data generated from our model are provided.
- [5] arXiv:2405.04629 [pdf, ps, other]
-
Title: ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT UrographySyed Jamal Safdar Gardezi (1), Lucas Aronson (1), Peter Wawrzyn (2), Hongkun Yu (2), E. Jason Abel (3), Daniel D. Shapiro (3), Meghan G. Lubner (1), Joshua Warner (1), Giuseppe Toia (1), Lu Mao (4), Pallavi Tiwari (1,2), Andrew L. Wentland (1,2,5) ((1) Department of Radiology, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (2) Department of Biomedical Engineering, University of Wisconsin Madison, Madison, WI, USA, (3) Department of Urology, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (4) Department of Biostatistics, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA, (5) Department of Medical Physics, University of Wisconsin School of Medicine & Public Health, Madison, WI, USA)Comments: 19 pages, 5 Figures,2 TablesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Medical Physics (physics.med-ph)
Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases.
Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE).
Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016).
Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations. - [6] arXiv:2405.04704 [pdf, ps, html, other]
-
Title: System Identification of the Upgraded LHPOST6 Reaction Mass at the University of California San DiegoComments: 38 pages, 35 figuresSubjects: Signal Processing (eess.SP)
Upon completing the upgrade from one to six degrees of freedom of the Outdoor Shake Table at UCSD in 2019, forced vibration tests were carried out to identify the dynamic characteristics of the reaction mass and soil system. This report describes the motivation, execution, and results from such tests, which independently excited the reaction mass in four degrees of freedom: longitudinal, transverse, yaw, and vertical. The report discusses the frequency response curves and deformation patterns from which the natural frequencies, damping ratio, mode shapes, and rigid body motion were determined. The first objective of the study was to investigate if the dynamic properties of the system had dramatically changed after the upgrade by comparing the results to those from forced vibration tests performed 20 years ago, during the construction of the facility. In addition, most recent tests also contributed with results from the vertical degree of freedom, which had never been tested. The second objective was to obtain high-quality response data of the system that will be used to develop a high-fidelity computational model of the reaction mass in future research. A comparison of results showed a slight difference of 0.5Hz in the natural frequency of 2 degrees of freedom. Moreover, maximum displacements in the recent tests were overall larger than the previous ones with few exceptions. The report thoroughly discusses the several sources of discrepancy between the past and most recent results. Finally, test results allowed us to estimate the system's response if the shake table actuators were to be used at their maximum nominal capacity. Small displacement and high damping results were consistent with those of previous tests and further validated the design of the reaction mass.
- [7] arXiv:2405.04752 [pdf, ps, other]
-
Title: HILCodec: High Fidelity and Lightweight Neural Audio CodecSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
The recent advancement of end-to-end neural audio codecs enables compressing audio at very low bitrates while reconstructing the output audio with high fidelity. Nonetheless, such improvements often come at the cost of increased model complexity. In this paper, we identify and address the problems of existing neural audio codecs. We show that the performance of Wave-U-Net does not increase consistently as the network depth increases. We analyze the root cause of such a phenomenon and suggest a variance-constrained design. Also, we reveal various distortions in previous waveform domain discriminators and propose a novel distortion-free discriminator. The resulting model, \textit{HILCodec}, is a real-time streaming audio codec that demonstrates state-of-the-art quality across various bitrates and audio types.
- [8] arXiv:2405.04757 [pdf, ps, html, other]
-
Title: Communication-efficient and Differentially-private Distributed Nash Equilibrium Seeking with Linear ConvergenceSubjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
The distributed computation of a Nash equilibrium (NE) for non-cooperative games is gaining increased attention recently. Due to the nature of distributed systems, privacy and communication efficiency are two critical concerns. Traditional approaches often address these critical concerns in isolation. This work introduces a unified framework, named CDP-NES, designed to improve communication efficiency in the privacy-preserving NE seeking algorithm for distributed non-cooperative games over directed graphs. Leveraging both general compression operators and the noise adding mechanism, CDP-NES perturbs local states with Laplacian noise and applies difference compression prior to their exchange among neighbors. We prove that CDP-NES not only achieves linear convergence to a neighborhood of the NE in games with restricted monotone mappings but also guarantees $\epsilon$-differential privacy, addressing privacy and communication efficiency simultaneously. Finally, simulations are provided to illustrate the effectiveness of the proposed method.
- [9] arXiv:2405.04778 [pdf, ps, other]
-
Title: Teacher-Student Network for Real-World Face Super-Resolution with Progressive Embedding of Edge InformationComments: Accepted by ICIP 2023Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Traditional face super-resolution (FSR) methods trained on synthetic datasets usually have poor generalization ability for real-world face images. Recent work has utilized complex degradation models or training networks to simulate the real degradation process, but this limits the performance of these methods due to the domain differences that still exist between the generated low-resolution images and the real low-resolution images. Moreover, because of the existence of a domain gap, the semantic feature information of the target domain may be affected when synthetic data and real data are utilized to train super-resolution models simultaneously. In this study, a real-world face super-resolution teacher-student model is proposed, which considers the domain gap between real and synthetic data and progressively includes diverse edge information by using the recurrent network's intermediate outputs. Extensive experiments demonstrate that our proposed approach surpasses state-of-the-art methods in obtaining high-quality face images for real-world FSR.
- [10] arXiv:2405.04806 [pdf, ps, other]
-
Title: A leadless power transfer and wireless telemetry solutions for an endovascular electrocorticographyZhangyu Xu, Majid Khazaee, Nhan Duy Truong, Deniel Havenga, Armin Nikpour, Arman Ahnood, Omid KaveheiComments: 17 Pages, 12 figuresSubjects: Systems and Control (eess.SY)
Endovascular brain-computer interfaces (eBCIs) offer a minimally invasive way to connect the brain to external devices, merging neuroscience, engineering, and medical technology. Achieving wireless data and power transmission is crucial for the clinical viability of these implantable devices. Typically, solutions for endovascular electrocorticography (ECoG) include a sensing stent with multiple electrodes (e.g. in the superior sagittal sinus) in the brain, a subcutaneous chest implant for wireless energy harvesting and data telemetry, and a long (tens of centimetres) cable with a set of wires in between. This long cable presents risks and limitations, especially for younger patients or those with fragile vasculature. This work introduces a wireless and leadless telemetry and power transfer solution for endovascular ECoG. The proposed solution includes an optical telemetry module and a focused ultrasound (FUS) power transfer system. The proposed system can be miniaturised to fit in an endovascular stent. Our solution uses optical telemetry for high-speed data transmission (over 2 Mbit/s, capable of transmitting 41 ECoG channels at a 2 kHz sampling rate and 24-bit resolution) and the proposed power transferring scheme provides up to 10mW power budget into the site of the endovascular implants under the safety limit. Tests on bovine tissues confirmed the system's effectiveness, suggesting that future custom circuit designs could further enhance eBCI applications by removing wires and auxiliary implants, minimising complications.
- [11] arXiv:2405.04867 [pdf, ps, html, other]
-
Title: MIPI 2024 Challenge on Demosaic for HybridEVS Camera: Methods and ResultsYaqi Wu, Zhihao Fan, Xiaofeng Chu, Jimmy S. Ren, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangcheng Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha, Jun Cao, Cheng Li, Shu Chen, Liang Ma, Shiyang Zhou, Haijin Zeng, Kai Feng, Yongyong Chen, Jingyong Su, Xianyu Guan, Hongyuan Yu, Cheng Wan, Jiamin Lin, Binnan Han, Yajun Zou, Zhuoyuan Wu, Yuan Huang, Yongsheng Yu, Daoan Zhang, Jizhe Li, Xuanwu Yin, Kunlong Zuo, Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong, Wei Yu, Bingchun Luo, Sabari Nathan, Priya KansalComments: MIPI@CVPR2024. Website: this https URLSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at this https URL.
- [12] arXiv:2405.04902 [pdf, ps, other]
-
Title: HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image SynthesisSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Medical Image Synthesis (MIS) plays an important role in the intelligent medical field, which greatly saves the economic and time costs of medical diagnosis. However, due to the complexity of medical images and similar characteristics of different tissue cells, existing methods face great challenges in meeting their biological consistency. To this end, we propose the Hybrid Augmented Generative Adversarial Network (HAGAN) to maintain the authenticity of structural texture and tissue cells. HAGAN contains Attention Mixed (AttnMix) Generator, Hierarchical Discriminator and Reverse Skip Connection between Discriminator and Generator. The AttnMix consistency differentiable regularization encourages the perception in structural and textural variations between real and fake images, which improves the pathological integrity of synthetic images and the accuracy of features in local areas. The Hierarchical Discriminator introduces pixel-by-pixel discriminant feedback to generator for enhancing the saliency and discriminance of global and local details simultaneously. The Reverse Skip Connection further improves the accuracy for fine details by fusing real and synthetic distribution features. Our experimental evaluations on three datasets of different scales, i.e., COVID-CT, ACDC and BraTS2018, demonstrate that HAGAN outperforms the existing methods and achieves state-of-the-art performance in both high-resolution and low-resolution.
- [13] arXiv:2405.04962 [pdf, ps, other]
-
Title: Bistatic OFDM-based ISAC with Over-the-Air Synchronization: System Concept and Performance AnalysisDavid Brunner, Lucas Giroto de Oliveira, Charlotte Muth, Silvio Mandelli, Marcus Henninger, Axel Diewald, Yueheng Li, Mohamad Basim Alabd, Laurent Schmalen, Thomas Zwick, Benjamin NussSubjects: Signal Processing (eess.SP)
Integrated sensing and communication (ISAC) has been defined as one goal for 6G mobile communication systems. In this context, this article introduces a bistatic ISAC system based on orthogonal frequency-division multiplexing (OFDM). While the bistatic architecture brings advantages such as not demanding full duplex operation with respect to the monostatic one, the need for synchronizing transmitter and receiver is imposed. In this context, this article introuces a bistatic ISAC signal processing framework where an incoming OFDM-based ISAC signal undergoes over-the-air synchronization based on preamble symbols and pilots. Afterwards, bistatic radar processing is performed using either only pilot subcarriers or the full OFDM frame. The latter approach requires estimation of the originally transmitted frame based on communication processing and therefore error-free communication, which can be achieved via appropriate channel coding. The performance and limitations of the introduced system based on both aforementioned approaches are assessed via an analysis of the impact of residual synchronization mismatches and data decoding failures on both communication and radar performances. Finally, the performed analyses are validated by proof-of-concept measurement results.
- [14] arXiv:2405.05007 [pdf, ps, html, other]
-
Title: HC-Mamba: Vision MAMBA with Hybrid Convolutional Techniques for Medical Image SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Automatic medical image segmentation technology has the potential to expedite pathological diagnoses, thereby enhancing the efficiency of patient care. However, medical images often have complex textures and structures, and the models often face the problem of reduced image resolution and information loss due to downsampling. To address this issue, we propose HC-Mamba, a new medical image segmentation model based on the modern state space model Mamba. Specifically, we introduce the technique of dilated convolution in the HC-Mamba model to capture a more extensive range of contextual information without increasing the computational cost by extending the perceptual field of the convolution kernel. In addition, the HC-Mamba model employs depthwise separable convolutions, significantly reducing the number of parameters and the computational power of the model. By combining dilated convolution and depthwise separable convolutions, HC-Mamba is able to process large-scale medical image data at a much lower computational cost while maintaining a high level of performance. We conduct comprehensive experiments on segmentation tasks including skin lesion, and conduct extensive experiments on ISIC17 and ISIC18 to demonstrate the potential of the HC-Mamba model in medical image segmentation. The experimental results show that HC-Mamba exhibits competitive performance on all these datasets, thereby proving its effectiveness and usefulness in medical image segmentation.
- [15] arXiv:2405.05028 [pdf, ps, other]
-
Title: Stability And Uncertainty Propagation In Power Networks: A Lyapunov-based Approach With Applications To Renewable Resources AllocationSubjects: Systems and Control (eess.SY)
The rapid increase in the integration of intermittent and stochastic renewable energy resources (RER) introduces challenging issues related to power system stability. Interestingly, identifying grid nodes that can best support stochastic loads from RER, has gained recent interest. Methods based on Lyapunov stability are commonly exploited to assess the stability of power networks. These strategies approach quantifying system stability while considering: (i) simplified reduced order power system models that do not model power flow constraints, or (ii) datadriven methods that are prone to measurement noise and hence can inaccurately depict stochastic loads as system instability. In this paper, while considering a nonlinear differential algebraic equation (NL-DAE) model, we introduce a new method for assessing the impact of uncertain renewable power injections on the stability of power system nodes/buses. The identification of stable nodes informs the operator/utility on how renewables injections affect the stability of the grid. The proposed method is based on optimizing metrics equivalent to the Lyapunov spectrum of exponents; its underlying properties result in a computationally efficient and scalable stable node identification algorithm for renewable energy resources allocation. The proposed method is validated on the IEEE 9-bus and 200-bus networks
- [16] arXiv:2405.05030 [pdf, ps, other]
-
Title: Functional Specifications and Testing Requirements of Grid-Forming Type-IV Offshore Wind PowerSulav Ghimire, Gabriel M.G. Guerreiro, Kanakesh V.K., Emerson D. Guest, Kim H. Jensen, Guangya Yang, Xiongfei WangSubjects: Systems and Control (eess.SY)
Throughout the past few years, various transmission system operators (TSOs) and research institutes have defined several functional specifications for grid-forming (GFM) converters via grid codes, white papers, and technical documents. These institutes and organisations also proposed testing requirements for general inverter-based resources (IBRs) and specific GFM converters. This paper initially reviews functional specifications and testing requirements from several sources to create an understanding of GFM capabilities in general. Furthermore, it proposes an outlook of the defined GFM capabilities, functional specifications, and testing requirements for offshore wind power plant (OF WPP) applications from an original equipment manufacturer (OEM) perspective. Finally, this paper briefly establishes the relevance of new testing methodologies for equipment-level certification and model validation, focusing on GFM functional specifications.
- [17] arXiv:2405.05036 [pdf, ps, other]
-
Title: Dissipativity Conditions for Maximum Dynamic LoadabilitySubjects: Systems and Control (eess.SY)
In this paper we consider a possibility of stabilizing very fast electromagnetic interactions between Inverter Based Resources (IBRs), known as the Control Induced System Stability problems. We propose that when these oscillatory interactions are controlled the ability of the grid to deliver power to loads at high rates will be greatly increased. We refer to this grid property as the dynamic grid loadability. The approach is to start by modeling the dynamical behavior of all components. Next, to avoid excessive complexity, interactions between components are captured in terms of unified technology-agnostic aggregate variables, instantaneous power and rate of change of instantaneous reactive power. Sufficient dissipativity conditions in terms of rate of change of energy conversion in components themselves and bounds on their rate of change of interactions are derived in support of achieving the maximum system loadability. These physically intuitive conditions are then used to derive methods to increase loadability using high switching frequency reactive power sources. Numerical simulations confirm the theoretical calculations, and shows dynamic load-side reactive power support increases stable dynamic loadability regions.
- [18] arXiv:2405.05157 [pdf, ps, other]
-
Title: Filtering and smoothing estimation algorithms from uncertain nonlinear observations with time-correlated additive noise and random deception attacksJournal-ref: International Journal of Systems Science, March 19 2024Subjects: Signal Processing (eess.SP)
This paper discusses the problem of estimating a stochastic signal from nonlinear uncertain observations with time-correlated additive noise described by a first-order Markov process. Random deception attacks are assumed to be launched by an adversary, and both this phenomenon and the uncertainty in the observations are modelled by two sets of Bernoulli random variables. Under the assumption that the evolution model generating the signal to be estimated is unknown and only the mean and covariance functions of the processes involved in the observation equation are available, recursive algorithms based on linear approximations of the real observations are proposed for the least-squares filtering and fixed-point smoothing problems. Finally, the feasibility and effectiveness of the developed estimation algorithms are verified by a numerical simulation example, where the impact of uncertain observation and deception attack probabilities on estimation accuracy is evaluated.
- [19] arXiv:2405.05234 [pdf, ps, other]
-
Title: Performance Bounds for Velocity Estimation with Large Antenna ArraysComments: 5 pages, 5 figuresSubjects: Signal Processing (eess.SP)
Joint communication and sensing (JCS) is envisioned as an enabler of future 6G networks. One of the key features of these networks will be the use of extremely large aperture arrays (ELAAs) and high operating frequencies, which will result in significant near-field propagation effects. This unique property can be harnessed to improve sensing capabilities. In this paper, we focus on velocity sensing, as using ELAAs allows the estimation of not just the radial component but also the transverse component. We derive analytical performance bounds for both velocity components, demonstrating how they are affected by the different system parameters and geometries. These insights offer a foundational understanding of how near-field effects play in velocity sensing differently from the far field and from position estimate.
- [20] arXiv:2405.05235 [pdf, ps, other]
-
Title: RACH Traffic Prediction in Massive Machine Type CommunicationsSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Traffic pattern prediction has emerged as a promising approach for efficiently managing and mitigating the impacts of event-driven bursty traffic in massive machine-type communication (mMTC) networks. However, achieving accurate predictions of bursty traffic remains a non-trivial task due to the inherent randomness of events, and these challenges intensify within live network environments. Consequently, there is a compelling imperative to design a lightweight and agile framework capable of assimilating continuously collected data from the network and accurately forecasting bursty traffic in mMTC networks. This paper addresses these challenges by presenting a machine learning-based framework tailored for forecasting bursty traffic in multi-channel slotted ALOHA networks. The proposed machine learning network comprises long-term short-term memory (LSTM) and a DenseNet with feed-forward neural network (FFNN) layers, where the residual connections enhance the training ability of the machine learning network in capturing complicated patterns. Furthermore, we develop a new low-complexity online prediction algorithm that updates the states of the LSTM network by leveraging frequently collected data from the mMTC network. Simulation results and complexity analysis demonstrate the superiority of our proposed algorithm in terms of both accuracy and complexity, making it well-suited for time-critical live scenarios. We evaluate the performance of the proposed framework in a network with a single base station and thousands of devices organized into groups with distinct traffic-generating characteristics. Comprehensive evaluations and simulations indicate that our proposed machine learning approach achieves a remarkable $52\%$ higher accuracy in long-term predictions compared to traditional methods, without imposing additional processing load on the system.
- [21] arXiv:2405.05236 [pdf, ps, other]
-
Title: Stability and Performance Analysis of Discrete-Time ReLU Recurrent Neural NetworksSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper presents sufficient conditions for the stability and $\ell_2$-gain performance of recurrent neural networks (RNNs) with ReLU activation functions. These conditions are derived by combining Lyapunov/dissipativity theory with Quadratic Constraints (QCs) satisfied by repeated ReLUs. We write a general class of QCs for repeated RELUs using known properties for the scalar ReLU. Our stability and performance condition uses these QCs along with a "lifted" representation for the ReLU RNN. We show that the positive homogeneity property satisfied by a scalar ReLU does not expand the class of QCs for the repeated ReLU. We present examples to demonstrate the stability / performance condition and study the effect of the lifting horizon.
- [22] arXiv:2405.05239 [pdf, ps, other]
-
Title: Cellular Traffic Prediction Using Online Prediction AlgorithmsSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The advent of 5G technology promises a paradigm shift in the realm of telecommunications, offering unprecedented speeds and connectivity. However, the efficient management of traffic in 5G networks remains a critical challenge. It is due to the dynamic and heterogeneous nature of network traffic, varying user behaviors, extended network size, and diverse applications, all of which demand highly accurate and adaptable prediction models to optimize network resource allocation and management. This paper investigates the efficacy of live prediction algorithms for forecasting cellular network traffic in real-time scenarios. We apply two live prediction algorithms on machine learning models, one of which is recently proposed Fast LiveStream Prediction (FLSP) algorithm. We examine the performance of these algorithms under two distinct data gathering methodologies: synchronous, where all network cells report statistics simultaneously, and asynchronous, where reporting occurs across consecutive time slots. Our study delves into the impact of these gathering scenarios on the predictive performance of traffic models. Our study reveals that the FLSP algorithm can halve the required bandwidth for asynchronous data reporting compared to conventional online prediction algorithms, while simultaneously enhancing prediction accuracy and reducing processing load. Additionally, we conduct a thorough analysis of algorithmic complexity and memory requirements across various machine learning models. Through empirical evaluation, we provide insights into the trade-offs inherent in different prediction strategies, offering valuable guidance for network optimization and resource allocation in dynamic environments.
- [23] arXiv:2405.05244 [pdf, ps, other]
-
Title: SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation PlanYou Zhang, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Tomoki Toda, Zhiyao DuanComments: Evaluation plan of the SVDD Challenge @ SLT 2024Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD)
The rapid advancement of AI-generated singing voices, which now closely mimic natural human singing and align seamlessly with musical scores, has led to heightened concerns for artists and the music industry. Unlike spoken voice, singing voice presents unique challenges due to its musical nature and the presence of strong background music, making singing voice deepfake detection (SVDD) a specialized field requiring focused attention. To promote SVDD research, we recently proposed the "SVDD Challenge," the very first research challenge focusing on SVDD for lab-controlled and in-the-wild bonafide and deepfake singing voice recordings. The challenge will be held in conjunction with the 2024 IEEE Spoken Language Technology Workshop (SLT 2024).
New submissions for Thursday, 9 May 2024 (showing 23 of 23 entries )
- [24] arXiv:2405.04535 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Image Classification for CSSVD Detection in Cacao PlantsSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
The detection of diseases within plants has attracted a lot of attention from computer vision enthusiasts. Despite the progress made to detect diseases in many plants, there remains a research gap to train image classifiers to detect the cacao swollen shoot virus disease or CSSVD for short, pertinent to cacao plants. This gap has mainly been due to the unavailability of high quality labeled training data. Moreover, institutions have been hesitant to share their data related to CSSVD. To fill these gaps, we propose the development of image classifiers to detect CSSVD-infected cacao plants. Our proposed solution is based on VGG16, ResNet50 and Vision Transformer (ViT). We evaluate the classifiers on a recently released and publicly accessible KaraAgroAI Cocoa dataset. Our best image classifier, based on ResNet50, achieves 95.39\% precision, 93.75\% recall, 94.34\% F1-score and 94\% accuracy on only 20 epochs. There is a +9.75\% improvement in recall when compared to previous works. Our results indicate that the image classifiers learn to identify cacao plants infected with CSSVD.
- [25] arXiv:2405.04539 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Some variation of COBRA in sequential learning setupSubjects: Machine Learning (stat.ML); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Signal Processing (eess.SP); Computational Finance (q-fin.CP)
This research paper introduces innovative approaches for multivariate time series forecasting based on different variations of the combined regression strategy. We use specific data preprocessing techniques which makes a radical change in the behaviour of prediction. We compare the performance of the model based on two types of hyper-parameter tuning Bayesian optimisation (BO) and Usual Grid search. Our proposed methodologies outperform all state-of-the-art comparative models. We illustrate the methodologies through eight time series datasets from three categories: cryptocurrency, stock index, and short-term load forecasting.
- [26] arXiv:2405.04722 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: Detecting and Refining HiRISE Image Patches Obscured by Atmospheric DustSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
HiRISE (High-Resolution Imaging Science Experiment) is a camera onboard the Mars Reconnaissance orbiter responsible for photographing vast areas of the Martian surface in unprecedented detail. It can capture millions of incredible closeup images in minutes. However, Mars suffers from frequent regional and local dust storms hampering this data-collection process, and pipeline, resulting in loss of effort and crucial flight time. Removing these images manually requires a large amount of manpower. I filter out these images obstructed by atmospheric dust automatically by using a Dust Image Classifier fine-tuned on Resnet-50 with an accuracy of 94.05%. To further facilitate the seamless filtering of Images I design a prediction pipeline that classifies and stores these dusty patches. I also denoise partially obstructed images using an Auto Encoder-based denoiser and Pix2Pix GAN with 0.75 and 0.99 SSIM Index respectively.
- [27] arXiv:2405.04821 (cross-list from cs.RO) [pdf, ps, html, other]
-
Title: ATDM:An Anthropomorphic Aerial Tendon-driven Manipulator with Low-Inertia and High-StiffnessSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Aerial Manipulator Systems (AMS) have garnered significant interest for their utility in aerial operations. Nonetheless, challenges related to the manipulator's limited stiffness and the coupling disturbance with manipulator movement persist. This paper introduces the Aerial Tendon-Driven Manipulator (ATDM), an innovative AMS that integrates a hexrotor Unmanned Aerial Vehicle (UAV) with a 4-degree-of-freedom (4-DOF) anthropomorphic tendon-driven manipulator. The design of the manipulator is anatomically inspired, emulating the human arm anatomy from the shoulder joint downward. To enhance the structural integrity and performance, finite element topology optimization and lattice optimization are employed on the links to replicate the radially graded structure characteristic of bone, this approach effectively reduces weight and inertia while simultaneously maximizing stiffness. A novel tensioning mechanism with adjustable tension is introduced to address cable relaxation, and a Tension-amplification tendon mechanism is implemented to increase the manipulator's overall stiffness and output. The paper presents a kinematic model based on virtual coupled joints, a comprehensive workspace analysis, and detailed calculations of output torques and stiffness for individual arm joints.
The prototype arm has a total weight of 2.7 kg, with the end effector contributing only 0.818 kg. By positioning all actuators at the base, coupling disturbance are minimized. The paper includes a detailed mechanical design and validates the system's performance through semi-physical multi-body dynamics simulations, confirming the efficacy of the proposed design. - [28] arXiv:2405.04837 (cross-list from cs.CR) [pdf, ps, other]
-
Title: Enhancing Data Integrity and Traceability in Industry Cyber Physical Systems (ICPS) through Blockchain Technology: A Comprehensive ApproachSubjects: Cryptography and Security (cs.CR); Systems and Control (eess.SY)
Blockchain technology, heralded as a transformative innovation, has far-reaching implications beyond its initial application in cryptocurrencies. This study explores the potential of blockchain in enhancing data integrity and traceability within Industry Cyber-Physical Systems (ICPS), a crucial aspect in the era of Industry 4.0. ICPS, integrating computational and physical components, is pivotal in managing critical infrastructure like manufacturing, power grids, and transportation networks. However, they face challenges in security, privacy, and reliability. With its inherent immutability, transparency, and distributed consensus, blockchain presents a groundbreaking approach to address these challenges. It ensures robust data reliability and traceability across ICPS, enhancing transaction transparency and facilitating secure data sharing. This research unearths various blockchain applications in ICPS, including supply chain management, quality control, contract management, and data sharing. Each application demonstrates blockchain's capacity to streamline processes, reduce fraud, and enhance system efficiency. In supply chain management, blockchain provides real-time auditing and compliance. For quality control, it establishes tamper-proof records, boosting consumer confidence. In contract management, smart contracts automate execution, enhancing efficiency. Blockchain also fosters secure collaboration in ICPS, which is crucial for system stability and safety. This study emphasizes the need for further research on blockchain's practical implementation in ICPS, focusing on challenges like scalability, system integration, and security vulnerabilities. It also suggests examining blockchain's economic and organizational impacts in ICPS to understand its feasibility and long-term advantages.
- [29] arXiv:2405.04865 (cross-list from cs.LG) [pdf, ps, html, other]
-
Title: Regime Learning for Differentiable Particle FiltersSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Differentiable particle filters are an emerging class of models that combine sequential Monte Carlo techniques with the flexibility of neural networks to perform state space inference. This paper concerns the case where the system may switch between a finite set of state-space models, i.e. regimes. No prior approaches effectively learn both the individual regimes and the switching process simultaneously. In this paper, we propose the neural network based regime learning differentiable particle filter (RLPF) to address this problem. We further design a training procedure for the RLPF and other related algorithms. We demonstrate competitive performance compared to the previous state-of-the-art algorithms on a pair of numerical experiments.
- [30] arXiv:2405.04880 (cross-list from cs.SD) [pdf, ps, other]
-
Title: The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake AudioYuankun Xie, Yi Lu, Ruibo Fu, Zhengqi Wen, Zhiyong Wang, Jianhua Tao, Xin Qi, Xiaopeng Wang, Yukun Liu, Haonan Cheng, Long Ye, Yi SunSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
With the proliferation of Audio Language Model (ALM) based deepfake audio, there is an urgent need for effective detection methods. Unlike traditional deepfake audio generation, which often involves multi-step processes culminating in vocoder usage, ALM directly utilizes neural codec methods to decode discrete codes into audio. Moreover, driven by large-scale data, ALMs exhibit remarkable robustness and versatility, posing a significant challenge to current audio deepfake detection (ADD) models. To effectively detect ALM-based deepfake audio, we focus on the mechanism of the ALM-based audio generation method, the conversion from neural codec to waveform. We initially construct the Codecfake dataset, an open-source large-scale dataset, including two languages, millions of audio samples, and various test conditions, tailored for ALM-based audio detection. Additionally, to achieve universal detection of deepfake audio and tackle domain ascent bias issue of original SAM, we propose the CSAM strategy to learn a domain balanced and generalized minima. Experiment results demonstrate that co-training on Codecfake dataset and vocoded dataset with CSAM strategy yield the lowest average Equal Error Rate (EER) of 0.616% across all test conditions compared to baseline models.
- [31] arXiv:2405.04976 (cross-list from cs.IT) [pdf, ps, other]
-
Title: RF-based Energy Harvesting: Nonlinear Models, Applications and ChallengesSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
So far, various aspects associated with wireless energy harvesting (EH) have been investigated from diverse perspectives, including energy sources and models, usage protocols, energy scheduling and optimization, and EH implementation in different wireless communication systems. However, a comprehensive survey specifically focusing on models of radio frequency (RF)-based EH behaviors has not yet been presented. To address this gap, this article provides an overview of the mainstream mathematical models that capture the nonlinear behavior of practical EH circuits, serving as a valuable handbook of mathematical models for EH application research. Moreover, we summarize the application of each nonlinear EH model, including the associated challenges and precautions. We also analyze the impact and advancements of each EH model on RF-based EH systems in wireless communication, utilizing artificial intelligence (AI) techniques. Additionally, we highlight emerging research directions in the context of nonlinear RF-based EH. This article aims to contribute to the future application of RF-based EH in novel communication research domains to a significant extent.
- [32] arXiv:2405.04997 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Bridging the Gap Between Saliency Prediction and Image Quality AssessmentSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Over the past few years, deep neural models have made considerable advances in image quality assessment (IQA). However, the underlying reasons for their success remain unclear, owing to the complex nature of deep neural networks. IQA aims to describe how the human visual system (HVS) works and to create its efficient approximations. On the other hand, Saliency Prediction task aims to emulate HVS via determining areas of visual interest. Thus, we believe that saliency plays a crucial role in human perception. In this work, we conduct an empirical study that reveals the relation between IQA and Saliency Prediction tasks, demonstrating that the former incorporates knowledge of the latter. Moreover, we introduce a novel SACID dataset of saliency-aware compressed images and conduct a large-scale comparison of classic and neural-based IQA methods. All supplementary code and data will be available at the time of publication.
- [33] arXiv:2405.05016 (cross-list from cs.CV) [pdf, ps, html, other]
-
Title: TGTM: TinyML-based Global Tone Mapping for HDR SensorsSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Advanced driver assistance systems (ADAS) relying on multiple cameras are increasingly prevalent in vehicle technology. Yet, conventional imaging sensors struggle to capture clear images in conditions with intense illumination contrast, such as tunnel exits, due to their limited dynamic range. Introducing high dynamic range (HDR) sensors addresses this issue. However, the process of converting HDR content to a displayable range via tone mapping often leads to inefficient computations, when performed directly on pixel data. In this paper, we focus on HDR image tone mapping using a lightweight neural network applied on image histogram data. Our proposed TinyML-based global tone mapping method, termed as TGTM, operates at 9,000 FLOPS per RGB image of any resolution. Additionally, TGTM offers a generic approach that can be incorporated to any classical tone mapping method. Experimental results demonstrate that TGTM outperforms state-of-the-art methods on real HDR camera images by up to 5.85 dB higher PSNR with orders of magnitude less computations.
- [34] arXiv:2405.05107 (cross-list from cs.ET) [pdf, ps, other]
-
Title: Leveraging AES Padding: dBs for Nothing and FEC for Free in IoT SystemsJongchan Woo, Vipindev Adat Vasudevan, Benjamin D. Kim, Rafael G. L. D'Oliveira, Alejandro Cohen, Thomas Stahlbuhk, Ken R. Duffy, Muriel MédardSubjects: Emerging Technologies (cs.ET); Hardware Architecture (cs.AR); Systems and Control (eess.SY)
The Internet of Things (IoT) represents a significant advancement in digital technology, with its rapidly growing network of interconnected devices. This expansion, however, brings forth critical challenges in data security and reliability, especially under the threat of increasing cyber vulnerabilities. Addressing the security concerns, the Advanced Encryption Standard (AES) is commonly employed for secure encryption in IoT systems. Our study explores an innovative use of AES, by repurposing AES padding bits for error correction and thus introducing a dual-functional method that seamlessly integrates error-correcting capabilities into the standard encryption process. The integration of the state-of-the-art Guessing Random Additive Noise Decoder (GRAND) in the receiver's architecture facilitates the joint decoding and decryption process. This strategic approach not only preserves the existing structure of the transmitter but also significantly enhances communication reliability in noisy environments, achieving a notable over 3 dB gain in Block Error Rate (BLER). Remarkably, this enhanced performance comes with a minimal power overhead at the receiver - less than 15% compared to the traditional decryption-only process, underscoring the efficiency of our hardware design for IoT applications. This paper discusses a comprehensive analysis of our approach, particularly in energy efficiency and system performance, presenting a novel and practical solution for reliable IoT communications.
- [35] arXiv:2405.05126 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Exploring Speech Pattern Disorders in Autism using Machine LearningSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Diagnosing autism spectrum disorder (ASD) by identifying abnormal speech patterns from examiner-patient dialogues presents significant challenges due to the subtle and diverse manifestations of speech-related symptoms in affected individuals. This study presents a comprehensive approach to identify distinctive speech patterns through the analysis of examiner-patient dialogues. Utilizing a dataset of recorded dialogues, we extracted 40 speech-related features, categorized into frequency, zero-crossing rate, energy, spectral characteristics, Mel Frequency Cepstral Coefficients (MFCCs), and balance. These features encompass various aspects of speech such as intonation, volume, rhythm, and speech rate, reflecting the complex nature of communicative behaviors in ASD. We employed machine learning for both classification and regression tasks to analyze these speech features. The classification model aimed to differentiate between ASD and non-ASD cases, achieving an accuracy of 87.75%. Regression models were developed to predict speech pattern related variables and a composite score from all variables, facilitating a deeper understanding of the speech dynamics associated with ASD. The effectiveness of machine learning in interpreting intricate speech patterns and the high classification accuracy underscore the potential of computational methods in supporting the diagnostic processes for ASD. This approach not only aids in early detection but also contributes to personalized treatment planning by providing insights into the speech and communication profiles of individuals with ASD.
- [36] arXiv:2405.05133 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Identifying every building's function in large-scale urban areas with multi-modality remote-sensing dataComments: 5 pages, 7 figures, accepted by IGARSS 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sensing data. In detail, optical images, building height, and nighttime-light data are collected to describe the morphological attributes of buildings. Then, the area of interest (AOI) and building masks from the volunteered geographic information (VGI) data are collected to form sparsely labeled samples. Furthermore, the multi-modality data and weak labels are utilized to train a segmentation model with a semi-supervised strategy. Finally, results are evaluated by 20,000 validation points and statistical survey reports from the government. The evaluations reveal that the produced function maps achieve an OA of 82% and Kappa of 71% among 1,616,796 buildings in Shanghai, China. This study has the potential to support large-scale urban management and sustainable urban development. All collected data and produced maps are open access at this https URL.
- [37] arXiv:2405.05162 (cross-list from cs.RO) [pdf, ps, other]
-
Title: A Dual-Motor Actuator for Ceiling Robots with High Force and High Speed CapabilitiesComments: 9 pages, 11 figuresSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
Patient transfer devices allow to move patients passively in hospitals and care centers. Instead of hoisting the patient, it would be beneficial in some cases to assist their movement, enabling them to move by themselves. However, patient assistance requires devices capable of precisely controlling output forces at significantly higher speeds than those used for patient transfers alone, and a single motor solution would be over-sized and show poor efficiency to do both functions. This paper presents a dual-motor actuator and control schemes adapted for a patient mobility equipment that can be used to transfer patients, assist patient in their movement, and help prevent falls. The prototype is shown to be able to lift patients weighing up to 318 kg, to assist a patient with a desired force of up to 100 kg with a precision of 7.8%. Also, a smart control scheme to manage falls is shown to be able to stop a patient who is falling by applying a desired deceleration.
- [38] arXiv:2405.05170 (cross-list from cs.MM) [pdf, ps, other]
-
Title: Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortionsComments: Accepted by ICME2024Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Digital watermarking is the process of embedding secret information by altering images in a way that is undetectable to the human eye. To increase the robustness of the model, many deep learning-based watermarking methods use the encoder-decoder architecture by adding different noises to the noise layer. The decoder then extracts the watermarked information from the distorted image. However, this method can only resist weak noise attacks. To improve the robustness of the algorithm against stronger noise, this paper proposes to introduce a denoise module between the noise layer and the decoder. The module is aimed at reducing noise and recovering some of the information lost during an attack. Additionally, the paper introduces the SE module to fuse the watermarking information pixel-wise and channel dimensions-wise, improving the encoder's efficiency. Experimental results show that our proposed method is comparable to existing models and outperforms state-of-the-art under different noise intensities. In addition, ablation experiments show the superiority of our proposed module.
- [39] arXiv:2405.05240 (cross-list from cs.SD) [pdf, ps, other]
-
Title: An LSTM-Based Chord Generation System Using Chroma Histogram RepresentationsComments: 6 pages, 4 figures, 1 tableSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
This paper proposes a system for chord generation to monophonic symbolic melodies using an LSTM-based model trained on chroma histogram representations of chords. Chroma representations promise more harmonically rich generation than chord label-based approaches, whilst maintaining a small number of dimensions in the dataset. This system is shown to be suitable for limited real-time use. While it does not meet the state-of-the-art for coherent long-term generation, it does show diatonic generation with cadential chord relationships. The need for further study into chroma histograms as an extracted feature in chord generation tasks is highlighted.
- [40] arXiv:2405.05252 (cross-list from cs.CV) [pdf, ps, other]
-
Title: Attention-Driven Training-Free Efficiency Enhancement of Diffusion ModelsComments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: this https URL.
Cross submissions for Thursday, 9 May 2024 (showing 17 of 17 entries )
- [41] arXiv:2211.04019 (replaced) [pdf, ps, other]
-
Title: Dynamic Sensor Placement Based on Graph Sampling TheorySubjects: Signal Processing (eess.SP)
In this paper, we consider a sensor placement problem where sensors can move within a network over time. Sensor placement problem aims to select K sensor positions from N candidates where K < N. Most existing methods assume that sensor positions are static, i.e., they do not move, however, many mobile sensors like drones, robots, and vehicles can change their positions over time. Moreover, underlying measurement conditions could also be changed, which are difficult to cover with statically placed sensors. We tackle the problem by allowing the sensors to change their positions in their neighbors on the network. We sequentially learn the dictionary from a pool of observed signals on the network based on graph sampling theory. Using the learned dictionary, we dynamically determine the sensor positions such that the non-observed signals on the network can be best recovered from the observations. In experiments, we validate the effectiveness of the proposed method via the mean squared error (MSE) of the reconstructed signals. The proposed dynamic sensor placement outperforms the existing static ones for both synthetic and real data.
- [42] arXiv:2212.07484 (replaced) [pdf, ps, html, other]
-
Title: Joint Delay-Phase Precoding Under True-Time Delay Constraints in Wideband Sub-THz Hybrid Massive MIMO SystemsSubjects: Signal Processing (eess.SP)
In wideband sub-Terahertz (sub-THz) massive multiple-input multiple-output (MIMO) communication, the beam squint effect leads to a significant loss of array gain. A hybrid precoding approach based on true-time delay (TTD) and phase shifter (PS) has been proposed to mitigate this effect. Existing methods make the assumption that the TTD precoder can provide arbitrary time delay values and design it by fixing the PS precoder. This work presents a novel joint optimization framework for the TTD and PS precoder design, incorporating realistic time delay constraints for each TTD device. Unlike previous methods, our framework does not rely on the unbounded time delay assumption and optimizes both the TTD and PS values simultaneously to cope with the practical limitations. Moreover, we derive the minimum number of TTD devices required to achieve a given array gain target using our framework. Simulations confirm the proposed approach exhibits performance improvement, guarantees array gain, and achieves computational efficiency.
- [43] arXiv:2303.02408 (replaced) [pdf, ps, other]
-
Title: Data Augmentation for Generating Synthetic Electrogastrogram Time SeriesJournal-ref: Med.Biol.Eng.Comput.(2024):1-13Subjects: Signal Processing (eess.SP)
To address an emerging need for large number of diverse datasets for rigor evaluation of signal processing techniques, we developed and evaluated a new method for generating synthetic electrogastrogram time series. We used electrogastrography (EGG) data from an open database to set model parameters and statistical tests to evaluate synthesized data. Additionally, we illustrated method customization for generating artificial EGG time series alterations caused by the simulator sickness. Proposed data augmentation method generates synthetic EGG data with specified duration, sampling frequency, recording state (postprandial or fasting state), overall noise and breathing artifact injection, and pauses in the gastric rhythm (arrhythmia occurrence) with statistically significant difference between postprandial and fasting states in > 70% cases while not accounting for individual differences. Features obtained from the synthetic EGG signal resembling simulator sickness occurrence displayed expected trends. The code for generation of synthetic EGG time series is not only freely available and can be further customized to assess signal processing algorithms but also may be used to increase data diversity for training artificial intelligence (AI) algorithms. The proposed approach is customized for EGG data synthesis but can be easily utilized for other biosignals with similar nature such as electroencephalogram.
- [44] arXiv:2303.03295 (replaced) [pdf, ps, html, other]
-
Title: Probabilistic Game-Theoretic Traffic RoutingSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
We examine the routing problem for self-interested vehicles using stochastic decision strategies. By approximating the road latency functions and a non-linear variable transformation, we frame the problem as an aggregative game. We characterize the approximation error and we derive a new monotonicity condition for a broad category of games that encompasses the problem under consideration. Next, we propose a semi-decentralized algorithm to calculate the routing as a variational generalized Nash equilibrium and demonstrate the solution's benefits with numerical simulations. In the particular case of potential games, which emerges for linear latency functions, we explore a receding-horizon formulation of the routing problem, showing asymptotic convergence to destinations and analysing closed-loop performance dependence on horizon length through numerical simulations.
- [45] arXiv:2304.03483 (replaced) [pdf, ps, html, other]
-
Title: RED-PSM: Regularization by Denoising of Factorized Low Rank Models for Dynamic ImagingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Dynamic imaging addresses the recovery of a time-varying 2D or 3D object at each time instant using its undersampled measurements. In particular, in the case of dynamic tomography, only a single projection at a single view angle may be available at a time, making the problem severely ill-posed. We propose an approach, RED-PSM, which combines for the first time two powerful techniques to address this challenging imaging problem. The first, are non-parametric factorized low rank models, also known as partially separable models (PSMs), which have been used to efficiently introduce a low-rank prior for the spatio-temporal object. The second is the recent Regularization by Denoising (RED), which provides a flexible framework to exploit the impressive performance of state-of-the-art image denoising algorithms, for various inverse problems. We propose a partially separable objective with RED and a computationally efficient and scalable optimization scheme with variable splitting and ADMM. Theoretical analysis proves the convergence of our objective to a value corresponding to a stationary point satisfying the first-order optimality conditions. Convergence is accelerated by a particular projection-domain-based initialization. We demonstrate the performance and computational improvements of our proposed RED-PSM with a learned image denoiser by comparing it to a recent deep-prior-based method known as TD-DIP. Although the main focus is on dynamic tomography, we also show performance advantages of RED-PSM in a cardiac dynamic MRI setting.
- [46] arXiv:2308.08946 (replaced) [pdf, ps, other]
-
Title: FR2 5G Networks for Industrial Scenarios: Experimental Characterization and Beam Management Procedures in Operational ConditionsAlejandro Ramírez-Arroyo, Melisa López, Ignacio Rodríguez, Troels B. Sørensen, Samantha Caporal del Barrio, Pablo Padilla, Juan F. Valenzuela-Valdés, Preben MogensenComments: Published in IEEE Transactions on Vehicular Technology, 2024Subjects: Signal Processing (eess.SP)
Industrial environments constitute a challenge in terms of radio propagation due to the presence of machinery and the mobility of the different agents, especially at mmWave bands. This paper presents an experimental evaluation of a FR2 5G network deployed in an operational factory scenario at 26 GHz. The experimental characterization, performed with autonomous mobile robots that self-navigate the industrial lab, leads to the analysis of the received power along the factory and the evaluation of reference path gain models. The proposed assessment deeply analyzes the physical layer of the communication network under operational conditions. Thus, two different network configurations are assessed by measuring the power received in the entire factory, providing a comparison between deployments. Additionally, beam management procedures, such as beam recovery, beam sweeping or beam switching, are analyzed since they are crucial in environments where mobile agents are involved. They aim for a zero interruption approach based on reliable communications. The results analysis shows that beam recovery procedures can perform a beam switching to an alternative serving beam with power losses of less than 1.6 dB on average. Beam sweeping analysis demonstrates the prevalence of the direct component in Line-of-Sight conditions despite the strong scattering component and large-scale fading in the environment.
- [47] arXiv:2310.06259 (replaced) [pdf, ps, html, other]
-
Title: Cross-modal Cognitive Consensus guided Audio-Visual SegmentationComments: 14 pagesSubjects: Image and Video Processing (eess.IV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Audio-Visual Segmentation (AVS) aims to extract the sounding object from a video frame, which is represented by a pixel-wise segmentation mask for application scenarios such as multi-modal video editing, augmented reality, and intelligent robot systems. The pioneering work conducts this task through dense feature-level audio-visual interaction, which ignores the dimension gap between different modalities. More specifically, the audio clip could only provide a Global semantic label in each sequence, but the video frame covers multiple semantic objects across different Local regions, which leads to mislocalization of the representationally similar but semantically different object. In this paper, we propose a Cross-modal Cognitive Consensus guided Network (C3N) to align the audio-visual semantics from the global dimension and progressively inject them into the local regions via an attention mechanism. Firstly, a Cross-modal Cognitive Consensus Inference Module (C3IM) is developed to extract a unified-modal label by integrating audio/visual classification confidence and similarities of modality-agnostic label embeddings. Then, we feed the unified-modal label back to the visual backbone as the explicit semantic-level guidance via a Cognitive Consensus guided Attention Module (CCAM), which highlights the local features corresponding to the interested object. Extensive experiments on the Single Sound Source Segmentation (S4) setting and Multiple Sound Source Segmentation (MS3) setting of the AVSBench dataset demonstrate the effectiveness of the proposed method, which achieves state-of-the-art performance. Code will be available at this https URL once accepted.
- [48] arXiv:2310.14465 (replaced) [pdf, ps, other]
-
Title: Channel State Information-Free Location-Privacy Enhancement: Delay-Angle Information SpoofingSubjects: Signal Processing (eess.SP)
In this paper, a delay-angle information spoofing (DAIS) strategy is proposed for location-privacy enhancement. By shifting the location-relevant delays and angles without the aid of channel state information (CSI) at the transmitter, the eavesdropper is obfuscated by a physical location that is distinct from the true one. A precoder is designed to preserve location-privacy while the legitimate localizer can remove the obfuscation with the securely shared information. Then, a lower bound on the localization error is derived via the analysis of the geometric mismatch caused by DAIS, validating the enhanced location-privacy. The statistical hardness for the estimation of the shared information is also investigated to assess the robustness to the potential leakage of the designed precoder structure. Numerical comparisons show that the proposed DAIS scheme results in more than 15 dB performance degradation for the illegitimate localizer at high signal-to-noise ratios, which is comparable to a recently proposed CSI-free location-privacy enhancement strategy and is less sensitive to the precoder structure leakage than the prior approach.
- [49] arXiv:2311.05532 (replaced) [pdf, ps, html, other]
-
Title: Uncertainty-Aware Bayes' Rule and Its ApplicationsSubjects: Signal Processing (eess.SP); Methodology (stat.ME)
Bayes' rule has enabled innumerable powerful algorithms of statistical signal processing and statistical machine learning. However, when there exist model misspecifications in prior distributions and/or data distributions, the direct application of Bayes' rule is questionable. Philosophically, the key is to balance the relative importance of prior and data distributions when calculating posterior distributions: if prior (resp. data) distributions are overly conservative, we should upweight the prior belief (resp. data evidence); if prior (resp. data) distributions are overly opportunistic, we should downweight the prior belief (resp. data evidence). This paper derives a generalized Bayes' rule, called uncertainty-aware Bayes' rule, to technically realize the above philosophy, i.e., to combat the model uncertainties in prior distributions and/or data distributions. Simulated and real-world experiments on classification and estimation showcase the superiority of the presented uncertainty-aware Bayes' rule over the conventional Bayes' rule: In particular, the uncertainty-aware Bayes classifier, the uncertainty-aware Kalman filter, the uncertainty-aware particle filter, and the uncertainty-aware interactive-multiple-model filter are suggested and validated.
- [50] arXiv:2311.12770 (replaced) [pdf, ps, html, other]
-
Title: Swift Parameter-free Attention Network for Efficient Super-ResolutionComments: NTIRE2024 ESR winnerSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Single Image Super-Resolution (SISR) is a crucial task in low-level computer vision, aiming to reconstruct high-resolution images from low-resolution counterparts. Conventional attention mechanisms have significantly improved SISR performance but often result in complex network structures and large number of parameters, leading to slow inference speed and large model size. To address this issue, we propose the Swift Parameter-free Attention Network (SPAN), a highly efficient SISR model that balances parameter count, inference speed, and image quality. SPAN employs a novel parameter-free attention mechanism, which leverages symmetric activation functions and residual connections to enhance high-contribution information and suppress redundant information. Our theoretical analysis demonstrates the effectiveness of this design in achieving the attention mechanism's purpose. We evaluate SPAN on multiple benchmarks, showing that it outperforms existing efficient super-resolution models in terms of both image quality and inference speed, achieving a significant quality-speed trade-off. This makes SPAN highly suitable for real-world applications, particularly in resource-constrained scenarios. Notably, we won the first place both in the overall performance track and runtime track of the NTIRE 2024 efficient super-resolution challenge. Our code and models are made publicly available at this https URL.
- [51] arXiv:2311.18188 (replaced) [pdf, ps, html, other]
-
Title: Speech Understanding on Tiny Devices with A Learning CacheComments: accepted at MobiSys'24Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
This paper addresses spoken language understanding (SLU) on microcontroller-like embedded devices, integrating on-device execution with cloud offloading in a novel fashion. We leverage temporal locality in the speech inputs to a device and reuse recent SLU inferences accordingly. Our idea is simple: let the device match incoming inputs against cached results, and only offload inputs not matched to any cached ones to the cloud for full inference. Realization of this idea, however, is non-trivial: the device needs to compare acoustic features in a robust yet low-cost way. To this end, we present SpeechCache (or SC), a speech cache for tiny devices. It matches speech inputs at two levels of representations: first by sequences of clustered raw sound units, then as sequences of phonemes. Working in tandem, the two representations offer complementary tradeoffs between cost and efficiency. To boost accuracy even further, our cache learns to personalize: with the mismatched and then offloaded inputs, it continuously finetunes the device's feature extractors with the assistance of the cloud. We implement SC on an off-the-shelf STM32 microcontroller. The complete implementation has a small memory footprint of 2MB. Evaluated on challenging speech benchmarks, our system resolves 45%-90% of inputs on device, reducing the average latency by up to 80% compared to offloading to popular cloud speech recognition services. The benefit brought by our proposed SC is notable even in adversarial settings - noisy environments, cold cache, or one device shared by a number of users.
- [52] arXiv:2312.06101 (replaced) [pdf, ps, other]
-
Title: Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-ResolutionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: this https URL.
- [53] arXiv:2401.02565 (replaced) [pdf, ps, html, other]
-
Title: Demonstration of an Adversarial Attack Against a Multimodal Vision Language Model for Pathology ImagingPoojitha Thota, Jai Prakash Veerla, Partha Sai Guttikonda, Mohammad S. Nasr, Shirin Nilizadeh, Jacob M. LuberSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Tissues and Organs (q-bio.TO)
In the context of medical artificial intelligence, this study explores the vulnerabilities of the Pathology Language-Image Pretraining (PLIP) model, a Vision Language Foundation model, under targeted attacks. Leveraging the Kather Colon dataset with 7,180 H&E images across nine tissue types, our investigation employs Projected Gradient Descent (PGD) adversarial perturbation attacks to induce misclassifications intentionally. The outcomes reveal a 100% success rate in manipulating PLIP's predictions, underscoring its susceptibility to adversarial perturbations. The qualitative analysis of adversarial examples delves into the interpretability challenges, shedding light on nuanced changes in predictions induced by adversarial manipulations. These findings contribute crucial insights into the interpretability, domain adaptation, and trustworthiness of Vision Language Models in medical imaging. The study emphasizes the pressing need for robust defenses to ensure the reliability of AI models. The source codes for this experiment can be found at this https URL.
- [54] arXiv:2402.00329 (replaced) [pdf, ps, other]
-
Title: Optimized Parameter Design for Channel State Information-Free Location SpoofingComments: arXiv admin note: text overlap with arXiv:2310.14465Subjects: Signal Processing (eess.SP)
In this paper, an augmented analysis of a delay-angle information spoofing (DAIS) is provided for location-privacy preservation, where the location-relevant delays and angles are artificially shifted to obfuscate the eavesdropper with an incorrect physical location. A simplified misspecified Cramer-Rao bound (MCRB) is derived, which clearly manifests that not only estimation error, but also the geometric mismatch introduced by DAIS can lead to a significant increase in localization error for an eavesdropper. Given an assumption of the orthogonality among wireless paths, the simplified MCRB can be further expressed as a function of delay-angle shifts in a closed-form, which enables the more straightforward optimization of these design parameters for location-privacy enhancement. Numerical results are provided, validating the theoretical analysis and showing that the root-mean-square error for eavesdropper's localization can be more than 150 m with the optimized delay-angle shifts for DAIS.
- [55] arXiv:2402.11983 (replaced) [pdf, ps, other]
-
Title: Antenna Array Design for Mono-Static ISACComments: 5 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Signal Processing (eess.SP)
Mono-static sensing operations in ISAC perform joint beamforming between transmitter and receiver. However, in contrast to pure radar systems, ISAC requires to fulfill communications tasks and to retain the corresponding design constraints for at least one half-duplex antenna array. This shifts the available degrees of freedom to the design of the second half-duplex array, that completes the mono-static sensing setup of the 6G ISAC system. Consequently, although it is still possible to achieve the gains foreseen by the radar sparse array literature, it is necessary to adapt these considerations to the new ISAC paradigm.
In this work, we propose a model to evaluate the angular capabilities of a mono-static setup, constrained to the shape of the communications array and its topology requirements in wireless networks. Accordingly, we enhance the joint angular capabilities by utilizing a sparse element topology of the sensing array with the same number of elements. Our analysis is validated by simulation experiments, confirming the value of our model in providing system designers with a tool to drastically improve the trade-off between angular capabilities for sensing and the cost of the deployed hardware. - [56] arXiv:2402.17455 (replaced) [pdf, ps, other]
-
Title: CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound ExtractionSubjects: Audio and Speech Processing (eess.AS)
Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings. This can be achieved by language-queried target sound extraction (TSE), which typically consists of two components: a query network that converts user queries into conditional embeddings, and a separation network that extracts the target sound accordingly. Existing methods commonly train models from scratch. As a consequence, substantial data and computational resources are required to improve the models' performance and generalizability. In this paper, we propose to integrate pre-trained models into TSE models to address the above issue. To be specific, we tailor and adapt the powerful contrastive language-audio pre-trained model (CLAP) for USS, denoted as CLAPSep. CLAPSep also accepts flexible user inputs, taking both positive and negative user prompts of uni- and/or multi-modalities for target sound extraction. These key features of CLAPSep can not only enhance the extraction performance but also improve the versatility of its application. We provide extensive experiments on 5 diverse datasets to demonstrate the superior performance and zero- and few-shot generalizability of our proposed CLAPSep with fast training convergence, surpassing previous methods by a significant margin. Full codes and some audio examples are released for reproduction and evaluation.
- [57] arXiv:2403.14179 (replaced) [pdf, ps, other]
-
Title: AdaProj: Adaptively Scaled Angular Margin Subspace Projections for Anomalous Sound Detection with Auxiliary Classification TasksSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
The state-of-the-art approach for semi-supervised anomalous sound detection is to first learn an embedding space by using auxiliary classification tasks based on meta information or self-supervised learning and then estimate the distribution of normal data. In this work, AdaProj a novel loss function for training the embedding model is presented. In contrast to commonly used angular margin losses, which project data of each class as close as possible to their corresponding class centers, AdaProj learns to project data onto class-specific subspaces while still ensuring an angular margin between classes. By doing so, the resulting distributions of the embeddings belonging to normal data are not required to be as restrictive as other loss functions allowing a more detailed view on the data. In experiments conducted on the DCASE2022 and DCASE2023 anomalous sound detection datasets, it is shown that using AdaProj to learn an embedding space significantly outperforms other commonly used loss functions.
- [58] arXiv:2404.01155 (replaced) [pdf, ps, other]
-
Title: Dynamic Modeling and Stability Analysis for Repeated LVRT Process of Wind Turbine Based on Switched System TheoryComments: 10 pages, 10 figuresSubjects: Systems and Control (eess.SY)
The significant electrical distance between wind power collection points and the main grid poses challenges for weak grid-connected wind power systems. A new type of voltage oscillation phenomenon induced by repeated low voltage ride-through (LVRT) of the wind turbine has been observed, threatening the safe and stable operation of such power systems. Therefore, exploring dynamic evolution mechanisms and developing stability analysis approaches for this phenomenon have become pressing imperatives. This paper introduces switched system theory for dynamic modeling, mechanism elucidation, and stability analysis of the repeated LVRT process. Firstly, considering the external connection impedance and internal control dynamics, a novel wind turbine grid-side converter (WT-GSC) switched system model is established to quantitatively characterize the evolution dynamic and mechanism of the voltage oscillation. Subsequently, a sufficient stability criterion and index grounded in the common Lyapunov function are proposed for stability analysis and assessment of the WT-GSC switched system. Moreover, to enhance the system stability, the Sobol' global sensitivity analysis method is adopted to identify dominant parameters, which can be further optimized via the particle swarm optimization (PSO) algorithm. Finally, simulations conducted on a modified IEEE 39-bus test system verify the effectiveness of the proposed dynamic modeling and stability analysis methods.
- [59] arXiv:2404.12818 (replaced) [pdf, ps, other]
-
Title: Aggregator of Electric Vehicles Bidding in Nordic FCR-D Markets: A Chance-Constrained ProgramComments: Requires major revisionsSubjects: Systems and Control (eess.SY)
Recently, two new innovative regulations in the Nordic ancillary service markets, the P90 rule and LER classification, were introduced to make the market more attractive for flexible stochastic resources. The regulations respectively relax market requirements related to the security and volume of flexible capacity from such resources. However, this incentivizes aggregators to exploit the rules when bidding flexible capacity. Considering the Nordic ancillary service Frequency Containment Reserve - Disturbance (FCR-D), we consider an aggregator with a portfolio of Electric Vehicles (EVs) using real-life data and present an optimization model that, new to the literature, uses Joint Chance-Constraints (JCCs) for bidding its flexible capacity while adhering to the new market regulations. Using different bundle sizes within the portfolio and the approximation methods of the JCCs, ALSO-X and Conditional Value at Risk (CVaR), we show that a significant synergy effect emerges when aggregating a portfolio of EVs, especially when applying ALSO-X which exploits the rules more than CVaR. We show that EV owners can earn a significant profit when participating in the aggregator portfolio.
- [60] arXiv:2404.15786 (replaced) [pdf, ps, html, other]
-
Title: Rethinking Model Prototyping through the MedMNIST+ Dataset CollectionSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The integration of deep learning based systems in clinical practice is often impeded by challenges rooted in limited and heterogeneous medical datasets. In addition, prioritization of marginal performance improvements on a few, narrowly scoped benchmarks over clinical applicability has slowed down meaningful algorithmic progress. This trend often results in excessive fine-tuning of existing methods to achieve state-of-the-art performance on selected datasets rather than fostering clinically relevant innovations. In response, this work presents a comprehensive benchmark for the MedMNIST+ database to diversify the evaluation landscape and conduct a thorough analysis of common convolutional neural networks (CNNs) and Transformer-based architectures, for medical image classification. Our evaluation encompasses various medical datasets, training methodologies, and input resolutions, aiming to reassess the strengths and limitations of widely used model variants. Our findings suggest that computationally efficient training schemes and modern foundation models hold promise in bridging the gap between expensive end-to-end training and more resource-refined approaches. Additionally, contrary to prevailing assumptions, we observe that higher resolutions may not consistently improve performance beyond a certain threshold, advocating for the use of lower resolutions, particularly in prototyping stages, to expedite processing. Notably, our analysis reaffirms the competitiveness of convolutional models compared to ViT-based architectures emphasizing the importance of comprehending the intrinsic capabilities of different model architectures. Moreover, we hope that our standardized evaluation framework will help enhance transparency, reproducibility, and comparability on the MedMNIST+ dataset collection as well as future research within the field. Code is available at this https URL .
- [61] arXiv:2404.18210 (replaced) [pdf, ps, other]
-
Title: Distributed Dissipativity-Based Controller and Topology Co-Design for DC MicrogridsSubjects: Systems and Control (eess.SY)
This paper presents a new distributed control approach for the voltage regulation problem in DC microgrids (MGs) comprised of interconnected distributed generators (DGs), distribution lines, and loads. First, we describe the closed-loop DC MG with a distributed controller as a networked system in which the DGs and lines are two subsystems interconnected via a static interconnection matrix. This matrix demonstrates the distributed controller gains as well as the communication topology of DC MG. To design the distributed controller gains, we use the dissipative properties of the subsystems and formulate a global linear matrix inequality (LMI) problem. To support the feasibility of global control design, we propose a local controller for each DG and line subsystem by formulating a local LMI problem. In contrast to existing controllers that segregate communication topology and controller gains, our proposed controller simultaneously designs both communication topology and distributed controllers. The proposed controller is also compositional, meaning that when a new subsystem is added to or removed from networked systems, the controller for the new subsystem is designed solely based on the dynamics of the new subsystem and the dissipativity information of its coupled subsystems. This ensures that the overall system remains stable during plug-and-play (PnP) operation.
- [62] arXiv:2404.18501 (replaced) [pdf, ps, other]
-
Title: Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech from the audio mixture given auxiliary visual cues. Previous methods usually search for the target voice through speech-lip synchronization. However, this strategy mainly focuses on the existence of target speech, while ignoring the variations of the noise characteristics. That may result in extracting noisy signals from the incorrect sound source in challenging acoustic situations. To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction. By estimating and utilizing the undesired noisy signal through this mechanism, we design an AV-TSE framework named Subtraction-and-ExtrAction network (SEANet) to suppress the noisy signals. We conduct abundant experiments by re-implementing three popular AV-TSE methods as the baselines and involving nine metrics for evaluation. The experimental results show that our proposed SEANet achieves state-of-the-art results and performs well for all five datasets. We will release the codes, the models and data logs.
- [63] arXiv:2405.04125 (replaced) [pdf, ps, other]
-
Title: Optimizing Prosumer Policies in Periodic Double Auctions Inspired by Equilibrium Analysis (Extended Version)Comments: A small typo removed - A sentence in Section 5 first paragraph is removed, since it was refer to the same extended version of the paperSubjects: Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT)
We consider a periodic double auction (PDA) wherein the main participants are wholesale suppliers and brokers representing retailers. The suppliers are represented by a composite supply curve and the brokers are represented by individual bids. Additionally, the brokers can participate in small-scale selling by placing individual asks; hence, they act as prosumers. Specifically, in a PDA, the prosumers who are net buyers have multiple opportunities to buy or sell multiple units of a commodity with the aim of minimizing the cost of buying across multiple rounds of the PDA. Formulating optimal bidding strategies for such a PDA setting involves planning across current and future rounds while considering the bidding strategies of other agents. In this work, we propose Markov perfect Nash equilibrium (MPNE) policies for a setup where multiple prosumers with knowledge of the composite supply curve compete to procure commodities. Thereafter, the MPNE policies are used to develop an algorithm called MPNE-BBS for the case wherein the prosumers need to re-construct an approximate composite supply curve using past auction information. The efficacy of the proposed algorithm is demonstrated on the PowerTAC wholesale market simulator against several baselines and state-of-the-art bidding policies.
- [64] arXiv:1005.2465 (replaced) [pdf, ps, other]
-
Title: Dichotic harmony for the musical practiceComments: 14 pages, in Russian, links addedSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
The dichotic method of hearing sound adapts in the region of musical harmony. The algorithm of the separation of the being dissonant voices into several separate groups is proposed. For an increase in the pleasantness of chords the different groups of voices are heard out through the different channels of headphones. Is created two demonstration program for PC. Keywords: music, harmony, chord, dichotic listening, dissonance, consonance, headphones, pleasantness, midi.
- [65] arXiv:2304.07106 (replaced) [pdf, ps, html, other]
-
Title: Extremum Seeking Nonlinear Regulator with Concurrent Uncertainties in Exosystems and Control DirectionsComments: 11 pages, 7 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO); Chaotic Dynamics (nlin.CD)
This paper proposes a non-adaptive control solution framework to the practical output regulation problem (PORP) for a class of nonlinear systems with uncertain parameters, unknown control directions and uncertain exosystem dynamics. The concurrence of the unknown control directions and uncertainties in both the system dynamics and the exosystem pose a significant challenge to the problem. In light of a nonlinear internal model approach, we first convert the robust PORP into a robust non-adaptive stabilization problem for the augmented system with integral Input-to-State Stable (iISS) inverse dynamics. By employing an extremum-seeking control (ESC) approach, the construction of our solution method avoids the use of Nussbaum-type gain techniques to address the robust PORP subject to unknown control directions with time-varying coefficients. The stability of the non-adaptive output regulation design is proven via a Lie bracket averaging technique where uniform ultimate boundedness of the closed-loop signals is guaranteed. As a result, both the estimation and tracking errors converge to zero exponentially, provided that the frequency of the dither signal goes to infinity. Finally, a simulation example with unknown coefficients is provided to exemplify the validity of the proposed control solution frameworks.
- [66] arXiv:2308.05384 (replaced) [pdf, ps, other]
-
Title: Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network OptimizationHongyang Du, Ruichen Zhang, Yinqiu Liu, Jiacheng Wang, Yijing Lin, Zonghang Li, Dusit Niyato, Jiawen Kang, Zehui Xiong, Shuguang Cui, Bo Ai, Haibo Zhou, Dong In KimComments: This paper has been accepted by IEEE Communications Surveys & Tutorials (COMST)Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
Generative Diffusion Models (GDMs) have emerged as a transformative force in the realm of Generative Artificial Intelligence (GenAI), demonstrating their versatility and efficacy across various applications. The ability to model complex data distributions and generate high-quality samples has made GDMs particularly effective in tasks such as image generation and reinforcement learning. Furthermore, their iterative nature, which involves a series of noise addition and denoising steps, is a powerful and unique approach to learning and generating data. This paper serves as a comprehensive tutorial on applying GDMs in network optimization tasks. We delve into the strengths of GDMs, emphasizing their wide applicability across various domains, such as vision, text, and audio generation. We detail how GDMs can be effectively harnessed to solve complex optimization problems inherent in networks. The paper first provides a basic background of GDMs and their applications in network optimization. This is followed by a series of case studies, showcasing the integration of GDMs with Deep Reinforcement Learning (DRL), incentive mechanism design, Semantic Communications (SemCom), Internet of Vehicles (IoV) networks, etc. These case studies underscore the practicality and efficacy of GDMs in real-world scenarios, offering insights into network design. We conclude with a discussion on potential future directions for GDM research and applications, providing major insights into how they can continue to shape the future of network optimization.
- [67] arXiv:2310.16985 (replaced) [pdf, ps, other]
-
Title: TinyMPC: Model-Predictive Control on Resource-Constrained MicrocontrollersComments: Accepted at ICRA 2024. Publicly available at this https URLSubjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Model-predictive control (MPC) is a powerful tool for controlling highly dynamic robotic systems subject to complex constraints. However, MPC is computationally demanding, and is often impractical to implement on small, resource-constrained robotic platforms. We present TinyMPC, a high-speed MPC solver with a low memory footprint targeting the microcontrollers common on small robots. Our approach is based on the alternating direction method of multipliers (ADMM) and leverages the structure of the MPC problem for efficiency. We demonstrate TinyMPC's effectiveness by benchmarking against the state-of-the-art solver OSQP, achieving nearly an order of magnitude speed increase, as well as through hardware experiments on a 27 gram quadrotor, demonstrating high-speed trajectory tracking and dynamic obstacle avoidance. TinyMPC is publicly available at this https URL.
- [68] arXiv:2312.02669 (replaced) [pdf, ps, html, other]
-
Title: Deep-learning-driven end-to-end metalens imagingJoonhyuk Seo, Jaegang Jo, Joohoon Kim, Joonho Kang, Chanik Kang, Seongwon Moon, Eunji Lee, Jehyeong Hong, Junsuk Rho, Haejun ChungComments: 17 pages, 7 figures, 1 tableSubjects: Optics (physics.optics); Image and Video Processing (eess.IV)
Recent advances in metasurface lenses (metalenses) have shown great potential for opening a new era in compact imaging, photography, light detection and ranging (LiDAR), and virtual reality/augmented reality (VR/AR) applications. However, the fundamental trade-off between broadband focusing efficiency and operating bandwidth limits the performance of broadband metalenses, resulting in chromatic aberration, angular aberration, and a relatively low efficiency. In this study, a deep-learning-based image restoration framework is proposed to overcome these limitations and realize end-to-end metalens imaging, thereby achieving aberration-free full-color imaging for mass-produced metalenses with 10-mm diameter. Neural-network-assisted metalens imaging achieved a high resolution comparable to that of the ground truth image.
- [69] arXiv:2401.17450 (replaced) [pdf, ps, html, other]
-
Title: Qplacer: Frequency-Aware Component Placement for Superconducting Quantum ComputersJunyao Zhang, Hanrui Wang, Qi Ding, Jiaqi Gu, Reouven Assouly, William D. Oliver, Song Han, Kenneth R. Brown, Hai "Helen" Li, Yiran ChenSubjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Systems and Control (eess.SY)
Noisy Intermediate-Scale Quantum (NISQ) computers face a critical limitation in qubit numbers, hindering their progression towards large-scale and fault-tolerant quantum computing. A significant challenge impeding scaling is crosstalk, characterized by unwanted interactions among neighboring components on quantum chips, including qubits, resonators, and substrate. We motivate a general approach to systematically resolving multifaceted crosstalks in a limited substrate area. We propose Qplacer, a frequency-aware electrostatic-based placement framework tailored for superconducting quantum computers, to alleviate crosstalk by isolating these components in spatial and frequency domains alongside compact substrate design. Qplacer commences with a frequency assigner that ensures frequency domain isolation for qubits and resonators. It then incorporates a padding strategy and resonator partitioning for layout flexibility. Central to our approach is the conceptualization of quantum components as charged particles, enabling strategic spatial isolation through a 'frequency repulsive force' concept. Our results demonstrate that Qplacer carefully crafts the physical component layout in mitigating various crosstalk impacts while maintaining a compact substrate size. On various device topologies and NISQ benchmarks, Qplacer improves fidelity by an average of 36.7x and reduces spatial violations (susceptible to crosstalk) by an average of 12.76x, compared to classical placement engines. Regarding area optimization, compared to manual designs, Qplacer can reduce the required layout area by 2.14x on average
- [70] arXiv:2403.00473 (replaced) [pdf, ps, html, other]
-
Title: Computer-Controlled 3D Freeform Surface WeavingSubjects: Graphics (cs.GR); Robotics (cs.RO); Systems and Control (eess.SY)
In this paper, we present a new computer-controlled weaving technology that enables the fabrication of woven structures in the shape of given 3D surfaces by using threads in non-traditional materials with high bending-stiffness, allowing for multiple applications with the resultant woven fabrics. A new weaving machine and a new manufacturing process are developed to realize the function of 3D surface weaving by the principle of short-row shaping. A computational solution is investigated to convert input 3D freeform surfaces into the corresponding weaving operations (indicated as W-code) to guide the operation of this system. A variety of examples using cotton threads, conductive threads and optical fibres are fabricated by our prototype system to demonstrate its functionality.
- [71] arXiv:2403.00987 (replaced) [pdf, ps, html, other]
-
Title: Composite Distributed Learning and Synchronization of Nonlinear Multi-Agent Systems with Complete Uncertain DynamicsSubjects: Multiagent Systems (cs.MA); Robotics (cs.RO); Systems and Control (eess.SY)
This paper addresses the problem of composite synchronization and learning control in a network of multi-agent robotic manipulator systems with heterogeneous nonlinear uncertainties under a leader-follower framework. A novel two-layer distributed adaptive learning control strategy is introduced, comprising a first-layer distributed cooperative estimator and a second-layer decentralized deterministic learning controller. The first layer is to facilitate each robotic agent's estimation of the leader's information. The second layer is responsible for both controlling individual robot agents to track desired reference trajectories and accurately identifying/learning their nonlinear uncertain dynamics. The proposed distributed learning control scheme represents an advancement in the existing literature due to its ability to manage robotic agents with completely uncertain dynamics including uncertain mass matrices. This allows the robotic control to be environment-independent which can be used in various settings, from underwater to space where identifying system dynamics parameters is challenging. The stability and parameter convergence of the closed-loop system are rigorously analyzed using the Lyapunov method. Numerical simulations validate the effectiveness of the proposed scheme.
- [72] arXiv:2403.01265 (replaced) [pdf, ps, other]
-
Title: Smooth Computation without Input Delay: Robust Tube-Based Model Predictive Control for Robot Manipulator PlanningComments: arXiv admin note: text overlap with arXiv:2103.09693Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
Model Predictive Control (MPC) has exhibited remarkable capabilities in optimizing objectives and meeting constraints. However, the substantial computational burden associated with solving the Optimal Control Problem (OCP) at each triggering instant introduces significant delays between state sampling and control application. These delays limit the practicality of MPC in resource-constrained systems when engaging in complex tasks. The intuition to address this issue in this paper is that by predicting the successor state, the controller can solve the OCP one time step ahead of time thus avoiding the delay of the next action. To this end, we compute deviations between real and nominal system states, predicting forthcoming real states as initial conditions for the imminent OCP solution. Anticipatory computation stores optimal control based on current nominal states, thus mitigating the delay effects. Additionally, we establish an upper bound for linearization error, effectively linearizing the nonlinear system, reducing OCP complexity, and enhancing response speed. We provide empirical validation through two numerical simulations and corresponding real-world robot tasks, demonstrating significant performance improvements and augmented response speed (up to $90\%$) resulting from the seamless integration of our proposed approach compared to conventional time-triggered MPC strategies.
- [73] arXiv:2403.11279 (replaced) [pdf, ps, html, other]
-
Title: Hybrid Feedback for Three-dimensional Convex Obstacle Avoidance (Extended version)Comments: 12 pages, 5 figures. arXiv admin note: substantial text overlap with arXiv:2304.10598Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
We propose a hybrid feedback control scheme for the autonomous robot navigation problem in three-dimensional environments with arbitrarily-shaped convex obstacles. The proposed hybrid control strategy, which consists in switching between the move-to-target mode and the obstacle-avoidance mode, guarantees global asymptotic stability of the target location in the obstacle-free workspace. We also provide a procedure for the implementation of the proposed hybrid controller in a priori unknown environments and validate its effectiveness through simulation results.
- [74] arXiv:2403.19024 (replaced) [pdf, ps, other]
-
Title: Exploiting Symmetry in Dynamics for Model-Based Reinforcement Learning with Asymmetric RewardsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Recent work in reinforcement learning has leveraged symmetries in the model to improve sample efficiency in training a policy. A commonly used simplifying assumption is that the dynamics and reward both exhibit the same symmetry. However, in many real-world environments, the dynamical model exhibits symmetry independent of the reward model: the reward may not satisfy the same symmetries as the dynamics. In this paper, we investigate scenarios where only the dynamics are assumed to exhibit symmetry, extending the scope of problems in reinforcement learning and learning in control theory where symmetry techniques can be applied. We use Cartan's moving frame method to introduce a technique for learning dynamics which, by construction, exhibit specified symmetries. We demonstrate through numerical experiments that the proposed method learns a more accurate dynamical model.
- [75] arXiv:2403.19324 (replaced) [pdf, ps, other]
-
Title: Rapid nonlinear convex guidance using a monomial methodComments: 34 pages, 16 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper introduces a framework by which the nonlinear trajectory optimization problem is posed as a path-planning problem in a space liberated of dynamics. In this space, general state constraints for continuous and impulsive control problems are encoded as linear constraints on the native overparameterized variables. This framework is enabled by nonlinear expansion in the vicinity of a reference in terms of fundamental solutions and a minimal nonlinear basis of mixed monomials in problem initial conditions. The former can be computed using state transition tensors, differential algebra, or analytic approaches, and the latter is computed analytically. Nonlinear guidance schemes are proposed taking advantage of this framework, including a successive convex programming scheme for delta-V minimizing trajectory optimization. This work enables a stable, easy to implement, and highly rapid nonlinear guidance implementation without the need for collocation or real-time integration.
- [76] arXiv:2405.01124 (replaced) [pdf, ps, html, other]
-
Title: Investigating Self-Supervised Image Denoising with DenaturationSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Statistics Theory (math.ST)
Self-supervised learning for image denoising problems in the presence of denaturation for noisy data is a crucial approach in machine learning. However, theoretical understanding of the performance of the approach that uses denatured data is lacking. To provide better understanding of the approach, in this paper, we analyze a self-supervised denoising algorithm that uses denatured data in depth through theoretical analysis and numerical experiments. Through the theoretical analysis, we discuss that the algorithm finds desired solutions to the optimization problem with the population risk, while the guarantee for the empirical risk depends on the hardness of the denoising task in terms of denaturation levels. We also conduct several experiments to investigate the performance of an extended algorithm in practice. The results indicate that the algorithm training with denatured images works, and the empirical performance aligns with the theoretical results. These results suggest several insights for further improvement of self-supervised image denoising that uses denatured data in future directions.