Search | arXiv e-print repository

A Comprehensive Overview and Survey of O-RAN: Exploring Slicing-aware Architecture, Deployment Options, and Use Cases

Authors: Khurshid Alam, Mohammad Asif Habibi, Matthias Tammen, Dennis Krummacker, Walid Saad, Marco Di Renzo, Tommaso Melodia, Xavier Costa-Pérez, Mérouane Debbah, Ashutosh Dutta, Hans D. Schotten

Abstract: Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhan… ▽ More Open-radio access network (O-RAN) seeks to establish principles of openness, programmability, automation, intelligence, and hardware-software disaggregation with interoperable interfaces. It advocates for multi-vendorism and multi-stakeholderism within a cloudified and virtualized wireless infrastructure, aimed at enhancing the deployment, operation, and maintenance of RAN architecture. This enhancement promises increased flexibility, performance optimization, service innovation, energy efficiency, and cost efficiency in fifth-generation (5G), sixth-generation (6G), and future networks. One of the key features of the O-RAN architecture is its support for network slicing, which entails interaction with other slicing domains within a mobile network, notably the transport network (TN) domain and the core network (CN) domain, to realize end-to-end (E2E) network slicing. The study of this feature requires exploring the stances and contributions of diverse standards development organizations (SDOs). In this context, we note that despite the ongoing industrial deployments and standardization efforts, the research and standardization communities have yet to comprehensively address network slicing in O-RAN. To address this gap, this survey paper provides a comprehensive exploration of network slicing in O-RAN through an in-depth review of specification documents from O-RAN Alliance and research papers from leading industry and academic institutions. The paper commences with an overview of the ongoing standardization efforts and open-source contributions associated with O-RAN, subsequently delving into the latest O-RAN architecture with an emphasis on its slicing aspects. Further, the paper explores deployment scenarios for network slicing within O-RAN, examining options for the deployment and orchestration of O-RAN and TN network slice subnets... △ Less

Submitted 8 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 45 pages, 12 figures, 4 tables, submitted to the IEEE for possible publication

arXiv:2402.03058 [pdf, other]

Array Geometry-Robust Attention-Based Neural Beamformer for Moving Speakers

Authors: Marvin Tammen, Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki, Simon Doclo

Abstract: Recently, a mask-based beamformer with attention-based spatial covariance matrix aggregator (ASA) was proposed, which was demonstrated to track moving sources accurately. However, the deep neural network model used in this algorithm is limited to a specific channel configuration, requiring a different model in case a different channel permutation, channel count, or microphone array geometry is con… ▽ More Recently, a mask-based beamformer with attention-based spatial covariance matrix aggregator (ASA) was proposed, which was demonstrated to track moving sources accurately. However, the deep neural network model used in this algorithm is limited to a specific channel configuration, requiring a different model in case a different channel permutation, channel count, or microphone array geometry is considered. Addressing this limitation, in this paper, we investigate three approaches to improve the robustness of the ASA-based tracking method against such variations: incorporating random channel configurations during the training process, employing the transform-average-concatenate (TAC) method to process multi-channel input features (allowing for any channel count and enabling permutation invariance), and utilizing input features that are robust against variations of the channel configuration. Our experiments, conducted using the CHiME-3 and DEMAND datasets, demonstrate improved robustness against mismatches in channel permutations, channel counts, and microphone array geometries compared to the conventional ASA-based tracking method without compromising performance in matched conditions, suggesting that the mask-based beamformer with ASA integrating the proposed approaches has the potential to track moving sources for arbitrary microphone arrays. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2205.09017 [pdf, other]

doi 10.1109/IWAENC53105.2022.9914710

Dictionary-Based Fusion of Contact and Acoustic Microphones for Wind Noise Reduction

Authors: Marvin Tammen, Xilin Li, Simon Doclo, Lalin Theverapperuma

Abstract: In mobile speech communication applications, wind noise can lead to a severe reduction of speech quality and intelligibility. Since the performance of speech enhancement algorithms using acoustic microphones tends to substantially degrade in extremely challenging scenarios, auxiliary sensors such as contact microphones can be used. Although contact microphones offer a much lower recorded wind nois… ▽ More In mobile speech communication applications, wind noise can lead to a severe reduction of speech quality and intelligibility. Since the performance of speech enhancement algorithms using acoustic microphones tends to substantially degrade in extremely challenging scenarios, auxiliary sensors such as contact microphones can be used. Although contact microphones offer a much lower recorded wind noise level, they come at the cost of speech distortion and additional noise components. Aiming at exploiting the advantages of acoustic and contact microphones for wind noise reduction, in this paper we propose to extend conventional single-microphone dictionary-based speech enhancement approaches by simultaneously modeling the acoustic and contact microphone signals. We propose to train a single speech dictionary and two noise dictionaries and use a relative transfer function to model the relationship between the speech components at the microphones. Simulation results show that the proposed approach yields improvements in both speech quality and intelligibility compared to several baseline approaches, most notably approaches using only the contact microphones or only the acoustic microphone. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: accepted at IWAENC 22

arXiv:2205.08983 [pdf, other]

doi 10.1109/IWAENC53105.2022.9914742

Deep Multi-Frame MVDR Filtering for Binaural Noise Reduction

Authors: Marvin Tammen, Simon Doclo

Abstract: To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech… ▽ More To improve speech intelligibility and speech quality in noisy environments, binaural noise reduction algorithms for head-mounted assistive listening devices are of crucial importance. Several binaural noise reduction algorithms such as the well-known binaural minimum variance distortionless response (MVDR) beamformer have been proposed, which exploit spatial correlations of both the target speech and the noise components. Furthermore, for single-microphone scenarios, multi-frame algorithms such as the multi-frame MVDR (MFMVDR) filter have been proposed, which exploit temporal instead of spatial correlations. In this contribution, we propose a binaural extension of the MFMVDR filter, which exploits both spatial and temporal correlations. The binaural MFMVDR filters are embedded in an end-to-end deep learning framework, where the required parameters, i.e., the speech spatio-temporal correlation vectors as well as the (inverse) noise spatio-temporal covariance matrix, are estimated by temporal convolutional networks (TCNs) that are trained by minimizing the mean spectral absolute error loss function. Simulation results comprising measured binaural room impulses and diverse noise sources at signal-to-noise ratios from -5 dB to 20 dB demonstrate the advantage of utilizing the binaural MFMVDR filter structure over directly estimating the binaural multi-frame filter coefficients with TCNs. △ Less

Submitted 14 November, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: accepted at IWAENC 2022

arXiv:2106.01902 [pdf, ps, other]

Joint Multi-Channel Dereverberation and Noise Reduction Using a Unified Convolutional Beamformer With Sparse Priors

Authors: Henri Gode, Marvin Tammen, Simon Doclo

Abstract: Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech… ▽ More Recently, the convolutional weighted power minimization distortionless response (WPD) beamformer was proposed, which unifies multi-channel weighted prediction error dereverberation and minimum power distortionless response beamforming. To optimize the convolutional filter, the desired speech component is modeled with a time-varying Gaussian model, which promotes the sparsity of the desired speech component in the short-time Fourier transform domain compared to the noisy microphone signals. In this paper we generalize the convolutional WPD beamformer by using an lp-norm cost function, introducing an adjustable shape parameter which enables to control the sparsity of the desired speech component. Experiments based on the REVERB challenge dataset show that the proposed method outperforms the conventional convolutional WPD beamformer in terms of objective speech quality metrics. △ Less

Submitted 13 March, 2023; v1 submitted 3 June, 2021; originally announced June 2021.

Comments: ITG Conference on Speech Communication

arXiv:1905.08492 [pdf, other]

doi 10.1109/ICASSP40776.2020.9054196

DNN-Based Speech Presence Probability Estimation for Multi-Frame Single-Microphone Speech Enhancement

Authors: Marvin Tammen, Dörte Fischer, Bernd T. Meyer, Simon Doclo

Abstract: Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-power-distortionless-response (MFMPDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, pro… ▽ More Multi-frame approaches for single-microphone speech enhancement, e.g., the multi-frame minimum-power-distortionless-response (MFMPDR) filter, are able to exploit speech correlations across neighboring time frames. In contrast to single-frame approaches such as the Wiener gain, it has been shown that multi-frame approaches achieve a substantial noise reduction with hardly any speech distortion, provided that an accurate estimate of the correlation matrices and especially the speech interframe correlation (IFC) vector is available. Typical estimation procedures of the IFC vector require an estimate of the speech presence probability (SPP) in each time-frequency (TF) bin. In this paper, we propose to use a bi-directional long short-term memory deep neural network (DNN) to estimate the SPP for each TF bin. Aiming at achieving a robust performance, the DNN is trained for various noise types and within a large signal-to-noise-ratio range. Experimental results show that the MFMPDR in combination with the proposed data-driven SPP estimator yields an increased speech quality compared to a state-of-the-art model-based SPP estimator. Furthermore, it is confirmed that exploiting interframe correlations in the MFMPDR is beneficial when compared to the Wiener gain especially in adverse scenarios. △ Less

Submitted 14 November, 2022; v1 submitted 21 May, 2019; originally announced May 2019.

arXiv:1804.06196 [pdf, other]

Demystifying Deception Technology:A Survey

Authors: Daniel Fraunholz, Simon Duque Anton, Christoph Lipps, Daniel Reti, Daniel Krohmer, Frederic Pohl, Matthias Tammen, Hans Dieter Schotten

Abstract: Deception boosts security for systems and components by denial, deceit, misinformation, camouflage and obfuscation. In this work an extensive overview of the deception technology environment is presented. Taxonomies, theoretical backgrounds, psychological aspects as well as concepts, implementations, legal aspects and ethics are discussed and compared. Deception boosts security for systems and components by denial, deceit, misinformation, camouflage and obfuscation. In this work an extensive overview of the deception technology environment is presented. Taxonomies, theoretical backgrounds, psychological aspects as well as concepts, implementations, legal aspects and ethics are discussed and compared. △ Less

Submitted 17 April, 2018; originally announced April 2018.

Comments: 25 pages, 169 references

Showing 1–7 of 7 results for author: Tammen, M