Search | arXiv e-print repository

Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management

Abstract: Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. Th… ▽ More Investment portfolios, central to finance, balance potential returns and risks. This paper introduces a hybrid approach combining Markowitz's portfolio theory with reinforcement learning, utilizing knowledge distillation for training agents. In particular, our proposed method, called KDD (Knowledge Distillation DDPG), consist of two training stages: supervised and reinforcement learning stages. The trained agents optimize portfolio assembly. A comparative analysis against standard financial models and AI frameworks, using metrics like returns, the Sharpe ratio, and nine evaluation indices, reveals our model's superiority. It notably achieves the highest yield and Sharpe ratio of 2.03, ensuring top profitability with the lowest risk in comparable return scenarios. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.04120 [pdf, ps, other]

Movable Antennas-Enabled Two-User Multicasting: Do We Really Need Alternating Optimization for Minimum Rate Maximization?

Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Movable antenna (MA) technology, which can reconfigure wireless channels by flexibly moving antenna positions in a specified region, has great potential for improving communication performance. In this paper, we consider a new setup of MAs-enabled multicasting, where we adopt a simple setting in which a linear MA array-enabled source (${\rm{S}}$) transmits a common message to two single-antenna us… ▽ More Movable antenna (MA) technology, which can reconfigure wireless channels by flexibly moving antenna positions in a specified region, has great potential for improving communication performance. In this paper, we consider a new setup of MAs-enabled multicasting, where we adopt a simple setting in which a linear MA array-enabled source (${\rm{S}}$) transmits a common message to two single-antenna users ${\rm{U}}_1$ and ${\rm{U}}_2$. We aim to maximize the minimum rate among these two users, by jointly optimizing the transmit beamforming and antenna positions at ${\rm{S}}$. Instead of utilizing the widely-used alternating optimization (AO) approach, we reveal, with rigorous proof, that the above two variables can be optimized separately: i) the optimal antenna positions can be firstly determined via the successive convex approximation technique, based on the rule of maximizing the correlation between ${\rm{S}}$-${\rm{U}}_1$ and ${\rm{S}}$-${\rm{U}}_2$ channels; ii) afterwards, the optimal closed-form transmit beamforming can be derived via simple arguments. Compared to AO, this new approach yields the same performance but reduces the computational complexities significantly. Moreover, it can provide insightful conclusions which are not possible with AO. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2404.16271 [pdf]

True random number generation using metastable 1T' molybdenum ditelluride

Authors: Yang Liu, Pengyu Liu, Yingyi Wen, Zihan Liang, Songwei Liu, Lekai Song, Jingfang Pei, Xiaoyue Fan, Teng Ma, Gang Wang, Shuo Gao, Kong-Pang Pun, Xiaolong Chen, Guohua Hu

Abstract: True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizatio… ▽ More True random numbers play a critical role in secure cryptography. The generation relies on a stable and readily extractable entropy source. Here, from solution-processed structurally metastable 1T' MoTe2, we prove stable output of featureless, stochastic, and yet stable conductance noise at a broad temperature (down to 15 K) with minimal power consumption (down to 0.05 micro-W). Our characterizations and statistical analysis of the characteristics of the conductance noise suggest that the noise arises from the volatility of the stochastic polarization of the underlying ferroelectric dipoles in the 1T' MoTe2. Further, as proved in our experiments and indicated by our Monte Carlo simulation, the ferroelectric dipole polarization is a reliable entropy source with the stochastic polarization persistent and stable over time. Exploiting the conductance noise, we achieve the generation of true random numbers and demonstrate their use in common cryptographic applications, for example, password generation and data encryption. Besides, particularly, we show a privacy safeguarding approach to sensitive data that can be critical for the cryptography of neural networks. We believe our work will bring insights into the understanding of the metastable 1T' MoTe2 and, more importantly, underpin its great potential in secure cryptography. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.12980 [pdf, other]

Ring-a-Pose: A Ring for Continuous Hand Pose Tracking

Authors: Tianhong Catherine Yu, Guilin Hu, Ruidong Zhang, Hyunchul Lim, Saif Mahmud, Chi-Jung Lee, Ke Li, Devansh Agarwal, Shuyang Nie, Jinseok Oh, François Guimbretière, Cheng Zhang

Abstract: We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three use… ▽ More We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three user studies with a total of 30 participants, we evaluate Ring-a-Pose's performance on pose tracking and micro-finger gesture recognition. Without collecting any training data from a user, Ring-a-Pose tracks continuous hand poses with a joint error of 14.1mm. The joint error decreases to 10.3mm for fine-tuned user-dependent models. Ring-a-Pose recognizes 7-class micro-gestures with a 90.60% and 99.27% accuracy for user-independent and user-dependent models, respectively. Furthermore, the ring exhibits promising performance when worn on any finger. Ring-a-Pose enables the future of smart rings to track and recognize hand poses using relatively low-power acoustic sensing. △ Less

Submitted 19 April, 2024; originally announced April 2024.

arXiv:2404.03395 [pdf, ps, other]

Movable Antennas-Assisted Secure Transmission Without Eavesdroppers' Instantaneous CSI

Authors: Guojie Hu, Qingqing Wu, Donghui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Movable antenna (MA) technology is highly promising for improving communication performance, due to its advantage of flexibly adjusting positions of antennas to reconfigure channel conditions. In this paper, we investigate MAs-assisted secure transmission under a legitimate transmitter Alice, a legitimate receiver Bob and multiple eavesdroppers. Specifically, we consider a practical scenario where… ▽ More Movable antenna (MA) technology is highly promising for improving communication performance, due to its advantage of flexibly adjusting positions of antennas to reconfigure channel conditions. In this paper, we investigate MAs-assisted secure transmission under a legitimate transmitter Alice, a legitimate receiver Bob and multiple eavesdroppers. Specifically, we consider a practical scenario where Alice has no any knowledge about the instantaneous non-line-of-sight component of the wiretap channel. Under this setup, we evaluate the secrecy performance by adopting the secrecy outage probability metric, the tight approximation of which is first derived by interpreting the Rician fading as a special case of Nakagami fading and concurrently exploiting the Laguerre series approximation. Then, we minimize the secrecy outage probability by jointly optimizing the transmit beamforming and positions of antennas at Alice. However, the problem is highly non-convex because the objective includes the complex incomplete gamma function. To tackle this challenge, we, for the first time, effectively approximate the inverse of the incomplete gamma function as a simple linear model. Based on this approximation, we arrive at a simplified problem with a clear structure, which can be solved via the developed alternating projected gradient ascent (APGA) algorithm. Considering the high complexity of the APGA, we further design another scheme where the zero-forcing based beamforming is adopted by Alice, and then we transform the problem into minimizing a simple function which is only related to positions of antennas at Alice.As demonstrated by simulations, our proposed schemes achieve significant performance gains compared to conventional schemes based on fixed-position antennas. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: Submitted for journal publication

arXiv:2404.00403 [pdf, other]

UniMEEC: Towards Unified Multimodal Emotion Recognition and Emotion Cause

Authors: Guimin Hu, Zhihong Zhu, Daniel Hershcovich, Hasti Seifi, Jiayuan Xie

Abstract: Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing wor… ▽ More Multimodal emotion recognition in conversation (MERC) and multimodal emotion-cause pair extraction (MECPE) has recently garnered significant attention. Emotions are the expression of affect or feelings; responses to specific events, thoughts, or situations are known as emotion causes. Both are like two sides of a coin, collectively describing human behaviors and intents. However, most existing works treat MERC and MECPE as separate tasks, which may result in potential challenges in integrating emotion and cause in real-world applications. In this paper, we propose a Unified Multimodal Emotion recognition and Emotion-Cause analysis framework (UniMEEC) to explore the causality and complementarity between emotion and emotion cause. Concretely, UniMEEC reformulates the MERC and MECPE tasks as two mask prediction problems, enhancing the interaction between emotion and cause. Meanwhile, UniMEEC shares the prompt learning among modalities for probing modality-specific knowledge from the Pre-trained model. Furthermore, we propose a task-specific hierarchical context aggregation to control the information flow to the task. Experiment results on four public benchmark datasets verify the model performance on MERC and MECPE tasks and achieve consistent improvements compared with state-of-the-art methods. △ Less

Submitted 30 March, 2024; originally announced April 2024.

arXiv:2403.18791 [pdf, other]

Object Pose Estimation via the Aggregation of Diffusion Features

Authors: Tianfu Wang, Guosheng Hu, Hongguang Wang

Abstract: Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on… ▽ More Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on the features of diffusion models, e.g. Stable Diffusion, which hold substantial potential for modeling unseen objects. Based on this analysis, we then innovatively introduce these diffusion features for object pose estimation. To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation. Our approach outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets, LM, O-LM, and T-LESS. In particular, our method achieves higher accuracy than the previous best arts on unseen objects: 98.2% vs. 93.5% on Unseen LM, 85.9% vs. 76.3% on Unseen O-LM, showing the strong generalizability of our method. Our code is released at https://github.com/Tianfu18/diff-feats-pose. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR2024

arXiv:2403.17676 [pdf]

Analysis on reservoir activation with the nonlinearity harnessed from solution-processed MoS2 devices

Authors: Songwei Liu, Yang Liu, Yingyi Wen, Jingfang Pei, Pengyu Liu, Lekai Song, Xiaoyue Fan, Wenchen Yang, Danmei Pan, Teng Ma, Yue Lin, Gang Wang, Guohua Hu

Abstract: Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational c… ▽ More Reservoir computing is a recurrent neural network that has been applied across various domains in machine learning. The implementation of reservoir computing, however, often demands heavy computations for activating the reservoir. Configuring physical reservoir networks and harnessing the nonlinearity from the underlying devices for activation is an emergent solution to address the computational challenge. Herein, we analyze the feasibility of employing the nonlinearity from solution-processed molybdenum disulfide (MoS2) devices for reservoir activation. The devices, fabricated using liquid-phase exfoliated MoS2, exhibit a high-order nonlinearity achieved by Stark modulation of the MoS2 material. We demonstrate that this nonlinearity can be fitted and employed as the activation function to facilitate reservoir computing implementation. Notably, owing to the high-order nonlinearity, the network exhibits long-term synchronization and robust generalization abilities for approximating complex dynamical systems. Given the remarkable reservoir activation capability, coupled with the scalability of the device fabrication, our findings open the possibility for the physical realization of lightweight, efficient reservoir computing for, for instance, signal classification, motion tracking, and pattern recognition of complex time series as well as secure cryptography. As an example, we show the network can be appointed to generate chaotic random numbers for secure data encryption. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.15274 [pdf]

Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review

Authors: Jinge Wang, Zien Cheng, Qiuming Yao, Li Liu, Dong Xu, Gangqing Hu

Abstract: The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in various sectors of bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programm… ▽ More The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in various sectors of bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future development. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: 19 pages, 3 Figures, 1 Table

arXiv:2403.14242 [pdf, other]

E-Syn: E-Graph Rewriting with Technology-Aware Cost Functions for Logic Synthesis

Authors: Chen Chen, Guangyu Hu, Dongsheng Zuo, Cunxi Yu, Yuzhe Ma, Hongce Zhang

Abstract: Logic synthesis plays a crucial role in the digital design flow. It has a decisive influence on the final Quality of Results (QoR) of the circuit implementations. However, existing multi-level logic optimization algorithms often employ greedy approaches with a series of local optimization steps. Each step breaks the circuit into small pieces (e.g., k-feasible cuts) and applies incremental changes… ▽ More Logic synthesis plays a crucial role in the digital design flow. It has a decisive influence on the final Quality of Results (QoR) of the circuit implementations. However, existing multi-level logic optimization algorithms often employ greedy approaches with a series of local optimization steps. Each step breaks the circuit into small pieces (e.g., k-feasible cuts) and applies incremental changes to individual pieces separately. These local optimization steps could limit the exploration space and may miss opportunities for significant improvements. To address the limitation, this paper proposes using e-graph in logic synthesis. The new workflow, named Esyn, makes use of the well-established e-graph infrastructure to efficiently perform logic rewriting. It explores a diverse set of equivalent Boolean representations while allowing technology-aware cost functions to better support delay-oriented and area-oriented logic synthesis. Experiments over a wide range of benchmark designs show our proposed logic optimization approach reaches a wider design space compared to the commonly used AIG-based logic synthesis flow. It achieves on average 15.29% delay saving in delay-oriented synthesis and 6.42% area saving for area-oriented synthesis. △ Less

Submitted 25 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by DAC 2024; Please note that this is not the final camera-ready version

arXiv:2403.11656 [pdf, other]

LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model

Authors: Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu

Abstract: Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in… ▽ More Previous work has shown that well-crafted adversarial perturbations can threaten the security of video recognition systems. Attackers can invade such models with a low query budget when the perturbations are semantic-invariant, such as StyleFool. Despite the query efficiency, the naturalness of the minutia areas still requires amelioration, since StyleFool leverages style transfer to all pixels in each frame. To close the gap, we propose LocalStyleFool, an improved black-box video adversarial attack that superimposes regional style-transfer-based perturbations on videos. Benefiting from the popularity and scalably usability of Segment Anything Model (SAM), we first extract different regions according to semantic information and then track them through the video stream to maintain the temporal consistency. Then, we add style-transfer-based perturbations to several regions selected based on the associative criterion of transfer-based gradient information and regional area. Perturbation fine adjustment is followed to make stylized videos adversarial. We demonstrate that LocalStyleFool can improve both intra-frame and inter-frame naturalness through a human-assessed survey, while maintaining competitive fooling rate and query efficiency. Successful experiments on the high-resolution dataset also showcase that scrupulous segmentation of SAM helps to improve the scalability of adversarial attacks under high-resolution data. △ Less

Submitted 27 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: Accepted to 2024 IEEE Security and Privacy Workshops (SPW)

arXiv:2403.06249 [pdf, other]

No Language is an Island: Unifying Chinese and English in Financial Large Language Models, Instruction Data, and Benchmarks

Authors: Gang Hu, Ke Qin, Chenhan Yuan, Min Peng, Alejandro Lopez-Lira, Benyou Wang, Sophia Ananiadou, Wanlong Yu, Jimin Huang, Qianqian Xie

Abstract: While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU un… ▽ More While the progression of Large Language Models (LLMs) has notably propelled financial analysis, their application has largely been confined to singular language realms, leaving untapped the potential of bilingual Chinese-English capacity. To bridge this chasm, we introduce ICE-PIXIU, seamlessly amalgamating the ICE-INTENT model and ICE-FLARE benchmark for bilingual financial analysis. ICE-PIXIU uniquely integrates a spectrum of Chinese tasks, alongside translated and original English datasets, enriching the breadth and depth of bilingual financial modeling. It provides unrestricted access to diverse model variants, a substantial compilation of diverse cross-lingual and multi-modal instruction data, and an evaluation benchmark with expert annotations, comprising 10 NLP tasks, 20 bilingual specific tasks, totaling 95k datasets. Our thorough evaluation emphasizes the advantages of incorporating these bilingual datasets, especially in translation tasks and utilizing original English data, enhancing both linguistic flexibility and analytical acuity in financial contexts. Notably, ICE-INTENT distinguishes itself by showcasing significant enhancements over conventional LLMs and existing financial LLMs in bilingual milieus, underscoring the profound impact of robust bilingual data on the accuracy and efficacy of financial NLP. △ Less

Submitted 16 April, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

Comments: 24 pages, 5 figures, 12 tables, including Appendix

arXiv:2403.04329 [pdf, other]

A mechanism-informed reinforcement learning framework for shape optimization of airfoils

Authors: Jingfeng Wang, Guanghui Hu

Abstract: In this study, we present the mechanism-informed reinforcement learning framework for airfoil shape optimization. By leveraging the twin delayed deep deterministic policy gradient algorithm for its notable stability, our approach addresses the complexities of optimizing shapes governed by fluid dynamics. The PDEs-based solver is adopted for its accuracy even when the configurations and geometries… ▽ More In this study, we present the mechanism-informed reinforcement learning framework for airfoil shape optimization. By leveraging the twin delayed deep deterministic policy gradient algorithm for its notable stability, our approach addresses the complexities of optimizing shapes governed by fluid dynamics. The PDEs-based solver is adopted for its accuracy even when the configurations and geometries are extraordinarily changed during the exploration. Dual-weighted residual-based mesh refinement strategy is applied to ensure the accurate calculation of target functionals. To streamline the iterative optimization process and handle geometric deformations, our approach integrates Laplacian smoothing, adaptive refinement, and a Bézier fitting strategy. This combination not only remits mesh tangling but also guarantees a precise manipulation of the airfoil geometry. Our neural network architecture leverages Bézier curves for efficient dimensionality reduction, thereby enhancing the learning process and ensuring the geometric accuracy of the airfoil shapes. An attention mechanism is embedded within the network to calculate potential action on the state as well. Furthermore, we have introduced different reward and penalty mechanisms tailored to the specific challenges of airfoil optimization. This algorithm is designed to support the optimization task, facilitating a more targeted and effective approach for airfoil shape optimization. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 25 pages

arXiv:2402.19344 [pdf, other]

The 6th Affective Behavior Analysis in-the-wild (ABAW) Competition

Authors: Dimitrios Kollias, Panagiotis Tzirakis, Alan Cowen, Stefanos Zafeiriou, Irene Kotsia, Alice Baird, Chris Gagne, Chunchang Shao, Guanyu Hu

Abstract: This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related bench… ▽ More This paper describes the 6th Affective Behavior Analysis in-the-wild (ABAW) Competition, which is part of the respective Workshop held in conjunction with IEEE CVPR 2024. The 6th ABAW Competition addresses contemporary challenges in understanding human emotions and behaviors, crucial for the development of human-centered technologies. In more detail, the Competition focuses on affect related benchmarking tasks and comprises of five sub-challenges: i) Valence-Arousal Estimation (the target is to estimate two continuous affect dimensions, valence and arousal), ii) Expression Recognition (the target is to recognise between the mutually exclusive classes of the 7 basic expressions and 'other'), iii) Action Unit Detection (the target is to detect 12 action units), iv) Compound Expression Recognition (the target is to recognise between the 7 mutually exclusive compound expression classes), and v) Emotional Mimicry Intensity Estimation (the target is to estimate six continuous emotion dimensions). In the paper, we present these Challenges, describe their respective datasets and challenge protocols (we outline the evaluation metrics) and present the baseline systems as well as their obtained performance. More information for the Competition can be found in: https://affective-behavior-analysis-in-the-wild.github.io/6th. △ Less

Submitted 12 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.16908 [pdf]

Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics

Authors: Lekai Song, Pengyu Liu, Jingfang Pei, Yang Liu, Songwei Liu, Shengbo Wang, Leonard W. T. Ng, Tawfique Hasan, Kong-Pang Pun, Shuo Gao, Guohua Hu

Abstract: The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech… ▽ More The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in developing lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond. △ Less

Submitted 20 March, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

arXiv:2402.16318 [pdf, other]

Gradient-Guided Modality Decoupling for Missing-Modality Robustness

Authors: Hao Wang, Shengda Luo, Guosheng Hu, Jianguo Zhang

Abstract: Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality d… ▽ More Multimodal learning with incomplete input data (missing modality) is practical and challenging. In this work, we conduct an in-depth analysis of this challenge and find that modality dominance has a significant negative impact on the model training, greatly degrading the missing modality performance. Motivated by Grad-CAM, we introduce a novel indicator, gradients, to monitor and reduce modality dominance which widely exists in the missing-modality scenario. In aid of this indicator, we present a novel Gradient-guided Modality Decoupling (GMD) method to decouple the dependency on dominating modalities. Specifically, GMD removes the conflicted gradient components from different modalities to achieve this decoupling, significantly improving the performance. In addition, to flexibly handle modal-incomplete data, we design a parameter-efficient Dynamic Sharing (DS) framework which can adaptively switch on/off the network parameters based on whether one modality is available. We conduct extensive experiments on three popular multimodal benchmarks, including BraTS 2018 for medical segmentation, CMU-MOSI, and CMU-MOSEI for sentiment analysis. The results show that our method can significantly outperform the competitors, showing the effectiveness of the proposed solutions. Our code is released here: https://github.com/HaoWang420/Gradient-guided-Modality-Decoupling. △ Less

Submitted 26 February, 2024; originally announced February 2024.

Comments: AAAI24

arXiv:2402.12659 [pdf, other]

The FinBen: An Holistic Financial Benchmark for Large Language Models

Authors: Qianqian Xie, Weiguang Han, Zhengyu Chen, Ruoyu Xiang, Xiao Zhang, Yueru He, Mengxi Xiao, Dong Li, Yongfu Dai, Duanyu Feng, Yijing Xu, Haoqiang Kang, Ziyan Kuang, Chenhan Yuan, Kailai Yang, Zheheng Luo, Tianlin Zhang, Zhiwei Liu, Guojun Xiong, Zhiyang Deng, Yuechen Jiang, Zhiyuan Yao, Haohang Li, Yangyang Yu, Gang Hu , et al. (9 additional authors not shown)

Abstract: LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of thorough evaluations and the complexity of financial tasks. This along with the rapid development of LLMs, highlights the urgent need for a systematic financial evaluation benchmark for LLMs. In this paper, we introduce FinBen, the first comprehensive open-sourced evaluat… ▽ More LLMs have transformed NLP and shown promise in various fields, yet their potential in finance is underexplored due to a lack of thorough evaluations and the complexity of financial tasks. This along with the rapid development of LLMs, highlights the urgent need for a systematic financial evaluation benchmark for LLMs. In this paper, we introduce FinBen, the first comprehensive open-sourced evaluation benchmark, specifically designed to thoroughly assess the capabilities of LLMs in the financial domain. FinBen encompasses 35 datasets across 23 financial tasks, organized into three spectrums of difficulty inspired by the Cattell-Horn-Carroll theory, to evaluate LLMs' cognitive abilities in inductive reasoning, associative memory, quantitative reasoning, crystallized intelligence, and more. Our evaluation of 15 representative LLMs, including GPT-4, ChatGPT, and the latest Gemini, reveals insights into their strengths and limitations within the financial domain. The findings indicate that GPT-4 leads in quantification, extraction, numerical reasoning, and stock trading, while Gemini shines in generation and forecasting; however, both struggle with complex extraction and forecasting, showing a clear need for targeted enhancements. Instruction tuning boosts simple task performance but falls short in improving complex reasoning and forecasting abilities. FinBen seeks to continuously evaluate LLMs in finance, fostering AI development with regular updates of tasks and models. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 19 pages, 10 figures

arXiv:2402.10186 [pdf, other]

Self-consistent Validation for Machine Learning Electronic Structure

Authors: Gengyuan Hu, Gengchen Wei, Zekun Lou, Philip H. S. Torr, Wanli Ouyang, Han-sen Zhong, Chen Lin

Abstract: Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems. Despite its potential, there is less guarantee for the model to generalize to unseen data that hinders its application in real-world scenarios. To address this issue, a technique has been proposed to estimate the accuracy of the predictions. This method integrates machine learning with self-… ▽ More Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems. Despite its potential, there is less guarantee for the model to generalize to unseen data that hinders its application in real-world scenarios. To address this issue, a technique has been proposed to estimate the accuracy of the predictions. This method integrates machine learning with self-consistent field methods to achieve both low validation cost and interpret-ability. This, in turn, enables exploration of the model's ability with active learning and instills confidence in its integration into real-world studies. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

arXiv:2402.09257 [pdf, other]

doi 10.1007/978-3-031-19833-5_17

TDViT: Temporal Dilated Video Transformer for Dense Video Tasks

Authors: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Abstract: Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i.e., predicting one result per video. However, challenges arise when adapting existing deep video models to dense video tasks, i.e., predicting one result per frame. Specifically, these models are expensive for deployment, less effective when handling redundant frames, and dif… ▽ More Deep video models, for example, 3D CNNs or video transformers, have achieved promising performance on sparse video tasks, i.e., predicting one result per video. However, challenges arise when adapting existing deep video models to dense video tasks, i.e., predicting one result per frame. Specifically, these models are expensive for deployment, less effective when handling redundant frames, and difficult to capture long-range temporal correlations. To overcome these issues, we propose a Temporal Dilated Video Transformer (TDViT) that consists of carefully designed temporal dilated transformer blocks (TDTB). TDTB can efficiently extract spatiotemporal representations and effectively alleviate the negative effect of temporal redundancy. Furthermore, by using hierarchical TDTBs, our approach obtains an exponentially expanded temporal receptive field and therefore can model long-range dynamics. Extensive experiments are conducted on two different dense video benchmarks, i.e., ImageNet VID for video object detection and YouTube VIS for video instance segmentation. Excellent experimental results demonstrate the superior efficiency, effectiveness, and compatibility of our method. The code is available at https://github.com/guanxiongsun/vfe.pytorch. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.09241 [pdf, other]

doi 10.1007/978-3-031-19833-5_1

Efficient One-stage Video Object Detection by Exploiting Temporal Consistency

Authors: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Abstract: Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we f… ▽ More Recently, one-stage detectors have achieved competitive accuracy and faster speed compared with traditional two-stage detectors on image data. However, in the field of video object detection (VOD), most existing VOD methods are still based on two-stage detectors. Moreover, directly adapting existing VOD methods to one-stage detectors introduces unaffordable computational costs. In this paper, we first analyse the computational bottlenecks of using one-stage detectors for VOD. Based on the analysis, we present a simple yet efficient framework to address the computational bottlenecks and achieve efficient one-stage VOD by exploiting the temporal consistency in video frames. Specifically, our method consists of a location-prior network to filter out background regions and a size-prior network to skip unnecessary computations on low-level feature maps for specific frames. We test our method on various modern one-stage detectors and conduct extensive experiments on the ImageNet VID dataset. Excellent experimental results demonstrate the superior effectiveness, efficiency, and compatibility of our method. The code is available at https://github.com/guanxiongsun/vfe.pytorch. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.00738 [pdf, other]

FM3Q: Factorized Multi-Agent MiniMax Q-Learning for Two-Team Zero-Sum Markov Game

Authors: Guangzheng Hu, Yuanheng Zhu, Haoran Li, Dongbin Zhao

Abstract: Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assign… ▽ More Many real-world applications involve some agents that fall into two teams, with payoffs that are equal within the same team but of opposite sign across the opponent team. The so-called two-team zero-sum Markov games (2t0sMGs) can be resolved with reinforcement learning in recent years. However, existing methods are thus inefficient in light of insufficient consideration of intra-team credit assignment, data utilization and computational intractability. In this paper, we propose the individual-global-minimax (IGMM) principle to ensure the coherence between two-team minimax behaviors and the individual greedy behaviors through Q functions in 2t0sMGs. Based on it, we present a novel multi-agent reinforcement learning framework, Factorized Multi-Agent MiniMax Q-Learning (FM3Q), which can factorize the joint minimax Q function into individual ones and iteratively solve for the IGMM-satisfied minimax Q functions for 2t0sMGs. Moreover, an online learning algorithm with neural networks is proposed to implement FM3Q and obtain the deterministic and decentralized minimax policies for two-team players. A theoretical analysis is provided to prove the convergence of FM3Q. Empirically, we use three environments to evaluate the learning efficiency and final performance of FM3Q and show its superiority on 2t0sMGs. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00068 [pdf, other]

GPT4Battery: An LLM-driven Framework for Adaptive State of Health Estimation of Raw Li-ion Batteries

Authors: Yuyuan Feng, Guosheng Hu, Zhihong Zhang

Abstract: State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH e… ▽ More State of health (SOH) is a crucial indicator for assessing the degradation level of batteries that cannot be measured directly but requires estimation. Accurate SOH estimation enhances detection, control, and feedback for Li-ion batteries, allowing for safe and efficient energy management and guiding the development of new-generation batteries. Despite the significant progress in data-driven SOH estimation, the time and resource-consuming degradation experiments for generating lifelong training data pose a challenge in establishing one large model capable of handling diverse types of Li-ion batteries, e.g., cross-chemistry, cross-manufacturer, and cross-capacity. Hence, this paper utilizes the strong generalization capability of large language model (LLM) to proposes a novel framework for adaptable SOH estimation across diverse batteries. To match the real scenario where unlabeled data sequentially arrives in use with distribution shifts, the proposed model is modified by a test-time training technique to ensure estimation accuracy even at the battery's end of life. The validation results demonstrate that the proposed framework achieves state-of-the-art accuracy on four widely recognized datasets collected from 62 batteries. Furthermore, we analyze the theoretical challenges of cross-battery estimation and provide a quantitative explanation of the effectiveness of our method. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.09923 [pdf, other]

doi 10.1609/aaai.v35i3.16365

MAMBA: Multi-level Aggregation via Memory Bank for Video Object Detection

Authors: Guanxiong Sun, Yang Hua, Guosheng Hu, Neil Robertson

Abstract: State-of-the-art video object detection methods maintain a memory structure, either a sliding window or a memory queue, to enhance the current frame using attention mechanisms. However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wi… ▽ More State-of-the-art video object detection methods maintain a memory structure, either a sliding window or a memory queue, to enhance the current frame using attention mechanisms. However, we argue that these memory structures are not efficient or sufficient because of two implied operations: (1) concatenating all features in memory for enhancement, leading to a heavy computational cost; (2) frame-wise memory updating, preventing the memory from capturing more temporal information. In this paper, we propose a multi-level aggregation architecture via memory bank called MAMBA. Specifically, our memory bank employs two novel operations to eliminate the disadvantages of existing methods: (1) light-weight key-set construction which can significantly reduce the computational cost; (2) fine-grained feature-wise updating strategy which enables our method to utilize knowledge from the whole video. To better enhance features from complementary levels, i.e., feature maps and proposals, we further propose a generalized enhancement operation (GEO) to aggregate multi-level features in a unified manner. We conduct extensive evaluations on the challenging ImageNetVID dataset. Compared with existing state-of-the-art methods, our method achieves superior performance in terms of both speed and accuracy. More remarkably, MAMBA achieves mAP of 83.7/84.6% at 12.6/9.1 FPS with ResNet-101. Code is available at https://github.com/guanxiongsun/vfe.pytorch. △ Less

Submitted 1 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: update code url https://github.com/guanxiongsun/vfe.pytorch

Journal ref: In Proceedings of the AAAI Conference on Artificial Intelligence 2021 (Vol. 35, No. 3, pp. 2620-2627)

arXiv:2312.13594 [pdf, other]

Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA

Authors: Chengen Lai, Shengli Song, Shiqi Meng, Jingyang Li, Sitong Yan, Guangneng Hu

Abstract: Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, s… ▽ More Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations. These problems reduce the faithfulness of explanations generated by models. To address the above issues, we propose a novel self-supervised \textbf{M}ulti-level \textbf{C}ontrastive \textbf{L}earning based natural language \textbf{E}xplanation model (MCLE) for VQA with semantic-level, image-level, and instance-level factual and counterfactual samples. MCLE extracts discriminative features and aligns the feature spaces from explanations with visual question and answer to generate more consistent explanations. We conduct extensive experiments, ablation analysis, and case study to demonstrate the effectiveness of our method on two VQA-NLE benchmarks. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: AAAI 2024

arXiv:2312.05763 [pdf, ps, other]

Fluid Antennas-Enabled Multiuser Uplink: A Low-Complexity Gradient Descent for Total Transmit Power Minimization

Authors: Guojie Hu, Qingqing Wu, Kui Xu, Jian Ouyang, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: We investigate multiuser uplink communication from multiple single-antenna users to a base station (BS), which is equipped with a movable-antenna (MA) array and adopts zero-forcing receivers to decode multiple signals. We aim to optimize the MAs' positions at the BS, to minimize the total transmit power of all users subject to the minimum rate requirement. After applying transformations, we show t… ▽ More We investigate multiuser uplink communication from multiple single-antenna users to a base station (BS), which is equipped with a movable-antenna (MA) array and adopts zero-forcing receivers to decode multiple signals. We aim to optimize the MAs' positions at the BS, to minimize the total transmit power of all users subject to the minimum rate requirement. After applying transformations, we show that the problem is equivalent to minimizing the sum of each eigenvalue's reciprocal of a matrix, which is a function of all MAs' positions. Subsequently, the projected gradient descent (PGD) method is utilized to find a locally optimal solution. In particular, different from the latest related work, we exploit the eigenvalue decomposition to successfully derive a closed-form gradient for the PGD, which facilitates the practical implementation greatly. We demonstrate by simulations that via careful optimization for all MAs' positions in our proposed design, the total transmit power of all users can be decreased significantly as compared to competitive benchmarks. △ Less

Submitted 8 January, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.05524 [pdf]

doi 10.1007/s11192-023-04917-w

Web of Science Core Collection's coverage expansion:The forgotten Arts & Humanities Citation Index?

Authors: Weishu Liu, Rong Ni, Guangyuan Hu

Abstract: The expansion of Web of Science Core Collection (WoSCC) over the recent years has partially accounted for the "norm" of growth of research output in many bibliometric analysis studies. However, the expansion patterns of different citation indexes may be different, which may benefit some disciplines but hinder others. Utilizing Science Citation Index Expanded (SCIE), Social Sciences Citation Index… ▽ More The expansion of Web of Science Core Collection (WoSCC) over the recent years has partially accounted for the "norm" of growth of research output in many bibliometric analysis studies. However, the expansion patterns of different citation indexes may be different, which may benefit some disciplines but hinder others. Utilizing Science Citation Index Expanded (SCIE), Social Sciences Citation Index (SSCI), and Arts & Humanities Citation Index (A&HCI), this study attempts to elaborate on WoSCC's coverage expansion patterns among these three databases from 2001 to 2020. Results show that different from SCIE/SSCI, both the annual publication volumes in the A&HCI database and all A&HCI journals have remained relatively stagnant in all document types considered scenario or have gained relatively slight increases in only citable items considered scenario. Although the number of A&HCI journals also has increased remarkably, the average journal publication volume of A&HCI journals has decreased gradually if all document types are considered or kept relatively stagnant when citable items only are considered. Besides, the A&HCI database has ceased the systematic index of individually selected items from SCIE/SSCI journals since 2018. The study finally discusses the possible causes and consequences of the unbalanced expansion of WoSCC's different citation indexes. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: Forthcoming in Scientometrics

Journal ref: Scientometrics 129, 933-955 (2024)

arXiv:2311.11814 [pdf, ps, other]

Movable-Antenna-Array-Enabled Communications with CoMP Reception

Authors: Guojie Hu, Qingqing Wu, Jian Ouyang, Kui Xu, Yunlong Cai, Naofal Al-Dhahir

Abstract: We consider the movable-antenna (MA) arrayenabled wireless communication with coordinate multi-point (CoMP) reception, where multiple destinations adopt the maximal ratio combination technique to jointly decode the common message sent from the transmitter equipped with the MA array. Our goal is to maximize the effective received signal-to-noise ratio, by jointly optimizing the transmit beamforming… ▽ More We consider the movable-antenna (MA) arrayenabled wireless communication with coordinate multi-point (CoMP) reception, where multiple destinations adopt the maximal ratio combination technique to jointly decode the common message sent from the transmitter equipped with the MA array. Our goal is to maximize the effective received signal-to-noise ratio, by jointly optimizing the transmit beamforming and the positions of the MA array. Although the formulated problem is highly non-convex, we reveal that it is fundamental to maximize the principal eigenvalue of a hermite channel matrix which is a function of the positions of the MA array. The corresponding sub-problem is still non-convex, for which we develop a computationally efficient algorithm. Afterwards, the optimal transmit beamforming is determined with a closed-form solution. In addition, the theoretical performance upper bound is analyzed. Since the MA array brings an additional spatial degree of freedom by flexibly adjusting all antennas' positions, it achieves significant performance gain compared to competitive benchmarks. △ Less

Submitted 25 January, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

arXiv:2311.07104 [pdf, ps, other]

Secure Wireless Communication via Movable-Antenna Array

Authors: Guojie Hu, Qingqing Wu, Kui Xu, Jiangbo Si, Naofal Al-Dhahir

Abstract: Movable antenna (MA) array is a novel technology recently developed where positions of transmit/receive antennas can be flexibly adjusted in the specified region to reconfigure the wireless channel and achieve a higher capacity. In this letter, we, for the first time, investigate the MA array-assisted physical-layer security where the confidential information is transmitted from a MA array-enabled… ▽ More Movable antenna (MA) array is a novel technology recently developed where positions of transmit/receive antennas can be flexibly adjusted in the specified region to reconfigure the wireless channel and achieve a higher capacity. In this letter, we, for the first time, investigate the MA array-assisted physical-layer security where the confidential information is transmitted from a MA array-enabled Alice to a single-antenna Bob, in the presence of multiple single-antenna and colluding eavesdroppers. We aim to maximize the achievable secrecy rate by jointly designing the transmit beamforming and positions of all antennas at Alice subject to the transmit power budget and specified regions for positions of all transmit antennas. The resulting problem is highly non-convex, for which the projected gradient ascent (PGA) and the alternating optimization methods are utilized to obtain a high-quality suboptimal solution. Simulation results demonstrate that since the additional spatial degree of freedom (DoF) can be fully exploited, the MA array significantly enhances the secrecy rate compared to the conventional fixed-position antenna (FPA) array. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.02376 [pdf, ps, other]

Intelligent Reflecting Surface-Aided Wireless Communication with Movable Elements

Authors: Guojie Hu, Qingqing Wu, Dognhui Xu, Kui Xu, Jiangbo Si, Yunlong Cai, Naofal Al-Dhahir

Abstract: Intelligent reflecting surface (IRS) has been recognized as a powerful technology for boosting communication performance. To reduce manufacturing and control costs, it is preferable to consider discrete phase shifts (DPSs) for IRS, which are set by default as uniformly distributed in the range of $[ - π,π)$ in the literature. Such setting, however, cannot achieve a desirable performance over the g… ▽ More Intelligent reflecting surface (IRS) has been recognized as a powerful technology for boosting communication performance. To reduce manufacturing and control costs, it is preferable to consider discrete phase shifts (DPSs) for IRS, which are set by default as uniformly distributed in the range of $[ - π,π)$ in the literature. Such setting, however, cannot achieve a desirable performance over the general Rician fading where the channel phase concentrates in a narrow range with a higher probability. Motivated by this drawback, we in this paper design optimal non-uniform DPSs for IRS to achieve a desirable performance level. The fundamental challenge is the \textit{possible offset in phase distribution across different cascaded source-element-destination channels}, if adopting conventional IRS where the position of each element is fixed. Such phenomenon leads to different patterns of optimal non-uniform DPSs for each IRS element and thus causes huge manufacturing costs especially when the number of IRS elements is large. Driven by the recently emerging fluid antenna system (or movable antenna technology), we demonstrate that if the position of each IRS element can be flexibly adjusted, the above phase distribution offset can be surprisingly eliminated, leading to the same pattern of DPSs for each IRS element. Armed with this, we then determine the form of unified non-uniform DPSs based on a low-complexity iterative algorithm. Simulations show that our proposed design significantly improves the system performance compared to competitive benchmarks. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2309.16172 [pdf, other]

Random and Safe Cache Architecture to Defeat Cache Timing Attacks

Authors: Guangyuan Hu, Ruby B. Lee

Abstract: Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. Cache timing attacks include non-speculative cache side and covert channel attacks and cache-based speculative execution attacks. We first present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks, which we do… ▽ More Caches have been exploited to leak secret information due to the different times they take to handle memory accesses. Cache timing attacks include non-speculative cache side and covert channel attacks and cache-based speculative execution attacks. We first present a systematic view of the attack and defense space and show that no existing defense has addressed all cache timing attacks, which we do in this paper. We propose Random and Safe (RaS) cache architectures to decorrelate cache state changes from memory requests. RaS fills the cache with ``safe'' cache lines that are likely to be used in the future, rather than with demand-fetched, security-sensitive lines. RaS lifts the restriction on cache fills for accesses that become safe when speculative execution is resolved and authorized. Our RaS-Spec design against cache-based speculative execution attacks has a low 3.8% average performance overhead. RaS+ variants against both speculative and non-speculative attacks have security-performance trade-offs ranging from 7.9% to 45.2% average overhead. △ Less

Submitted 21 April, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.08835 [pdf]

Intelligent machines work in unstructured environments by differential neuromorphic computing

Authors: Shengbo Wang, Shuo Gao, Chenyu Tang, Edoardo Occhipinti, Cong Li, Shurui Wang, Jiaqi Wang, Hubin Zhao, Guohua Hu, Arokia Nathan, Ravinder Dahiya, Luigi Occhipinti

Abstract: Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer… ▽ More Efficient operation of intelligent machines in the real world requires methods that allow them to understand and predict the uncertainties presented by the unstructured environments with good accuracy, scalability and generalization, similar to humans. Current methods rely on pretrained networks instead of continuously learning from the dynamic signal properties of working environments and suffer inherent limitations, such as data-hungry procedures, and limited generalization capabilities. Herein, we present a memristor-based differential neuromorphic computing, perceptual signal processing and learning method for intelligent machines. The main features of environmental information such as amplification (>720%) and adaptation (<50%) of mechanical stimuli encoded in memristors, are extracted to obtain human-like processing in unstructured environments. The developed method takes advantage of the intrinsic multi-state property of memristors and exhibits good scalability and generalization, as confirmed by validation in two different application scenarios: object grasping and autonomous driving. In the former, a robot hand experimentally realizes safe and stable grasping through fast learning (in ~1 ms) the unknown object features (e.g., sharp corner and smooth surface) with a single memristor. In the latter, the decision-making information of 10 unstructured environments in autonomous driving (e.g., overtaking cars, pedestrians) is accurately (94%) extracted with a 40*25 memristor array. By mimicking the intrinsic nature of human low-level perception mechanisms, the electronic memristive neuromorphic circuit-based method, presented here shows the potential for adapting to diverse sensing technologies and helping intelligent machines generate smart high-level decisions in the real world. △ Less

Submitted 17 November, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

Comments: 16 pages, 5 figures

arXiv:2308.02815 [pdf, other]

The changing rule of human bone density with aging based on a novel definition and mensuration of bone density with computed tomography

Authors: Linmi Tao, Ruiyang Liu, Yuanbiao Wang, Yuezhi Zhou, Li Huo, Guilan Hu, Xiangsong Zhang, Zuo-Xiang He

Abstract: Osteoporosis and fragility fractures have emerged as major public health concerns in an aging population. However, measuring age-related changes in bone density using dual-energy X-ray absorptiometry has limited personalized risk assessment due to susceptibility to interference from various factors. In this study, we propose an innovative statistical model of bone pixel distribution in fine-segmen… ▽ More Osteoporosis and fragility fractures have emerged as major public health concerns in an aging population. However, measuring age-related changes in bone density using dual-energy X-ray absorptiometry has limited personalized risk assessment due to susceptibility to interference from various factors. In this study, we propose an innovative statistical model of bone pixel distribution in fine-segmented computed tomography (CT) images, along with a novel approach to measuring bone density based on CT values of bone pixels. Our findings indicate that bone density exhibits a linear decline with age during adulthood between the ages of 39 and 80, with the rate of decline being approximately 1.6 times faster in women than in men. This contradicts the widely accepted notion that bone density starts declining in women at menopause and in men at around 50 years of age. The linearity of age-related changes provides further insights into the dynamics of the aging human body. Consequently, our findings suggest that the definition of osteoporosis by the World Health Organization should be revised to the standard deviation of age-based bone density. Furthermore, these results open up new avenues for research in bone health care and clinical investigation of osteoporosis. △ Less

Submitted 5 August, 2023; originally announced August 2023.

arXiv:2307.01995 [pdf, other]

Dynamic Feature-based Deep Reinforcement Learning for Flow Control of Circular Cylinder with Sparse Surface Pressure Sensing

Authors: Qiulei Wang, Lei Yan, Gang Hu, Wenli Chen, Bernd R. Noack

Abstract: This study proposes a self-learning algorithm for closed-loop cylinder wake control targeting lower drag and lower lift fluctuations with the additional challenge of sparse sensor information, taking deep reinforcement learning as the starting point. DRL performance is significantly improved by lifting the sensor signals to dynamic features (DF), which predict future flow states. The resulting dyn… ▽ More This study proposes a self-learning algorithm for closed-loop cylinder wake control targeting lower drag and lower lift fluctuations with the additional challenge of sparse sensor information, taking deep reinforcement learning as the starting point. DRL performance is significantly improved by lifting the sensor signals to dynamic features (DF), which predict future flow states. The resulting dynamic feature-based DRL (DF-DRL) automatically learns a feedback control in the plant without a dynamic model. Results show that the drag coefficient of the DF-DRL model is 25% less than the vanilla model based on direct sensor feedback. More importantly, using only one surface pressure sensor, DF-DRL can reduce the drag coefficient to a state-of-the-art performance of about 8% at Re = 100 and significantly mitigate lift coefficient fluctuations. Hence, DF-DRL allows the deployment of sparse sensing of the flow without degrading the control performance. This method also shows good robustness in controlling flow under higher Reynolds numbers, which reduces the drag coefficient by 32.2% and 46.55% at Re = 500 and 1000, respectively, indicating the broad applicability of the method. Since surface pressure information is more straightforward to measure in realistic scenarios than flow velocity information, this study provides a valuable reference for experimentally designing the active flow control of a circular cylinder based on wall pressure signals, which is an essential step toward further developing intelligent control in realistic multi-input multi-output (MIMO) system. △ Less

Submitted 28 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

arXiv:2306.14646 [pdf, other]

Multi-View Attention Learning for Residual Disease Prediction of Ovarian Cancer

Authors: Xiangneng Gao, Shulan Ruan, Jun Shi, Guoqing Hu, Wei Wei

Abstract: In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the im… ▽ More In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the importance of 3D image information of disease, which might brings a limited performance for residual disease prediction, especially in small-scale datasets. To this end, in this paper, we propose a novel Multi-View Attention Learning (MuVAL) method for residual disease prediction, which focuses on the comprehensive learning of 3D Computed Tomography (CT) images in a multi-view manner. Specifically, we first obtain multi-view of 3D CT images from transverse, coronal and sagittal views. To better represent the image features in a multi-view manner, we further leverage attention mechanism to help find the more relevant slices in each view. Extensive experiments on a dataset of 111 patients show that our method outperforms existing deep-learning methods. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.11027 [pdf, other]

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

Authors: Wayne Xin Zhao, Kun Zhou, Beichen Zhang, Zheng Gong, Zhipeng Chen, Yuanhang Zhou, Ji-Rong Wen, Jing Sha, Shijin Wang, Cong Liu, Guoping Hu

Abstract: Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (\eg a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose \te… ▽ More Although pre-trained language models~(PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (\eg a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose \textbf{JiuZhang~2.0}, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the \emph{cross-task knowledge sharing} to improve the model capacity in a multi-task setting. Specially, we construct a Mixture-of-Experts~(MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks. For optimizing the MoE architecture, we design \emph{multi-task continual pre-training} and \emph{multi-task fine-tuning} strategies for multi-task adaptation. These training strategies can effectively decompose the knowledge from the task data and establish the cross-task sharing via expert networks. In order to further improve the general capacity of solving different complex tasks, we leverage large language models~(LLMs) as complementary models to iteratively refine the generated solution by our PLM, via in-context learning. Extensive experiments have demonstrated the effectiveness of our model. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: Accepted by KDD 2023 ADS track, the 2.0 version of JiuZhang (arxiv:2206.06315v1)

arXiv:2306.10042 [pdf, other]

A Pairing Enhancement Approach for Aspect Sentiment Triplet Extraction

Authors: Fan Yang, Mian Zhang, Gongzhen Hu, Xiabing Zhou

Abstract: Aspect Sentiment Triplet Extraction (ASTE) aims to extract the triplet of an aspect term, an opinion term, and their corresponding sentiment polarity from the review texts. Due to the complexity of language and the existence of multiple aspect terms and opinion terms in a single sentence, current models often confuse the connections between an aspect term and the opinion term describing it. To add… ▽ More Aspect Sentiment Triplet Extraction (ASTE) aims to extract the triplet of an aspect term, an opinion term, and their corresponding sentiment polarity from the review texts. Due to the complexity of language and the existence of multiple aspect terms and opinion terms in a single sentence, current models often confuse the connections between an aspect term and the opinion term describing it. To address this issue, we propose a pairing enhancement approach for ASTE, which incorporates contrastive learning during the training stage to inject aspect-opinion pairing knowledge into the triplet extraction model. Experimental results demonstrate that our approach performs well on four ASTE datasets (i.e., 14lap, 14res, 15res and 16res) compared to several related classical and state-of-the-art triplet extraction methods. Moreover, ablation studies conduct an analysis and verify the advantage of contrastive learning over other pairing enhancement approaches. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: 12 pages, 4 figures

arXiv:2305.12389 [pdf, other]

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

Authors: Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

Abstract: Zero-shot cross-lingual information extraction(IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies based on language-universal features have shown their effectiveness and are attracting increasing attention. However, prior work has neither explored the potential of establishing interactions betwe… ▽ More Zero-shot cross-lingual information extraction(IE) aims at constructing an IE model for some low-resource target languages, given annotations exclusively in some rich-resource languages. Recent studies based on language-universal features have shown their effectiveness and are attracting increasing attention. However, prior work has neither explored the potential of establishing interactions between language-universal features and contextual representations nor incorporated features that can effectively model constituent span attributes and relationships between multiple spans. In this study, a syntax-augmented hierarchical interactive encoder (SHINE) is proposed to transfer cross-lingual IE knowledge. The proposed encoder is capable of interactively capturing complementary information between features and contextual information, to derive language-agnostic representations for various IE tasks. Concretely, a multi-level interaction network is designed to hierarchically interact the complementary information to strengthen domain adaptability. Besides, in addition to the well-studied syntax features of part-of-speech and dependency relation, a new syntax feature of constituency structure is introduced to model the constituent span information which is crucial for IE. Experiments across seven languages on three IE tasks and four benchmarks verify the effectiveness and generalization ability of the proposed method. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 15pages

arXiv:2305.09360 [pdf, other]

GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding

Authors: Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

Abstract: Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method nam… ▽ More Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention. However, existing methods on MPC understanding typically embed interlocutors and utterances into sequential information flows, or utilize only the superficial of inherent graph structures in MPCs. To this end, we present a plug-and-play and lightweight method named graph-induced fine-tuning (GIFT) which can adapt various Transformer-based pre-trained language models (PLMs) for universal MPC understanding. In detail, the full and equivalent connections among utterances in regular Transformer ignore the sparse but distinctive dependency of an utterance on another in MPCs. To distinguish different relationships between utterances, four types of edges are designed to integrate graph-induced signals into attention mechanisms to refine PLMs originally designed for processing sequential texts. We evaluate GIFT by implementing it into three PLMs, and test the performance on three downstream tasks including addressee recognition, speaker identification and response selection. Experimental results show that GIFT can significantly improve the performance of three PLMs on three downstream tasks and two benchmarks with only 4 additional parameters per encoding layer, achieving new state-of-the-art performance on MPC understanding. △ Less

Submitted 17 July, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted by ACL 2023. arXiv admin note: substantial text overlap with arXiv:2106.01541

arXiv:2305.08586 [pdf, other]

GCNSLIM: Graph Convolutional Network with Sparse Linear Methods for E-government Service Recommendation

Authors: Lingyuan Kong, Hao Ding, Guangwei Hu

Abstract: Graph Convolutional Networks have made significant strides in Collabora-tive Filtering recommendations. However, existing GCN-based CF methods are mainly based on matrix factorization and incorporate some optimization tech-niques to enhance performance, which are not enough to handle the complexities of diverse real-world recommendation scenarios. E-government service recommendation is a crucial a… ▽ More Graph Convolutional Networks have made significant strides in Collabora-tive Filtering recommendations. However, existing GCN-based CF methods are mainly based on matrix factorization and incorporate some optimization tech-niques to enhance performance, which are not enough to handle the complexities of diverse real-world recommendation scenarios. E-government service recommendation is a crucial area for recommendation re-search as it involves rigid aspects of people's lives. However, it has not received ad-equate attention in comparison to other recommendation scenarios like news and music recommendation. We empirically find that when existing GCN-based CF methods are directly applied to e-government service recommendation, they are limited by the MF framework and showing poor performance. This is because MF's equal treatment of users and items is not appropriate for scenarios where the number of users and items is unbalanced. In this work, we propose a new model, GCNSLIM, which combines GCN and sparse linear methods instead of combining GCN and MF to accommodate e-government service recommendation. In particular, GCNSLIM explicitly injects high-order collaborative signals obtained from multi-layer light graph convolutions into the item similarity matrix in the SLIM frame-work, effectively improving the recommendation accuracy. In addition, we propose two optimization measures, removing layer 0 embedding and adding nonlinear acti-vation, to further adapt to the characteristics of e-government service recommenda-tion scenarios. Furthermore, we propose a joint optimization mode to adapt to more diverse recommendation scenarios. We conduct extensive experiments on a real e-government service dataset and a common public dataset and demonstrate the ef-fectiveness of GCNSLIM in recommendation accuracy and operational performance. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.03864 [pdf, other]

SGDP: A Stream-Graph Neural Network Based Data Prefetcher

Authors: Yiyuan Yang, Rongshang Li, Qiquan Shi, Xijun Li, Gang Hu, Xing Li, Mingxuan Yuan

Abstract: Data prefetching is important for storage system optimization and access performance improvement. Traditional prefetchers work well for mining access patterns of sequential logical block address (LBA) but cannot handle complex non-sequential patterns that commonly exist in real-world applications. The state-of-the-art (SOTA) learning-based prefetchers cover more LBA accesses. However, they do not… ▽ More Data prefetching is important for storage system optimization and access performance improvement. Traditional prefetchers work well for mining access patterns of sequential logical block address (LBA) but cannot handle complex non-sequential patterns that commonly exist in real-world applications. The state-of-the-art (SOTA) learning-based prefetchers cover more LBA accesses. However, they do not adequately consider the spatial interdependencies between LBA deltas, which leads to limited performance and robustness. This paper proposes a novel Stream-Graph neural network-based Data Prefetcher (SGDP). Specifically, SGDP models LBA delta streams using a weighted directed graph structure to represent interactive relations among LBA deltas and further extracts hybrid features by graph neural networks for data prefetching. We conduct extensive experiments on eight real-world datasets. Empirical results verify that SGDP outperforms the SOTA methods in terms of the hit ratio by 6.21%, the effective prefetching ratio by 7.00%, and speeds up inference time by 3.13X on average. Besides, we generalize SGDP to different variants by different stream constructions, further expanding its application scenarios and demonstrating its robustness. SGDP offers a novel data prefetching solution and has been verified in commercial hybrid storage systems in the experimental phase. Our codes and appendix are available at https://github.com/yyysjz1997/SGDP/. △ Less

Submitted 11 October, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

Comments: Accepted by International Joint Conference on Neural Networks (IJCNN 2023)

arXiv:2303.12319 [pdf, other]

NeuronsMAE: A Novel Multi-Agent Reinforcement Learning Environment for Cooperative and Competitive Multi-Robot Tasks

Authors: Guangzheng Hu, Haoran Li, Shasha Liu, Mingjun Ma, Yuanheng Zhu, Dongbin Zhao

Abstract: Multi-agent reinforcement learning (MARL) has achieved remarkable success in various challenging problems. Meanwhile, more and more benchmarks have emerged and provided some standards to evaluate the algorithms in different fields. On the one hand, the virtual MARL environments lack knowledge of real-world tasks and actuator abilities, and on the other hand, the current task-specified multi-robot… ▽ More Multi-agent reinforcement learning (MARL) has achieved remarkable success in various challenging problems. Meanwhile, more and more benchmarks have emerged and provided some standards to evaluate the algorithms in different fields. On the one hand, the virtual MARL environments lack knowledge of real-world tasks and actuator abilities, and on the other hand, the current task-specified multi-robot platform has poor support for the generality of multi-agent reinforcement learning algorithms and lacks support for transferring from simulation to the real environment. Bridging the gap between the virtual MARL environments and the real multi-robot platform becomes the key to promoting the practicability of MARL algorithms. This paper proposes a novel MARL environment for real multi-robot tasks named NeuronsMAE (Neurons Multi-Agent Environment). This environment supports cooperative and competitive multi-robot tasks and is configured with rich parameter interfaces to study the multi-agent policy transfer from simulation to reality. With this platform, we evaluate various popular MARL algorithms and build a new MARL benchmark for multi-robot tasks. We hope that this platform will facilitate the research and application of MARL algorithms for real robot tasks. Information about the benchmark and the open-source code will be released. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.05034 [pdf, other]

Multi-Stage Coarse-to-Fine Contrastive Learning for Conversation Intent Induction

Authors: Caiyuan Chu, Ya Li, Yifan Liu, Jia-Chen Gu, Quan Liu, Yongxin Ge, Guoping Hu

Abstract: Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents… ▽ More Intent recognition is critical for task-oriented dialogue systems. However, for emerging domains and new services, it is difficult to accurately identify the key intent of a conversation due to time-consuming data annotation and comparatively poor model transferability. Therefore, the automatic induction of dialogue intention is very important for intelligent dialogue systems. This paper presents our solution to Track 2 of Intent Induction from Conversations for Task-Oriented Dialogue at the Eleventh Dialogue System Technology Challenge (DSTC11). The essence of intention clustering lies in distinguishing the representation of different dialogue utterances. The key to automatic intention induction is that, for any given set of new data, the sentence representation obtained by the model can be well distinguished from different labels. Therefore, we propose a multi-stage coarse-to-fine contrastive learning model training scheme including unsupervised contrastive learning pre-training, supervised contrastive learning pre-training, and fine-tuning with joint contrastive learning and clustering to obtain a better dialogue utterance representation model for the clustering task. In the released DSTC11 Track 2 evaluation results, our proposed system ranked first on both of the two subtasks of this Track. △ Less

Submitted 8 March, 2023; originally announced March 2023.

Comments: Ranked 1st on Track 2 at DSTC 11, Accepted by DSTC 11 Workshop

arXiv:2302.12417 [pdf, other]

Emotion Prediction Oriented method with Multiple Supervisions for Emotion-Cause Pair Extraction

Authors: Guimin Hu, Yi Zhao, Guangming Lu

Abstract: Emotion-cause pair extraction (ECPE) task aims to extract all the pairs of emotions and their causes from an unannotated emotion text. The previous works usually extract the emotion-cause pairs from two perspectives of emotion and cause. However, emotion extraction is more crucial to the ECPE task than cause extraction. Motivated by this analysis, we propose an end-to-end emotion-cause extraction… ▽ More Emotion-cause pair extraction (ECPE) task aims to extract all the pairs of emotions and their causes from an unannotated emotion text. The previous works usually extract the emotion-cause pairs from two perspectives of emotion and cause. However, emotion extraction is more crucial to the ECPE task than cause extraction. Motivated by this analysis, we propose an end-to-end emotion-cause extraction approach oriented toward emotion prediction (EPO-ECPE), aiming to fully exploit the potential of emotion prediction to enhance emotion-cause pair extraction. Considering the strong dependence between emotion prediction and emotion-cause pair extraction, we propose a synchronization mechanism to share their improvement in the training process. That is, the improvement of emotion prediction can facilitate the emotion-cause pair extraction, and then the results of emotion-cause pair extraction can also be used to improve the accuracy of emotion prediction simultaneously. For the emotion-cause pair extraction, we divide it into genuine pair supervision and fake pair supervision, where the genuine pair supervision learns from the pairs with more possibility to be emotion-cause pairs. In contrast, fake pair supervision learns from other pairs. In this way, the emotion-cause pairs can be extracted directly from the genuine pair, thereby reducing the difficulty of extraction. Experimental results show that our approach outperforms the 13 compared systems and achieves new state-of-the-art performance. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: accepted by TASLP

arXiv:2302.10756 [pdf, other]

doi 10.1109/TGRS.2023.3277973

Unsupervised Seismic Footprint Removal With Physical Prior Augmented Deep Autoencoder

Authors: Feng Qian, Yuehua Yue, Yu He, Hongtao Yu, Yingjie Zhou, Jinliang Tang, Guangmin Hu

Abstract: Seismic acquisition footprints appear as stably faint and dim structures and emerge fully spatially coherent, causing inevitable damage to useful signals during the suppression process. Various footprint removal methods, including filtering and sparse representation (SR), have been reported to attain promising results for surmounting this challenge. However, these methods, e.g., SR, rely solely on… ▽ More Seismic acquisition footprints appear as stably faint and dim structures and emerge fully spatially coherent, causing inevitable damage to useful signals during the suppression process. Various footprint removal methods, including filtering and sparse representation (SR), have been reported to attain promising results for surmounting this challenge. However, these methods, e.g., SR, rely solely on the handcrafted image priors of useful signals, which is sometimes an unreasonable demand if complex geological structures are contained in the given seismic data. As an alternative, this article proposes a footprint removal network (dubbed FR-Net) for the unsupervised suppression of acquired footprints without any assumptions regarding valuable signals. The key to the FR-Net is to design a unidirectional total variation (UTV) model for footprint acquisition according to the intrinsically directional property of noise. By strongly regularizing a deep convolutional autoencoder (DCAE) using the UTV model, our FR-Net transforms the DCAE from an entirely data-driven model to a \textcolor{black}{prior-augmented} approach, inheriting the superiority of the DCAE and our footprint model. Subsequently, the complete separation of the footprint noise and useful signals is projected in an unsupervised manner, specifically by optimizing the FR-Net via the backpropagation (BP) algorithm. We provide qualitative and quantitative evaluations conducted on three synthetic and field datasets, demonstrating that our FR-Net surpasses the previous state-of-the-art (SOTA) methods. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Report number: 2302.10756

Journal ref: IEEE Transactions on Geoscience and Remote Sensing,2023

arXiv:2302.10747 [pdf, other]

doi 10.1109/ICC45041.2023.10279434

Clustered Data Sharing for Non-IID Federated Learning over Wireless Networks

Authors: Gang Hu, Yinglei Teng, Nan Wang, F. Richard Yu

Abstract: Federated Learning (FL) is a novel distributed machine learning approach to leverage data from Internet of Things (IoT) devices while maintaining data privacy. However, the current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. To address the statistical imbalances in FL, we propose… ▽ More Federated Learning (FL) is a novel distributed machine learning approach to leverage data from Internet of Things (IoT) devices while maintaining data privacy. However, the current FL algorithms face the challenges of non-independent and identically distributed (non-IID) data, which causes high communication costs and model accuracy declines. To address the statistical imbalances in FL, we propose a clustered data sharing framework which spares the partial data from cluster heads to credible associates through device-to-device (D2D) communication. Moreover, aiming at diluting the data skew on nodes, we formulate the joint clustering and data sharing problem based on the privacy-preserving constrained graph. To tackle the serious coupling of decisions on the graph, we devise a distribution-based adaptive clustering algorithm (DACA) basing on three deductive cluster-forming conditions, which ensures the maximum yield of data sharing. The experiments show that the proposed framework facilitates FL on non-IID datasets with better convergence and model accuracy under a limited communication environment. △ Less

Submitted 1 March, 2023; v1 submitted 17 February, 2023; originally announced February 2023.

ACM Class: I.2.11; C.2.4

Journal ref: ICC 2023 - IEEE International Conference on Communications

arXiv:2302.00732 [pdf, other]

Protecting Cache States Against Both Speculative Execution Attacks and Side-channel Attacks

Authors: Guangyuan Hu, Ruby B. Lee

Abstract: Hardware caches are essential performance optimization features in modern processors to reduce the effective memory access time. Unfortunately, they are also the prime targets for attacks on computer processors because they are high-bandwidth and reliable side or covert channels for leaking secrets. Conventional cache timing attacks typically leak secret encryption keys, while recent speculative e… ▽ More Hardware caches are essential performance optimization features in modern processors to reduce the effective memory access time. Unfortunately, they are also the prime targets for attacks on computer processors because they are high-bandwidth and reliable side or covert channels for leaking secrets. Conventional cache timing attacks typically leak secret encryption keys, while recent speculative execution attacks typically leak arbitrary illegally-obtained secrets through cache timing channels. While many hardware defenses have been proposed for each class of attacks, we show that those for conventional (non-speculative) cache timing channels do not work for all speculative execution attacks, and vice versa. We maintain that a cache is not secure unless it can defend against both of these major attack classes. We propose a new methodology and framework for covering such relatively large attack surfaces to produce a Speculative and Timing Attack Resilient (STAR) cache subsystem. We use this to design two comprehensive secure cache architectures, STAR-FARR and STAR-NEWS, that have very low performance overheads of 5.6% and 6.8%, respectively. To the best of our knowledge, these are the first secure cache designs that cover both non-speculative cache side channels and cache-based speculative execution attacks. Our methodology can be used to compose and check other secure cache designs. It can also be extended to other attack classes and hardware systems. Additionally, we also highlight the intrinsic security and performance benefits of a randomized cache like a real Fully Associative cache with Random Replacement (FARR) and a lower-latency, speculation-aware version (NEWS). △ Less

Submitted 4 June, 2023; v1 submitted 1 February, 2023; originally announced February 2023.

arXiv:2301.03822 [pdf, ps, other]

Transmission Design for Active RIS-Aided Simultaneous Wireless Information and Power Transfer

Authors: Hong Ren, Zhiwei Chen, Guosheng Hu, Zhangjie Peng, Cunhua Pan, Jiangzhou Wang

Abstract: Reconfigurable intelligent surface (RIS) is a revolutionary technology to enhance both the spectral efficiency and energy efficiency of wireless communication systems. However, most of the existing contributions mainly focused on the study of passive RIS, which suffers from the ``double fading'' effect. On the other hand, active RIS, which is equipped with amplifiers, can effectively address this… ▽ More Reconfigurable intelligent surface (RIS) is a revolutionary technology to enhance both the spectral efficiency and energy efficiency of wireless communication systems. However, most of the existing contributions mainly focused on the study of passive RIS, which suffers from the ``double fading'' effect. On the other hand, active RIS, which is equipped with amplifiers, can effectively address this issue. In this paper, we propose an active RIS-aided simultaneous wireless information and power transfer (SWIPT) system. Specifically, we maximize the weighted sum rate of the information receivers, subject to the minimum power received at all energy receivers, amplification power constraint at the active RIS, and the maximum transmit power constraint at the base station (BS). By adopting alternating optimization framework, suboptimal solutions are obtained. Simulation results show that the active RIS-aided SWIPT system has higher performance gain with the same power budget. △ Less

Submitted 10 January, 2023; originally announced January 2023.

Comments: Accepted by IEEE Wireless Communications Letters. Keywords: Reconfigurable intelligent surface (RIS), active RIS, wireless information and power transfer (SWIPT)

arXiv:2301.03724 [pdf, other]

doi 10.1109/SEED51797.2021.00023

SoK: Hardware Defenses Against Speculative Execution Attacks

Authors: Guangyuan Hu, Zecheng He, Ruby Lee

Abstract: Speculative execution attacks leverage the speculative and out-of-order execution features in modern computer processors to access secret data or execute code that should not be executed. Secret information can then be leaked through a covert channel. While software patches can be installed for mitigation on existing hardware, these solutions can incur big performance overhead. Hardware mitigation… ▽ More Speculative execution attacks leverage the speculative and out-of-order execution features in modern computer processors to access secret data or execute code that should not be executed. Secret information can then be leaked through a covert channel. While software patches can be installed for mitigation on existing hardware, these solutions can incur big performance overhead. Hardware mitigation is being studied extensively by the computer architecture community. It has the benefit of preserving software compatibility and the potential for much smaller performance overhead than software solutions. This paper presents a systematization of the hardware defenses against speculative execution attacks that have been proposed. We show that speculative execution attacks consist of 6 critical attack steps. We propose defense strategies, each of which prevents a critical attack step from happening, thus preventing the attack from succeeding. We then summarize 20 hardware defenses and overhead-reducing features that have been proposed. We show that each defense proposed can be classified under one of our defense strategies, which also explains why it can thwart the attack from succeeding. We discuss the scope of the defenses, their performance overhead, and the security-performance trade-offs that can be made. △ Less

Submitted 9 January, 2023; originally announced January 2023.

Comments: Accepted to 2021 International Symposium on Secure and Private Execution Environment Design (SEED)

Journal ref: 2021 International Symposium on Secure and Private Execution Environment Design (SEED), 2021, pp. 108-120

arXiv:2211.11256 [pdf, other]

UniMSE: Towards Unified Multimodal Sentiment Analysis and Emotion Recognition

Authors: Guimin Hu, Ting-En Lin, Yi Zhao, Guangming Lu, Yuchuan Wu, Yongbin Li

Abstract: Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully… ▽ More Multimodal sentiment analysis (MSA) and emotion recognition in conversation (ERC) are key research topics for computers to understand human behaviors. From a psychological perspective, emotions are the expression of affect or feelings during a short period, while sentiments are formed and held for a longer period. However, most existing works study sentiment and emotion separately and do not fully exploit the complementary knowledge behind the two. In this paper, we propose a multimodal sentiment knowledge-sharing framework (UniMSE) that unifies MSA and ERC tasks from features, labels, and models. We perform modality fusion at the syntactic and semantic levels and introduce contrastive learning between modalities and samples to better capture the difference and consistency between sentiments and emotions. Experiments on four public benchmark datasets, MOSI, MOSEI, MELD, and IEMOCAP, demonstrate the effectiveness of the proposed method and achieve consistent improvements compared with state-of-the-art methods. △ Less

Submitted 21 November, 2022; originally announced November 2022.

Comments: Accepted to EMNLP 2022 main conference

arXiv:2210.12361 [pdf]

doi 10.2147/JMDH.S417068

MS-DCANet: A Novel Segmentation Network For Multi-Modality COVID-19 Medical Images

Authors: Xiaoyu Pan, Huazheng Zhu, Jinglong Du, Guangtao Hu, Baoru Han, Yuanyuan Jia

Abstract: The Coronavirus Disease 2019 (COVID-19) pandemic has increased the public health burden and brought profound disaster to humans. For the particularity of the COVID-19 medical images with blurred boundaries, low contrast and different infection sites, some researchers have improved the accuracy by adding more complexity. Also, they overlook the complexity of lesions, which hinder their ability to c… ▽ More The Coronavirus Disease 2019 (COVID-19) pandemic has increased the public health burden and brought profound disaster to humans. For the particularity of the COVID-19 medical images with blurred boundaries, low contrast and different infection sites, some researchers have improved the accuracy by adding more complexity. Also, they overlook the complexity of lesions, which hinder their ability to capture the relationship between segmentation sites and the background, as well as the edge contours and global context. However, increasing the computational complexity, parameters and inference speed is unfavorable for model transfer from laboratory to clinic. A perfect segmentation network needs to balance the above three factors completely. To solve the above issues, this paper propose a symmetric automatic segmentation framework named MS-DCANet. We introduce Tokenized MLP block, a novel attention scheme that use a shift-window mechanism to conditionally fuse local and global features to get more continuous boundaries and spatial positioning capabilities. It has greater understanding of irregular lesions contours. MS-DCANet also uses several Dual Channel blocks and a Res-ASPP block to improve the ability to recognize small targets. On multi-modality COVID-19 tasks, MS-DCANet achieved state-of-the-art performance compared with other baselines. It can well trade off the accuracy and complexity. To prove the strong generalization ability of our proposed model, we apply it to other tasks (ISIC 2018 and BAA) and achieve satisfactory results. △ Less

Submitted 19 July, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

Comments: 21pages,13 figures,9 tables

Journal ref: J Multidiscip Healthc. 2023;16:2023-2043

Showing 1–50 of 180 results for author: Hu, G