Search | arXiv e-print repository

Smart Reaction Templating: A Graph-Based Method for Automated Molecular Dynamics Input Generation

Abstract: Accurately modeling chemical reactions in molecular dynamics simulations requires detailed pre- and post-reaction templates, often created through labor-intensive manual workflows. This work introduces a Python-based algorithm that automates the generation of reaction templates for the LAMMPS REACTION package, leveraging graph-theoretical principles and sub-graph isomorphism techniques. By represe… ▽ More Accurately modeling chemical reactions in molecular dynamics simulations requires detailed pre- and post-reaction templates, often created through labor-intensive manual workflows. This work introduces a Python-based algorithm that automates the generation of reaction templates for the LAMMPS REACTION package, leveraging graph-theoretical principles and sub-graph isomorphism techniques. By representing molecular systems as mathematical graphs, the method enables automated identification of conserved molecular domains, reaction sites, and atom mappings, significantly reducing manual effort. The algorithm was validated on three case studies: poly-addition, poly-condensation, and chain polymerization, demonstrating its ability to map conserved regions, identify reaction-initiating atoms, and resolve challenges such as symmetric reactants and indistinguishable atoms. Additionally, the generated templates were optimized for computational efficiency by retaining only essential reactive domains, ensuring scalability and consistency in high-throughput workflows for computational chemistry, materials science, and machine learning applications. Future work will focus on extending the method to mixed organic-inorganic systems, incorporating adaptive scoring mechanisms, and integrating quantum mechanical calculations to enhance its applicability. △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: 21 pages, 4 figures

arXiv:2409.02303 [pdf, other]

A Lesion-aware Edge-based Graph Neural Network for Predicting Language Ability in Patients with Post-stroke Aphasia

Authors: Zijian Chen, Maria Varkanitsa, Prakash Ishwar, Janusz Konrad, Margrit Betke, Swathi Kiran, Archana Venkataraman

Abstract: We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia. Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning module that leverages functional similarities for pr… ▽ More We propose a lesion-aware graph neural network (LEGNet) to predict language ability from resting-state fMRI (rs-fMRI) connectivity in patients with post-stroke aphasia. Our model integrates three components: an edge-based learning module that encodes functional connectivity between brain regions, a lesion encoding module, and a subgraph learning module that leverages functional similarities for prediction. We use synthetic data derived from the Human Connectome Project (HCP) for hyperparameter tuning and model pretraining. We then evaluate the performance using repeated 10-fold cross-validation on an in-house neuroimaging dataset of post-stroke aphasia. Our results demonstrate that LEGNet outperforms baseline deep learning methods in predicting language ability. LEGNet also exhibits superior generalization ability when tested on a second in-house dataset that was acquired under a slightly different neuroimaging protocol. Taken together, the results of this study highlight the potential of LEGNet in effectively learning the relationships between rs-fMRI connectivity and language ability in a patient cohort with brain lesions for improved post-stroke aphasia evaluation. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted at MICCAI 2024 International Workshop on Machine Learning in Clinical Neuroimaging (MLCN)

arXiv:2408.01878 [pdf, other]

FBINeRF: Feature-Based Integrated Recurrent Network for Pinhole and Fisheye Neural Radiance Fields

Authors: Yifan Wu, Tianyi Cheng, Peixu Xin, Janusz Konrad

Abstract: Previous studies aiming to optimize and bundle-adjust camera poses using Neural Radiance Fields (NeRFs), such as BARF and DBARF, have demonstrated impressive capabilities in 3D scene reconstruction. However, these approaches have been designed for pinhole-camera pose optimization and do not perform well under radial image distortions such as those in fisheye cameras. Furthermore, inaccurate depth… ▽ More Previous studies aiming to optimize and bundle-adjust camera poses using Neural Radiance Fields (NeRFs), such as BARF and DBARF, have demonstrated impressive capabilities in 3D scene reconstruction. However, these approaches have been designed for pinhole-camera pose optimization and do not perform well under radial image distortions such as those in fisheye cameras. Furthermore, inaccurate depth initialization in DBARF results in erroneous geometric information affecting the overall convergence and quality of results. In this paper, we propose adaptive GRUs with a flexible bundle-adjustment method adapted to radial distortions and incorporate feature-based recurrent neural networks to generate continuous novel views from fisheye datasets. Other NeRF methods for fisheye images, such as SCNeRF and OMNI-NeRF, use projected ray distance loss for distorted pose refinement, causing severe artifacts, long rendering time, and are difficult to use in downstream tasks, where the dense voxel representation generated by a NeRF method needs to be converted into a mesh representation. We also address depth initialization issues by adding MiDaS-based depth priors for pinhole images. Through extensive experiments, we demonstrate the generalization capacity of FBINeRF and show high-fidelity results for both pinhole-camera and fisheye-camera NeRFs. △ Less

Submitted 3 August, 2024; originally announced August 2024.

Comments: 18 pages

arXiv:2303.11520 [pdf, other]

doi 10.5220/0011653100003417

Estimating Distances Between People using a Single Overhead Fisheye Camera with Application to Social-Distancing Oversight

Authors: Zhangchi Lu, Mertcan Cokbas, Prakash Ishwar, Jansuz Konrad

Abstract: Unobtrusive monitoring of distances between people indoors is a useful tool in the fight against pandemics. A natural resource to accomplish this are surveillance cameras. Unlike previous distance estimation methods, we use a single, overhead, fisheye camera with wide area coverage and propose two approaches. One method leverages a geometric model of the fisheye lens, whereas the other method uses… ▽ More Unobtrusive monitoring of distances between people indoors is a useful tool in the fight against pandemics. A natural resource to accomplish this are surveillance cameras. Unlike previous distance estimation methods, we use a single, overhead, fisheye camera with wide area coverage and propose two approaches. One method leverages a geometric model of the fisheye lens, whereas the other method uses a neural network to predict the 3D-world distance from people-locations in a fisheye image. To evaluate our algorithms, we collected a first-of-its-kind dataset using single fisheye camera, that comprises a wide range of distances between people (1-58 ft) and will be made publicly available. The algorithms achieve 1-2 ft distance error and over 95% accuracy in detecting social-distance violations. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Journal ref: In Proceedings of the 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 5: VISAPP (2023), pages 528-535

arXiv:2212.11477 [pdf, other]

Spatio-Visual Fusion-Based Person Re-Identification for Overhead Fisheye Images

Authors: Mertcan Cokbas, Prakash Ishwar, Janusz Konrad

Abstract: Person re-identification (PRID) has been thoroughly researched in typical surveillance scenarios where various scenes are monitored by side-mounted, rectilinear-lens cameras. To date, few methods have been proposed for fisheye cameras mounted overhead and their performance is lacking. In order to close this performance gap, we propose a multi-feature framework for fisheye PRID where we combine dee… ▽ More Person re-identification (PRID) has been thoroughly researched in typical surveillance scenarios where various scenes are monitored by side-mounted, rectilinear-lens cameras. To date, few methods have been proposed for fisheye cameras mounted overhead and their performance is lacking. In order to close this performance gap, we propose a multi-feature framework for fisheye PRID where we combine deep-learning, color-based and location-based features by means of novel feature fusion. We evaluate the performance of our framework for various feature combinations on FRIDA, a public fisheye PRID dataset. The results demonstrate that our multi-feature approach outperforms recent appearance-based deep-learning methods by almost 18% points and location-based methods by almost 3% points in matching accuracy. We also demonstrate the potential application of the proposed PRID framework to people counting in large, crowded indoor spaces. △ Less

Submitted 25 April, 2023; v1 submitted 21 December, 2022; originally announced December 2022.

arXiv:2212.09377 [pdf, other]

doi 10.18653/v1/2022.naacl-demo.5

Flowstorm: Open-Source Platform with Hybrid Dialogue Architecture

Authors: Jan Pichl, Petr Marek, Jakub Konrád, Petr Lorenc, Ondřej Kobza, Tomáš Zajíček, Jan Šedivý

Abstract: This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structu… ▽ More This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structures with generative models. The tree structures are also used for training NLU models suitable for specific dialogue scenarios. However, the generative models are globally used across applications and extend the functionality of the dialogue trees. Moreover, the platform functionality benefits from out-of-the-box components, such as the one responsible for extracting data from utterances or working with crawled data. Additionally, it can be extended using a custom code directly in the platform. One of the essential features of the platform is the possibility to reuse the created assets across applications. There is a library of prepared assets where each developer can contribute. All of the features are available through a user-friendly visual editor. △ Less

Submitted 19 December, 2022; originally announced December 2022.

Journal ref: NAACL Demo Track (2022) 39-45

arXiv:2210.01582 [pdf, other]

FRIDA: Fisheye Re-Identification Dataset with Annotations

Authors: Mertcan Cokbas, John Bolognino, Janusz Konrad, Prakash Ishwar

Abstract: Person re-identification (PRID) from side-mounted rectilinear-lens cameras is a well-studied problem. On the other hand, PRID from overhead fisheye cameras is new and largely unstudied, primarily due to the lack of suitable image datasets. To fill this void, we introduce the "Fisheye Re-IDentification Dataset with Annotations" (FRIDA), with 240k+ bounding-box annotations of people, captured by 3 t… ▽ More Person re-identification (PRID) from side-mounted rectilinear-lens cameras is a well-studied problem. On the other hand, PRID from overhead fisheye cameras is new and largely unstudied, primarily due to the lack of suitable image datasets. To fill this void, we introduce the "Fisheye Re-IDentification Dataset with Annotations" (FRIDA), with 240k+ bounding-box annotations of people, captured by 3 time-synchronized, ceiling-mounted fisheye cameras in a large indoor space. Due to a field-of-view overlap, PRID in this case differs from a typical PRID problem, which we discuss in depth. We also evaluate the performance of 10 state-of-the-art PRID algorithms on FRIDA. We show that for 6 CNN-based algorithms, training on FRIDA boosts the performance by up to 11.64% points in mAP compared to training on a common rectilinear-camera PRID dataset. △ Less

Submitted 19 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

Comments: 8 pages

arXiv:2204.10849 [pdf, other]

Metric Learning and Adaptive Boundary for Out-of-Domain Detection

Authors: Petr Lorenc, Tommaso Gargiani, Jan Pichl, Jakub Konrád, Petr Marek, Ondřej Kobza, Jan Šedivý

Abstract: Conversational agents are usually designed for closed-world environments. Unfortunately, users can behave unexpectedly. Based on the open-world environment, we often encounter the situation that the training and test data are sampled from different distributions. Then, data from different distributions are called out-of-domain (OOD). A robust conversational agent needs to react to these OOD uttera… ▽ More Conversational agents are usually designed for closed-world environments. Unfortunately, users can behave unexpectedly. Based on the open-world environment, we often encounter the situation that the training and test data are sampled from different distributions. Then, data from different distributions are called out-of-domain (OOD). A robust conversational agent needs to react to these OOD utterances adequately. Thus, the importance of robust OOD detection is emphasized. Unfortunately, collecting OOD data is a challenging task. We have designed an OOD detection algorithm independent of OOD data that outperforms a wide range of current state-of-the-art algorithms on publicly available datasets. Our algorithm is based on a simple but efficient approach of combining metric learning with adaptive decision boundary. Furthermore, compared to other algorithms, we have found that our proposed algorithm has significantly improved OOD performance in a scenario with a lower number of classes while preserving the accuracy for in-domain (IND) classes. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted to The 27th International Conference on Natural Language & Information Systems (NLDB) 2022

arXiv:2109.07968 [pdf, other]

Alquist 4.0: Towards Social Intelligence Using Generative Models and Dialogue Personalization

Authors: Jakub Konrád, Jan Pichl, Petr Marek, Petr Lorenc, Van Duy Ta, Ondřej Kobza, Lenka Hýlová, Jan Šedivý

Abstract: The open domain-dialogue system Alquist has a goal to conduct a coherent and engaging conversation that can be considered as one of the benchmarks of social intelligence. The fourth version of the system, developed within the Alexa Prize Socialbot Grand Challenge 4, brings two main innovations. The first addresses coherence, and the second addresses the engagingness of the conversation. For innova… ▽ More The open domain-dialogue system Alquist has a goal to conduct a coherent and engaging conversation that can be considered as one of the benchmarks of social intelligence. The fourth version of the system, developed within the Alexa Prize Socialbot Grand Challenge 4, brings two main innovations. The first addresses coherence, and the second addresses the engagingness of the conversation. For innovations regarding coherence, we propose a novel hybrid approach combining hand-designed responses and a generative model. The proposed approach utilizes hand-designed dialogues, out-of-domain detection, and a neural response generator. Hand-designed dialogues walk the user through high-quality conversational flows. The out-of-domain detection recognizes that the user diverges from the predefined flow and prevents the system from producing a scripted response that might not make sense for unexpected user input. Finally, the neural response generator generates a response based on the context of the dialogue that correctly reacts to the unexpected user input and returns the dialogue to the boundaries of hand-designed dialogues. The innovations for engagement that we propose are mostly inspired by the famous exploration-exploitation dilemma. To conduct an engaging conversation with the dialogue partners, one has to learn their preferences and interests -- exploration. Moreover, to engage the partner, we have to utilize the knowledge we have already learned -- exploitation. In this work, we present the principles and inner workings of individual components of the open-domain dialogue system Alquist developed within the Alexa Prize Socialbot Grand Challenge 4 and the experiments we have conducted to evaluate them. △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: 20 pages

arXiv:2104.10454 [pdf, other]

doi 10.14712/00326585.012

Text Summarization of Czech News Articles Using Named Entities

Authors: Petr Marek, Štěpán Müller, Jakub Konrád, Petr Lorenc, Jan Pichl, Jan Šedivý

Abstract: The foundation for the research of summarization in the Czech language was laid by the work of Straka et al. (2018). They published the SumeCzech, a large Czech news-based summarization dataset, and proposed several baseline approaches. However, it is clear from the achieved results that there is a large space for improvement. In our work, we focus on the impact of named entities on the summarizat… ▽ More The foundation for the research of summarization in the Czech language was laid by the work of Straka et al. (2018). They published the SumeCzech, a large Czech news-based summarization dataset, and proposed several baseline approaches. However, it is clear from the achieved results that there is a large space for improvement. In our work, we focus on the impact of named entities on the summarization of Czech news articles. First, we annotate SumeCzech with named entities. We propose a new metric ROUGE_NE that measures the overlap of named entities between the true and generated summaries, and we show that it is still challenging for summarization systems to reach a high score in it. We propose an extractive summarization approach Named Entity Density that selects a sentence with the highest ratio between a number of entities and the length of the sentence as the summary of the article. The experiments show that the proposed approach reached results close to the solid baseline in the domain of news articles selecting the first sentence. Moreover, we demonstrate that the selected sentence reflects the style of reports concisely identifying to whom, when, where, and what happened. We propose that such a summary is beneficial in combination with the first sentence of an article in voice applications presenting news articles. We propose two abstractive summarization approaches based on Seq2Seq architecture. The first approach uses the tokens of the article. The second approach has access to the named entity annotations. The experiments show that both approaches exceed state-of-the-art results previously reported by Straka et al. (2018), with the latter achieving slightly better results on SumeCzech's out-of-domain testing set. △ Less

Submitted 21 April, 2021; originally announced April 2021.

Journal ref: The Prague Bulletin of Mathematical Linguistics 2021 116

arXiv:2101.09585 [pdf, other]

BSUV-Net 2.0: Spatio-Temporal Data Augmentations for Video-Agnostic Supervised Background Subtraction

Authors: M. Ozan Tezcan, Prakash Ishwar, Janusz Konrad

Abstract: Background subtraction (BGS) is a fundamental video processing task which is a key component of many applications. Deep learning-based supervised algorithms achieve very good perforamnce in BGS, however, most of these algorithms are optimized for either a specific video or a group of videos, and their performance decreases dramatically when applied to unseen videos. Recently, several papers addres… ▽ More Background subtraction (BGS) is a fundamental video processing task which is a key component of many applications. Deep learning-based supervised algorithms achieve very good perforamnce in BGS, however, most of these algorithms are optimized for either a specific video or a group of videos, and their performance decreases dramatically when applied to unseen videos. Recently, several papers addressed this problem and proposed video-agnostic supervised BGS algorithms. However, nearly all of the data augmentations used in these algorithms are limited to the spatial domain and do not account for temporal variations that naturally occur in video data. In this work, we introduce spatio-temporal data augmentations and apply them to one of the leading video-agnostic BGS algorithms, BSUV-Net. We also introduce a new cross-validation training and evaluation strategy for the CDNet-2014 dataset that makes it possible to fairly and easily compare the performance of various video-agnostic supervised BGS algorithms. Our new model trained using the proposed data augmentations, named BSUV-Net 2.0, significantly outperforms state-of-the-art algorithms evaluated on unseen videos of CDNet-2014. We also evaluate the cross-dataset generalization capacity of BSUV-Net 2.0 by training it solely on CDNet-2014 videos and evaluating its performance on LASIESTA dataset. Overall, BSUV-Net 2.0 provides a ~5% improvement in the F-score over state-of-the-art methods on unseen videos of CDNet-2014 and LASIESTA datasets. Furthermore, we develop a real-time variant of our model, that we call Fast BSUV-Net 2.0, whose performance is close to the state of the art. △ Less

Submitted 24 February, 2021; v1 submitted 23 January, 2021; originally announced January 2021.

arXiv:2011.09825 [pdf, ps, other]

doi 10.1007/s10579-021-09563-3

Do We Need Online NLU Tools?

Authors: Petr Lorenc, Petr Marek, Jan Pichl, Jakub Konrád, Jan Šedivý

Abstract: The intent recognition is an essential algorithm of any conversational AI application. It is responsible for the classification of an input message into meaningful classes. In many bot development platforms, we can configure the NLU pipeline. Several intent recognition services are currently available as an API, or we choose from many open-source alternatives. However, there is no comparison of in… ▽ More The intent recognition is an essential algorithm of any conversational AI application. It is responsible for the classification of an input message into meaningful classes. In many bot development platforms, we can configure the NLU pipeline. Several intent recognition services are currently available as an API, or we choose from many open-source alternatives. However, there is no comparison of intent recognition services and open-source algorithms. Many factors make the selection of the right approach to the intent recognition challenging in practice. In this paper, we suggest criteria to choose the best intent recognition algorithm for an application. We present a dataset for evaluation. Finally, we compare selected public NLU services with selected open-source algorithms for intent recognition. △ Less

Submitted 19 November, 2020; originally announced November 2020.

Comments: 8 pages, 9 tables

arXiv:2011.03261 [pdf, other]

Alquist 3.0: Alexa Prize Bot Using Conversational Knowledge Graph

Authors: Jan Pichl, Petr Marek, Jakub Konrád, Petr Lorenc, Van Duy Ta, Jan Šedivý

Abstract: The third version of the open-domain dialogue system Alquist developed within the Alexa Prize 2020 competition is designed to conduct coherent and engaging conversations on popular topics. The main novel contribution is the introduction of a system leveraging an innovative approach based on a conversational knowledge graph and adjacency pairs. The conversational knowledge graph allows the system t… ▽ More The third version of the open-domain dialogue system Alquist developed within the Alexa Prize 2020 competition is designed to conduct coherent and engaging conversations on popular topics. The main novel contribution is the introduction of a system leveraging an innovative approach based on a conversational knowledge graph and adjacency pairs. The conversational knowledge graph allows the system to utilize knowledge expressed during the dialogue in consequent turns and across conversations. Dialogue adjacency pairs divide the conversation into small conversational structures, which can be combined and allow the system to react to a wide range of user inputs flexibly. We discuss and describe Alquist's pipeline, data acquisition and processing, dialogue manager, NLG, knowledge aggregation, and a hierarchy of adjacency pairs. We present the experimental results of the individual parts of the system. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2011.03259 [pdf, other]

Alquist 2.0: Alexa Prize Socialbot Based on Sub-Dialogue Models

Authors: Jan Pichl, Petr Marek, Jakub Konrád, Martin Matulík, Jan Šedivý

Abstract: This paper presents the second version of the dialogue system named Alquist competing in Amazon Alexa Prize 2018. We introduce a system leveraging ontology-based topic structure called topic nodes. Each of the nodes consists of several sub-dialogues, and each sub-dialogue has its own LSTM-based model for dialogue management. The sub-dialogues can be triggered according to the topic hierarchy or a… ▽ More This paper presents the second version of the dialogue system named Alquist competing in Amazon Alexa Prize 2018. We introduce a system leveraging ontology-based topic structure called topic nodes. Each of the nodes consists of several sub-dialogues, and each sub-dialogue has its own LSTM-based model for dialogue management. The sub-dialogues can be triggered according to the topic hierarchy or a user intent which allows the bot to create a unique experience during each session. △ Less

Submitted 6 November, 2020; originally announced November 2020.

arXiv:2005.11623 [pdf, other]

RAPiD: Rotation-Aware People Detection in Overhead Fisheye Images

Authors: Zhihao Duan, M. Ozan Tezcan, Hayato Nakamura, Prakash Ishwar, Janusz Konrad

Abstract: Recent methods for people detection in overhead, fisheye images either use radially-aligned bounding boxes to represent people, assuming people always appear along image radius or require significant pre-/post-processing which radically increases computational complexity. In this work, we develop an end-to-end rotation-aware people detection method, named RAPiD, that detects people using arbitrari… ▽ More Recent methods for people detection in overhead, fisheye images either use radially-aligned bounding boxes to represent people, assuming people always appear along image radius or require significant pre-/post-processing which radically increases computational complexity. In this work, we develop an end-to-end rotation-aware people detection method, named RAPiD, that detects people using arbitrarily-oriented bounding boxes. Our fully-convolutional neural network directly regresses the angle of each bounding box using a periodic loss function, which accounts for angle periodicities. We have also created a new dataset with spatio-temporal annotations of rotated bounding boxes, for people detection as well as other vision tasks in overhead fisheye videos. We show that our simple, yet effective method outperforms state-of-the-art results on three fisheye-image datasets. Code and dataset are available at http://vip.bu.edu/rapid . △ Less

Submitted 23 May, 2020; originally announced May 2020.

Comments: CVPR 2020 OmniCV Workshop paper extended version

arXiv:2004.05685 [pdf, other]

Low-Resolution Overhead Thermal Tripwire for Occupancy Estimation

Authors: Mertcan Cokbas, Prakash Ishwar, Janusz Konrad

Abstract: Smart buildings use occupancy sensing for various tasks ranging from energy-efficient HVAC and lighting to space-utilization analysis and emergency response. We propose a people counting system which uses a low-resolution thermal sensor. Unlike previous people-counting systems based on thermal sensors, we use an overhead tripwire configuration at entryways to detect and track transient entries or… ▽ More Smart buildings use occupancy sensing for various tasks ranging from energy-efficient HVAC and lighting to space-utilization analysis and emergency response. We propose a people counting system which uses a low-resolution thermal sensor. Unlike previous people-counting systems based on thermal sensors, we use an overhead tripwire configuration at entryways to detect and track transient entries or exits. We develop two distinct people counting algorithms for this configuration. To evaluate our algorithms, we have collected and labeled a low-resolution thermal video dataset using the proposed system. The dataset, the first of its kind, is public and available for download. We also propose new evaluation metrics that are more suitable for systems that are subject to drift and jitter. △ Less

Submitted 5 May, 2020; v1 submitted 12 April, 2020; originally announced April 2020.

arXiv:2003.00641 [pdf, ps, other]

doi 10.1109/MLSP.2019.8918926

VAE/WGAN-Based Image Representation Learning For Pose-Preserving Seamless Identity Replacement In Facial Images

Authors: Hiroki Kawai, Jiawei Chen, Prakash Ishwar, Janusz Konrad

Abstract: We present a novel variational generative adversarial network (VGAN) based on Wasserstein loss to learn a latent representation from a face image that is invariant to identity but preserves head-pose information. This facilitates synthesis of a realistic face image with the same head pose as a given input image, but with a different identity. One application of this network is in privacy-sensitive… ▽ More We present a novel variational generative adversarial network (VGAN) based on Wasserstein loss to learn a latent representation from a face image that is invariant to identity but preserves head-pose information. This facilitates synthesis of a realistic face image with the same head pose as a given input image, but with a different identity. One application of this network is in privacy-sensitive scenarios; after identity replacement in an image, utility, such as head pose, can still be recovered. Extensive experimental validation on synthetic and real human-face image datasets performed under 3 threat scenarios confirms the ability of the proposed network to preserve head pose of the input image, mask the input identity, and synthesize a good-quality realistic face image of a desired identity. We also show that our network can be used to perform pose-preserving identity morphing and identity-preserving pose morphing. The proposed method improves over a recent state-of-the-art method in terms of quantitative metrics as well as synthesized image quality. △ Less

Submitted 1 March, 2020; originally announced March 2020.

Comments: 6 pages, 5 figures, 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)

Journal ref: 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)

arXiv:1907.11371 [pdf, other]

BSUV-Net: A Fully-Convolutional Neural Network for Background Subtraction of Unseen Videos

Authors: M. Ozan Tezcan, Prakash Ishwar, Janusz Konrad

Abstract: Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the t… ▽ More Background subtraction is a basic task in computer vision and video processing often applied as a pre-processing step for object tracking, people recognition, etc. Recently, a number of successful background-subtraction algorithms have been proposed, however nearly all of the top-performing ones are supervised. Crucially, their success relies upon the availability of some annotated frames of the test video during training. Consequently, their performance on completely "unseen" videos is undocumented in the literature. In this work, we propose a new, supervised, background-subtraction algorithm for unseen videos (BSUV-Net) based on a fully-convolutional neural network. The input to our network consists of the current frame and two background frames captured at different time scales along with their semantic segmentation maps. In order to reduce the chance of overfitting, we also introduce a new data-augmentation technique which mitigates the impact of illumination difference between the background frames and the current frame. On the CDNet-2014 dataset, BSUV-Net outperforms state-of-the-art algorithms evaluated on unseen videos in terms of several metrics including F-measure, recall and precision. △ Less

Submitted 14 January, 2020; v1 submitted 25 July, 2019; originally announced July 2019.

Comments: 10 pages

arXiv:1906.09313 [pdf, other]

A Cyclically-Trained Adversarial Network for Invariant Representation Learning

Authors: Jiawei Chen, Janusz Konrad, Prakash Ishwar

Abstract: Recent studies show that deep neural networks are vulnerable to adversarial examples which can be generated via certain types of transformations. Being robust to a desired family of adversarial attacks is then equivalent to being invariant to a family of transformations. Learning invariant representations then naturally emerges as an important goal to achieve which we explore in this paper within… ▽ More Recent studies show that deep neural networks are vulnerable to adversarial examples which can be generated via certain types of transformations. Being robust to a desired family of adversarial attacks is then equivalent to being invariant to a family of transformations. Learning invariant representations then naturally emerges as an important goal to achieve which we explore in this paper within specific application contexts. Specifically, we propose a cyclically-trained adversarial network to learn a mapping from image space to latent representation space and back such that the latent representation is invariant to a specified factor of variation (e.g., identity). The learned mapping assures that the synthesized image is not only realistic, but has the same values for unspecified factors (e.g., pose and illumination) as the original image and a desired value of the specified factor. Unlike disentangled representation learning, which requires two latent spaces, one for specified and another for unspecified factors, invariant representation learning needs only one such space. We encourage invariance to a specified factor by applying adversarial training using a variational autoencoder in the image space as opposed to the latent space. We strengthen this invariance by introducing a cyclic training process (forward and backward cycle). We also propose a new method to evaluate conditional generative networks. It compares how well different factors of variation can be predicted from the synthesized, as opposed to real, images. In quantitative terms, our approach attains state-of-the-art performance in experiments spanning three datasets with factors such as identity, pose, illumination or style. Our method produces sharp, high-quality synthetic images with little visible artefacts compared to previous approaches. △ Less

Submitted 16 April, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

arXiv:1804.06705 [pdf, other]

Alquist: The Alexa Prize Socialbot

Authors: Jan Pichl, Petr Marek, Jakub Konrád, Martin Matulík, Hoang Long Nguyen, Jan Šedivý

Abstract: This paper describes a new open domain dialogue system Alquist developed as part of the Alexa Prize competition for the Amazon Echo line of products. The Alquist dialogue system is designed to conduct a coherent and engaging conversation on popular topics. We are presenting a hybrid system combining several machine learning and rule based approaches. We discuss and describe the Alquist pipeline, d… ▽ More This paper describes a new open domain dialogue system Alquist developed as part of the Alexa Prize competition for the Amazon Echo line of products. The Alquist dialogue system is designed to conduct a coherent and engaging conversation on popular topics. We are presenting a hybrid system combining several machine learning and rule based approaches. We discuss and describe the Alquist pipeline, data acquisition, and processing, dialogue manager, NLG, knowledge aggregation and hierarchy of sub-dialogs. We present some of the experimental results. △ Less

Submitted 18 April, 2018; originally announced April 2018.

arXiv:1803.07100 [pdf, ps, other]

VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition

Authors: Jiawei Chen, Janusz Konrad, Prakash Ishwar

Abstract: Reliable facial expression recognition plays a critical role in human-machine interactions. However, most of the facial expression analysis methodologies proposed to date pay little or no attention to the protection of a user's privacy. In this paper, we propose a Privacy-Preserving Representation-Learning Variational Generative Adversarial Network (PPRL-VGAN) to learn an image representation that… ▽ More Reliable facial expression recognition plays a critical role in human-machine interactions. However, most of the facial expression analysis methodologies proposed to date pay little or no attention to the protection of a user's privacy. In this paper, we propose a Privacy-Preserving Representation-Learning Variational Generative Adversarial Network (PPRL-VGAN) to learn an image representation that is explicitly disentangled from the identity information. At the same time, this representation is discriminative from the standpoint of facial expression recognition and generative as it allows expression-equivalent face image synthesis. We evaluate the proposed model on two public datasets under various threat scenarios. Quantitative and qualitative results demonstrate that our approach strikes a balance between the preservation of privacy and data utility. We further demonstrate that our model can be effectively applied to other tasks such as expression morphing and image completion. △ Less

Submitted 7 September, 2018; v1 submitted 19 March, 2018; originally announced March 2018.

arXiv:1610.03898 [pdf, other]

Semi-Coupled Two-Stream Fusion ConvNets for Action Recognition at Extremely Low Resolutions

Authors: Jiawei Chen, Jonathan Wu, Janusz Konrad, Prakash Ishwar

Abstract: Deep convolutional neural networks (ConvNets) have been recently shown to attain state-of-the-art performance for action recognition on standard-resolution videos. However, less attention has been paid to recognition performance at extremely low resolutions (eLR) (e.g., 16 x 12 pixels). Reliable action recognition using eLR cameras would address privacy concerns in various application environments… ▽ More Deep convolutional neural networks (ConvNets) have been recently shown to attain state-of-the-art performance for action recognition on standard-resolution videos. However, less attention has been paid to recognition performance at extremely low resolutions (eLR) (e.g., 16 x 12 pixels). Reliable action recognition using eLR cameras would address privacy concerns in various application environments such as private homes, hospitals, nursing/rehabilitation facilities, etc. In this paper, we propose a semi-coupled filter-sharing network that leverages high resolution (HR) videos during training in order to assist an eLR ConvNet. We also study methods for fusing spatial and temporal ConvNets customized for eLR videos in order to take advantage of appearance and motion information. Our method outperforms state-of-the-art methods at extremely low resolutions on IXMAS (93.7%) and HMDB (29.2%) datasets. △ Less

Submitted 5 October, 2018; v1 submitted 12 October, 2016; originally announced October 2016.

arXiv:0910.2917 [pdf, ps, other]

Behavior Subtraction

Authors: P. M. Jodoin, V. Saligrama, J. Konrad

Abstract: Background subtraction has been a driving engine for many computer vision and video analytics tasks. Although its many variants exist, they all share the underlying assumption that photometric scene properties are either static or exhibit temporal stationarity. While this works in some applications, the model fails when one is interested in discovering {\it changes in scene dynamics} rather than… ▽ More Background subtraction has been a driving engine for many computer vision and video analytics tasks. Although its many variants exist, they all share the underlying assumption that photometric scene properties are either static or exhibit temporal stationarity. While this works in some applications, the model fails when one is interested in discovering {\it changes in scene dynamics} rather than those in a static background; detection of unusual pedestrian and motor traffic patterns is but one example. We propose a new model and computational framework that address this failure by considering stationary scene dynamics as a ``background'' with which observed scene dynamics are compared. Central to our approach is the concept of an {\it event}, that we define as short-term scene dynamics captured over a time window at a specific spatial location in the camera field of view. We compute events by time-aggregating motion labels, obtained by background subtraction, as well as object descriptors (e.g., object size). Subsequently, we characterize events probabilistically, but use a low-memory, low-complexity surrogates in practical implementation. Using these surrogates amounts to {\it behavior subtraction}, a new algorithm with some surprising properties. As demonstrated here, behavior subtraction is an effective tool in anomaly detection and localization. It is resilient to spurious background motion, such as one due to camera jitter, and is content-blind, i.e., it works equally well on humans, cars, animals, and other objects in both uncluttered and highly-cluttered scenes. Clearly, treating video as a collection of events rather than colored pixels opens new possibilities for video analytics. △ Less

Submitted 15 October, 2009; originally announced October 2009.

Showing 1–23 of 23 results for author: Konrád, J