Skip to main content

Showing 1–50 of 74 results for author: Roth, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16818  [pdf, other

    cs.CV

    Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals

    Authors: Oliver Hahn, Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

    Abstract: Unsupervised semantic segmentation aims to automatically partition images into semantically meaningful regions by identifying global categories within an image corpus without any form of annotation. Building upon recent advances in self-supervised representation learning, we focus on how to leverage these large pre-trained models for the downstream task of unsupervised segmentation. We present Pri… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: Code: https://github.com/visinf/primaps

  2. arXiv:2404.12330  [pdf, other

    cs.CV cs.MM

    A Perspective on Deep Vision Performance with Standard Image and Video Codecs

    Authors: Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

    Abstract: Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and requir… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

  3. arXiv:2402.13773  [pdf, other

    cs.CR

    Spatial-Domain Wireless Jamming with Reconfigurable Intelligent Surfaces

    Authors: Philipp Mackensen, Paul Staat, Stefan Roth, Aydin Sezgin, Christof Paar, Veelasha Moonsamy

    Abstract: Today, we rely heavily on the constant availability of wireless communication systems. As a result, wireless jamming continues to prevail as an imminent threat: Attackers can create deliberate radio interference to overshadow desired signals, leading to denial of service. Although the broadcast nature of radio signal propagation makes such an attack possible in the first place, it likewise poses a… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  4. arXiv:2312.14791  [pdf, other

    cs.IT

    EMF-Constrained Artificial Noise for Secrecy Rates with Stochastic Eavesdropper Channels

    Authors: Stefan Roth, Aydin Sezgin

    Abstract: An information-theoretic confidential communication is achievable if the eavesdropper has a degraded channel compared to the legitimate receiver. In wireless channels, beamforming and artificial noise can enable such confidentiality. However, only distribution knowledge of the eavesdropper channels can be assumed. Moreover, the transmission of artificial noise can lead to an increased electromagne… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  5. arXiv:2310.07706  [pdf, other

    cs.RO cs.AI

    Pixel State Value Network for Combined Prediction and Planning in Interactive Environments

    Authors: Sascha Rosbach, Stefan M. Leupold, Simon Großjohann, Stefan Roth

    Abstract: Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules.… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  6. arXiv:2308.09472  [pdf, other

    cs.CV cs.AI

    Vision Relation Transformer for Unbiased Scene Graph Generation

    Authors: Gopika Sudhakaran, Devendra Singh Dhami, Kristian Kersting, Stefan Roth

    Abstract: Recent years have seen a growing interest in Scene Graph Generation (SGG), a comprehensive visual scene understanding task that aims to predict entity relationships using a relation encoder-decoder pipeline stacked on top of an object encoder-decoder backbone. Unfortunately, current SGG methods suffer from an information loss regarding the entities local-level cues during the relation encoding pro… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

    Comments: Accepted for publication in ICCV 2023

  7. arXiv:2308.06248  [pdf, other

    cs.CV cs.LG

    FunnyBirds: A Synthetic Vision Dataset for a Part-Based Analysis of Explainable AI Methods

    Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

    Abstract: The field of explainable artificial intelligence (XAI) aims to uncover the inner workings of complex deep neural models. While being crucial for safety-critical domains, XAI inherently lacks ground-truth explanations, making its automatic evaluation an unsolved problem. We address this challenge by proposing a novel synthetic vision dataset, named FunnyBirds, and accompanying automatic evaluation… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: Accepted at ICCV 2023. Code: https://github.com/visinf/funnybirds

  8. arXiv:2305.09504  [pdf, other

    cs.CV cs.LG

    Content-Adaptive Downsampling in Convolutional Neural Networks

    Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

    Abstract: Many convolutional neural networks (CNNs) rely on progressive downsampling of their feature maps to increase the network's receptive field and decrease computational cost. However, this comes at the price of losing granularity in the feature maps, limiting the ability to correctly understand images or recover fine detail in dense prediction tasks. To address this, common practice is to replace the… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted at CVPR 2023 Workshop on Efficient Deep Learning for Computer Vision (ECV). Code: https://github.com/visinf/cad

  9. arXiv:2302.01998  [pdf, ps, other

    cs.IT eess.SY

    Integrated Communication and Control Systems: A Data Significance Perspective

    Authors: Stefan Roth, Yasemin Karacora, Christina Chaccour, Aydin Sezgin, Walid Saad

    Abstract: The interconnected smart devices and industrial internet of things devices require low-latency communication to fulfill control objectives despite limited resources. In essence, such devices have a time-critical nature but also require a highly accurate data input based on its significance. In this paper, we investigate various coordinated and distributed semantic scheduling schemes with a data si… ▽ More

    Submitted 3 February, 2023; originally announced February 2023.

  10. arXiv:2211.14005  [pdf, other

    cs.CV

    Efficient Feature Extraction for High-resolution Video Frame Interpolation

    Authors: Moritz Nottebaum, Stefan Roth, Simone Schaub-Meyer

    Abstract: Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are designed. However, when interpolating high-resolution images, e.g. at 4K, the design choices for achieving high accuracy within reasonable memory requirements are limit… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: Accepted to BMVC 2022. Code: https://github.com/visinf/fldr-vfi

  11. arXiv:2211.12209  [pdf, other

    cs.CV

    $S^2$-Flow: Joint Semantic and Style Editing of Facial Images

    Authors: Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

    Abstract: The high-quality images yielded by generative adversarial networks (GANs) have motivated investigations into their application for image editing. However, GANs are often limited in the control they provide for performing specific edits. One of the principal challenges is the entangled latent space of GANs, which is not directly suitable for performing independent and detailed edits. Recent editing… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: Accepted to BMVC 2022

  12. Optimizing the Age of Information in Mixed-Critical Wireless Communication Networks

    Authors: Robert-Jeron Reifert, Stefan Roth, Aydin Sezgin

    Abstract: Beyond fifth generation wireless communication networks (B5G) are applied in many use-cases, such as industrial control systems, smart public transport, and power grids. Those applications require innovative techniques for timely transmission and increased wireless network capacities. Hence, this paper proposes optimizing the data freshness measured by the age of information (AoI) in dense interne… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Comments: 6 pages, 5 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    Journal ref: ICC 2023 - IEEE International Conference on Communications

  13. arXiv:2208.05991  [pdf, ps, other

    cs.IT eess.SP

    Approximation-based Threshold Optimization from Single Antenna to Massive SIMO Authentication

    Authors: Stefan Roth, Aydin Sezgin, Roman Bessel, H. Vincent Poor

    Abstract: In a wireless sensor network, data from various sensors are gathered to estimate the system-state of the process system. However, adversaries aim at distorting the system-state estimate, for which they may infiltrate sensors or position additional devices in the environment. To authenticate the received process values, the integrity of the measurements from different sensors can be evaluated joint… ▽ More

    Submitted 11 August, 2022; originally announced August 2022.

  14. arXiv:2208.05788  [pdf, other

    cs.CV

    Semantic Self-adaptation: Enhancing Generalization with a Single Sample

    Authors: Sherwin Bahmani, Oliver Hahn, Eduard Zamfir, Nikita Araslanov, Daniel Cremers, Stefan Roth

    Abstract: The lack of out-of-domain generalization is a critical weakness of deep networks for semantic segmentation. Previous studies relied on the assumption of a static model, i. e., once the training process is complete, model parameters remain fixed at test time. In this work, we challenge this premise with a self-adaptive approach for semantic segmentation that adjusts the inference process to each in… ▽ More

    Submitted 13 December, 2023; v1 submitted 10 August, 2022; originally announced August 2022.

    Comments: Published in TMLR (July 2023) | OpenReview: https://openreview.net/forum?id=ILNqQhGbLx | Code: https://github.com/visinf/self-adaptive | Video: https://youtu.be/s4DG65ic0EA

  15. arXiv:2205.01813  [pdf, other

    cs.CV cs.LG

    Diverse Image Captioning with Grounded Style

    Authors: Franz Klein, Shweta Mahajan, Stefan Roth

    Abstract: Stylized image captioning as presented in prior work aims to generate captions that reflect characteristics beyond a factual description of the scene composition, such as sentiments. Such prior work relies on given sentiment identifiers, which are used to express a certain global style in the caption, e.g. positive or negative, however without taking into account the stylistic content of the visua… ▽ More

    Submitted 3 May, 2022; originally announced May 2022.

    Comments: In the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021

    Journal ref: In Proceedings of the German Conference on Pattern Recognition (GCPR), Ed. by C. Bauckhage, J. Gall, and A. G. Schwing, Vol. 13024, Lecture Notes in Computer Science, Springer, 2021, pp. 421-436

  16. Comeback Kid: Resilience for Mixed-Critical Wireless Network Resource Management

    Authors: Robert-Jeron Reifert, Stefan Roth, Alaa Alameer Ahmad, Aydin Sezgin

    Abstract: The future sixth generation (6G) of communication systems is envisioned to provide numerous applications in safety-critical contexts, e.g., driverless traffic, modular industry, and smart cities, which require outstanding performance, high reliability and fault tolerance, as well as autonomy. Ensuring criticality awareness for diverse functional safety applications and providing fault tolerance in… ▽ More

    Submitted 11 June, 2022; v1 submitted 25 April, 2022; originally announced April 2022.

    Comments: 16 pages, 13 figures. Submitted to IEEE for possible publication

    Journal ref: IEEE Transactions on Vehicular Technology, 2023

  17. Energy Efficiency in Rate-Splitting Multiple Access with Mixed Criticality

    Authors: Robert-Jeron Reifert, Stefan Roth, Alaa Alameer Ahmad, Aydin Sezgin

    Abstract: Future sixth generation (6G) wireless communication networks face the need to similarly meet unprecedented quality of service (QoS) demands while also providing a larger energy efficiency (EE) to minimize their carbon footprint. Moreover, due to the diverseness of network participants, mixed criticality QoS levels are assigned to the users of such networks. In this work, with a focus on a cloud-ra… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: 7 pages, 6 figures, 1 table. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    Journal ref: 2022 IEEE International Conference on Communications Workshops (ICC Workshops)

  18. arXiv:2201.01882  [pdf

    cs.RO

    Trust-based Symbolic Motion Planning for Multi-robot Bounding Overwatch

    Authors: Huanfei Zheng, Jonathon M. Smereka, Dariusz Mikulski, Stephanie Roth, Yue Wang

    Abstract: Multi-robot bounding overwatch requires timely coordination of robot team members. Symbolic motion planning (SMP) can provide provably correct solutions for robot motion planning with high-level temporal logic task requirements. This paper aims to develop a framework for safe and reliable SMP of multi-robot systems (MRS) to satisfy complex bounding overwatch tasks constrained by temporal logics. A… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

  19. arXiv:2112.01967  [pdf, other

    cs.CR

    IRShield: A Countermeasure Against Adversarial Physical-Layer Wireless Sensing

    Authors: Paul Staat, Simon Mulzer, Stefan Roth, Veelasha Moonsamy, Markus Heinrichs, Rainer Kronberger, Aydin Sezgin, Christof Paar

    Abstract: Wireless radio channels are known to contain information about the surrounding propagation environment, which can be extracted using established wireless sensing methods. Thus, today's ubiquitous wireless devices are attractive targets for passive eavesdroppers to launch reconnaissance attacks. In particular, by overhearing standard communication signals, eavesdroppers obtain estimations of wirele… ▽ More

    Submitted 7 April, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

  20. arXiv:2111.07668  [pdf, other

    cs.LG cs.CV

    Fast Axiomatic Attribution for Neural Networks

    Authors: Robin Hesse, Simone Schaub-Meyer, Stefan Roth

    Abstract: Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable ax… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: To appear at NeurIPS*2021. Project page and code: https://visinf.github.io/fast-axiomatic-attribution

  21. arXiv:2111.06265  [pdf, other

    cs.CV cs.LG

    Dense Unsupervised Learning for Video Segmentation

    Authors: Nikita Araslanov, Simone Schaub-Meyer, Stefan Roth

    Abstract: We present a novel approach to unsupervised learning for video object segmentation (VOS). Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime. We rely on uniform grid sampling to extract a set of anchors and train our model to disambiguate between them on both inter- and intra-video levels. However, a naive scheme to train su… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Comments: To appear at NeurIPS*2021. Code: https://github.com/visinf/dense-ulearn-vos

  22. arXiv:2110.08787  [pdf, other

    cs.CV cs.LG eess.IV

    PixelPyramids: Exact Inference Models from Lossless Image Pyramids

    Authors: Shweta Mahajan, Stefan Roth

    Abstract: Autoregressive models are a class of exact inference approaches with highly flexible functional forms, yielding state-of-the-art density estimates for natural images. Yet, the sequential ordering on the dimensions makes these models computationally expensive and limits their applicability to low-resolution imagery. In this work, we propose Pixel-Pyramids, a block-autoregressive approach employing… ▽ More

    Submitted 17 October, 2021; originally announced October 2021.

    Comments: To appear at ICCV 2021

  23. arXiv:2109.06082  [pdf, other

    cs.CL

    xGQA: Cross-Lingual Visual Question Answering

    Authors: Jonas Pfeiffer, Gregor Geigle, Aishwarya Kamath, Jan-Martin O. Steitz, Stefan Roth, Ivan Vulić, Iryna Gurevych

    Abstract: Recent advances in multimodal vision and language modeling have predominantly focused on the English language, mostly due to the lack of multilingual multimodal datasets to steer modeling efforts. In this work, we address this gap and provide xGQA, a new multilingual evaluation benchmark for the visual question answering task. We extend the established English GQA dataset to 7 typologically divers… ▽ More

    Submitted 17 March, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

    Comments: Findings of ACL 2022

  24. arXiv:2109.04422  [pdf, other

    cs.CV cs.CL

    TxT: Crossmodal End-to-End Learning with Transformers

    Authors: Jan-Martin O. Steitz, Jonas Pfeiffer, Iryna Gurevych, Stefan Roth

    Abstract: Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual r… ▽ More

    Submitted 9 September, 2021; originally announced September 2021.

    Comments: To appear at the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021

  25. arXiv:2105.02216  [pdf, other

    cs.CV cs.LG cs.RO

    Self-Supervised Multi-Frame Monocular Scene Flow

    Authors: Junhwa Hur, Stefan Roth

    Abstract: Estimating 3D scene flow from a sequence of monocular images has been gaining increased attention due to the simple, economical capture setup. Owing to the severe ill-posedness of the problem, the accuracy of current methods has been limited, especially that of efficient, real-time approaches. In this paper, we introduce a multi-frame monocular scene flow network based on self-supervised learning,… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: To appear at CVPR 2021. Code available: https://github.com/visinf/multi-mono-sf

  26. arXiv:2105.00097  [pdf, other

    cs.CV cs.LG

    Self-supervised Augmentation Consistency for Adapting Semantic Segmentation

    Authors: Nikita Araslanov, Stefan Roth

    Abstract: We propose an approach to domain adaptation for semantic segmentation that is both practical and highly accurate. In contrast to previous work, we abandon the use of computationally involved adversarial objectives, network ensembles and style transfer. Instead, we employ standard data augmentation techniques $-$ photometric noise, flipping and scaling $-$ and ensure consistency of the semantic pre… ▽ More

    Submitted 30 April, 2021; originally announced May 2021.

    Comments: To appear at CVPR 2021. Code: https://github.com/visinf/da-sac

  27. arXiv:2103.09962  [pdf, other

    cs.CV

    Deep Wiener Deconvolution: Wiener Meets Deep Learning for Image Deblurring

    Authors: Jiangxin Dong, Stefan Roth, Bernt Schiele

    Abstract: We present a simple and effective approach for non-blind image deblurring, combining classical techniques and deep learning. In contrast to existing methods that deblur the image directly in the standard image space, we propose to perform an explicit deconvolution process in a feature space by integrating a classical Wiener deconvolution framework with learned deep features. A multi-scale feature… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Accepted to NeurIPS 2020 as an oral presentation. Project page: https://gitlab.mpi-klsb.mpg.de/jdong/dwdn

  28. arXiv:2103.08497  [pdf, other

    cs.LG cs.CV

    Sampling-free Variational Inference for Neural Networks with Multiplicative Activation Noise

    Authors: Jannik Schmitt, Stefan Roth

    Abstract: To adopt neural networks in safety critical domains, knowing whether we can trust their predictions is crucial. Bayesian neural networks (BNNs) provide uncertainty estimates by averaging predictions with respect to the posterior weight distribution. Variational inference methods for BNNs approximate the intractable weight posterior with a tractable distribution, yet mostly rely on sampling from th… ▽ More

    Submitted 16 March, 2021; v1 submitted 15 March, 2021; originally announced March 2021.

  29. arXiv:2012.07727  [pdf, ps, other

    cs.IT cs.CR eess.SP

    Localization Attack by Precoder Feedback Overhearing in 5G Networks and Countermeasures

    Authors: Stefan Roth, Stefano Tomasin, Marco Maso, Aydin Sezgin

    Abstract: In fifth-generation (5G) cellular networks, users feed back to the base station the index of the precoder (from a codebook) to be used for downlink transmission. The precoder is strongly related to the user channel and in turn to the user position within the cell. We propose a method by which an external attacker determines the user position by passively overhearing this unencrypted layer-2 feedba… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

  30. arXiv:2011.00966  [pdf, other

    cs.CV cs.LG

    Diverse Image Captioning with Context-Object Split Latent Spaces

    Authors: Shweta Mahajan, Stefan Roth

    Abstract: Diverse image captioning models aim to learn one-to-many mappings that are innate to cross-domain datasets, such as of images and texts. Current methods for this task are based on generative latent variable models, e.g. VAEs with structured latent spaces. Yet, the amount of multimodality captured by prior work is limited to that of the paired training data -- the true diversity of the underlying g… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: To appear at NeurIPS 2020

  31. arXiv:2010.07548  [pdf, other

    cs.CV

    MOTChallenge: A Benchmark for Single-Camera Multiple Target Tracking

    Authors: Patrick Dendorfer, Aljoša Ošep, Anton Milan, Konrad Schindler, Daniel Cremers, Ian Reid, Stefan Roth, Laura Leal-Taixé

    Abstract: Standardized benchmarks have been crucial in pushing the performance of computer vision algorithms, especially since the advent of deep learning. Although leaderboards should not be over-claimed, they often provide the most objective measure of performance and are therefore important guides for research. We present MOTChallenge, a benchmark for single-camera Multiple Object Tracking (MOT) launched… ▽ More

    Submitted 8 December, 2020; v1 submitted 15 October, 2020; originally announced October 2020.

    Comments: Accepted at IJCV

  32. Planning on the fast lane: Learning to interact using attention mechanisms in path integral inverse reinforcement learning

    Authors: Sascha Rosbach, Xing Li, Simon Großjohann, Silviu Homoceanu, Stefan Roth

    Abstract: General-purpose trajectory planning algorithms for automated driving utilize complex reward functions to perform a combined optimization of strategic, behavioral, and kinematic features. The specification and tuning of a single reward function is a tedious task and does not generalize over a large set of traffic situations. Deep learning approaches based on path integral inverse reinforcement lear… ▽ More

    Submitted 12 September, 2020; v1 submitted 11 July, 2020; originally announced July 2020.

    Comments: To appear in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, October 2020

    Journal ref: 2020 IEEE/RSJ Int. Conf. on Intelligent Robots and Syst. (IROS), Las Vegas, USA, 2020, pp. 5187-5193

  33. arXiv:2005.14264  [pdf, other

    cs.CV

    LR-CNN: Local-aware Region CNN for Vehicle Detection in Aerial Imagery

    Authors: Wentong Liao, Xiang Chen, Jingfeng Yang, Stefan Roth, Michael Goesele, Michael Ying Yang, Bodo Rosenhahn

    Abstract: State-of-the-art object detection approaches such as Fast/Faster R-CNN, SSD, or YOLO have difficulties detecting dense, small targets with arbitrary orientation in large aerial images. The main reason is that using interpolation to align RoI features can result in a lack of accuracy or even loss of location information. We present the Local-aware Region Convolutional Neural Network (LR-CNN), a nov… ▽ More

    Submitted 28 May, 2020; originally announced May 2020.

    Comments: 8 pages

  34. arXiv:2005.08104  [pdf, other

    cs.CV cs.LG

    Single-Stage Semantic Segmentation from Image Labels

    Authors: Nikita Araslanov, Stefan Roth

    Abstract: Recent years have seen a rapid growth in new approaches improving the accuracy of semantic segmentation in a weakly supervised setting, i.e. with only image-level labels available for training. However, this has come at the cost of increased model complexity and sophisticated multi-stage training procedures. This is in contrast to earlier work that used only a single stage $-$ training one segment… ▽ More

    Submitted 16 May, 2020; originally announced May 2020.

    Comments: To appear at CVPR 2020; minor corrections in Eq. (9). Code: https://github.com/visinf/1-stage-wseg

  35. arXiv:2005.05292  [pdf, ps, other

    cs.IT cs.NI eess.SP

    Remote Short Blocklength Process Monitoring: Trade-off Between Resolution and Data Freshness

    Authors: Stefan Roth, Ahmed Arafa, H. Vincent Poor, Aydin Sezgin

    Abstract: In cyber-physical systems, as in 5G and beyond, multiple physical processes require timely online monitoring at a remote device. There, the received information is used to estimate current and future process values. When transmitting the process data over a communication channel, source-channel coding is used in order to reduce data errors. During transmission, a high data resolution is helpful to… ▽ More

    Submitted 11 May, 2020; originally announced May 2020.

    Comments: To appear in the 2020 IEEE International Conference on Communications

  36. arXiv:2004.04143  [pdf, other

    cs.CV cs.LG cs.RO

    Self-Supervised Monocular Scene Flow Estimation

    Authors: Junhwa Hur, Stefan Roth

    Abstract: Scene flow estimation has been receiving increasing attention for 3D environment perception. Monocular scene flow estimation -- obtaining 3D structure and 3D motion from two temporally consecutive images -- is a highly ill-posed problem, and practical solutions are lacking to date. We propose a novel monocular scene flow method that yields competitive accuracy and real-time performance. By taking… ▽ More

    Submitted 15 April, 2020; v1 submitted 8 April, 2020; originally announced April 2020.

    Comments: To appear at CVPR 2020 (Oral); a typo corrected in the reference section

  37. arXiv:2004.03891  [pdf, other

    cs.LG cs.CV stat.ML

    Normalizing Flows with Multi-Scale Autoregressive Priors

    Authors: Shweta Mahajan, Apratim Bhattacharyya, Mario Fritz, Bernt Schiele, Stefan Roth

    Abstract: Flow-based generative models are an important class of exact inference models that admit efficient inference and sampling for image synthesis. Owing to the efficiency constraints on the design of the flow layers, e.g. split coupling flow layers in which approximately half the pixels do not undergo further transformations, they have limited expressiveness for modeling long-range data dependencies c… ▽ More

    Submitted 8 April, 2020; originally announced April 2020.

    Comments: To appear in CVPR 2020

  38. arXiv:2004.02853  [pdf, other

    cs.CV cs.LG

    Optical Flow Estimation in the Deep Learning Age

    Authors: Junhwa Hur, Stefan Roth

    Abstract: Akin to many subareas of computer vision, the recent advances in deep learning have also significantly influenced the literature on optical flow. Previously, the literature had been dominated by classical energy-based models, which formulate optical flow estimation as an energy minimization problem. However, as the practical benefits of Convolutional Neural Networks (CNNs) over conventional method… ▽ More

    Submitted 6 April, 2020; originally announced April 2020.

    Comments: To appear as a book chapter in Modelling Human Motion, N. Noceti, A. Sciutti and F. Rea, Eds., Springer, 2020

  39. arXiv:2003.14407  [pdf, other

    cs.CV

    Probabilistic Pixel-Adaptive Refinement Networks

    Authors: Anne S. Wannenwetsch, Stefan Roth

    Abstract: Encoder-decoder networks have found widespread use in various dense prediction tasks. However, the strong reduction of spatial resolution in the encoder leads to a loss of location information as well as boundary artifacts. To address this, image-adaptive post-processing methods have shown beneficial by leveraging the high-resolution input image(s) as guidance data. We extend such approaches by co… ▽ More

    Submitted 31 March, 2020; originally announced March 2020.

    Comments: To appear at CVPR 2020

  40. arXiv:2003.09003  [pdf, other

    cs.CV

    MOT20: A benchmark for multi object tracking in crowded scenes

    Authors: Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixé

    Abstract: Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of mu… ▽ More

    Submitted 19 March, 2020; originally announced March 2020.

    Comments: The sequences of the new MOT20 benchmark were previously presented in the CVPR 2019 tracking challenge ( arXiv:1906.04567 ). The differences between the two challenges are: - New and corrected annotations - New sequences, as we had to crop and transform some old sequences to achieve higher quality in the annotations. - New baselines evaluations and different sets of public detections

  41. arXiv:2002.06661  [pdf, other

    cs.CV

    Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings

    Authors: Shweta Mahajan, Iryna Gurevych, Stefan Roth

    Abstract: Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning. Prior work mostly maps both domains into a common latent representation in a purely supervised fashion. This is rather restrictive, however, as the two domains follow distinct generative processes. Therefore, we propose a novel semi-supervised framework, which models… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

    Comments: Published as a conference paper at ICLR 2020

  42. Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving

    Authors: Sascha Rosbach, Vinit James, Simon Großjohann, Silviu Homoceanu, Xing Li, Stefan Roth

    Abstract: General-purpose planning algorithms for automated driving combine mission, behavior, and local motion planning. Such planning algorithms map features of the environment and driving kinematics into complex reward functions. To achieve this, planning experts often rely on linear reward functions. The specification and tuning of these reward functions is a tedious process and requires significant exp… ▽ More

    Submitted 13 September, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

    Comments: To appear in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, June 2020 (Virtual Conference). Accepted version. Corrected figure font

    Journal ref: IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 6419-6425

  43. arXiv:1909.12400  [pdf, other

    cs.CV

    Markov Decision Process for Video Generation

    Authors: Vladyslav Yushchenko, Nikita Araslanov, Stefan Roth

    Abstract: We identify two pathological cases of temporal inconsistencies in video generation: video freezing and video looping. To better quantify the temporal diversity, we propose a class of complementary metrics that are effective, easy to implement, data agnostic, and interpretable. Further, we observe that current state-of-the-art models are trained on video samples of fixed length thereby inhibiting l… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: To appear at 2019 ICCV Workshop on Large Scale Holistic Video Understanding

  44. arXiv:1909.12196  [pdf, other

    cs.CV cs.LG

    Deep Video Deblurring: The Devil is in the Details

    Authors: Jochen Gast, Stefan Roth

    Abstract: Video deblurring for hand-held cameras is a challenging task, since the underlying blur is caused by both camera shake and object motion. State-of-the-art deep networks exploit temporal information from neighboring frames, either by means of spatio-temporal transformers or by recurrent architectures. In contrast to these involved models, we found that a simple baseline CNN can perform astonishingl… ▽ More

    Submitted 26 September, 2019; originally announced September 2019.

    Comments: To appear at ICCVW 2019

  45. arXiv:1909.06635  [pdf, other

    cs.CV cs.CL cs.LG

    Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

    Authors: Shweta Mahajan, Teresa Botschen, Iryna Gurevych, Stefan Roth

    Abstract: One of the key challenges in learning joint embeddings of multiple modalities, e.g. of images and text, is to ensure coherent cross-modal semantics that generalize across datasets. We propose to address this through joint Gaussian regularization of the latent representations. Building on Wasserstein autoencoders (WAEs) to encode the input in each domain, we enforce the latent embeddings to be simi… ▽ More

    Submitted 14 September, 2019; originally announced September 2019.

    Comments: Accepted at ICCV 2019 Workshop on Cross-Modal Learning in Real World

  46. arXiv:1909.03677  [pdf, other

    cs.CV

    Learning Task-Specific Generalized Convolutions in the Permutohedral Lattice

    Authors: Anne S. Wannenwetsch, Martin Kiefel, Peter V. Gehler, Stefan Roth

    Abstract: Dense prediction tasks typically employ encoder-decoder architectures, but the prevalent convolutions in the decoder are not image-adaptive and can lead to boundary artifacts. Different generalized convolution operations have been introduced to counteract this. We go beyond these by leveraging guidance data to redefine their inherent notion of proximity. Our proposed network layer builds on the pe… ▽ More

    Submitted 9 September, 2019; originally announced September 2019.

    Comments: To appear at GCPR 2019

  47. arXiv:1906.04567  [pdf, other

    cs.CV cs.LG

    CVPR19 Tracking and Detection Challenge: How crowded can it get?

    Authors: Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, Laura Leal-Taixe

    Abstract: Standardized benchmarks are crucial for the majority of computer vision applications. Although leaderboards and ranking tables should not be over-claimed, benchmarks often provide the most objective measure of performance and are therefore important guides for research. The benchmark for Multiple Object Tracking, MOTChallenge, was launched with the goal to establish a standardized evaluation of… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: arXiv admin note: substantial text overlap with arXiv:1603.00831, arXiv:1504.01942

  48. Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving

    Authors: Sascha Rosbach, Vinit James, Simon Großjohann, Silviu Homoceanu, Stefan Roth

    Abstract: Behavior and motion planning play an important role in automated driving. Traditionally, behavior planners instruct local motion planners with predefined behaviors. Due to the high scene complexity in urban environments, unpredictable situations may occur in which behavior planners fail to match predefined behavior templates. Recently, general-purpose planners have been introduced, combining behav… ▽ More

    Submitted 12 September, 2020; v1 submitted 1 May, 2019; originally announced May 2019.

    Comments: Appeared at IROS 2019. Accepted version. Added/updated footnote, minor correction in preliminaries

    Journal ref: 2019 IEEE/RSJ Int. Conf. on Intelligent Robots and Syst. (IROS), Macau, China, 2019, pp. 2658-2665

  49. arXiv:1904.05290  [pdf, other

    cs.CV cs.LG

    Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation

    Authors: Junhwa Hur, Stefan Roth

    Abstract: Deep learning approaches to optical flow estimation have seen rapid progress over the recent years. One common trait of many networks is that they refine an initial flow estimate either through multiple stages or across the levels of a coarse-to-fine representation. While leading to more accurate results, the downside of this is an increased number of parameters. Taking inspiration from both class… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: To appear in CVPR 2019

  50. arXiv:1904.05126  [pdf, other

    cs.CV cs.LG

    Actor-Critic Instance Segmentation

    Authors: Nikita Araslanov, Constantin Rothkopf, Stefan Roth

    Abstract: Most approaches to visual scene analysis have emphasised parallel processing of the image elements. However, one area in which the sequential nature of vision is apparent, is that of segmenting multiple, potentially similar and partially occluded objects in a scene. In this work, we revisit the recurrent formulation of this challenging problem in the context of reinforcement learning. Motivated by… ▽ More

    Submitted 10 April, 2019; originally announced April 2019.

    Comments: To appear at CVPR 2019