Skip to main content

Showing 1–50 of 69 results for author: Bowden, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04164  [pdf, other

    cs.CV

    Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation

    Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

    Abstract: Automatic Sign Language Translation requires the integration of both computer vision and natural language processing to effectively bridge the communication gap between sign and spoken languages. However, the deficiency in large-scale training data to support sign language translation means we need to leverage resources from spoken language. We introduce, Sign2GPT, a novel framework for sign langu… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted at ICLR2024

  2. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  3. arXiv:2404.11532  [pdf, other

    cs.CL

    Select and Reorder: A Novel Approach for Neural Sign Language Production

    Authors: Harry Walsh, Ben Saunders, Richard Bowden

    Abstract: Sign languages, often categorised as low-resource languages, face significant challenges in achieving accurate translation due to the scarcity of parallel annotated datasets. This paper introduces Select and Reorder (S&R), a novel approach that addresses data scarcity by breaking down the translation process into two distinct steps: Gloss Selection (GS) and Gloss Reordering (GR). Our method levera… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 8 Pages, 5 Figures, 7 Tables, LREC-COLING 2024

  4. arXiv:2404.11499  [pdf, other

    cs.CL cs.AI

    A Data-Driven Representation for Sign Language Production

    Authors: Harry Walsh, Abolfazl Ravanshad, Mariam Rahmani, Richard Bowden

    Abstract: Phonetic representations are used when recording spoken languages, but no equivalent exists for recording signed languages. As a result, linguists have proposed several annotation systems that operate on the gloss or sub-unit level; however, these resources are notably irregular and scarce. Sign Language Production (SLP) aims to automatically translate spoken language sentences into continuous s… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 8 Pages, 3 Figures, 7 Tables, 18th IEEE International Conference on Automatic Face and Gesture Recognition 2024

  5. arXiv:2404.05414  [pdf, other

    cs.CV

    Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks

    Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

    Abstract: 3D hand pose estimation from images has seen considerable interest from the literature, with new methods improving overall 3D accuracy. One current challenge is to address hand-to-hand interaction where self-occlusions and finger articulation pose a significant problem to estimation. Little work has applied physical constraints that minimize the hand intersections that occur as a result of noisy e… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  6. arXiv:2403.10731  [pdf, other

    cs.CV cs.LG

    Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation

    Authors: Anton Pelykh, Ozge Mercanoglu Sincan, Richard Bowden

    Abstract: Recent years have seen significant progress in human image generation, particularly with the advancements in diffusion models. However, existing diffusion methods encounter challenges when producing consistent hand anatomy and the generated images often lack precise control over the hand pose. To address this limitation, we introduce a novel approach to pose-conditioned human image generation, div… ▽ More

    Submitted 30 April, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  7. arXiv:2403.10434  [pdf, other

    cs.CV

    Using an LLM to Turn Sign Spottings into Spoken Language Sentences

    Authors: Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos. In this paper, we introduce a hybrid SLT approach, Spotter+GPT, that utilizes a sign spotter and a pretrained large language model to improve SLT performance. Our method builds upon the strengths of both components. The videos are first processed by the spotter, which is… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  8. arXiv:2403.01569  [pdf, other

    cs.CV cs.AI cs.RO

    Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV

    Authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

    Abstract: Self-supervised learning is the key to unlocking generic computer vision systems. By eliminating the reliance on ground-truth annotations, it allows scaling to much larger data quantities. Unfortunately, self-supervised monocular depth estimation (SS-MDE) has been limited by the absence of diverse training data. Existing datasets have focused exclusively on urban driving in densely populated citie… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  9. arXiv:2308.09622  [pdf, other

    cs.CV

    Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse

    Authors: Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos, both of which have different grammar and word/gloss order. From a Neural Machine Translation (NMT) perspective, the straightforward way of training translation models is to use sign language phrase-spoken language sentence pairs. However, human interpreters heavily rely… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  10. arXiv:2308.09525  [pdf, other

    cs.CV

    Improving 3D Pose Estimation for Sign Language

    Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

    Abstract: This work addresses 3D human pose reconstruction in single images. We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose. Pose is represented as a hierarchical tree/graph with nodes corresponding to human joints that model their physical limits. Given a 2D detection of keypoints in the image, we lift the skeleton to 3D using… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  11. arXiv:2308.09523  [pdf, other

    cs.CV

    Denoising Diffusion for 3D Hand Pose Estimation from Images

    Authors: Maksym Ivashechkin, Oscar Mendez, Richard Bowden

    Abstract: Hand pose estimation from a single image has many applications. However, approaches to full 3D body pose estimation are typically trained on day-to-day activities or actions. As such, detailed hand-to-hand interactions are poorly represented, especially during motion. We see this in the failure cases of techniques such as OpenPose or MediaPipe. However, accurate hand pose estimation is crucial for… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  12. arXiv:2308.09515  [pdf, other

    cs.CV

    Learnt Contrastive Concept Embeddings for Sign Recognition

    Authors: Ryan Wong, Necati Cihan Camgoz, Richard Bowden

    Abstract: In natural language processing (NLP) of spoken languages, word embeddings have been shown to be a useful method to encode the meaning of words. Sign languages are visual languages, which require sign embeddings to capture the visual and linguistic semantics of sign. Unlike many common approaches to Sign Recognition, we focus on explicitly creating sign embeddings that bridge the gap between sign l… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  13. arXiv:2308.04248  [pdf, other

    cs.CL cs.AI

    Gloss Alignment Using Word Embeddings

    Authors: Harry Walsh, Ozge Mercanoglu Sincan, Ben Saunders, Richard Bowden

    Abstract: Capturing and annotating Sign language datasets is a time consuming and costly process. Current datasets are orders of magnitude too small to successfully train unconstrained \acf{slt} models. As a result, research has turned to TV broadcast content as a source of large-scale training data, consisting of both the sign language interpreter and the associated audio subtitle. However, lack of sign la… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: 4 pages, 4 figures, 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)

  14. arXiv:2307.10713  [pdf, other

    cs.CV cs.AI cs.RO

    Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

    Authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

    Abstract: Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magni… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted to ICCV2023

  15. arXiv:2307.09065  [pdf, other

    cs.CV cs.LG

    Learning Adaptive Neighborhoods for Graph Neural Networks

    Authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

    Abstract: Graph convolutional networks (GCNs) enable end-to-end learning on graph structured data. However, many works assume a given graph structure. When the input graph is noisy or unavailable, one approach is to construct or learn a latent graph structure. These methods typically fix the choice of node degree for the entire graph, which is suboptimal. Instead, we propose a novel end-to-end differentiabl… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  16. arXiv:2304.07051  [pdf, other

    cs.CV cs.AI

    The Second Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, C. Stella Qian, Michaela Trescakova, Chris Russell, Simon Hadfield, Erich W. Graf, Wendy J. Adams, Andrew J. Schofield, James Elder, Richard Bowden, Ali Anwar, Hao Chen, Xiaozhi Chen, Kai Cheng, Yuchao Dai, Huynh Thai Hoa, Sadat Hossain, Jianmian Huang, Mohan Jing, Bo Li, Chao Li, Baojun Li, Zhiwen Liu, Stefano Mattoccia, Siegfried Mercelis , et al. (18 additional authors not shown)

    Abstract: This paper discusses the results for the second edition of the Monocular Depth Estimation Challenge (MDEC). This edition was open to methods using any form of supervision, including fully-supervised, self-supervised, multi-task or proxy depth. The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth. This includes… ▽ More

    Submitted 26 April, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Published at CVPRW2023

  17. arXiv:2303.16821  [pdf, other

    cs.RO cs.AI cs.LG

    Decision Making for Autonomous Driving in Interactive Merge Scenarios via Learning-based Prediction

    Authors: Salar Arbabi, Davide Tavernini, Saber Fallah, Richard Bowden

    Abstract: Autonomous agents that drive on roads shared with human drivers must reason about the nuanced interactions among traffic participants. This poses a highly challenging decision making problem since human behavior is influenced by a multitude of factors (e.g., human intentions and emotions) that are hard to model. This paper presents a decision making approach for autonomous driving, focusing on the… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: 12 pages, 12 figures

  18. Novel View Synthesis of Humans using Differentiable Rendering

    Authors: Guillaume Rochette, Chris Russell, Richard Bowden

    Abstract: We present a new approach for synthesizing novel views of people in new poses. Our novel differentiable renderer enables the synthesis of highly realistic images from any viewpoint. Rather than operating over mesh-based structures, our renderer makes use of diffuse Gaussian primitives that directly represent the underlying skeletal structure of a human. Rendering these primitives gives results in… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted at IEEE transactions on Biometrics, Behavior, and Identity Science, 10 pages, 11 figures. arXiv admin note: substantial text overlap with arXiv:2111.12731

  19. arXiv:2211.12174  [pdf, other

    cs.CV

    The Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, C. Stella Qian, Chris Russell, Simon Hadfield, Erich Graf, Wendy Adams, Andrew J. Schofield, James Elder, Richard Bowden, Heng Cong, Stefano Mattoccia, Matteo Poggi, Zeeshan Khan Suri, Yang Tang, Fabio Tosi, Hao Wang, Youmin Zhang, Yusheng Zhang, Chaoqiang Zhao

    Abstract: This paper summarizes the results of the first Monocular Depth Estimation Challenge (MDEC) organized at WACV2023. This challenge evaluated the progress of self-supervised monocular depth estimation on the challenging SYNS-Patches dataset. The challenge was organized on CodaLab and received submissions from 4 valid teams. Participants were provided a devkit containing updated reference implementati… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: WACV-Workshops 2023

  20. arXiv:2210.06312  [pdf, other

    cs.CL cs.AI

    Changing the Representation: Examining Language Representation for Neural Sign Language Production

    Authors: Harry Walsh, Ben Saunders, Richard Bowden

    Abstract: Neural Sign Language Production (SLP) aims to automatically translate from spoken language sentences to sign language videos. Historically the SLP task has been broken into two steps; Firstly, translating from a spoken language sentence to a gloss sequence and secondly, producing a sign language video given a sequence of glosses. In this paper we apply Natural Language Processing techniques to the… ▽ More

    Submitted 16 September, 2022; originally announced October 2022.

    Comments: 8 pages, 4 figures, 5 tables, SLTAT 2022

    MSC Class: 68T50 (Primary)

  21. arXiv:2210.00951  [pdf, other

    cs.CV

    Hierarchical I3D for Sign Spotting

    Authors: Ryan Wong, Necati Cihan Camgöz, Richard Bowden

    Abstract: Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously id… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  22. arXiv:2208.01489  [pdf, other

    cs.CV cs.CG cs.LG

    Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter

    Authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden

    Abstract: This paper presents an open and comprehensive framework to systematically evaluate state-of-the-art contributions to self-supervised monocular depth estimation. This includes pretraining, backbone, architectural design choices and loss functions. Many papers in this field claim novelty in either architecture design or loss formulation. However, simply updating the backbone of historical systems re… ▽ More

    Submitted 21 December, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

    Comments: https://github.com/jspenmar/monodepth_benchmark

    Journal ref: Transactions of Machine Learning Research 2022

  23. arXiv:2208.00516  [pdf, other

    cs.RO

    Learning an Interpretable Model for Driver Behavior Prediction with Inductive Biases

    Authors: Salar Arbabi, Davide Tavernini, Saber Fallah, Richard Bowden

    Abstract: To plan safe maneuvers and act with foresight, autonomous vehicles must be capable of accurately predicting the uncertain future. In the context of autonomous driving, deep neural networks have been successfully applied to learning predictive models of human driving behavior from data. However, the predictions suffer from cascading errors, resulting in large inaccuracies over long time horizons. F… ▽ More

    Submitted 31 July, 2022; originally announced August 2022.

  24. arXiv:2206.12946  [pdf, other

    cs.CV cs.RO

    AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation

    Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

    Abstract: Motion estimation approaches typically employ sensor fusion techniques, such as the Kalman Filter, to handle individual sensor failures. More recently, deep learning-based fusion approaches have been proposed, increasing the performance and requiring less model-specific implementations. However, current deep fusion approaches often assume that sensors are synchronised, which is not always practica… ▽ More

    Submitted 16 September, 2022; v1 submitted 26 June, 2022; originally announced June 2022.

  25. arXiv:2205.00135  [pdf, other

    math.NT cs.CR

    Failing to hash into supersingular isogeny graphs

    Authors: Jeremy Booher, Ross Bowden, Javad Doliskani, Tako Boris Fouotsa, Steven D. Galbraith, Sabrina Kunzweiler, Simon-Philipp Merz, Christophe Petit, Benjamin Smith, Katherine E. Stange, Yan Bo Ti, Christelle Vincent, José Felipe Voloch, Charlotte Weitkämper, Lukas Zobernig

    Abstract: An important open problem in supersingular isogeny-based cryptography is to produce, without a trusted authority, concrete examples of "hard supersingular curves" that is, equations for supersingular curves for which computing the endomorphism ring is as difficult as it is for random supersingular curves. A related open problem is to produce a hash function to the vertices of the supersingular… ▽ More

    Submitted 8 May, 2024; v1 submitted 29 April, 2022; originally announced May 2022.

    Comments: 34 pages, 8 figures

    MSC Class: 11G05; 11T71; 14G50; 14K02; 81P94; 94A60; 68Q12

  26. arXiv:2204.05698  [pdf, other

    cs.LG

    Medusa: Universal Feature Learning via Attentional Multitasking

    Authors: Jaime Spencer, Richard Bowden, Simon Hadfield

    Abstract: Recent approaches to multi-task learning (MTL) have focused on modelling connections between tasks at the decoder level. This leads to a tight coupling between tasks, which need retraining if a new task is inserted or removed. We argue that MTL is a stepping stone towards universal feature learning (UFL), which is the ability to learn generic features that can be applied to new tasks without retra… ▽ More

    Submitted 12 April, 2022; originally announced April 2022.

    Comments: Accepted @ CVPRW 2022 (CLVision, 3rd Edition)

  27. arXiv:2204.02944  [pdf, other

    cs.CV

    "The Pedestrian next to the Lamppost" Adaptive Object Graphs for Better Instantaneous Mapping

    Authors: Avishkar Saha, Oscar Mendez, Chris Russell, Richard Bowden

    Abstract: Estimating a semantically segmented bird's-eye-view (BEV) map from a single image has become a popular technique for autonomous control and navigation. However, they show an increase in localization error with distance from the camera. While such an increase in error is entirely expected - localization is harder at distance - much of the drop in performance can be attributed to the cues used by cu… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  28. arXiv:2203.15354  [pdf, other

    cs.CV

    Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. However, current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences from constrained vocabularies and this limits applicability. To be understandable and accepted by the deaf, an automatic SLP system must be able to generate co-articulated p… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: arXiv admin note: text overlap with arXiv:2011.09846

  29. arXiv:2202.09096  [pdf, other

    cs.LG stat.ME stat.ML

    A Free Lunch with Influence Functions? Improving Neural Network Estimates with Concepts from Semiparametric Statistics

    Authors: Matthew J. Vowels, Sina Akbari, Necati Cihan Camgoz, Richard Bowden

    Abstract: Parameter estimation in empirical fields is usually undertaken using parametric models, and such models readily facilitate statistical inference. Unfortunately, they are unlikely to be sufficiently flexible to be able to adequately model real-world phenomena, and may yield biased estimates. Conversely, non-parametric approaches are flexible but do not readily facilitate statistical inference and m… ▽ More

    Submitted 10 June, 2022; v1 submitted 18 February, 2022; originally announced February 2022.

  30. Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty Estimation

    Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

    Abstract: Visual Odometry (VO) estimation is an important source of information for vehicle state estimation and autonomous driving. Recently, deep learning based approaches have begun to appear in the literature. However, in the context of driving, single sensor based approaches are often prone to failure because of degraded image quality due to environmental factors, camera placement, etc. To address this… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Journal ref: 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), 2021, pp. 2944-2949

  31. MDN-VO: Estimating Visual Odometry with Confidence

    Authors: Nimet Kaygusuz, Oscar Mendez, Richard Bowden

    Abstract: Visual Odometry (VO) is used in many applications including robotics and autonomous systems. However, traditional approaches based on feature matching are computationally expensive and do not directly address failure cases, instead relying on heuristic methods to detect failure. In this work, we propose a deep learning-based VO model to efficiently estimate 6-DoF poses, as well as a confidence mod… ▽ More

    Submitted 23 December, 2021; originally announced December 2021.

    Journal ref: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3528-3533

  32. arXiv:2112.05277  [pdf, other

    cs.CV cs.CL

    Skeletal Graph Self-Attention: Embedding a Skeleton Inductive Bias into Sign Language Production

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Recent approaches to Sign Language Production (SLP) have adopted spoken language Neural Machine Translation (NMT) architectures, applied without sign-specific modifications. In addition, these works represent sign language as a sequence of skeleton pose vectors, projected to an abstract representation with no inherent skeletal structure. In this paper, we represent sign language sequences as a ske… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  33. arXiv:2111.12731  [pdf, other

    cs.CV

    Human Pose Manipulation and Novel View Synthesis using Differentiable Rendering

    Authors: Guillaume Rochette, Chris Russell, Richard Bowden

    Abstract: We present a new approach for synthesizing novel views of people in new poses. Our novel differentiable renderer enables the synthesis of highly realistic images from any viewpoint. Rather than operating over mesh-based structures, our renderer makes use of diffuse Gaussian primitives that directly represent the underlying skeletal structure of a human. Rendering these primitives gives results in… ▽ More

    Submitted 20 February, 2022; v1 submitted 24 November, 2021; originally announced November 2021.

    Comments: Accepted at Face and Gesture 2021, 8 pages, 7 figures

  34. arXiv:2110.00966  [pdf, other

    cs.CV

    Translating Images into Maps

    Authors: Avishkar Saha, Oscar Mendez Maldonado, Chris Russell, Richard Bowden

    Abstract: We approach instantaneous mapping, converting images to a top-down view of the world, as a translation problem. We show how a novel form of transformer network can be used to map from images and video directly to an overhead map or bird's-eye-view (BEV) of the world, in a single end-to-end network. We assume a 1-1 correspondence between a vertical scanline in the image, and rays passing through th… ▽ More

    Submitted 30 March, 2022; v1 submitted 3 October, 2021; originally announced October 2021.

    Comments: Accepted to ICRA 2022

  35. arXiv:2108.04229  [pdf, other

    cs.CV

    Looking for the Signs: Identifying Isolated Sign Instances in Continuous Video Footage

    Authors: Tao Jiang, Necati Cihan Camgoz, Richard Bowden

    Abstract: In this paper, we focus on the task of one-shot sign spotting, i.e. given an example of an isolated sign (query), we want to identify whether/where this sign appears in a continuous, co-articulated sign language video (target). To achieve this goal, we propose a transformer-based network, called SignLookup. We employ 3D Convolutional Neural Networks (CNNs) to extract spatio-temporal representation… ▽ More

    Submitted 20 November, 2021; v1 submitted 21 July, 2021; originally announced August 2021.

    Comments: 8 pages, 2 figures

  36. arXiv:2107.11857  [pdf, other

    cs.RO cs.CV

    Improving Robot Localisation by Ignoring Visual Distraction

    Authors: Oscar Mendez, Matthew Vowels, Richard Bowden

    Abstract: Attention is an important component of modern deep learning. However, less emphasis has been put on its inverse: ignoring distraction. Our daily lives require us to explicitly avoid giving attention to salient visual features that confound the task we are trying to accomplish. This visual prioritisation allows us to concentrate on important tasks while ignoring visual distractors. In this work,… ▽ More

    Submitted 25 July, 2021; originally announced July 2021.

    Comments: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

  37. arXiv:2107.11317  [pdf, other

    cs.CV

    Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: It is common practice to represent spoken languages at their phonetic level. However, for sign languages, this implies breaking motion into its constituent motion primitives. Avatar based Sign Language Production (SLP) has traditionally done just this, building up animation from sequences of hand motions, shapes and facial expressions. However, more recent deep learning based solutions to SLP have… ▽ More

    Submitted 26 July, 2021; v1 submitted 23 July, 2021; originally announced July 2021.

    Journal ref: International Conference of Computer Vision (ICCV 2021)

  38. arXiv:2107.10685  [pdf, other

    cs.CV

    AnonySIGN: Novel Human Appearance Synthesis for Sign Language Video Anonymisation

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: The visual anonymisation of sign language data is an essential task to address privacy concerns raised by large-scale dataset collection. Previous anonymisation techniques have either significantly affected sign comprehension or required manual, labour-intensive work. In this paper, we formally introduce the task of Sign Language Video Anonymisation (SLVA) as an automatic method to anonymise the… ▽ More

    Submitted 23 July, 2021; v1 submitted 22 July, 2021; originally announced July 2021.

    Journal ref: Face and Gesture Conference 2021

  39. arXiv:2107.04487  [pdf, other

    cs.LG cs.AI cs.MA eess.SY

    ARC: Adversarially Robust Control Policies for Autonomous Vehicles

    Authors: Sampo Kuutti, Saber Fallah, Richard Bowden

    Abstract: Deep neural networks have demonstrated their capability to learn control policies for a variety of tasks. However, these neural network-based policies have been shown to be susceptible to exploitation by adversarial agents. Therefore, there is a need to develop techniques to learn control policies that are robust against adversaries. We introduce Adversarially Robust Control (ARC), which trains th… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted in IEEE Intelligent Transportation Systems Conference (ITSC) 2021

  40. arXiv:2107.04485  [pdf, other

    cs.LG cs.AI eess.SY

    Adversarial Mixture Density Networks: Learning to Drive Safely from Collision Data

    Authors: Sampo Kuutti, Saber Fallah, Richard Bowden

    Abstract: Imitation learning has been widely used to learn control policies for autonomous driving based on pre-recorded data. However, imitation learning based policies have been shown to be susceptible to compounding errors when encountering states outside of the training distribution. Further, these agents have been demonstrated to be easily exploitable by adversarial road users aiming to create collisio… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: Accepted in IEEE Intelligent Transportation Systems Conference (ITSC) 2021

  41. arXiv:2106.00371  [pdf, other

    cs.RO cs.CV cs.LG

    Markov Localisation using Heatmap Regression and Deep Convolutional Odometry

    Authors: Oscar Mendez, Simon Hadfield, Richard Bowden

    Abstract: In the context of self-driving vehicles there is strong competition between approaches based on visual localisation and LiDAR. While LiDAR provides important depth information, it is sparse in resolution and expensive. On the other hand, cameras are low-cost and recent developments in deep learning mean they can provide high localisation performance. However, several fundamental problems remain, p… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: IEEE International Conference on Robotics and Automation (ICRA) 2021

  42. arXiv:2105.02351  [pdf, other

    cs.CV cs.CL

    Content4All Open Research Sign Language Translation Datasets

    Authors: Necati Cihan Camgoz, Ben Saunders, Guillaume Rochette, Marco Giovanelli, Giacomo Inches, Robin Nachtrab-Ribback, Richard Bowden

    Abstract: Computational sign language research lacks the large-scale datasets that enables the creation of useful reallife applications. To date, most research has been limited to prototype systems on small domains of discourse, e.g. weather forecasts. To address this issue and to push the field forward, we release six datasets comprised of 190 hours of footage on the larger domain of news. From this, 20 ho… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

  43. arXiv:2104.11712  [pdf, other

    cs.CV

    Skeletor: Skeletal Transformers for Robust Body-Pose Estimation

    Authors: Tao Jiang, Necati Cihan Camgoz, Richard Bowden

    Abstract: Predicting 3D human pose from a single monoscopic video can be highly challenging due to factors such as low resolution, motion blur and occlusion, in addition to the fundamental ambiguity in estimating 3D from 2D. Approaches that directly regress the 3D pose from independent images can be particularly susceptible to these factors and result in jitter, noise and/or inconsistencies in skeletal esti… ▽ More

    Submitted 23 April, 2021; originally announced April 2021.

  44. arXiv:2104.10166  [pdf, other

    cs.CL

    Evaluating the Immediate Applicability of Pose Estimation for Sign Language Recognition

    Authors: Amit Moryossef, Ioannis Tsochantaridis, Joe Dinn, Necati Cihan Camgöz, Richard Bowden, Tao Jiang, Annette Rios, Mathias Müller, Sarah Ebling

    Abstract: Signed languages are visual languages produced by the movement of the hands, face, and body. In this paper, we evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations. Basically, skeletal representations generalize over an individual's appearance and background, allowing us to focus on the recognition of mo… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

  45. arXiv:2104.08183  [pdf, other

    cs.CV stat.AP stat.ML

    Shadow-Mapping for Unsupervised Neural Causal Discovery

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: An important goal across most scientific fields is the discovery of causal structures underling a set of observations. Unfortunately, causal discovery methods which are based on correlation or mutual information can often fail to identify causal links in systems which exhibit dynamic relationships. Such dynamic systems (including the famous coupled logistic map) exhibit `mirage' correlations which… ▽ More

    Submitted 28 April, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

  46. There and Back Again: Self-supervised Multispectral Correspondence Estimation

    Authors: Celyn Walters, Oscar Mendez, Mark Johnson, Richard Bowden

    Abstract: Across a wide range of applications, from autonomous vehicles to medical imaging, multi-spectral images provide an opportunity to extract additional information not present in color images. One of the most important steps in making this information readily available is the accurate estimation of dense correspondences between different spectra. Due to the nature of cross-spectral images, most cor… ▽ More

    Submitted 26 May, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

    Comments: To be published in IEEE/RSJ International Conference on Robot and Automation (ICRA) 2021

  47. arXiv:2103.09726  [pdf, other

    cs.LG eess.SY

    Weakly Supervised Reinforcement Learning for Autonomous Highway Driving via Virtual Safety Cages

    Authors: Sampo Kuutti, Richard Bowden, Saber Fallah

    Abstract: The use of neural networks and reinforcement learning has become increasingly popular in autonomous vehicle control. However, the opaqueness of the resulting control policies presents a significant barrier to deploying neural network-based control in autonomous vehicles. In this paper, we present a reinforcement learning based approach to autonomous vehicle longitudinal control, where the rule-bas… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Published in Sensors

    Journal ref: Sensors 2021, 21, 2032

  48. A Robust Extrinsic Calibration Framework for Vehicles with Unscaled Sensors

    Authors: Celyn Walters, Oscar Mendez, Simon Hadfield, Richard Bowden

    Abstract: Accurate extrinsic sensor calibration is essential for both autonomous vehicles and robots. Traditionally this is an involved process requiring calibration targets, known fiducial markers and is generally performed in a lab. Moreover, even a small change in the sensor layout requires recalibration. With the anticipated arrival of consumer autonomous vehicles, there is demand for a system which can… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Journal ref: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 2019, pp. 36-42

  49. arXiv:2103.07292  [pdf, other

    cs.CV cs.LG

    VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts

    Authors: Matthew J. Vowels, Necati Cihan Camgoz, Richard Bowden

    Abstract: Disentangled representations support a range of downstream tasks including causal reasoning, generative modeling, and fair machine learning. Unfortunately, disentanglement has been shown to be impossible without the incorporation of supervision or inductive bias. Given that supervision is often expensive or infeasible to acquire, we choose to incorporate structural inductive bias and present an un… ▽ More

    Submitted 15 December, 2021; v1 submitted 12 March, 2021; originally announced March 2021.

  50. arXiv:2103.06982  [pdf, other

    cs.CV

    Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks

    Authors: Ben Saunders, Necati Cihan Camgoz, Richard Bowden

    Abstract: Sign languages are multi-channel visual languages, where signers use a continuous 3D space to communicate.Sign Language Production (SLP), the automatic translation from spoken to sign languages, must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. Previous deep learning-based SLP works have produced only a concatenation of isola… ▽ More

    Submitted 11 March, 2021; originally announced March 2021.