Skip to main content

Showing 1–50 of 109 results for author: Ross, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.04726  [pdf, other

    cs.CL

    Learning Phonotactics from Linguistic Informants

    Authors: Canaan Breiss, Alexis Ross, Amani Maina-Kilaas, Roger Levy, Jacob Andreas

    Abstract: We propose an interactive approach to language learning that utilizes linguistic acceptability judgments from an informant (a competent language user) to learn a grammar. Given a grammar formalism and a framework for synthesizing data, our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies, asks the informant for a binary judgment, a… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  2. arXiv:2405.04495  [pdf, other

    cs.CL cs.AI cs.LG

    Toward In-Context Teaching: Adapting Examples to Students' Misconceptions

    Authors: Alexis Ross, Jacob Andreas

    Abstract: When a teacher provides examples for a student to study, these examples must be informative, enabling a student to progress from their current state toward a target concept or skill. Good teachers must therefore simultaneously infer what students already know and adapt their teaching to students' changing state of knowledge. There is increasing interest in using computational models, particularly… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  3. arXiv:2404.17105  [pdf, other

    cs.CV

    Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis

    Authors: Shivangi Yadav, Arun Ross

    Abstract: Biometric systems based on iris recognition are currently being used in border control applications and mobile devices. However, research in iris recognition is stymied by various factors such as limited datasets of bonafide irides and presentation attack instruments; restricted intra-class variations; and privacy concerns. Some of these issues can be mitigated by the use of synthetic iris data. I… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  4. arXiv:2404.16255  [pdf, other

    cs.CR cs.CV

    Enhancing Privacy in Face Analytics Using Fully Homomorphic Encryption

    Authors: Bharat Yalavarthi, Arjun Ramesh Kaushik, Arun Ross, Vishnu Boddeti, Nalini Ratha

    Abstract: Modern face recognition systems utilize deep neural networks to extract salient features from a face. These features denote embeddings in latent space and are often stored as templates in a face recognition system. These embeddings are susceptible to data leakage and, in some cases, can even be used to reconstruct the original face image. To prevent compromising identities, template protection sch… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  5. arXiv:2403.12047  [pdf, other

    cs.CV

    Alpha-wolves and Alpha-mammals: Exploring Dictionary Attacks on Iris Recognition Systems

    Authors: Sudipta Banerjee, Anubhav Jain, Zehua Jiang, Nasir Memon, Julian Togelius, Arun Ross

    Abstract: A dictionary attack in a biometric system entails the use of a small number of strategically generated images or templates to successfully match with a large number of identities, thereby compromising security. We focus on dictionary attacks at the template level, specifically the IrisCodes used in iris recognition systems. We present an hitherto unknown vulnerability wherein we mix IrisCodes usin… ▽ More

    Submitted 20 November, 2023; originally announced March 2024.

    Comments: 8 pages, 5 figures, 13 tables, Workshop on Manipulation, Adversarial, and Presentation Attacks in Biometrics, Winter Conference on Applications of Computer Vision

  6. arXiv:2403.05024  [pdf, other

    eess.IV cs.CV cs.LG

    A Probabilistic Hadamard U-Net for MRI Bias Field Correction

    Authors: Xin Zhu, Hongyi Pan, Yury Velichko, Adam B. Murphy, Ashley Ross, Baris Turkbey, Ahmet Enis Cetin, Ulas Bagci

    Abstract: Magnetic field inhomogeneity correction remains a challenging task in MRI analysis. Most established techniques are designed for brain MRI by supposing that image intensities in the identical tissue follow a uniform distribution. Such an assumption cannot be easily applied to other organs, especially those that are small in size and heterogeneous in texture (large variations in intensity), such as… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  7. arXiv:2403.01248  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

    Authors: Ziniu Hu, Ahmet Iscen, Aashi Jain, Thomas Kipf, Yisong Yue, David A. Ross, Cordelia Schmid, Alireza Fathi

    Abstract: This paper introduces SceneCraft, a Large Language Model (LLM) Agent converting text descriptions into Blender-executable Python scripts which render complex scenes with up to a hundred 3D assets. This process requires complex spatial planning and arrangement. We tackle these challenges through a combination of advanced abstraction, strategic planning, and library learning. SceneCraft first models… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

  8. arXiv:2402.13217  [pdf, other

    cs.CV cs.AI

    VideoPrism: A Foundational Visual Encoder for Video Understanding

    Authors: Long Zhao, Nitesh B. Gundavarapu, Liangzhe Yuan, Hao Zhou, Shen Yan, Jennifer J. Sun, Luke Friedman, Rui Qian, Tobias Weyand, Yue Zhao, Rachel Hornung, Florian Schroff, Ming-Hsuan Yang, David A. Ross, Huisheng Wang, Hartwig Adam, Mikhail Sirotenko, Ting Liu, Boqing Gong

    Abstract: We introduce VideoPrism, a general-purpose video encoder that tackles diverse video understanding tasks with a single frozen model. We pretrain VideoPrism on a heterogeneous corpus containing 36M high-quality video-caption pairs and 582M video clips with noisy parallel text (e.g., ASR transcripts). The pretraining approach improves upon masked autoencoding by global-local distillation of semantic… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  9. arXiv:2402.06497  [pdf, other

    cs.CV

    Iris-SAM: Iris Segmentation Using a Foundation Model

    Authors: Parisa Farmanifard, Arun Ross

    Abstract: Iris segmentation is a critical component of an iris biometric system and it involves extracting the annular iris region from an ocular image. In this work, we develop a pixel-level iris segmentation model from a foundational model, viz., Segment Anything Model (SAM), that has been successfully used for segmenting arbitrary objects. The primary contribution of this work lies in the integration of… ▽ More

    Submitted 25 April, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

  10. arXiv:2401.16587  [pdf, other

    cs.CL cs.AI cs.CY

    A Linguistic Comparison between Human and ChatGPT-Generated Conversations

    Authors: Morgan Sandler, Hyesun Choung, Arun Ross, Prabu David

    Abstract: This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenti… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Proceedings of the 4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), Jeju, Korea, 2024

  11. arXiv:2312.14874  [pdf, other

    cs.DC

    Parallel Prefix Sum with SIMD

    Authors: Wangda Zhang, Yanbin Wang, Kenneth A. Ross

    Abstract: The prefix sum operation is a useful primitive with a broad range of applications. For database systems, it is a building block of many important operators including join, sort and filter queries. In this paper, we study different methods of computing prefix sums with SIMD instructions and multiple threads. For SIMD, we implement and compare horizontal and vertical computations, as well as a theor… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  12. arXiv:2312.14125  [pdf, other

    cs.CV cs.AI

    VideoPoet: A Large Language Model for Zero-Shot Video Generation

    Authors: Dan Kondratyuk, Lijun Yu, Xiuye Gu, José Lezama, Jonathan Huang, Grant Schindler, Rachel Hornung, Vighnesh Birodkar, Jimmy Yan, Ming-Chang Chiu, Krishna Somandepalli, Hassan Akbari, Yair Alon, Yong Cheng, Josh Dillon, Agrim Gupta, Meera Hahn, Anja Hauth, David Hendon, Alonso Martinez, David Minnen, Mikhail Sirotenko, Kihyuk Sohn, Xuan Yang, Hartwig Adam , et al. (6 additional authors not shown)

    Abstract: We present VideoPoet, a language model capable of synthesizing high-quality video, with matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder-only transformer architecture that processes multimodal inputs -- including images, videos, text, and audio. The training protocol follows that of Large Language Models (LLMs), consisting of two stages: pretraining and tas… ▽ More

    Submitted 22 March, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: http://sites.research.google/videopoet/

  13. arXiv:2311.12773  [pdf, other

    cs.CV

    Iris Presentation Attack: Assessing the Impact of Combining Vanadium Dioxide Films with Artificial Eyes

    Authors: Darshika Jauhari, Renu Sharma, Cunjian Chen, Nelson Sepulveda, Arun Ross

    Abstract: Iris recognition systems, operating in the near infrared spectrum (NIR), have demonstrated vulnerability to presentation attacks, where an adversary uses artifacts such as cosmetic contact lenses, artificial eyes or printed iris images in order to circumvent the system. At the same time, a number of effective presentation attack detection (PAD) methods have been developed. These methods have demon… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  14. arXiv:2311.12764  [pdf, other

    cs.CV

    Investigating Weight-Perturbed Deep Neural Networks With Application in Iris Presentation Attack Detection

    Authors: Renu Sharma, Redwan Sony, Arun Ross

    Abstract: Deep neural networks (DNNs) exhibit superior performance in various machine learning tasks, e.g., image classification, speech recognition, biometric recognition, object detection, etc. However, it is essential to analyze their sensitivity to parameter perturbations before deploying them in real-world applications. In this work, we assess the sensitivity of DNNs against perturbations to their weig… ▽ More

    Submitted 22 November, 2023; v1 submitted 21 November, 2023; originally announced November 2023.

  15. arXiv:2311.04323  [pdf, other

    cs.RO

    Incident Angle Study for Designing an Endoscopic Tool for Intraoperative Brain Tumor Detection

    Authors: Kent Y. Yamamoto, Tanner J. Zachem, Weston A. Ross, Patrick J. Codd

    Abstract: In neurosurgical procedures maximizing the resection of tumor tissue while avoiding healthy tissue is of paramount importance and a difficult task due to many factors, such as surrounding eloquent brain. Swiftly identifying tumor tissue for removal could increase surgical outcomes. The TumorID is a laser-induced fluorescence spectroscopy device that utilizes endogenous fluorophores such as NADH an… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted for publication in Hamlyn Symposium on Medical Robotics, 2023

  16. arXiv:2310.05737  [pdf, other

    cs.CV cs.AI cs.MM

    Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation

    Authors: Lijun Yu, José Lezama, Nitesh B. Gundavarapu, Luca Versari, Kihyuk Sohn, David Minnen, Yong Cheng, Vighnesh Birodkar, Agrim Gupta, Xiuye Gu, Alexander G. Hauptmann, Boqing Gong, Ming-Hsuan Yang, Irfan Essa, David A. Ross, Lu Jiang

    Abstract: While Large Language Models (LLMs) are the dominant models for generative tasks in language, they do not perform as well as diffusion models on image and video generation. To effectively use LLMs for visual generation, one crucial component is the visual tokenizer that maps pixel-space inputs to discrete tokens appropriate for LLM learning. In this paper, we introduce MAGVIT-v2, a video tokenizer… ▽ More

    Submitted 29 March, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  17. arXiv:2309.02404  [pdf, other

    cs.SD cs.CV eess.AS

    Voice Morphing: Two Identities in One Voice

    Authors: Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross

    Abstract: In a biometric system, each biometric sample or template is typically associated with a single identity. However, recent research has demonstrated the possibility of generating "morph" biometric samples that can successfully match more than a single identity. Morph attacks are now recognized as a potential security threat to biometric systems. However, most morph attacks have been studied on biome… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted oral paper at BIOSIG 2023

  18. arXiv:2308.02065  [pdf, other

    cs.CV cs.AI cs.LG

    On the Biometric Capacity of Generative Face Models

    Authors: Vishnu Naresh Boddeti, Gautam Sreekumar, Arun Ross

    Abstract: There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: "Given a generative face model, how many unique identities can it generate?" In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and… ▽ More

    Submitted 3 August, 2023; originally announced August 2023.

    Comments: IJCB 2023

  19. arXiv:2307.03789  [pdf, other

    cs.CV

    Synthesizing Forestry Images Conditioned on Plant Phenotype Using a Generative Adversarial Network

    Authors: Debasmita Pal, Arun Ross

    Abstract: Plant phenology and phenotype prediction using remote sensing data is increasingly gaining the attention of the plant science community to improve agricultural productivity. This work aims to generate synthetic forestry images that satisfy certain phenotypic attributes, viz. canopy greenness. We harness a Generative Adversarial Network (GAN) to synthesize biologically plausible and phenotypically… ▽ More

    Submitted 9 February, 2024; v1 submitted 7 July, 2023; originally announced July 2023.

  20. arXiv:2307.02477  [pdf, other

    cs.CL cs.AI

    Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks

    Authors: Zhaofeng Wu, Linlu Qiu, Alexis Ross, Ekin Akyürek, Boyuan Chen, Bailin Wang, Najoung Kim, Jacob Andreas, Yoon Kim

    Abstract: The impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specialized to specific tasks seen during pretraining? To disentangle these effects, we propose an evaluation framework based on "counterfactual" task variants that deviate from the default assumptions unde… ▽ More

    Submitted 28 March, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: NAACL 2024

  21. arXiv:2307.01753  [pdf, other

    astro-ph.CO cs.LG physics.comp-ph physics.data-an

    Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

    Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho , et al. (23 additional authors not shown)

    Abstract: We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter fNL. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range 0.2< z < 1.35. We identify Galactic extinction, survey depth, and astronomical seeing as the prima… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

    Comments: 19 pages, 15 figures, 6 tables (Appendix excluded). Submitted to MNRAS

  22. arXiv:2306.17842  [pdf, other

    cs.CV cs.CL cs.MM

    SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

    Authors: Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang

    Abstract: In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details n… ▽ More

    Submitted 28 October, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 spotlight

  23. arXiv:2306.17206  [pdf, other

    cs.CV

    FarSight: A Physics-Driven Whole-Body Biometric System at Large Distance and Altitude

    Authors: Feng Liu, Ryan Ashbaugh, Nicholas Chimitt, Najmul Hassan, Ali Hassani, Ajay Jaiswal, Minchul Kim, Zhiyuan Mao, Christopher Perry, Zhiyuan Ren, Yiyang Su, Pegah Varghaei, Kai Wang, Xingguang Zhang, Stanley Chan, Arun Ross, Humphrey Shi, Zhangyang Wang, Anil Jain, Xiaoming Liu

    Abstract: Whole-body biometric recognition is an important area of research due to its vast applications in law enforcement, border security, and surveillance. This paper presents the end-to-end design, development and evaluation of FarSight, an innovative software system designed for whole-body (fusion of face, gait and body shape) biometric recognition. FarSight accepts videos from elevated platforms and… ▽ More

    Submitted 6 September, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 11 pages, 7 figures, accepted in WACV 2024

  24. arXiv:2306.12587  [pdf, other

    cs.CL

    ARIES: A Corpus of Scientific Paper Edits Made in Response to Peer Reviews

    Authors: Mike D'Arcy, Alexis Ross, Erin Bransom, Bailey Kuehl, Jonathan Bragg, Tom Hope, Doug Downey

    Abstract: Revising scientific papers based on peer feedback is a challenging task that requires not only deep scientific knowledge and reasoning, but also the ability to recognize the implicit requests in high-level feedback and to choose the best of many possible ways to update the manuscript in response. We introduce this task for large language models and release ARIES, a dataset of review comments and t… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: 11 pages, 2 figures

  25. arXiv:2306.09479  [pdf, other

    cs.CL cs.AI cs.CY

    Inverse Scaling: When Bigger Isn't Better

    Authors: Ian R. McKenzie, Alexander Lyzhov, Michael Pieler, Alicia Parrish, Aaron Mueller, Ameya Prabhu, Euan McLean, Aaron Kirtland, Alexis Ross, Alisa Liu, Andrew Gritsevskiy, Daniel Wurgaft, Derik Kauffman, Gabriel Recchia, Jiacheng Liu, Joe Cavanagh, Max Weiss, Sicong Huang, The Floating Droid, Tom Tseng, Tomasz Korbak, Xudong Shen, Yuhui Zhang, Zhengping Zhou, Najoung Kim , et al. (2 additional authors not shown)

    Abstract: Work on scaling laws has found that large language models (LMs) show predictable improvements to overall loss with increased scale (model size, training data, and compute). Here, we present evidence for the claim that LMs may show inverse scaling, or worse task performance with increased scale, e.g., due to flaws in the training objective and data. We present empirical evidence of inverse scaling… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  26. arXiv:2306.08129  [pdf, other

    cs.CV cs.AI cs.CL

    AVIS: Autonomous Visual Information Seeking with Large Language Model Agent

    Authors: Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David A Ross, Cordelia Schmid, Alireza Fathi

    Abstract: In this paper, we propose an autonomous information seeking visual question answering framework, AVIS. Our method leverages a Large Language Model (LLM) to dynamically strategize the utilization of external tools and to investigate their outputs, thereby acquiring the indispensable knowledge needed to provide answers to the posed questions. Responding to visual questions that necessitate external… ▽ More

    Submitted 2 November, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

    Comments: Published on NeurIPS 2023

  27. arXiv:2306.01736  [pdf, other

    cs.CV cs.AI cs.LG

    DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

    Authors: Xiuye Gu, Yin Cui, Jonathan Huang, Abdullah Rashwan, Xuan Yang, Xingyi Zhou, Golnaz Ghiasi, Weicheng Kuo, Huizhong Chen, Liang-Chieh Chen, David A Ross

    Abstract: Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision… ▽ More

    Submitted 2 June, 2023; originally announced June 2023.

  28. arXiv:2305.17075  [pdf, other

    cs.CL

    CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

    Authors: Marcos Treviso, Alexis Ross, Nuno M. Guerreiro, André F. T. Martins

    Abstract: Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models. However, prior work has not explored how these methods can be integrated to combine their complementary advantages. We overcome this limitation by introducing CREST (ContRastive Edits with Sparse raTionalization), a joint framework… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023 (main)

  29. arXiv:2305.12596  [pdf, other

    cs.CV

    iWarpGAN: Disentangling Identity and Style to Generate Synthetic Iris Images

    Authors: Shivangi Yadav, Arun Ross

    Abstract: Generative Adversarial Networks (GANs) have shown success in approximating complex distributions for synthetic image generation. However, current GAN-based methods for generating biometric images, such as iris, have certain limitations: (a) the synthetic images often closely resemble images in the training dataset; (b) the generated images lack diversity in terms of the number of unique identities… ▽ More

    Submitted 29 August, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

  30. arXiv:2305.07997  [pdf, other

    eess.AS cs.SD

    Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

    Authors: Morgan Sandler, Arun Ross

    Abstract: The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech. In this paper, we hypothesize that speaker identity is composed of various vocal style factors that may be learned from unlabeled data and re-combined using a neural network to generate a holistic speaker identity representation for affective scenarios. In this regard, we propose the E-… ▽ More

    Submitted 3 August, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

    Comments: Proceedings of the IEEE 2023 International Joint Conference on Biometrics (IJCB)

  31. arXiv:2302.01328  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    IC3: Image Captioning by Committee Consensus

    Authors: David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, John Canny

    Abstract: If you ask a human to describe an image, they might do so in a thousand different ways. Traditionally, image captioning models are trained to generate a single "best" (most like a reference) image caption. Unfortunately, doing so encourages captions that are "informationally impoverished," and focus on only a subset of the possible details, while ignoring other potentially useful information in th… ▽ More

    Submitted 19 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: To Appear at EMNLP 2023

  32. arXiv:2212.13792  [pdf

    cs.CV

    Periocular Biometrics: A Modality for Unconstrained Scenarios

    Authors: Fernando Alonso-Fernandez, Josef Bigun, Julian Fierrez, Naser Damer, Hugo Proença, Arun Ross

    Abstract: Periocular refers to the externally visible region of the face that surrounds the eye socket. This feature-rich area can provide accurate identification in unconstrained or uncooperative scenarios, where the iris or face modalities may not offer sufficient biometric cues due to factors such as partial occlusion or high subject-to-camera distance. The COVID-19 pandemic has further highlighted its i… ▽ More

    Submitted 20 July, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: Published at IEEE Computer journal

  33. arXiv:2212.10596  [pdf, other

    cs.CV

    Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features

    Authors: Vivek Rathod, Bryan Seybold, Sudheendra Vijayanarasimhan, Austin Myers, Xiuye Gu, Vighnesh Birodkar, David A. Ross

    Abstract: Detecting actions in untrimmed videos should not be limited to a small, closed set of classes. We present a simple, yet effective strategy for open-vocabulary temporal action detection utilizing pretrained image-text co-embeddings. Despite being trained on static images rather than videos, we show that image-text co-embeddings enable openvocabulary performance competitive with fully-supervised mod… ▽ More

    Submitted 10 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  34. arXiv:2212.05221  [pdf, other

    cs.CV cs.AI

    REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory

    Authors: Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A. Ross, Alireza Fathi

    Abstract: In this paper, we propose an end-to-end Retrieval-Augmented Visual Language Model (REVEAL) that learns to encode world knowledge into a large-scale memory, and to retrieve from it to answer knowledge-intensive queries. REVEAL consists of four key components: the memory, the encoder, the retriever and the generator. The large-scale memory encodes various sources of multimodal world knowledge (e.g.… ▽ More

    Submitted 3 April, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

    Comments: Published on CVPR 2023

  35. arXiv:2211.08213  [pdf, other

    eess.AS cs.SD

    Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

    Authors: Morgan Sandler, Arun Ross

    Abstract: In this work, we study the hypothesis that speaker identity embeddings extracted from speech samples may be used for detection and classification of emotion. In particular, we show that emotions can be effectively identified by learning speaker identities by use of a 1-D Triplet Convolutional Neural Network (CNN) & Global Style Token (GST) scheme (e.g., DeepTalk Network) and reusing the trained sp… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  36. arXiv:2211.03659  [pdf

    cs.ET

    Multilayer spintronic neural networks with radio-frequency connections

    Authors: Andrew Ross, Nathan Leroux, Arnaud de Riz, Danijela Marković, Dédalo Sanz-Hernández, Juan Trastoy, Paolo Bortolotti, Damien Querlioz, Leandro Martins, Luana Benetti, Marcel S. Claro, Pedro Anacleto, Alejandro Schulman, Thierry Taris, Jean-Baptiste Begueret, Sylvain Saïghi, Alex S. Jenkins, Ricardo Ferreira, Adrien F. Vincent, Alice Mizrahi, Julie Grollier

    Abstract: Spintronic nano-synapses and nano-neurons perform complex cognitive computations with high accuracy thanks to their rich, reproducible and controllable magnetization dynamics. These dynamical nanodevices could transform artificial intelligence hardware, provided that they implement state-of-the art deep neural networks. However, there is today no scalable way to connect them in multilayers. Here w… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  37. arXiv:2210.13575  [pdf, other

    cs.CL cs.AI

    Does Self-Rationalization Improve Robustness to Spurious Correlations?

    Authors: Alexis Ross, Matthew E. Peters, Ana Marasović

    Abstract: Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  38. arXiv:2210.00113  [pdf

    cs.CL cs.AI cs.IR

    Institutional Foundations of Adaptive Planning: Exploration of Flood Planning in the Lower Rio Grande Valley, Texas, USA

    Authors: Ashley D. Ross, Ali Nejat, Virgie Greb

    Abstract: Adaptive planning is ideally suited for the deep uncertainties presented by climate change. While there is a robust scholarship on the theory and methods of adaptive planning, this has largely neglected how adaptive planning is affected by existing planning institutions and how to move forward within the constraints of traditional planning organizations. This study asks: How do existing traditiona… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  39. arXiv:2209.07518  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Distribution Aware Metrics for Conditional Natural Language Generation

    Authors: David M Chan, Yiming Ni, David A Ross, Sudheendra Vijayanarasimhan, Austin Myers, John Canny

    Abstract: Traditional automated metrics for evaluating conditional natural language generation use pairwise comparisons between a single generated text and the best-matching gold-standard ground truth text. When multiple ground truths are available, scores are aggregated using an average or max operation across references. While this approach works well when diversity in the ground truth data (i.e. dispersi… ▽ More

    Submitted 29 September, 2022; v1 submitted 15 September, 2022; originally announced September 2022.

  40. arXiv:2209.02941  [pdf, other

    cs.CV

    Can GAN-induced Attribute Manipulations Impact Face Recognition?

    Authors: Sudipta Banerjee, Aditi Aggarwal, Arun Ross

    Abstract: Impact due to demographic factors such as age, sex, race, etc., has been studied extensively in automated face recognition systems. However, the impact of \textit{digitally modified} demographic and facial attributes on face recognition is relatively under-explored. In this work, we study the effect of attribute manipulations induced via generative adversarial networks (GANs) on face recognition p… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  41. arXiv:2209.02933  [pdf, other

    cs.CV

    Facial De-morphing: Extracting Component Faces from a Single Morph

    Authors: Sudipta Banerjee, Prateek Jaiswal, Arun Ross

    Abstract: A face morph is created by strategically combining two or more face images corresponding to multiple identities. The intention is for the morphed image to match with multiple identities. Current morph attack detection strategies can detect morphs but cannot recover the images or identities used in creating them. The task of deducing the individual face images from a morphed face image is known as… ▽ More

    Submitted 7 September, 2022; originally announced September 2022.

  42. arXiv:2208.09500  [pdf, other

    cs.CV

    Causality-Inspired Taxonomy for Explainable Artificial Intelligence

    Authors: Pedro C. Neto, Tiago Gonçalves, João Ribeiro Pinto, Wilson Silva, Ana F. Sequeira, Arun Ross, Jaime S. Cardoso

    Abstract: As two sides of the same coin, causality and explainable artificial intelligence (xAI) were initially proposed and developed with different goals. However, the latter can only be complete when seen through the lens of the causality framework. As such, we propose a novel causality-inspired framework for xAI that creates an environment for the development of xAI approaches. To show its applicability… ▽ More

    Submitted 4 March, 2024; v1 submitted 19 August, 2022; originally announced August 2022.

  43. arXiv:2208.07241  [pdf, other

    cs.CV cs.CR

    HEFT: Homomorphically Encrypted Fusion of Biometric Templates

    Authors: Luke Sperling, Nalini Ratha, Arun Ross, Vishnu Naresh Boddeti

    Abstract: This paper proposes a non-interactive end-to-end solution for secure fusion and matching of biometric templates using fully homomorphic encryption (FHE). Given a pair of encrypted feature vectors, we perform the following ciphertext operations, i) feature concatenation, ii) fusion and dimensionality reduction through a learned linear projection, iii) scale normalization to unit $\ell_2$-norm, and… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: IJCB 2022

  44. arXiv:2205.06253  [pdf, other

    cs.CV cs.CL

    What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics

    Authors: David M. Chan, Austin Myers, Sudheendra Vijayanarasimhan, David A. Ross, Bryan Seybold, John F. Canny

    Abstract: While there have been significant gains in the field of automated video description, the generalization performance of automated description models to novel domains remains a major barrier to using these systems in the real world. Most visual description methods are known to capture and exploit patterns in the training data leading to evaluation metric increases, but what are those patterns? In th… ▽ More

    Submitted 12 January, 2023; v1 submitted 12 May, 2022; originally announced May 2022.

    Comments: The 1st Workshop on Vision Datasets Understanding, IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2022

  45. arXiv:2203.15203  [pdf, other

    cs.CV

    Periocular Biometrics and its Relevance to Partially Masked Faces: A Survey

    Authors: Renu Sharma, Arun Ross

    Abstract: The performance of face recognition systems can be negatively impacted in the presence of masks and other types of facial coverings that have become prevalent due to the COVID-19 pandemic. In such cases, the periocular region of the human face becomes an important biometric cue. In this article, we present a detailed review of periocular biometrics. We first examine the various face and periocular… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  46. Trust in AI and Its Role in the Acceptance of AI Technologies

    Authors: Hyesun Choung, Prabu David, Arun Ross

    Abstract: As AI-enhanced technologies become common in a variety of domains, there is an increasing need to define and examine the trust that users have in such technologies. Given the progress in the development of AI, a correspondingly sophisticated understanding of trust in the technology is required. This paper addresses this need by explaining the role of trust on the intention to use AI technologies.… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  47. arXiv:2201.04435  [pdf, other

    cs.CV

    Beyond the Visible: A Survey on Cross-spectral Face Recognition

    Authors: David Anghelone, Cunjian Chen, Arun Ross, Antitza Dantcheva

    Abstract: Cross-spectral face recognition (CFR) refers to recognizing individuals using face images stemming from different spectral bands, such as infrared vs. visible. While CFR is inherently more challenging than classical face recognition due to significant variation in facial appearance caused by the modality gap, it is useful in many scenarios including night-vision biometrics and detecting presentati… ▽ More

    Submitted 5 May, 2022; v1 submitted 12 January, 2022; originally announced January 2022.

  48. arXiv:2201.03080  [pdf, other

    cs.CV cs.AI cs.CR cs.LG

    The State of Aerial Surveillance: A Survey

    Authors: Kien Nguyen, Clinton Fookes, Sridha Sridharan, Yingli Tian, Feng Liu, Xiaoming Liu, Arun Ross

    Abstract: The rapid emergence of airborne platforms and imaging sensors are enabling new forms of aerial surveillance due to their unprecedented advantages in scale, mobility, deployment and covert observation capabilities. This paper provides a comprehensive overview of human-centric aerial surveillance tasks from a computer vision and pattern recognition perspective. It aims to provide readers with an in-… ▽ More

    Submitted 12 January, 2022; v1 submitted 9 January, 2022; originally announced January 2022.

  49. arXiv:2109.11043  [pdf, other

    cs.LG

    Learning Predictive and Interpretable Timeseries Summaries from ICU Data

    Authors: Nari Johnson, Sonali Parbhoo, Andrew Slavin Ross, Finale Doshi-Velez

    Abstract: Machine learning models that utilize patient data across time (rather than just the most recent measurements) have increased performance for many risk stratification tasks in the intensive care unit. However, many of these models and their learned representations are complex and therefore difficult for clinicians to interpret, creating challenges for validation. Our work proposes a new procedure t… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: 10 pages, 3 figures, AMIA 2021 Annual Symposium

  50. arXiv:2107.07150  [pdf, other

    cs.CL

    Tailor: Generating and Perturbing Text with Semantic Controls

    Authors: Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

    Abstract: Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived fr… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 July, 2021; originally announced July 2021.