Search | arXiv e-print repository

arXiv:2412.20666 [pdf, other]

Recurrence-based Vanishing Point Detection

Authors: Skanda Bharadwaj, Robert Collins, Yanxi Liu

Abstract: Classical approaches to Vanishing Point Detection (VPD) rely solely on the presence of explicit straight lines in images, while recent supervised deep learning approaches need labeled datasets for training. We propose an alternative unsupervised approach: Recurrence-based Vanishing Point Detection (R-VPD) that uses implicit lines discovered from recurring correspondences in addition to explicit li… ▽ More Classical approaches to Vanishing Point Detection (VPD) rely solely on the presence of explicit straight lines in images, while recent supervised deep learning approaches need labeled datasets for training. We propose an alternative unsupervised approach: Recurrence-based Vanishing Point Detection (R-VPD) that uses implicit lines discovered from recurring correspondences in addition to explicit lines. Furthermore, we contribute two Recurring-Pattern-for-Vanishing-Point (RPVP) datasets: 1) a Synthetic Image dataset with 3,200 ground truth vanishing points and camera parameters, and 2) a Real-World Image dataset with 1,400 human annotated vanishing points. We compare our method with two classical methods and two state-of-the-art deep learning-based VPD methods. We demonstrate that our unsupervised approach outperforms all the methods on the synthetic images dataset, outperforms the classical methods, and is on par with the supervised learning approaches on real-world images. △ Less

Submitted 31 December, 2024; v1 submitted 29 December, 2024; originally announced December 2024.

Comments: WACV 2025

arXiv:2412.16177 [pdf, other]

Mining Math Conjectures from LLMs: A Pruning Approach

Authors: Jake Chuharski, Elias Rojas Collins, Mark Meringolo

Abstract: We present a novel approach to generating mathematical conjectures using Large Language Models (LLMs). Focusing on the solubilizer, a relatively recent construct in group theory, we demonstrate how LLMs such as ChatGPT, Gemini, and Claude can be leveraged to generate conjectures. These conjectures are pruned by allowing the LLMs to generate counterexamples. Our results indicate that LLMs are capab… ▽ More We present a novel approach to generating mathematical conjectures using Large Language Models (LLMs). Focusing on the solubilizer, a relatively recent construct in group theory, we demonstrate how LLMs such as ChatGPT, Gemini, and Claude can be leveraged to generate conjectures. These conjectures are pruned by allowing the LLMs to generate counterexamples. Our results indicate that LLMs are capable of producing original conjectures that, while not groundbreaking, are either plausible or falsifiable via counterexamples, though they exhibit limitations in code execution. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: 23 pages, 10 figures, NeurIPS MathAI Workshop 2024

arXiv:2410.00548 [pdf, ps, other]

doi 10.4230/LIPIcs.MFCS.2025.29

The complexity of separability for semilinear sets and Parikh automata

Authors: Elias Rojas Collins, Chris Köcher, Georg Zetzsche

Abstract: In a \emph{separability problem}, we are given two sets $K$ and $L$ from a class $\mathcal{C}$, and we want to decide whether there exists a set $S$ from a class $\mathcal{S}$ such that $K\subseteq S$ and $S\cap L=\emptyset$. In this case, we speak of \emph{separability of sets in $\mathcal{C}$ by sets in $\mathcal{S}$}. We study two types of separability problems. First, we consider separabilit… ▽ More In a \emph{separability problem}, we are given two sets $K$ and $L$ from a class $\mathcal{C}$, and we want to decide whether there exists a set $S$ from a class $\mathcal{S}$ such that $K\subseteq S$ and $S\cap L=\emptyset$. In this case, we speak of \emph{separability of sets in $\mathcal{C}$ by sets in $\mathcal{S}$}. We study two types of separability problems. First, we consider separability of semilinear sets (i.e. subsets of $\mathbb{N}^d$ for some $d$) by sets definable by quantifier-free monadic Presburger formulas (or equivalently, the recognizable subsets of $\mathbb{N}^d$). Here, a formula is monadic if each atom uses at most one variable. Second, we consider separability of languages of Parikh automata by regular languages. A Parikh automaton is a machine with access to counters that can only be incremented, and have to meet a semilinear constraint at the end of the run. Both of these separability problems are known to be decidable with elementary complexity. Our main results are that both problems are coNP-complete. In the case of semilinear sets, coNP-completeness holds regardless of whether the input sets are specified by existential Presburger formulas, quantifier-free formulas, or semilinear representations. Our results imply that recognizable separability of rational subsets of $Σ^*\times\mathbb{N}^d$ (shown decidable by Choffrut and Grigorieff) is coNP-complete as well. Another application is that regularity of deterministic Parikh automata (where the target set is specified using a quantifier-free Presburger formula) is coNP-complete as well. △ Less

Submitted 1 July, 2025; v1 submitted 1 October, 2024; originally announced October 2024.

Comments: accepted for MFCS 2025

arXiv:2406.09419 [pdf]

Sentient House: Designing for Discourse

Authors: Robert Collins

Abstract: The Sentient House project is an investigation into approaches that the artistdesigner can take to better involve the public in developing a critical perspective on pervasive technology in the home and the surrounding environment. Using Interaction Design approaches including workshops, surveys, rapidprototyping and critical thinking, this thesis suggests a framework for developing a more particip… ▽ More The Sentient House project is an investigation into approaches that the artistdesigner can take to better involve the public in developing a critical perspective on pervasive technology in the home and the surrounding environment. Using Interaction Design approaches including workshops, surveys, rapidprototyping and critical thinking, this thesis suggests a framework for developing a more participatory atmosphere for Critical Design. As the world becomes more connected, and smarter, citizens concerns are being sidelined in favour of rapid progress and solutionism. Many of these initiatives are backed by government and commercial concerns who may not have the publics best interest at heart. The designs and approaches generated from this public participation seek to provide an outlet for a more agonistic debate and to develop tools and approaches to engage the public in questioning and addressing how technology affects them in the future. The outcomes of this research suggest that the public is receptive to a more active involvement in designing their digital future, and that the designer can be a critical component in revealing hidden consequences and alternative pathways for a more transparent and desirable future. △ Less

Submitted 14 February, 2024; originally announced June 2024.

Comments: Masters Thesis - 2015

arXiv:2405.12258 [pdf]

Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Authors: Abbi Abdel-Rehim, Hector Zenil, Oghenejokpeme Orhobor, Marie Fisher, Ross J. Collins, Elizabeth Bourne, Gareth W. Fearnley, Emma Tate, Holly X. Smith, Larisa N. Soldatova, Ross D. King

Abstract: Large language models LLMs have transformed AI and achieved breakthrough performance on a wide range of tasks In science the most interesting application of LLMs is for hypothesis formation A feature of LLMs which results from their probabilistic structure is that the output text is not necessarily a valid inference from the training text These are termed hallucinations and are harmful in many app… ▽ More Large language models LLMs have transformed AI and achieved breakthrough performance on a wide range of tasks In science the most interesting application of LLMs is for hypothesis formation A feature of LLMs which results from their probabilistic structure is that the output text is not necessarily a valid inference from the training text These are termed hallucinations and are harmful in many applications In science some hallucinations may be useful novel hypotheses whose validity may be tested by laboratory experiments Here we experimentally test the application of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment We applied the LLM GPT4 to hypothesize novel synergistic pairs of FDA-approved noncancer drugs that target the MCF7 breast cancer cell line relative to the nontumorigenic breast cell line MCF10A In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations out of twelve tested with synergy scores above the positive controls GPT4 then generated new combinations based on its initial results this generated three more combinations with positive synergy scores out of four tested We conclude that LLMs are a valuable source of scientific hypotheses. △ Less

Submitted 8 May, 2025; v1 submitted 20 May, 2024; originally announced May 2024.

Comments: 12 pages, 6 tables, 1 figure. Supplementary information available

arXiv:2303.04244 [pdf, other]

A Light-Weight Contrastive Approach for Aligning Human Pose Sequences

Authors: Robert T. Collins

Abstract: We present a simple unsupervised method for learning an encoder mapping short 3D pose sequences into embedding vectors suitable for sequence-to-sequence alignment by dynamic time warping. Training samples consist of temporal windows of frames containing 3D body points such as mocap markers or skeleton joints. A light-weight, 3-layer encoder is trained using a contrastive loss function that encoura… ▽ More We present a simple unsupervised method for learning an encoder mapping short 3D pose sequences into embedding vectors suitable for sequence-to-sequence alignment by dynamic time warping. Training samples consist of temporal windows of frames containing 3D body points such as mocap markers or skeleton joints. A light-weight, 3-layer encoder is trained using a contrastive loss function that encourages embedding vectors of augmented sample pairs to have cosine similarity 1, and similarity 0 with all other samples in a minibatch. When multiple scripted training sequences are available, temporal alignments inferred from an initial round of training are harvested to extract additional, cross-performance match pairs for a second phase of training to refine the encoder. In addition to being simple, the proposed method is fast to train, making it easy to adapt to new data using different marker sets or skeletal joint layouts. Experimental results illustrate ease of use, transferability, and utility of the learned embeddings for comparing and analyzing human behavior sequences. △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2211.04656 [pdf, other]

MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification

Authors: Daniel Davila, Dawei Du, Bryon Lewis, Christopher Funk, Joseph Van Pelt, Roderick Collins, Kellie Corona, Matt Brown, Scott McCloskey, Anthony Hoogs, Brian Clipp

Abstract: In this paper, we present the Multi-view Extended Videos with Identities (MEVID) dataset for large-scale, video person re-identification (ReID) in the wild. To our knowledge, MEVID represents the most-varied video person ReID dataset, spanning an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specificall… ▽ More In this paper, we present the Multi-view Extended Videos with Identities (MEVID) dataset for large-scale, video person re-identification (ReID) in the wild. To our knowledge, MEVID represents the most-varied video person ReID dataset, spanning an extensive indoor and outdoor environment across nine unique dates in a 73-day window, various camera viewpoints, and entity clothing changes. Specifically, we label the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames, seen in 33 camera views from the very large-scale MEVA person activities dataset. While other datasets have more unique identities, MEVID emphasizes a richer set of information about each individual, such as: 4 outfits/identity vs. 2 outfits/identity in CCVID, 33 viewpoints across 17 locations vs. 6 in 5 simulated locations for MTA, and 10 million frames vs. 3 million for LS-VID. Being based on the MEVA video dataset, we also inherit data that is intentionally demographically balanced to the continental United States. To accelerate the annotation process, we developed a semi-automatic annotation framework and GUI that combines state-of-the-art real-time models for object detection, pose estimation, person ReID, and multi-object tracking. We evaluate several state-of-the-art methods on MEVID challenge problems and comprehensively quantify their robustness in terms of changes of outfit, scale, and background location. Our quantitative analysis on the realistic, unique aspects of MEVID shows that there are significant remaining challenges in video person ReID and indicates important directions for future research. △ Less

Submitted 10 November, 2022; v1 submitted 8 November, 2022; originally announced November 2022.

Comments: This paper was accepted to WACV 2023

arXiv:2210.07991 [pdf, other]

Novel 3D Scene Understanding Applications From Recurrence in a Single Image

Authors: Shimian Zhang, Skanda Bharadwaj, Keaton Kraiger, Yashasvi Asthana, Hong Zhang, Robert Collins, Yanxi Liu

Abstract: We demonstrate the utility of recurring pattern discovery from a single image for spatial understanding of a 3D scene in terms of (1) vanishing point detection, (2) hypothesizing 3D translation symmetry and (3) counting the number of RP instances in the image. Furthermore, we illustrate the feasibility of leveraging RP discovery output to form a more precise, quantitative text description of the… ▽ More We demonstrate the utility of recurring pattern discovery from a single image for spatial understanding of a 3D scene in terms of (1) vanishing point detection, (2) hypothesizing 3D translation symmetry and (3) counting the number of RP instances in the image. Furthermore, we illustrate the feasibility of leveraging RP discovery output to form a more precise, quantitative text description of the scene. Our quantitative evaluations on a new 1K+ Recurring Pattern (RP) benchmark with diverse variations show that visual perception of recurrence from one single view leads to scene understanding outcomes that are as good as or better than existing supervised methods and/or unsupervised methods that use millions of images. △ Less

Submitted 14 October, 2022; originally announced October 2022.

arXiv:2206.11443 [pdf, other]

Image-based Stability Quantification

Authors: Jesse Scott, John Challis, Robert T. Collins, Yanxi Liu

Abstract: Quantitative evaluation of human stability using foot pressure/force measurement hardware and motion capture (mocap) technology is expensive, time consuming, and restricted to the laboratory. We propose a novel image-based method to estimate three key components for stability computation: Center of Mass (CoM), Base of Support (BoS), and Center of Pressure (CoP). Furthermore, we quantitatively vali… ▽ More Quantitative evaluation of human stability using foot pressure/force measurement hardware and motion capture (mocap) technology is expensive, time consuming, and restricted to the laboratory. We propose a novel image-based method to estimate three key components for stability computation: Center of Mass (CoM), Base of Support (BoS), and Center of Pressure (CoP). Furthermore, we quantitatively validate our image-based methods for computing two classic stability measures, CoMtoCoP and CoMtoBoS distances, against values generated directly from laboratory-based sensor output (ground truth) using a publicly available, multi-modality (mocap, foot pressure, two-view videos), ten-subject human motion dataset. Using Leave One Subject Out (LOSO) cross-validation, experimental results show: 1) our image-based CoM estimation method (CoMNet) consistently outperforms state-of-the-art inertial sensor-based CoM estimation techniques; 2) stability computed by our image-based method combined with insole foot pressure sensor data produces consistent, strong, and statistically significant correlation with ground truth stability measures (CoMtoCoP r = 0.79 p < 0.001, CoMtoBoS r = 0.75 p < 0.001); 3) our fully image-based estimation of stability produces consistent, positive, and statistically significant correlation on the two stability metrics (CoMtoCoP r = 0.31 p < 0.001, CoMtoBoS r = 0.22 p < 0.043). Our study provides promising quantitative evidence for the feasibility of image-based stability evaluation in natural environments. △ Less

Submitted 2 November, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

arXiv:2012.00914 [pdf, other]

MEVA: A Large-Scale Multiview, Multimodal Video Dataset for Activity Detection

Authors: Kellie Corona, Katie Osterdahl, Roderic Collins, Anthony Hoogs

Abstract: We present the Multiview Extended Video with Activities (MEVA) dataset, a new and very-large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activ… ▽ More We present the Multiview Extended Video with Activities (MEVA) dataset, a new and very-large-scale dataset for human activity recognition. Existing security datasets either focus on activity counts by aggregating public video disseminated due to its content, which typically excludes same-scene background video, or they achieve persistence by observing public areas and thus cannot control for activity content. Our dataset is over 9300 hours of untrimmed, continuous video, scripted to include diverse, simultaneous activities, along with spontaneous background activity. We have annotated 144 hours for 37 activity types, marking bounding boxes of actors and props. Our collection observed approximately 100 actors performing scripted scenarios and spontaneous background activity over a three-week period at an access-controlled venue, collecting in multiple modalities with overlapping and non-overlapping indoor and outdoor viewpoints. The resulting data includes video from 38 RGB and thermal IR cameras, 42 hours of UAV footage, as well as GPS locations for the actors. 122 hours of annotation are sequestered in support of the NIST Activity in Extended Video (ActEV) challenge; the other 22 hours of annotation and the corresponding video are available on our website, along with an additional 306 hours of ground camera data, 4.6 hours of UAV data, and 9.6 hours of GPS logs. Additional derived data includes camera models geo-registering the outdoor cameras and a dense 3D point cloud model of the outdoor scene. The data was collected with IRB oversight and approval and released under a CC-BY-4.0 license. △ Less

Submitted 1 December, 2020; originally announced December 2020.

Comments: 9 pages, 11 figures, to appear at WACV 2021. Dataset is available at https://mevadata.org

arXiv:2001.00657 [pdf, other]

From Kinematics To Dynamics: Estimating Center of Pressure and Base of Support from Video Frames of Human Motion

Authors: Jesse Scott, Christopher Funk, Bharadwaj Ravichandran, John H. Challis, Robert T. Collins, Yanxi Liu

Abstract: To gain an understanding of the relation between a given human pose image and the corresponding physical foot pressure of the human subject, we propose and validate two end-to-end deep learning architectures, PressNet and PressNet-Simple, to regress foot pressure heatmaps (dynamics) from 2D human pose (kinematics) derived from a video frame. A unique video and foot pressure data set of 813,050 syn… ▽ More To gain an understanding of the relation between a given human pose image and the corresponding physical foot pressure of the human subject, we propose and validate two end-to-end deep learning architectures, PressNet and PressNet-Simple, to regress foot pressure heatmaps (dynamics) from 2D human pose (kinematics) derived from a video frame. A unique video and foot pressure data set of 813,050 synchronized pairs, composed of 5-minute long choreographed Taiji movement sequences of 6 subjects, is collected and used for leaving-one-subject-out cross validation. Our initial experimental results demonstrate reliable and repeatable foot pressure prediction from a single image, setting the first baseline for such a complex cross modality mapping problem in computer vision. Furthermore, we compute and quantitatively validate the Center of Pressure (CoP) and Base of Support (BoS) from predicted foot pressure distribution, obtaining key components in pose stability analysis from images with potential applications in kinesiology, medicine, sports and robotics. △ Less

Submitted 2 January, 2020; originally announced January 2020.

arXiv:1912.04368 [pdf, other]

Learning Non-Markovian Quantum Noise from Moiré-Enhanced Swap Spectroscopy with Deep Evolutionary Algorithm

Authors: Murphy Yuezhen Niu, Vadim Smelyanskyi, Paul Klimov, Sergio Boixo, Rami Barends, Julian Kelly, Yu Chen, Kunal Arya, Brian Burkett, Dave Bacon, Zijun Chen, Ben Chiaro, Roberto Collins, Andrew Dunsworth, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Trent Huang, Evan Jeffrey, David Landhuis, Erik Lucero, Anthony Megrant, Josh Mutus , et al. (8 additional authors not shown)

Abstract: Two-level-system (TLS) defects in amorphous dielectrics are a major source of noise and decoherence in solid-state qubits. Gate-dependent non-Markovian errors caused by TLS-qubit coupling are detrimental to fault-tolerant quantum computation and have not been rigorously treated in the existing literature. In this work, we derive the non-Markovian dynamics between TLS and qubits during a SWAP-like… ▽ More Two-level-system (TLS) defects in amorphous dielectrics are a major source of noise and decoherence in solid-state qubits. Gate-dependent non-Markovian errors caused by TLS-qubit coupling are detrimental to fault-tolerant quantum computation and have not been rigorously treated in the existing literature. In this work, we derive the non-Markovian dynamics between TLS and qubits during a SWAP-like two-qubit gate and the associated average gate fidelity for frequency-tunable Transmon qubits. This gate dependent error model facilitates using qubits as sensors to simultaneously learn practical imperfections in both the qubit's environment and control waveforms. We combine the-state-of-art machine learning algorithm with Moiré-enhanced swap spectroscopy to achieve robust learning using noisy experimental data. Deep neural networks are used to represent the functional map from experimental data to TLS parameters and are trained through an evolutionary algorithm. Our method achieves the highest learning efficiency and robustness against experimental imperfections to-date, representing an important step towards in-situ quantum control optimization over environmental and control defects. △ Less

Submitted 9 December, 2019; originally announced December 2019.

arXiv:1903.06694 [pdf, other]

Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly

Authors: Kirthevasan Kandasamy, Karun Raju Vysyaraju, Willie Neiswanger, Biswajit Paria, Christopher R. Collins, Jeff Schneider, Barnabas Poczos, Eric P. Xing

Abstract: Bayesian Optimisation (BO) refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently search for the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Drago… ▽ More Bayesian Optimisation (BO) refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently search for the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Dragonfly, an open source Python library for scalable and robust BO. Dragonfly incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-fidelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonfly to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable significant improvements in the performance of BO. The Dragonfly library is available at dragonfly.github.io. △ Less

Submitted 19 April, 2020; v1 submitted 15 March, 2019; originally announced March 2019.

Comments: Journal of Machine Learning Research 2020, Special Issue on Bayesian Optimization

arXiv:1811.12607 [pdf, other]

Learning Dynamics from Kinematics: Estimating 2D Foot Pressure Maps from Video Frames

Authors: Christopher Funk, Savinay Nagendra, Jesse Scott, Bharadwaj Ravichandran, John H. Challis, Robert T. Collins, Yanxi Liu

Abstract: Pose stability analysis is the key to understanding locomotion and control of body equilibrium, with applications in numerous fields such as kinesiology, medicine, and robotics. In biomechanics, Center of Pressure (CoP) is used in studies of human postural control and gait. We propose and validate a novel approach to learn CoP from pose of a human body to aid stability analysis. More specifically,… ▽ More Pose stability analysis is the key to understanding locomotion and control of body equilibrium, with applications in numerous fields such as kinesiology, medicine, and robotics. In biomechanics, Center of Pressure (CoP) is used in studies of human postural control and gait. We propose and validate a novel approach to learn CoP from pose of a human body to aid stability analysis. More specifically, we propose an end-to-end deep learning architecture to regress foot pressure heatmaps, and hence the CoP locations, from 2D human pose derived from video. We have collected a set of long (5min +) choreographed Taiji (Tai Chi) sequences of multiple subjects with synchronized foot pressure and video data. The derived human pose data and corresponding foot pressure maps are used jointly in training a convolutional neural network with residual architecture, named PressNET. Cross-subject validation results show promising performance of PressNET, significantly outperforming the baseline method of K-Nearest Neighbors. Furthermore, we demonstrate that our computation of center of pressure (CoP) from PressNET is not only significantly more accurate than those obtained from the baseline approach but also meets the expectations of corresponding lab-based measurements of stability studies in kinesiology. △ Less

Submitted 28 May, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

arXiv:1801.09108 [pdf, other]

Deep Neural Networks In Fully Connected CRF For Image Labeling With Social Network Metadata

Authors: Chengjiang Long, Roddy Collins, Eran Swears, Anthony Hoogs

Abstract: We propose a novel method for predicting image labels by fusing image content descriptors with the social media context of each image. An image uploaded to a social media site such as Flickr often has meaningful, associated information, such as comments and other images the user has uploaded, that is complementary to pixel content and helpful in predicting labels. Prediction challenges such as Ima… ▽ More We propose a novel method for predicting image labels by fusing image content descriptors with the social media context of each image. An image uploaded to a social media site such as Flickr often has meaningful, associated information, such as comments and other images the user has uploaded, that is complementary to pixel content and helpful in predicting labels. Prediction challenges such as ImageNet~\cite{imagenet_cvpr09} and MSCOCO~\cite{LinMBHPRDZ:ECCV14} use only pixels, while other methods make predictions purely from social media context \cite{McAuleyECCV12}. Our method is based on a novel fully connected Conditional Random Field (CRF) framework, where each node is an image, and consists of two deep Convolutional Neural Networks (CNN) and one Recurrent Neural Network (RNN) that model both textual and visual node/image information. The edge weights of the CRF graph represent textual similarity and link-based metadata such as user sets and image groups. We model the CRF as an RNN for both learning and inference, and incorporate the weighted ranking loss and cross entropy loss into the CRF parameter optimization to handle the training data imbalance issue. Our proposed approach is evaluated on the MIR-9K dataset and experimentally outperforms current state-of-the-art approaches. △ Less

Submitted 27 January, 2018; originally announced January 2018.

Showing 1–15 of 15 results for author: Collins, R