Skip to main content

Showing 1–23 of 23 results for author: Tsutsui, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.13516  [pdf, other

    cs.CV cs.CR

    Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

    Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

    Abstract: Deepfake videos are becoming increasingly realistic, showing few tampering traces on facial areasthat vary between frames. Consequently, existing Deepfake detection methods struggle to detect unknown domain Deepfake videos while accurately locating the tampered region. To address thislimitation, we propose Delocate, a novel Deepfake detection model that can both recognize andlocalize unknown domai… ▽ More

    Submitted 5 May, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2308.09921, arXiv:2305.05943

  2. arXiv:2308.09921  [pdf, other

    cs.CV cs.AI

    Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

    Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

    Abstract: The exploitation of Deepfake techniques for malicious intentions has driven significant research interest in Deepfake detection. Deepfake manipulations frequently introduce random tampered traces, leading to unpredictable outcomes in different facial regions. However, existing detection methods heavily rely on specific forgery indicators, and as the forgery mode improves, these traces become incre… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.05943

  3. arXiv:2306.13531  [pdf, other

    cs.CV

    WBCAtt: A White Blood Cell Dataset Annotated with Detailed Morphological Attributes

    Authors: Satoshi Tsutsui, Winnie Pang, Bihan Wen

    Abstract: The examination of blood samples at a microscopic level plays a fundamental role in clinical diagnostics, influencing a wide range of medical conditions. For instance, an in-depth study of White Blood Cells (WBCs), a crucial component of our blood, is essential for diagnosing blood-related diseases such as leukemia and anemia. While multiple datasets containing WBC images have been proposed, they… ▽ More

    Submitted 25 December, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Neural Information Processing Systems 2023

  4. arXiv:2305.05943  [pdf, other

    cs.MM

    Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection

    Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

    Abstract: Deepfake techniques have been widely used for malicious purposes, prompting extensive research interest in developing Deepfake detection methods. Deepfake manipulations typically involve tampering with facial parts, which can result in inconsistencies across different parts of the face. For instance, Deepfake techniques may change smiling lips to an upset lip, while the eyes remain smiling. Existi… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.01740

  5. arXiv:2303.01777  [pdf, other

    eess.IV cs.CV

    Benchmarking White Blood Cell Classification Under Domain Shift

    Authors: Satoshi Tsutsui, Zhengyang Su, Bihan Wen

    Abstract: Recognizing the types of white blood cells (WBCs) in microscopic images of human blood smears is a fundamental task in the fields of pathology and hematology. Although previous studies have made significant contributions to the development of methods and datasets, few papers have investigated benchmarks or baselines that others can easily refer to. For instance, we observed notable variations in t… ▽ More

    Submitted 19 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted to the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023. More datasets are cited

  6. arXiv:2303.01740  [pdf, other

    cs.CV cs.MM

    Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection

    Authors: Juan Hu, Xin Liao, Difei Gao, Satoshi Tsutsui, Qian Wang, Zheng Qin, Mike Zheng Shou

    Abstract: Deepfake techniques have been widely used for malicious purposes, prompting extensive research interest in developing Deepfake detection methods. Deepfake manipulations typically involve tampering with facial parts, which can result in inconsistencies across different parts of the face. For instance, Deepfake techniques may change smiling lips to an upset lip, while the eyes remain smiling. Existi… ▽ More

    Submitted 5 May, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  7. arXiv:2208.09023  [pdf, other

    cs.CV

    Single-Stage Open-world Instance Segmentation with Cross-task Consistency Regularization

    Authors: Xizhe Xue, Dongdong Yu, Lingqiao Liu, Yu Liu, Satoshi Tsutsui, Ying Li, Zehuan Yuan, Ping Song, Mike Zheng Shou

    Abstract: Open-World Instance Segmentation (OWIS) is an emerging research topic that aims to segment class-agnostic object instances from images. The mainstream approaches use a two-stage segmentation framework, which first locates the candidate object bounding boxes and then performs instance segmentation. In this work, we instead promote a single-stage framework for OWIS. We argue that the end-to-end trai… ▽ More

    Submitted 18 October, 2022; v1 submitted 18 August, 2022; originally announced August 2022.

  8. arXiv:2208.07344  [pdf, other

    cs.CV

    Action Recognition based on Cross-Situational Action-object Statistics

    Authors: Satoshi Tsutsui, Xizi Wang, Guangyuan Weng, Yayun Zhang, David Crandall, Chen Yu

    Abstract: Machine learning models of visual action recognition are typically trained and tested on data from specific situations where actions are associated with certain objects. It is an open question how action-object associations in the training set influence a model's ability to generalize beyond trained situations. We set out to identify properties of training data that lead to action recognition mode… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

    Comments: Accepted to International Conference on Development and Learning (ICDL) 2022

  9. arXiv:2205.15595  [pdf, other

    cs.CV

    Novel View Synthesis for High-fidelity Headshot Scenes

    Authors: Satoshi Tsutsui, Weijia Mao, Sijing Lin, Yunyi Zhu, Murong Ma, Mike Zheng Shou

    Abstract: Rendering scenes with a high-quality human face from arbitrary viewpoints is a practical and useful technique for many real-world applications. Recently, Neural Radiance Fields (NeRF), a rendering technique that uses neural networks to approximate classical ray tracing, have been considered as one of the promising approaches for synthesizing novel views from a sparse set of images. We find that Ne… ▽ More

    Submitted 31 May, 2022; originally announced May 2022.

  10. Reinforcing Generated Images via Meta-learning for One-Shot Fine-Grained Visual Recognition

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: One-shot fine-grained visual recognition often suffers from the problem of having few training examples for new fine-grained classes. To alleviate this problem, off-the-shelf image generation techniques based on Generative Adversarial Networks (GANs) can potentially create additional training images. However, these GAN-generated images are often not helpful for actually improving the accuracy of o… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: Accepted to PAMI 2022. arXiv admin note: substantial text overlap with arXiv:1911.07164

  11. arXiv:2111.14448  [pdf, other

    cs.CV cs.MM eess.AS

    AVA-AVD: Audio-Visual Speaker Diarization in the Wild

    Authors: Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou

    Abstract: Audio-visual speaker diarization aims at detecting "who spoke when" using both auditory and visual signals. Existing audio-visual diarization datasets are mainly focused on indoor environments like meeting rooms or news studios, which are quite different from in-the-wild videos in many scenarios such as movies, documentaries, and audience sitcoms. To develop diarization methods for these challengi… ▽ More

    Submitted 16 July, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

    Comments: ACMMM 2022

  12. arXiv:2110.01680  [pdf, other

    cs.CV

    How You Move Your Head Tells What You Do: Self-supervised Video Representation Learning with Egocentric Cameras and IMU Sensors

    Authors: Satoshi Tsutsui, Ruta Desai, Karl Ridgeway

    Abstract: Understanding users' activities from head-mounted cameras is a fundamental task for Augmented and Virtual Reality (AR/VR) applications. A typical approach is to train a classifier in a supervised manner using data labeled by humans. This approach has limitations due to the expensive annotation cost and the closed coverage of activity labels. A potential way to address these limitations is to use s… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Comments: Accepted to 2021 ICCV Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  13. arXiv:2106.06694  [pdf, other

    cs.CV

    Reverse-engineer the Distributional Structure of Infant Egocentric Views for Training Generalizable Image Classifiers

    Authors: Satoshi Tsutsui, David Crandall, Chen Yu

    Abstract: We analyze egocentric views of attended objects from infants. This paper shows 1) empirical evidence that children's egocentric views have more diverse distributions compared to adults' views, 2) we can computationally simulate the infants' distribution, and 3) the distribution is beneficial for training more generalized image classifiers not only for infant egocentric vision but for third-person… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: Accepted to 2021 CVPR Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  14. arXiv:2011.08900  [pdf, other

    cs.CV

    Whose hand is this? Person Identification from Egocentric Hand Gestures

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: Recognizing people by faces and other biometrics has been extensively studied in computer vision. But these techniques do not work for identifying the wearer of an egocentric (first-person) camera because that person rarely (if ever) appears in their own first-person view. But while one's own face is not frequently visible, their hands are: in fact, hands are among the most common objects in one's… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV) 2021 (First round acceptance)

  15. arXiv:2006.02802  [pdf, other

    cs.CV

    A Computational Model of Early Word Learning from the Infant's Point of View

    Authors: Satoshi Tsutsui, Arjun Chandrasekaran, Md Alimoor Reza, David Crandall, Chen Yu

    Abstract: Human infants have the remarkable ability to learn the associations between object names and visual objects from inherently ambiguous experiences. Researchers in cognitive science and developmental psychology have built formal models that implement in-principle learning algorithms, and then used pre-selected and pre-cleaned datasets to test the abilities of the models to find statistical regularit… ▽ More

    Submitted 4 June, 2020; originally announced June 2020.

    Comments: Accepted by Annual Conference of the Cognitive Science Society (CogSci) 2020. (Oral Acceptance Rate = 177/811 = 22%)

  16. arXiv:1911.07164  [pdf, other

    cs.CV

    Meta-Reinforced Synthetic Data for One-Shot Fine-Grained Visual Recognition

    Authors: Satoshi Tsutsui, Yanwei Fu, David Crandall

    Abstract: One-shot fine-grained visual recognition often suffers from the problem of training data scarcity for new fine-grained classes. To alleviate this problem, an off-the-shelf image generator can be applied to synthesize additional training images, but these synthesized images are often not helpful for actually improving the accuracy of one-shot fine-grained recognition. This paper proposes a meta-lea… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

    Comments: Accepted by Conference on Neural Information Processing System 2019

  17. arXiv:1906.01415  [pdf

    cs.CV

    Active Object Manipulation Facilitates Visual Object Learning: An Egocentric Vision Study

    Authors: Satoshi Tsutsui, Dian Zhi, Md Alimoor Reza, David Crandall, Chen Yu

    Abstract: Inspired by the remarkable ability of the infant visual learning system, a recent study collected first-person images from children to analyze the `training data' that they receive. We conduct a follow-up study that investigates two additional directions. First, given that infants can quickly learn to recognize a new object without much supervision (i.e. few-shot learning), we limit the number of… ▽ More

    Submitted 4 June, 2019; originally announced June 2019.

    Comments: Accepted at 2019 CVPR Workshop on Egocentric Perception, Interaction and Computing (EPIC)

  18. arXiv:1809.02269  [pdf, other

    cs.IR

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Authors: Zheng Gao, Gang Fu, Chunping Ouyang, Satoshi Tsutsui, Xiaozhong Liu, Jeremy Yang, Christopher Gessner, Brian Foote, David Wild, Qi Yu, Ying Ding

    Abstract: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domai… ▽ More

    Submitted 27 May, 2019; v1 submitted 6 September, 2018; originally announced September 2018.

    Comments: 10 pages

  19. arXiv:1806.00264  [pdf, other

    cs.CV

    Combining Pyramid Pooling and Attention Mechanism for Pelvic MR Image Semantic Segmentaion

    Authors: Ting-Ting Liang, Satoshi Tsutsui, Liangcai Gao, Jing-Jing Lu, Mengyan Sun

    Abstract: One of the time-consuming routine work for a radiologist is to discern anatomical structures from tomographic images. For assisting radiologists, this paper develops an automatic segmentation method for pelvic magnetic resonance (MR) images. The task has three major challenges 1) A pelvic organ can have various sizes and shapes depending on the axial image, which requires local contexts to segment… ▽ More

    Submitted 28 June, 2018; v1 submitted 1 June, 2018; originally announced June 2018.

    Comments: 12 pages

  20. arXiv:1711.05998  [pdf, other

    cs.CV

    Minimizing Supervision for Free-space Segmentation

    Authors: Satoshi Tsutsui, Tommi Kerola, Shunta Saito, David J. Crandall

    Abstract: Identifying "free-space," or safely driveable regions in the scene ahead, is a fundamental task for autonomous navigation. While this task can be addressed using semantic segmentation, the manual labor involved in creating pixelwise annotations to train the segmentation model is very costly. Although weakly supervised segmentation addresses this issue, most methods are not designed for free-space.… ▽ More

    Submitted 8 December, 2018; v1 submitted 16 November, 2017; originally announced November 2017.

    Comments: Link to source code added; Typo fixed from the version published in CVPR 2018 Workshop on Autonomous Driving (WAD)

  21. arXiv:1708.06118  [pdf, other

    cs.CV

    Distantly Supervised Road Segmentation

    Authors: Satoshi Tsutsui, Tommi Kerola, Shunta Saito

    Abstract: We present an approach for road segmentation that only requires image-level annotations at training time. We leverage distant supervision, which allows us to train our model using images that are different from the target domain. Using large publicly available image databases as distant supervisors, we develop a simple method to automatically generate weak pixel-wise road masks. These are used to… ▽ More

    Submitted 21 August, 2017; originally announced August 2017.

    Comments: Accepted for ICCV workshop CVRSUAD2017

  22. arXiv:1706.06275  [pdf, other

    cs.CV

    Using Artificial Tokens to Control Languages for Multilingual Image Caption Generation

    Authors: Satoshi Tsutsui, David Crandall

    Abstract: Recent work in computer vision has yielded impressive results in automatically describing images with natural language. Most of these systems generate captions in a sin- gle language, requiring multiple language-specific models to build a multilingual captioning system. We propose a very simple technique to build a single unified model across languages, using artificial tokens to control the langu… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.

    Comments: This work appears as an Extended Abstract at the 2017 CVPR Language and Vision Workshop

  23. arXiv:1703.05105  [pdf, other

    cs.CV

    A Data Driven Approach for Compound Figure Separation Using Convolutional Neural Networks

    Authors: Satoshi Tsutsui, David Crandall

    Abstract: A key problem in automatic analysis and understanding of scientific papers is to extract semantic information from non-textual paper components like figures, diagrams, tables, etc. Much of this work requires a very first preprocessing step: decomposing compound multi-part figures into individual subfigures. Previous work in compound figure separation has been based on manually designed features an… ▽ More

    Submitted 21 August, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Accepted to The International Conference on Document Analysis and Recognition (ICDAR) 2017