Skip to main content

Showing 1–50 of 90 results for author: Wetzstein, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.16829  [pdf, other

    cs.CV cs.AI cs.CL

    Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials

    Authors: Ye Fang, Zeyi Sun, Tong Wu, Jiaqi Wang, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Physically realistic materials are pivotal in augmenting the realism of 3D assets across various applications and lighting conditions. However, existing 3D assets and generative models often lack authentic material properties. Manual assignment of materials using graphic software is a tedious and time-consuming task. In this paper, we exploit advancements in Multimodal Large Language Models (MLLMs… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Project Page: https://sunzey.github.io/Make-it-Real/

  2. arXiv:2404.11810  [pdf, other

    cs.GR

    Holographic Parallax Improves 3D Perceptual Realism

    Authors: Dongyeon Kim, Seung-Woo Nam, Suyeon Choi, Jong-Mo Seo, Gordon Wetzstein, Yoonchan Jeong

    Abstract: Holographic near-eye displays are a promising technology to solve long-standing challenges in virtual and augmented reality display systems. Over the last few years, many different computer-generated holography (CGH) algorithms have been proposed that are supervised by different types of target content, such as 2.5D RGB-depth maps, 3D focal stacks, and 4D light fields. It is unclear, however, what… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 33 pages, 34 figures

  3. arXiv:2404.06493  [pdf, other

    cs.CV eess.IV

    Flying with Photons: Rendering Novel Views of Propagating Light

    Authors: Anagh Malik, Noah Juravsky, Ryan Po, Gordon Wetzstein, Kiriakos N. Kutulakos, David B. Lindell

    Abstract: We present an imaging and neural rendering technique that seeks to synthesize videos of light propagating through a scene from novel, moving camera viewpoints. Our approach relies on a new ultrafast imaging setup to capture a first-of-its kind, multi-viewpoint video dataset with picosecond-level temporal resolution. Combined with this dataset, we introduce an efficient neural volume rendering fram… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    Comments: Project page: https://anaghmalik.com/FlyingWithPhotons/

  4. arXiv:2404.04421  [pdf, other

    cs.GR cs.CV

    PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations

    Authors: Yang Zheng, Qingqing Zhao, Guandao Yang, Wang Yifan, Donglai Xiang, Florian Dubost, Dmitry Lagun, Thabo Beeler, Federico Tombari, Leonidas Guibas, Gordon Wetzstein

    Abstract: Modeling and rendering photorealistic avatars is of crucial importance in many applications. Existing methods that build a 3D avatar from visual observations, however, struggle to reconstruct clothed humans. We introduce PhysAvatar, a novel framework that combines inverse rendering with inverse physics to automatically estimate the shape and appearance of a human from multi-view video data along w… ▽ More

    Submitted 9 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Project Page: https://qingqing-zhao.github.io/PhysAvatar

  5. arXiv:2404.02101  [pdf, other

    cs.CV

    CameraCtrl: Enabling Camera Control for Text-to-Video Generation

    Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang

    Abstract: Controllability plays a crucial role in video generation since it allows users to create desired content. However, existing models largely overlooked the precise control of camera pose that serves as a cinematic language to express deeper narrative nuances. To alleviate this issue, we introduce CameraCtrl, enabling accurate camera pose control for text-to-video(T2V) models. After precisely paramet… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Project page: https://hehao13.github.io/projects-CameraCtrl/ Code: https://github.com/hehao13/CameraCtrl

  6. arXiv:2403.17920  [pdf, other

    cs.CV

    TC4D: Trajectory-Conditioned Text-to-4D Generation

    Authors: Sherwin Bahmani, Xian Liu, Yifan Wang, Ivan Skorokhodov, Victor Rong, Ziwei Liu, Xihui Liu, Jeong Joon Park, Sergey Tulyakov, Gordon Wetzstein, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent techniques for text-to-4D generation synthesize dynamic 3D scenes using supervision from pre-trained text-to-video models. However, existing representations for motion, such as deformation models or time-dependent neural representations, are limited in the amount of motion they can generate-they cannot synthesize motion extending far beyond the bounding box used for volume rendering. The la… ▽ More

    Submitted 10 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Project Page: https://sherwinbahmani.github.io/tc4d

  7. arXiv:2403.14621  [pdf, other

    cs.CV

    GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation

    Authors: Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein

    Abstract: We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0.1s. GRM is a feed-forward transformer-based model that efficiently incorporates multi-view information to translate the input pixels into pixel-aligned Gaussians, which are unprojected to create a set of densely distributed 3D Gaussians representing a scene. Together, our transformer… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Project page: https://justimyhxu.github.io/projects/grm/ Code: https://github.com/justimyhxu/GRM

  8. arXiv:2403.12032  [pdf, other

    cs.CV cs.GR

    Generic 3D Diffusion Adapter Using Controlled Multi-View Editing

    Authors: Hansheng Chen, Ruoxi Shi, Yulin Liu, Bokui Shen, Jiayuan Gu, Gordon Wetzstein, Hao Su, Leonidas Guibas

    Abstract: Open-domain 3D object synthesis has been lagging behind image synthesis due to limited data and higher computational complexity. To bridge this gap, recent works have investigated multi-view diffusion but often fall short in either 3D consistency, visual quality, or efficiency. This paper proposes MVEdit, which functions as a 3D counterpart of SDEdit, employing ancestral sampling to jointly denois… ▽ More

    Submitted 19 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: V2 note: Fix missing acknowledgements. Project page: https://lakonik.github.io/mvedit

  9. arXiv:2402.14000  [pdf, other

    cs.CV

    Real-time 3D-aware Portrait Editing from a Single Image

    Authors: Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen

    Abstract: This work presents 3DPE, a practical method that can efficiently edit a face image following given prompts, like reference images or text descriptions, in a 3D-aware manner. To this end, a lightweight module is distilled from a 3D portrait generator and a text-to-image model, which provide prior knowledge of face geometry and superior editing capability, respectively. Such a design brings two comp… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  10. arXiv:2401.17217  [pdf, other

    cs.HC cs.CV

    GazeGPT: Augmenting Human Capabilities using Gaze-contingent Contextual AI for Smart Eyewear

    Authors: Robert Konrad, Nitish Padmanaban, J. Gabriel Buckmaster, Kevin C. Boyle, Gordon Wetzstein

    Abstract: Multimodal large language models (LMMs) excel in world knowledge and problem-solving abilities. Through the use of a world-facing camera and contextual AI, emerging smart accessories aim to provide a seamless interface between humans and LMMs. Yet, these wearable computing systems lack an understanding of the user's attention. We introduce GazeGPT as a new user interaction paradigm for contextual… ▽ More

    Submitted 31 January, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: Project video: https://youtu.be/AuDFHHTK_m8

  11. arXiv:2401.04092  [pdf, other

    cs.CV

    GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation

    Authors: Tong Wu, Guandao Yang, Zhibing Li, Kai Zhang, Ziwei Liu, Leonidas Guibas, Dahua Lin, Gordon Wetzstein

    Abstract: Despite recent advances in text-to-3D generative methods, there is a notable absence of reliable evaluation metrics. Existing metrics usually focus on a single criterion each, such as how well the asset aligned with the input text. These metrics lack the flexibility to generalize to different evaluation criteria and might not align well with human preferences. Conducting user preference studies is… ▽ More

    Submitted 9 January, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

    Comments: Project page: https://gpteval3d.github.io/ ; Code: https://github.com/3DTopia/GPTEval3D

  12. arXiv:2312.14432  [pdf, other

    cs.CV cs.LG q-bio.BM

    Scalable 3D Reconstruction From Single Particle X-Ray Diffraction Images Based on Online Machine Learning

    Authors: Jay Shenoy, Axel Levy, Frédéric Poitevin, Gordon Wetzstein

    Abstract: X-ray free-electron lasers (XFELs) offer unique capabilities for measuring the structure and dynamics of biomolecules, helping us understand the basic building blocks of life. Notably, high-repetition-rate XFELs enable single particle imaging (X-ray SPI) where individual, weakly scattering biomolecules are imaged under near-physiological conditions with the opportunity to access fleeting states th… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: Project page: http://jayshenoy.com/xrai

  13. arXiv:2312.02432  [pdf, other

    cs.CV

    Orthogonal Adaptation for Modular Customization of Diffusion Models

    Authors: Ryan Po, Guandao Yang, Kfir Aberman, Gordon Wetzstein

    Abstract: Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications, enabling the generation of specific concepts across diverse contexts and styles. While existing methods facilitate high-fidelity customization for individual concepts or a limited, pre-defined set of them, they fall short of achieving scalability, where a single model can… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Project page: https://ryanpo.com/ortha/

  14. arXiv:2312.01409  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

    Authors: Shengqu Cai, Duygu Ceylan, Matheus Gadelha, Chun-Hao Paul Huang, Tuanfeng Yang Wang, Gordon Wetzstein

    Abstract: Traditional 3D content creation tools empower users to bring their imagination to life by giving them direct control over a scene's geometry, appearance, motion, and camera path. Creating computer-generated videos, however, is a tedious manual process, which can be automated by emerging text-to-video diffusion models. Despite great promise, video diffusion models are difficult to control, hinderin… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Project page: https://primecai.github.io/generative_rendering/

  15. arXiv:2311.17984  [pdf, other

    cs.CV

    4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling

    Authors: Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, David B. Lindell

    Abstract: Recent breakthroughs in text-to-4D generation rely on pre-trained text-to-image and text-to-video models to generate dynamic 3D scenes. However, current text-to-4D methods face a three-way tradeoff between the quality of scene appearance, 3D structure, and motion. For example, text-to-image models and their 3D-aware variants are trained on internet-scale image datasets and can be used to produce s… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page: https://sherwinbahmani.github.io/4dfy

  16. arXiv:2311.17857  [pdf, other

    cs.CV cs.GR

    Gaussian Shell Maps for Efficient 3D Human Generation

    Authors: Rameen Abdal, Wang Yifan, Zifan Shi, Yinghao Xu, Ryan Po, Zhengfei Kuang, Qifeng Chen, Dit-Yan Yeung, Gordon Wetzstein

    Abstract: Efficient generation of 3D digital humans is important in several industries, including virtual reality, social media, and cinematic production. 3D generative adversarial networks (GANs) have demonstrated state-of-the-art (SOTA) quality and diversity for generated assets. Current 3D GAN architectures, however, typically rely on volume representations, which are slow to render, thereby hampering th… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Project page : https://rameenabdal.github.io/GaussianShellMaps/

  17. arXiv:2311.13177  [pdf, other

    physics.med-ph cs.CV

    Volumetric Reconstruction Resolves Off-Resonance Artifacts in Static and Dynamic PROPELLER MRI

    Authors: Annesha Ghosh, Gordon Wetzstein, Mert Pilanci, Sara Fridovich-Keil

    Abstract: Off-resonance artifacts in magnetic resonance imaging (MRI) are visual distortions that occur when the actual resonant frequencies of spins within the imaging volume differ from the expected frequencies used to encode spatial information. These discrepancies can be caused by a variety of factors, including magnetic field inhomogeneities, chemical shifts, or susceptibility differences within the ti… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Code is available at https://github.com/sarafridov/volumetric-propeller

  18. arXiv:2311.09217  [pdf, other

    cs.CV

    DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model

    Authors: Yinghao Xu, Hao Tan, Fujun Luan, Sai Bi, Peng Wang, Jiahao Li, Zifan Shi, Kalyan Sunkavalli, Gordon Wetzstein, Zexiang Xu, Kai Zhang

    Abstract: We propose \textbf{DMV3D}, a novel 3D generation approach that uses a transformer-based 3D large reconstruction model to denoise multi-view diffusion. Our reconstruction model incorporates a triplane NeRF representation and can denoise noisy multi-view images via NeRF reconstruction and rendering, achieving single-stage 3D generation in $\sim$30s on single A100 GPU. We train \textbf{DMV3D} on larg… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: Project Page: https://justimyhxu.github.io/projects/dmv3d/

  19. arXiv:2310.20249  [pdf, other

    cs.CV cs.GR cs.LG

    Pose-to-Motion: Cross-Domain Motion Retargeting with Pose Prior

    Authors: Qingqing Zhao, Peizhuo Li, Wang Yifan, Olga Sorkine-Hornung, Gordon Wetzstein

    Abstract: Creating believable motions for various characters has long been a goal in computer graphics. Current learning-based motion synthesis methods depend on extensive motion datasets, which are often challenging, if not impossible, to obtain. On the other hand, pose data is more accessible, since static posed characters are easier to create and can even be extracted from images using recent advancement… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: Project page: https://cyanzhao42.github.io/pose2motion

  20. arXiv:2310.07204  [pdf, other

    cs.AI cs.CV cs.GR cs.LG

    State of the Art on Diffusion Models for Visual Computing

    Authors: Ryan Po, Wang Yifan, Vladislav Golyanik, Kfir Aberman, Jonathan T. Barron, Amit H. Bermano, Eric Ryan Chan, Tali Dekel, Aleksander Holynski, Angjoo Kanazawa, C. Karen Liu, Lingjie Liu, Ben Mildenhall, Matthias Nießner, Björn Ommer, Christian Theobalt, Peter Wonka, Gordon Wetzstein

    Abstract: The field of visual computing is rapidly advancing due to the emergence of generative artificial intelligence (AI), which unlocks unprecedented capabilities for the generation, editing, and reconstruction of images, videos, and 3D scenes. In these domains, diffusion models are the generative AI architecture of choice. Within the last year alone, the literature on diffusion-based tools and applicat… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  21. arXiv:2310.03956  [pdf, other

    cs.CV math.OC physics.med-ph

    Gradient Descent Provably Solves Nonlinear Tomographic Reconstruction

    Authors: Sara Fridovich-Keil, Fabrizio Valdivia, Gordon Wetzstein, Benjamin Recht, Mahdi Soltanolkotabi

    Abstract: In computed tomography (CT), the forward model consists of a linear Radon transform followed by an exponential nonlinearity based on the attenuation of light according to the Beer-Lambert Law. Conventional reconstruction often involves inverting this nonlinearity as a preprocessing step and then solving a convex inverse problem. However, this nonlinear measurement preprocessing required to use the… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

  22. arXiv:2309.01811  [pdf, other

    cs.CV

    Instant Continual Learning of Neural Radiance Fields

    Authors: Ryan Po, Zhengyang Dong, Alexander W. Bergman, Gordon Wetzstein

    Abstract: Neural radiance fields (NeRFs) have emerged as an effective method for novel-view synthesis and 3D scene reconstruction. However, conventional training methods require access to all training views during scene optimization. This assumption may be prohibitive in continual learning scenarios, where new data is acquired in a sequential manner and a continuous update of the NeRF is desired, as in auto… ▽ More

    Submitted 5 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

    Comments: For project page please visit https://ryanpo.com/icngp/

  23. arXiv:2307.15055  [pdf, other

    cs.CV

    PointOdyssey: A Large-Scale Synthetic Dataset for Long-Term Point Tracking

    Authors: Yang Zheng, Adam W. Harley, Bokui Shen, Gordon Wetzstein, Leonidas J. Guibas

    Abstract: We introduce PointOdyssey, a large-scale synthetic dataset, and data generation framework, for the training and evaluation of long-term fine-grained tracking algorithms. Our goal is to advance the state-of-the-art by placing emphasis on long videos with naturalistic motion. Toward the goal of naturalism, we animate deformable characters using real-world motion capture data, we build 3D scenes to m… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  24. arXiv:2307.05462  [pdf, other

    cs.CV

    Efficient 3D Articulated Human Generation with Layered Surface Volumes

    Authors: Yinghao Xu, Wang Yifan, Alexander W. Bergman, Menglei Chai, Bolei Zhou, Gordon Wetzstein

    Abstract: Access to high-quality and diverse 3D articulated digital human assets is crucial in various applications, ranging from virtual reality to social platforms. Generative approaches, such as 3D generative adversarial networks (GANs), are rapidly replacing laborious manual content creation tools. However, existing 3D GAN frameworks typically rely on scene representations that leverage either template… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Project page: https://www.computationalimaging.org/publications/lsv/ Demo: https://www.youtube.com/watch?v=vahgMFCM3j4

  25. arXiv:2307.04859  [pdf, other

    cs.CV cs.GR cs.LG

    Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

    Authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein

    Abstract: The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: Project website: http://www.computationalimaging.org/publications/articulated-diffusion/

  26. Single-Shot Implicit Morphable Faces with Consistent Texture Parameterization

    Authors: Connor Z. Lin, Koki Nagano, Jan Kautz, Eric R. Chan, Umar Iqbal, Leonidas Guibas, Gordon Wetzstein, Sameh Khamis

    Abstract: There is a growing demand for the accessible creation of high-quality 3D avatars that are animatable and customizable. Although 3D morphable models provide intuitive control for editing and animation, and robustness for single-view face reconstruction, they cannot easily capture geometric and appearance details. Methods based on neural implicit representations, such as signed distance functions (S… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: SIGGRAPH 2023, Project Page: https://research.nvidia.com/labs/toronto-ai/ssif

  27. arXiv:2305.01122  [pdf, other

    cs.LG cs.CE

    Learning Controllable Adaptive Simulation for Multi-resolution Physics

    Authors: Tailin Wu, Takashi Maruyama, Qingqing Zhao, Gordon Wetzstein, Jure Leskovec

    Abstract: Simulating the time evolution of physical systems is pivotal in many scientific and engineering problems. An open challenge in simulating such systems is their multi-resolution dynamics: a small fraction of the system is extremely dynamic, and requires very fine-grained resolution, while a majority of the system is changing slowly and can be modeled by coarser spatial scales. Typical learning-base… ▽ More

    Submitted 1 May, 2023; originally announced May 2023.

    Comments: ICLR 2023, notable top-25% (spotlight), 19 pages, 9 figures

  28. arXiv:2304.13153  [pdf, other

    cs.CV cs.GR cs.LG

    LumiGAN: Unconditional Generation of Relightable 3D Human Faces

    Authors: Boyang Deng, Yifan Wang, Gordon Wetzstein

    Abstract: Unsupervised learning of 3D human faces from unstructured 2D image data is an active research area. While recent works have achieved an impressive level of photorealism, they commonly lack control of lighting, which prevents the generated assets from being deployed in novel environments. To this end, we introduce LumiGAN, an unconditional Generative Adversarial Network (GAN) for 3D human faces wit… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Project page: https://boyangdeng.com/projects/lumigan

  29. arXiv:2304.05440  [pdf, other

    cs.CV

    PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

    Authors: Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein

    Abstract: Conventional image sensors digitize high-resolution images at fast frame rates, producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices, because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication, emerging sensor-processors… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

  30. arXiv:2304.02602  [pdf, other

    cs.CV cs.AI cs.GR

    Generative Novel View Synthesis with 3D-Aware Diffusion Models

    Authors: Eric R. Chan, Koki Nagano, Matthew A. Chan, Alexander W. Bergman, Jeong Joon Park, Axel Levy, Miika Aittala, Shalini De Mello, Tero Karras, Gordon Wetzstein

    Abstract: We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image. Our model samples from the distribution of possible renderings consistent with the input and, even in the presence of ambiguity, is capable of rendering diverse and plausible novel views. To achieve this, our method makes use of existing 2D diffusion backbones but, crucially, incorp… ▽ More

    Submitted 5 April, 2023; originally announced April 2023.

    Comments: Project page: https://nvlabs.github.io/genvs

  31. arXiv:2303.12218  [pdf, other

    cs.CV

    Compositional 3D Scene Generation using Locally Conditioned Diffusion

    Authors: Ryan Po, Gordon Wetzstein

    Abstract: Designing complex 3D scenes has been a tedious, manual process requiring domain expertise. Emerging text-to-3D generative models show great promise for making this task more intuitive, but existing approaches are limited to object-level generation. We introduce \textbf{locally conditioned diffusion} as an approach to compositional scene diffusion, providing control over semantic parts using text p… ▽ More

    Submitted 22 March, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: For project page, see https://ryanpo.com/comp3d/

  32. arXiv:2303.12074  [pdf, other

    cs.CV

    CC3D: Layout-Conditioned Generation of Compositional 3D Scenes

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea Tagliasacchi

    Abstract: In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D l… ▽ More

    Submitted 8 September, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; Webpage: https://sherwinbahmani.github.io/cc3d/

  33. arXiv:2303.11364  [pdf, other

    cs.CV

    DehazeNeRF: Multiple Image Haze Removal and 3D Shape Reconstruction using Neural Radiance Fields

    Authors: Wei-Ting Chen, Wang Yifan, Sy-Yen Kuo, Gordon Wetzstein

    Abstract: Neural radiance fields (NeRFs) have demonstrated state-of-the-art performance for 3D computer vision tasks, including novel view synthesis and 3D shape reconstruction. However, these methods fail in adverse weather conditions. To address this challenge, we introduce DehazeNeRF as a framework that robustly operates in hazy conditions. DehazeNeRF extends the volume rendering equation by adding physi… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

    Comments: including supplemental material; project page: https://www.computationalimaging.org/publications/dehazenerf

  34. arXiv:2303.08096  [pdf, other

    cs.CV

    MELON: NeRF with Unposed Images in SO(3)

    Authors: Axel Levy, Mark Matthews, Matan Sela, Gordon Wetzstein, Dmitry Lagun

    Abstract: Neural radiance fields enable novel-view synthesis and scene reconstruction with photorealistic quality from a few images, but require known and accurate camera poses. Conventional pose estimation algorithms fail on smooth or self-similar scenes, while methods performing inverse rendering from unposed views require a rough initialization of the camera orientations. The main difficulty of pose esti… ▽ More

    Submitted 19 July, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

  35. arXiv:2303.04291  [pdf, other

    eess.IV cs.CV

    Diffusion in the Dark: A Diffusion Model for Low-Light Text Recognition

    Authors: Cindy M. Nguyen, Eric R. Chan, Alexander W. Bergman, Gordon Wetzstein

    Abstract: Capturing images is a key part of automation for high-level tasks such as scene text recognition. Low-light conditions pose a challenge for high-level perception stacks, which are often optimized on well-lit, artifact-free images. Reconstruction methods for low-light images can produce well-lit counterparts, but typically at the cost of high-frequency details critical for downstream tasks. We prop… ▽ More

    Submitted 30 October, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: WACV 2024. Project website: https://ccnguyen.github.io/diffusion-in-the-dark/

  36. arXiv:2302.01368  [pdf, other

    cs.HC cs.GR eess.IV

    Towards Attention-aware Foveated Rendering

    Authors: Brooke Krajancich, Petr Kellnhofer, Gordon Wetzstein

    Abstract: Foveated graphics is a promising approach to solving the bandwidth challenges of immersive virtual and augmented reality displays by exploiting the falloff in spatial acuity in the periphery of the visual field. However, the perceptual models used in these applications neglect the effects of higher-level cognitive processing, namely the allocation of visual attention, and are thus overestimating s… ▽ More

    Submitted 10 May, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: 10 pages, 6 figures

  37. arXiv:2212.10699  [pdf, other

    cs.CV cs.GR

    PaletteNeRF: Palette-based Appearance Editing of Neural Radiance Fields

    Authors: Zhengfei Kuang, Fujun Luan, Sai Bi, Zhixin Shu, Gordon Wetzstein, Kalyan Sunkavalli

    Abstract: Recent advances in neural radiance fields have enabled the high-fidelity 3D reconstruction of complex scenes for novel view synthesis. However, it remains underexplored how the appearance of such representations can be efficiently edited while maintaining photorealism. In this work, we present PaletteNeRF, a novel method for photorealistic appearance editing of neural radiance fields (NeRF) base… ▽ More

    Submitted 24 January, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

  38. arXiv:2212.08377  [pdf, other

    cs.CV cs.GR

    PointAvatar: Deformable Point-based Head Avatars from Videos

    Authors: Yufeng Zheng, Wang Yifan, Gordon Wetzstein, Michael J. Black, Otmar Hilliges

    Abstract: The ability to create realistic, animatable and relightable head avatars from casual video sequences would open up wide ranging applications in communication and entertainment. Current methods either build on explicit 3D morphable meshes (3DMM) or exploit neural implicit representations. The former are limited by fixed topology, while the latter are non-trivial to deform and inefficient to render.… ▽ More

    Submitted 28 February, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Project page: https://zhengyuf.github.io/PointAvatar/ Code base: https://github.com/zhengyuf/pointavatar

  39. arXiv:2212.04096  [pdf, other

    cs.CV

    ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

    Authors: Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, Achuta Kadambi

    Abstract: This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  40. arXiv:2211.17260  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    SinGRAF: Learning a 3D Generative Radiance Field for a Single Scene

    Authors: Minjung Son, Jeong Joon Park, Leonidas Guibas, Gordon Wetzstein

    Abstract: Generative models have shown great promise in synthesizing photorealistic 3D objects, but they require large amounts of training data. We introduce SinGRAF, a 3D-aware generative model that is trained with a few input images of a single scene. Once trained, SinGRAF generates different realizations of this 3D scene that preserve the appearance of the input while varying scene layout. For this purpo… ▽ More

    Submitted 2 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: CVPR 2023. Project page: https://www.computationalimaging.org/publications/singraf/

  41. arXiv:2211.16677  [pdf, other

    cs.CV cs.AI cs.GR

    3D Neural Field Generation using Triplane Diffusion

    Authors: J. Ryan Shue, Eric Ryan Chan, Ryan Po, Zachary Ankner, Jiajun Wu, Gordon Wetzstein

    Abstract: Diffusion models have emerged as the state-of-the-art for image generation, among other tasks. Here, we present an efficient diffusion-based model for 3D-aware generation of neural fields. Our approach pre-processes training data, such as ShapeNet meshes, by converting them to continuous occupancy fields and factoring them into a set of axis-aligned triplane feature representations. Thus, our 3D t… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

    Comments: Project page: https://jryanshue.com/nfd

  42. arXiv:2211.12131  [pdf, other

    cs.CV

    DiffDreamer: Towards Consistent Unsupervised Single-view Scene Extrapolation with Conditional Diffusion Models

    Authors: Shengqu Cai, Eric Ryan Chan, Songyou Peng, Mohamad Shahbazi, Anton Obukhov, Luc Van Gool, Gordon Wetzstein

    Abstract: Scene extrapolation -- the idea of generating novel views by flying into a given image -- is a promising, yet challenging task. For each predicted frame, a joint inpainting and 3D refinement problem has to be solved, which is ill posed and includes a high level of ambiguity. Moreover, training data for long-range scenes is difficult to obtain and usually lacks sufficient views to infer accurate ca… ▽ More

    Submitted 18 March, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

  43. arXiv:2210.08676  [pdf, other

    cs.CV cs.LG

    Scale-Agnostic Super-Resolution in MRI using Feature-Based Coordinate Networks

    Authors: Dave Van Veen, Rogier van der Sluijs, Batu Ozturkler, Arjun Desai, Christian Bluethgen, Robert D. Boutin, Marc H. Willis, Gordon Wetzstein, David Lindell, Shreyas Vasanawala, John Pauly, Akshay S. Chaudhari

    Abstract: We propose using a coordinate network decoder for the task of super-resolution in MRI. The continuous signal representation of coordinate networks enables this approach to be scale-agnostic, i.e. one can train over a continuous range of scales and subsequently query at arbitrary resolutions. Due to the difficulty of performing super-resolution on inherently noisy data, we analyze network behavior… ▽ More

    Submitted 17 October, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

    Journal ref: Medical Imaging with Deep Learning. 2022

  44. arXiv:2210.07387  [pdf, other

    cs.CV cs.LG q-bio.BM

    Amortized Inference for Heterogeneous Reconstruction in Cryo-EM

    Authors: Axel Levy, Gordon Wetzstein, Julien Martel, Frederic Poitevin, Ellen D. Zhong

    Abstract: Cryo-electron microscopy (cryo-EM) is an imaging modality that provides unique insights into the dynamics of proteins and other building blocks of life. The algorithmic challenge of jointly estimating the poses, 3D structure, and conformational heterogeneity of a biomolecule from millions of noisy and randomly oriented 2D projections in a computationally efficient manner, however, remains unsolved… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2022

  45. arXiv:2209.15121  [pdf, other

    q-bio.BM cs.CV eess.IV physics.chem-ph

    Heterogeneous reconstruction of deformable atomic models in Cryo-EM

    Authors: Youssef Nashed, Ariana Peck, Julien Martel, Axel Levy, Bongjin Koo, Gordon Wetzstein, Nina Miolane, Daniel Ratner, Frédéric Poitevin

    Abstract: Cryogenic electron microscopy (cryo-EM) provides a unique opportunity to study the structural heterogeneity of biomolecules. Being able to explain this heterogeneity with atomic models would help our understanding of their functional mechanisms but the size and ruggedness of the structural space (the space of atomic 3D cartesian coordinates) presents an immense challenge. Here, we describe a heter… ▽ More

    Submitted 29 September, 2022; originally announced September 2022.

    Comments: 8 pages, 1 figure

  46. arXiv:2207.08890  [pdf, other

    cs.CV cs.GR cs.LG

    NeuForm: Adaptive Overfitting for Neural Shape Editing

    Authors: Connor Z. Lin, Niloy J. Mitra, Gordon Wetzstein, Leonidas Guibas, Paul Guerrero

    Abstract: Neural representations are popular for representing shapes, as they can be learned form sensor data and used for data cleanup, model completion, shape editing, and shape synthesis. Current neural representations can be categorized as either overfitting to a single object instance, or representing a collection of objects. However, neither allows accurate editing of neural scene representations: on… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

  47. arXiv:2206.14797  [pdf, other

    cs.CV cs.LG

    3D-Aware Video Generation

    Authors: Sherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Hao Tang, Gordon Wetzstein, Leonidas Guibas, Luc Van Gool, Radu Timofte

    Abstract: Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neu… ▽ More

    Submitted 9 August, 2023; v1 submitted 29 June, 2022; originally announced June 2022.

    Comments: TMLR 2023; Project page: https://sherwinbahmani.github.io/3dvidgen

  48. arXiv:2206.14314  [pdf, other

    cs.CV cs.GR

    Generative Neural Articulated Radiance Fields

    Authors: Alexander W. Bergman, Petr Kellnhofer, Wang Yifan, Eric R. Chan, David B. Lindell, Gordon Wetzstein

    Abstract: Unsupervised learning of 3D-aware generative adversarial networks (GANs) using only collections of single-view 2D photographs has very recently made much progress. These 3D GANs, however, have not been demonstrated for human bodies and the generated radiance fields of existing frameworks are not directly editable, limiting their applicability in downstream tasks. We propose a solution to these cha… ▽ More

    Submitted 9 January, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: Project website: http://www.computationalimaging.org/publications/gnarf/

  49. arXiv:2206.00711  [pdf, other

    cs.LG

    Learning to Solve PDE-constrained Inverse Problems with Graph Networks

    Authors: Qingqing Zhao, David B. Lindell, Gordon Wetzstein

    Abstract: Learned graph neural networks (GNNs) have recently been established as fast and accurate alternatives for principled solvers in simulating the dynamics of physical systems. In many application domains across science and engineering, however, we are not only interested in a forward simulation but also in solving inverse problems with constraints defined by a partial differential equation (PDE). Her… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  50. arXiv:2205.02367  [pdf, other

    cs.GR cs.AI physics.comp-ph physics.optics

    Time-multiplexed Neural Holography: A flexible framework for holographic near-eye displays with fast heavily-quantized spatial light modulators

    Authors: Suyeon Choi, Manu Gopakumar, Yifan, Peng, Jonghyun Kim, Matthew O'Toole, Gordon Wetzstein

    Abstract: Holographic near-eye displays offer unprecedented capabilities for virtual and augmented reality systems, including perceptually important focus cues. Although artificial intelligence--driven algorithms for computer-generated holography (CGH) have recently made much progress in improving the image quality and synthesis efficiency of holograms, these algorithms are not directly applicable to emergi… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Project page with more details: http://www.computationalimaging.org/publications/time-multiplexed-neural-holography/