Skip to main content

Showing 1–50 of 141 results for author: Wetzstein, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21117  [pdf, ps, other

    cs.CV

    CL-Splats: Continual Learning of Gaussian Splatting with Local Optimization

    Authors: Jan Ackermann, Jonas Kulhanek, Shengqu Cai, Haofei Xu, Marc Pollefeys, Gordon Wetzstein, Leonidas Guibas, Songyou Peng

    Abstract: In dynamic 3D environments, accurately updating scene representations over time is crucial for applications in robotics, mixed reality, and embodied AI. As scenes evolve, efficient methods to incorporate changes are needed to maintain up-to-date, high-quality reconstructions without the computational overhead of re-optimizing the entire scene. This paper introduces CL-Splats, which incrementally u… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project Page: https://cl-splats.github.io

  2. arXiv:2506.05284  [pdf, ps, other

    cs.CV

    Video World Models with Long-term Spatial Memory

    Authors: Tong Wu, Shuai Yang, Ryan Po, Yinghao Xu, Ziwei Liu, Dahua Lin, Gordon Wetzstein

    Abstract: Emerging world models autoregressively generate video frames in response to actions, such as camera movements and text prompts, among other control signals. Due to limited temporal context window sizes, these models often struggle to maintain scene consistency during revisits, leading to severe forgetting of previously generated environments. Inspired by the mechanisms of human memory, we introduc… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Project page: https://spmem.github.io/

  3. arXiv:2506.05210  [pdf, ps, other

    cs.CV

    Towards Vision-Language-Garment Models for Web Knowledge Garment Understanding and Generation

    Authors: Jan Ackermann, Kiyohiro Nakayama, Guandao Yang, Tong Wu, Gordon Wetzstein

    Abstract: Multimodal foundation models have demonstrated strong generalization, yet their ability to transfer knowledge to specialized domains such as garment generation remains underexplored. We introduce VLG, a vision-language-garment model that synthesizes garments from textual descriptions and visual imagery. Our experiments assess VLG's zero-shot generalization, investigating its ability to transfer we… ▽ More

    Submitted 30 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

    Comments: Presented at MMFM CVPRW'25, Project Page: https://www.computationalimaging.org/publications/vision-language-garment-models/

  4. arXiv:2506.04490  [pdf, ps, other

    cs.LG q-bio.BM

    Multiscale guidance of AlphaFold3 with heterogeneous cryo-EM data

    Authors: Rishwanth Raghu, Axel Levy, Gordon Wetzstein, Ellen D. Zhong

    Abstract: Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imag… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  5. arXiv:2506.03107  [pdf, ps, other

    cs.CV

    ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions

    Authors: Di Chang, Mingdeng Cao, Yichun Shi, Bo Liu, Shengqu Cai, Shijie Zhou, Weilin Huang, Gordon Wetzstein, Mohammad Soleymani, Peng Wang

    Abstract: Editing images with instructions to reflect non-rigid motions, camera viewpoint shifts, object deformations, human articulations, and complex interactions, poses a challenging yet underexplored problem in computer vision. Existing approaches and datasets predominantly focus on static scenes or rigid transformations, limiting their capacity to handle expressive edits involving dynamic motion. To ad… ▽ More

    Submitted 11 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: Website: https://boese0601.github.io/bytemorph Dataset: https://huggingface.co/datasets/ByteDance-Seed/BM-6M Benchmark: https://huggingface.co/datasets/ByteDance-Seed/BM-Bench Code: https://github.com/ByteDance-Seed/BM-code Demo: https://huggingface.co/spaces/Boese0601/ByteMorph-Demo

  6. arXiv:2505.20171  [pdf, ps, other

    cs.CV

    Long-Context State-Space Video World Models

    Authors: Ryan Po, Yotam Nitzan, Richard Zhang, Berlin Chen, Tri Dao, Eli Shechtman, Gordon Wetzstein, Xun Huang

    Abstract: Video diffusion models have recently shown promise for world modeling through autoregressive frame prediction conditioned on actions. However, they struggle to maintain long-term memory due to the high computational cost associated with processing extended sequences in attention layers. To overcome this limitation, we propose a novel architecture leveraging state-space models (SSMs) to extend temp… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Project website: https://ryanpo.com/ssm_wm

  7. arXiv:2505.18151  [pdf, ps, other

    cs.GR cs.AI cs.CV

    WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions

    Authors: Zizhang Li, Hong-Xing Yu, Wei Liu, Yin Yang, Charles Herrmann, Gordon Wetzstein, Jiajun Wu

    Abstract: WonderPlay is a novel framework integrating physics simulation with video generation for generating action-conditioned dynamic 3D scenes from a single image. While prior works are restricted to rigid body or simple elastic dynamics, WonderPlay features a hybrid generative simulator to synthesize a wide range of 3D dynamics. The hybrid generative simulator first uses a physics solver to simulate co… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: The first two authors contributed equally. Project website: https://kyleleey.github.io/WonderPlay/

  8. arXiv:2505.17353  [pdf, ps, other

    cs.CV cs.AI cs.LG eess.IV

    Dual Ascent Diffusion for Inverse Problems

    Authors: Minseo Kim, Axel Levy, Gordon Wetzstein

    Abstract: Ill-posed inverse problems are fundamental in many domains, ranging from astrophysics to medical imaging. Emerging diffusion models provide a powerful prior for solving these problems. Existing maximum-a-posteriori (MAP) or posterior sampling approaches, however, rely on different computational approximations, leading to inaccurate or suboptimal samples. To address this issue, we introduce a new a… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 23 pages, 15 figures, 5 tables

  9. arXiv:2505.15800  [pdf, ps, other

    cs.CV

    Interspatial Attention for Efficient 4D Human Video Generation

    Authors: Ruizhi Shao, Yinghao Xu, Yujun Shen, Ceyuan Yang, Yang Zheng, Changan Chen, Yebin Liu, Gordon Wetzstein

    Abstract: Generating photorealistic videos of digital humans in a controllable manner is crucial for a plethora of applications. Existing approaches either build on methods that employ template-based 3D representations or emerging video generation models but suffer from poor quality or limited consistency and identity preservation when generating individual or multiple digital humans. In this paper, we intr… ▽ More

    Submitted 25 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Project page: https://dsaurus.github.io/isa4d/

  10. arXiv:2505.06582  [pdf, ps, other

    cs.GR physics.comp-ph physics.optics

    Gaussian Wave Splatting for Computer-Generated Holography

    Authors: Suyeon Choi, Brian Chao, Jacqueline Yang, Manu Gopakumar, Gordon Wetzstein

    Abstract: State-of-the-art neural rendering methods optimize Gaussian scene representations from a few photographs for novel-view synthesis. Building on these representations, we develop an efficient algorithm, dubbed Gaussian Wave Splatting, to turn these Gaussians into holograms. Unlike existing computer-generated holography (CGH) algorithms, Gaussian Wave Splatting supports accurate occlusions and view-d… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: Project page with more details: https://bchao1.github.io/gaussian-wave-splatting/

  11. arXiv:2505.02018  [pdf, ps, other

    cs.CV

    R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation

    Authors: Meng-Hao Guo, Jiajun Xu, Yi Zhang, Jiaxi Song, Haoyang Peng, Yi-Xuan Deng, Xinzhi Dong, Kiyohiro Nakayama, Zhengyang Geng, Chen Wang, Bolin Ni, Guo-Wei Yang, Yongming Rao, Houwen Peng, Han Hu, Gordon Wetzstein, Shi-min Hu

    Abstract: Reasoning stands as a cornerstone of intelligence, enabling the synthesis of existing knowledge to solve complex problems. Despite remarkable progress, existing reasoning benchmarks often fail to rigorously evaluate the nuanced reasoning capabilities required for complex, real-world problemsolving, particularly in multi-disciplinary and multimodal contexts. In this paper, we introduce a graduate-l… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

    Comments: 18pages

  12. arXiv:2504.13457  [pdf, other

    cs.CV cs.ET eess.IV

    Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina

    Authors: Haley M. So, Gordon Wetzstein

    Abstract: Inspired by the data-efficient spiking mechanism of neurons in the human eye, event cameras were created to achieve high temporal resolution with minimal power and bandwidth requirements by emitting asynchronous, per-pixel intensity changes rather than conventional fixed-frame rate images. Unlike retinal ganglion cells (RGCs) in the human eye, however, which integrate signals from multiple photore… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

  13. arXiv:2504.08727  [pdf, other

    cs.CV cs.AI cs.CY

    Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images

    Authors: Boyang Deng, Songyou Peng, Kyle Genova, Gordon Wetzstein, Noah Snavely, Leonidas Guibas, Thomas Funkhouser

    Abstract: We present a system using Multimodal LLMs (MLLMs) to analyze a large database with tens of millions of images captured at different times, with the aim of discovering patterns in temporal changes. Specifically, we aim to capture frequent co-occurring changes ("trends") across a city over a certain period. Unlike previous visual analyses, our analysis answers open-ended queries (e.g., "what are the… ▽ More

    Submitted 14 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

    Comments: Project page: https://boyangdeng.com/visual-chronicles , second and third listed authors have equal contributions

  14. arXiv:2504.07083  [pdf, other

    cs.CV

    GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography

    Authors: Mengchen Zhang, Tong Wu, Jing Tan, Ziwei Liu, Gordon Wetzstein, Dahua Lin

    Abstract: Camera trajectory design plays a crucial role in video production, serving as a fundamental tool for conveying directorial intent and enhancing visual storytelling. In cinematography, Directors of Photography meticulously craft camera movements to achieve expressive and intentional framing. However, existing methods for camera trajectory generation remain limited: Traditional approaches rely on ge… ▽ More

    Submitted 10 April, 2025; v1 submitted 9 April, 2025; originally announced April 2025.

  15. arXiv:2504.05304  [pdf, other

    cs.LG cs.CV

    Gaussian Mixture Flow Matching Models

    Authors: Hansheng Chen, Kai Zhang, Hao Tan, Zexiang Xu, Fujun Luan, Leonidas Guibas, Gordon Wetzstein, Sai Bi

    Abstract: Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow m… ▽ More

    Submitted 1 May, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: ICML 2025. Code: https://github.com/Lakonik/GMFlow

  16. arXiv:2503.22020  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models

    Authors: Qingqing Zhao, Yao Lu, Moo Jin Kim, Zipeng Fu, Zhuoyang Zhang, Yecheng Wu, Zhaoshuo Li, Qianli Ma, Song Han, Chelsea Finn, Ankur Handa, Ming-Yu Liu, Donglai Xiang, Gordon Wetzstein, Tsung-Yi Lin

    Abstract: Vision-language-action models (VLAs) have shown potential in leveraging pretrained vision-language models and diverse robot demonstrations for learning generalizable sensorimotor control. While this paradigm effectively utilizes large-scale data from both robotic and non-robotic sources, current VLAs primarily focus on direct input--output mappings, lacking the intermediate reasoning steps crucial… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Project website: https://cot-vla.github.io/

    Journal ref: CVPR 2025

  17. arXiv:2503.21745  [pdf, other

    cs.CV

    3DGen-Bench: Comprehensive Benchmark Suite for 3D Generative Models

    Authors: Yuhan Zhang, Mengchen Zhang, Tong Wu, Tengfei Wang, Gordon Wetzstein, Dahua Lin, Ziwei Liu

    Abstract: 3D generation is experiencing rapid advancements, while the development of 3D evaluation has not kept pace. How to keep automatic evaluation equitably aligned with human perception has become a well-recognized challenge. Recent advances in the field of language and image generation have explored human preferences and showcased respectable fitting ability. However, the 3D domain still lacks such a… ▽ More

    Submitted 19 May, 2025; v1 submitted 27 March, 2025; originally announced March 2025.

  18. arXiv:2503.10597  [pdf, other

    cs.GR cs.CV

    GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling

    Authors: Yang Zheng, Menglei Chai, Delio Vicini, Yuxiao Zhou, Yinghao Xu, Leonidas Guibas, Gordon Wetzstein, Thabo Beeler

    Abstract: We present GroomLight, a novel method for relightable hair appearance modeling from multi-view images. Existing hair capture methods struggle to balance photorealistic rendering with relighting capabilities. Analytical material models, while physically grounded, often fail to fully capture appearance details. Conversely, neural rendering approaches excel at view synthesis but generalize poorly to… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Project Page: https://syntec-research.github.io/GroomLight

  19. arXiv:2503.10592  [pdf, other

    cs.CV

    CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models

    Authors: Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li

    Abstract: This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model. Previous camera-conditioned video generative models suffer from diminished video dynamics and limited range of viewpoints when generating videos with large camera movement. We take an approach that progressively expands the generation of dynamic sce… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: Project page: https://hehao13.github.io/Projects-CameraCtrl-II/

  20. arXiv:2502.12138  [pdf, other

    cs.CV

    FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views

    Authors: Shangzhan Zhang, Jianyuan Wang, Yinghao Xu, Nan Xue, Christian Rupprecht, Xiaowei Zhou, Yujun Shen, Gordon Wetzstein

    Abstract: We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images (i.e., as few as 2-8 inputs), which is a challenging yet practical setting in real-world applications. Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto… ▽ More

    Submitted 24 March, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: CVPR 2025. Website: https://zhanghe3z.github.io/FLARE/

  21. arXiv:2502.10377  [pdf, other

    cs.CV cs.GR

    ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences

    Authors: Liyuan Zhu, Shengqu Cai, Shengyu Huang, Gordon Wetzstein, Naji Khosravan, Iro Armeni

    Abstract: We introduce ReStyle3D, a novel framework for scene-level appearance transfer from a single style image to a real-world scene represented by multiple views. The method combines explicit semantic correspondences with multi-view consistency to achieve precise and coherent stylization. Unlike conventional stylization methods that apply a reference style globally, ReStyle3D uses open-vocabulary segmen… ▽ More

    Submitted 25 April, 2025; v1 submitted 14 February, 2025; originally announced February 2025.

    Comments: SIGGRAPH 2025. Project page: https://restyle3d.github.io/

  22. arXiv:2502.09563  [pdf, other

    cs.CV cs.GR

    Self-Calibrating Gaussian Splatting for Large Field of View Reconstruction

    Authors: Youming Deng, Wenqi Xian, Guandao Yang, Leonidas Guibas, Gordon Wetzstein, Steve Marschner, Paul Debevec

    Abstract: In this paper, we present a self-calibrating framework that jointly optimizes camera parameters, lens distortion and 3D Gaussian representations, enabling accurate and efficient scene reconstruction. In particular, our technique enables high-quality scene reconstruction from Large field-of-view (FOV) imagery taken with wide-angle lenses, allowing the scene to be modeled from a smaller number of im… ▽ More

    Submitted 3 April, 2025; v1 submitted 13 February, 2025; originally announced February 2025.

    Comments: Project Page: https://denghilbert.github.io/self-cali/

  23. arXiv:2501.16330  [pdf, other

    cs.CV cs.AI

    RelightVid: Temporal-Consistent Diffusion Model for Video Relighting

    Authors: Ye Fang, Zeyi Sun, Shangzhan Zhang, Tong Wu, Yinghao Xu, Pan Zhang, Jiaqi Wang, Gordon Wetzstein, Dahua Lin

    Abstract: Diffusion models have demonstrated remarkable success in image generation and editing, with recent advancements enabling albedo-preserving image relighting. However, applying these models to video relighting remains challenging due to the lack of paired video relighting datasets and the high demands for output fidelity and temporal consistency, further complicated by the inherent randomness of dif… ▽ More

    Submitted 27 January, 2025; originally announced January 2025.

  24. arXiv:2501.10021  [pdf, other

    cs.CV

    X-Dyna: Expressive Dynamic Human Image Animation

    Authors: Di Chang, Hongyi Xu, You Xie, Yipeng Gao, Zhengfei Kuang, Shengqu Cai, Chenxu Zhang, Guoxian Song, Chao Wang, Yichun Shi, Zeyuan Chen, Shijie Zhou, Linjie Luo, Gordon Wetzstein, Mohammad Soleymani

    Abstract: We introduce X-Dyna, a novel zero-shot, diffusion-based pipeline for animating a single human image using facial expressions and body movements derived from a driving video, that generates realistic, context-aware dynamics for both the subject and the surrounding environment. Building on prior approaches centered on human pose control, X-Dyna addresses key shortcomings causing the loss of dynamic… ▽ More

    Submitted 20 January, 2025; v1 submitted 17 January, 2025; originally announced January 2025.

    Comments: Project page:https://x-dyna.github.io/xdyna.github.io/ Code:https://github.com/bytedance/X-Dyna Model:https://huggingface.co/Boese0601/X-Dyna

  25. arXiv:2501.07917  [pdf

    cs.ET physics.app-ph physics.optics

    Roadmap on Neuromorphic Photonics

    Authors: Daniel Brunner, Bhavin J. Shastri, Mohammed A. Al Qadasi, H. Ballani, Sylvain Barbay, Stefano Biasi, Peter Bienstman, Simon Bilodeau, Wim Bogaerts, Fabian Böhm, G. Brennan, Sonia Buckley, Xinlun Cai, Marcello Calvanese Strinati, B. Canakci, Benoit Charbonnier, Mario Chemnitz, Yitong Chen, Stanley Cheung, Jeff Chiles, Suyeon Choi, Demetrios N. Christodoulides, Lukas Chrostowski, J. Chu, J. H. Clegg , et al. (125 additional authors not shown)

    Abstract: This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field.

    Submitted 16 January, 2025; v1 submitted 14 January, 2025; originally announced January 2025.

  26. arXiv:2412.10523  [pdf, other

    cs.CV

    The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion

    Authors: Changan Chen, Juze Zhang, Shrinidhi K. Lakshmikanth, Yusu Fang, Ruizhi Shao, Gordon Wetzstein, Li Fei-Fei, Ehsan Adeli

    Abstract: Human communication is inherently multimodal, involving a combination of verbal and non-verbal cues such as speech, facial expressions, and body gestures. Modeling these behaviors is essential for understanding human interaction and for creating virtual characters that can communicate naturally in applications like games, films, and virtual reality. However, existing motion generation models are t… ▽ More

    Submitted 13 December, 2024; originally announced December 2024.

    Comments: Project page: languageofmotion.github.io

  27. arXiv:2412.09420  [pdf, other

    cs.LG

    Mixture of neural fields for heterogeneous reconstruction in cryo-EM

    Authors: Axel Levy, Rishwanth Raghu, David Shustin, Adele Rui-Yang Peng, Huan Li, Oliver Biggs Clarke, Gordon Wetzstein, Ellen D. Zhong

    Abstract: Cryo-electron microscopy (cryo-EM) is an experimental technique for protein structure determination that images an ensemble of macromolecules in near-physiological contexts. While recent advances enable the reconstruction of dynamic conformations of a single biomolecular complex, current methods do not adequately model samples with mixed conformational and compositional heterogeneity. In particula… ▽ More

    Submitted 12 December, 2024; originally announced December 2024.

  28. arXiv:2412.07674  [pdf, other

    cs.CV

    FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

    Authors: Tong Wu, Yinghao Xu, Ryan Po, Mengchen Zhang, Guandao Yang, Jiaqi Wang, Ziwei Liu, Dahua Lin, Gordon Wetzstein

    Abstract: Recent advances in text-to-image generation have enabled the creation of high-quality images with diverse applications. However, accurately describing desired visual attributes can be challenging, especially for non-experts in art and photography. An intuitive solution involves adopting favorable attributes from the source images. Current methods attempt to distill identity and style from source i… ▽ More

    Submitted 10 December, 2024; originally announced December 2024.

    Comments: NeurIPS 2024 (Datasets and Benchmarks Track); Project page: https://fiva-dataset.github.io/

  29. arXiv:2412.03937  [pdf, other

    cs.CV

    AIpparel: A Multimodal Foundation Model for Digital Garments

    Authors: Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan, Yang Zheng, Maria Korosteleva, Olga Sorkine-Hornung, Leonidas J. Guibas, Guandao Yang, Gordon Wetzstein

    Abstract: Apparel is essential to human life, offering protection, mirroring cultural identities, and showcasing personal style. Yet, the creation of garments remains a time-consuming process, largely due to the manual work involved in designing them. To simplify this process, we introduce AIpparel, a multimodal foundation model for generating and editing sewing patterns. Our model fine-tunes state-of-the-a… ▽ More

    Submitted 5 April, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: The project website is at https://georgenakayama.github.io/AIpparel/

  30. arXiv:2411.18625  [pdf, ps, other

    cs.CV

    Textured Gaussians for Enhanced 3D Scene Appearance Modeling

    Authors: Brian Chao, Hung-Yu Tseng, Lorenzo Porzi, Chen Gao, Tuotuo Li, Qinbo Li, Ayush Saraf, Jia-Bin Huang, Johannes Kopf, Gordon Wetzstein, Changil Kim

    Abstract: 3D Gaussian Splatting (3DGS) has recently emerged as a state-of-the-art 3D reconstruction and rendering technique due to its high-quality results and fast training and rendering time. However, pixels covered by the same Gaussian are always shaded in the same color up to a Gaussian falloff scaling factor. Furthermore, the finest geometric detail any individual Gaussian can represent is a simple ell… ▽ More

    Submitted 28 May, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

    Comments: Will be presented at CVPR 2025. Project website: https://textured-gaussians.github.io/

  31. arXiv:2411.18616  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Self-Distillation for Zero-Shot Customized Image Generation

    Authors: Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

    Abstract: Text-to-image diffusion models produce impressive results but are frustrating tools for artists who desire fine-grained control. For example, a common use case is to create images of a specific instance in novel contexts, i.e., "identity-preserving generation". This setting, along with many other tasks (e.g., relighting), is a natural fit for image+text-conditional generative models. However, ther… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Project page: https://primecai.github.io/dsd/

  32. arXiv:2411.17249  [pdf, other

    cs.CV cs.AI

    Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors

    Authors: Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan

    Abstract: We present Buffer Anytime, a framework for estimation of depth and normal maps (which we call geometric buffers) from video that eliminates the need for paired video--depth and video--normal training data. Instead of relying on large-scale annotated video datasets, we demonstrate high-quality video buffer estimation by leveraging single-image priors with temporal consistency constraints. Our zero-… ▽ More

    Submitted 26 November, 2024; originally announced November 2024.

  33. arXiv:2411.13525  [pdf, other

    cs.CV

    Geometric Algebra Planes: Convex Implicit Neural Volumes

    Authors: Irmak Sivgin, Sara Fridovich-Keil, Gordon Wetzstein, Mert Pilanci

    Abstract: Volume parameterizations abound in recent literature, from the classic voxel grid to the implicit neural representation and everything in between. While implicit representations have shown impressive capacity and better memory efficiency compared to voxel grids, to date they require training via nonconvex optimization. This nonconvex training process can be slow to converge and sensitive to initia… ▽ More

    Submitted 21 November, 2024; v1 submitted 20 November, 2024; originally announced November 2024.

    Comments: Code is available at https://github.com/sivginirmak/Geometric-Algebra-Planes

  34. arXiv:2410.18974  [pdf, other

    cs.CV cs.AI

    3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation

    Authors: Hansheng Chen, Bokui Shen, Yulin Liu, Ruoxi Shi, Linqi Zhou, Connor Z. Lin, Jiayuan Gu, Hao Su, Gordon Wetzstein, Leonidas Guibas

    Abstract: Multi-view image diffusion models have significantly advanced open-domain 3D object generation. However, most existing models rely on 2D network architectures that lack inherent 3D biases, resulting in compromised geometric consistency. To address this challenge, we introduce 3D-Adapter, a plug-in module designed to infuse 3D geometry awareness into pretrained image diffusion models. Central to ou… ▽ More

    Submitted 19 February, 2025; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Project page: https://lakonik.github.io/3d-adapter/

  35. arXiv:2410.02786  [pdf, other

    cs.CV cs.AI cs.GR

    Robust Symmetry Detection via Riemannian Langevin Dynamics

    Authors: Jihyeon Je, Jiayi Liu, Guandao Yang, Boyang Deng, Shengqu Cai, Gordon Wetzstein, Or Litany, Leonidas Guibas

    Abstract: Symmetries are ubiquitous across all kinds of objects, whether in nature or in man-made creations. While these symmetries may seem intuitive to the human eye, detecting them with a machine is nontrivial due to the vast search space. Classical geometry-based methods work by aggregating "votes" for each symmetry but struggle with noise. In contrast, learning-based methods may be more robust to noise… ▽ More

    Submitted 17 September, 2024; originally announced October 2024.

    Comments: Project page: https://symmetry-langevin.github.io/

  36. arXiv:2409.15394  [pdf, other

    cs.LG cs.AI cs.GR math.NA

    Neural Control Variates with Automatic Integration

    Authors: Zilu Li, Guandao Yang, Qingqing Zhao, Xi Deng, Leonidas Guibas, Bharath Hariharan, Gordon Wetzstein

    Abstract: This paper presents a method to leverage arbitrary neural network architecture for control variates. Control variates are crucial in reducing the variance of Monte Carlo integration, but they hinge on finding a function that both correlates with the integrand and has a known analytical integral. Traditional approaches rely on heuristics to choose this function, which might not be expressive enough… ▽ More

    Submitted 23 September, 2024; originally announced September 2024.

    Journal ref: SIGGRAPH Conference Papers 2024

  37. arXiv:2409.03143  [pdf, other

    cs.GR eess.IV physics.optics

    Large Étendue 3D Holographic Display with Content-adaptive Dynamic Fourier Modulation

    Authors: Brian Chao, Manu Gopakumar, Suyeon Choi, Jonghyun Kim, Liang Shi, Gordon Wetzstein

    Abstract: Emerging holographic display technology offers unique capabilities for next-generation virtual reality systems. Current holographic near-eye displays, however, only support a small étendue, which results in a direct tradeoff between achievable field of view and eyebox size. Étendue expansion has recently been explored, but existing approaches are either fundamentally limited in the image quality t… ▽ More

    Submitted 23 November, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: 12 pages, 7 figures, to be published in SIGGRAPH Asia 2024. Project website: https://bchao1.github.io/holo_dfm/

  38. arXiv:2408.13252  [pdf, other

    cs.CV

    LayerPano3D: Layered 3D Panorama for Hyper-Immersive Scene Generation

    Authors: Shuai Yang, Jing Tan, Mengchen Zhang, Tong Wu, Yixuan Li, Gordon Wetzstein, Ziwei Liu, Dahua Lin

    Abstract: 3D immersive scene generation is a challenging yet critical task in computer vision and graphics. A desired virtual 3D scene should 1) exhibit omnidirectional view consistency, and 2) allow for free exploration in complex scene hierarchies. Existing methods either rely on successive scene expansion via inpainting or employ panorama representation to represent large FOV scene environments. However,… ▽ More

    Submitted 21 February, 2025; v1 submitted 23 August, 2024; originally announced August 2024.

    Comments: Project page: https://ys-imtech.github.io/projects/LayerPano3D/

  39. arXiv:2407.15337  [pdf, other

    cs.CV

    ThermalNeRF: Thermal Radiance Fields

    Authors: Yvette Y. Lin, Xin-Yi Pan, Sara Fridovich-Keil, Gordon Wetzstein

    Abstract: Thermal imaging has a variety of applications, from agricultural monitoring to building inspection to imaging under poor visibility, such as in low light, fog, and rain. However, reconstructing thermal scenes in 3D presents several challenges due to the comparatively lower resolution and limited features present in long-wave infrared (LWIR) images. To overcome these challenges, we propose a unifie… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Presented at ICCP 2024; project page at https://yvette256.github.io/thermalnerf

  40. arXiv:2407.15208  [pdf, other

    cs.RO cs.AI

    Flow as the Cross-Domain Manipulation Interface

    Authors: Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song

    Abstract: We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-world manipulation skills without the need of real-world robot training data. The key idea behind Im2Flow2Act is to use object flow as the manipulation interface, bridging domain gaps between different embodiments (i.e., human and robot) and training environments (i.e., real-world and simulated). Im2Flow2Act… ▽ More

    Submitted 4 October, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: Conference on Robot Learning 2024

  41. arXiv:2407.13759  [pdf, other

    cs.CV cs.GR

    Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion

    Authors: Boyang Deng, Richard Tucker, Zhengqi Li, Leonidas Guibas, Noah Snavely, Gordon Wetzstein

    Abstract: We present a method for generating Streetscapes-long sequences of views through an on-the-fly synthesized city-scale scene. Our generation is conditioned by language input (e.g., city name, weather), as well as an underlying map/layout hosting the desired trajectory. Compared to recent models for video generation or 3D view synthesis, our method can scale to much longer-range camera trajectories,… ▽ More

    Submitted 25 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: *Equal Contributions; Fixed few duplicated references from 1st upload; Project Page: https://boyangdeng.com/streetscapes

  42. arXiv:2407.04191  [pdf, other

    cs.CV cs.AI cs.GR

    GazeFusion: Saliency-Guided Image Generation

    Authors: Yunxiang Zhang, Nan Wu, Connor Z. Lin, Gordon Wetzstein, Qi Sun

    Abstract: Diffusion models offer unprecedented image generation power given just a text prompt. While emerging approaches for controlling diffusion models have enabled users to specify the desired spatial layouts of the generated content, they cannot predict or control where viewers will pay more attention due to the complexity of human vision. Recognizing the significance of attention-controllable image ge… ▽ More

    Submitted 15 February, 2025; v1 submitted 16 March, 2024; originally announced July 2024.

    Comments: ACM Transactions on Applied Perception (ACM Symposium on Applied Perception 2024)

  43. arXiv:2406.19126  [pdf, other

    physics.optics cs.AI

    Super-resolution imaging using super-oscillatory diffractive neural networks

    Authors: Hang Chen, Sheng Gao, Zejia Zhao, Zhengyang Duan, Haiou Zhang, Gordon Wetzstein, Xing Lin

    Abstract: Optical super-oscillation enables far-field super-resolution imaging beyond diffraction limits. However, the existing super-oscillatory lens for the spatial super-resolution imaging system still confronts critical limitations in performance due to the lack of a more advanced design method and the limited design degree of freedom. Here, we propose an optical super-oscillatory diffractive neural net… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 18 pages, 7 figures, 1 table

  44. Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos

    Authors: Colton Stearns, Adam Harley, Mikaela Uy, Florian Dubost, Federico Tombari, Gordon Wetzstein, Leonidas Guibas

    Abstract: Gaussian splatting has become a popular representation for novel-view synthesis, exhibiting clear strengths in efficiency, photometric quality, and compositional edibility. Following its success, many works have extended Gaussians to 4D, showing that dynamic Gaussians maintain these benefits while also tracking scene geometry far better than alternative representations. Yet, these methods assume d… ▽ More

    Submitted 10 September, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  45. arXiv:2406.11819  [pdf, other

    cs.CV

    MegaScenes: Scene-Level View Synthesis at Scale

    Authors: Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely

    Abstract: Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes… ▽ More

    Submitted 21 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted at ECCV 2024. Our project page is at https://megascenes.github.io

  46. arXiv:2406.10454  [pdf, other

    cs.RO cs.AI cs.CV cs.LG eess.SY

    HumanPlus: Humanoid Shadowing and Imitation from Humans

    Authors: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn

    Abstract: One of the key arguments for building robots that have similar form factors to human beings is that we can leverage the massive human data for training. Yet, doing so has remained challenging in practice due to the complexities in humanoid perception and control, lingering physical gaps between humanoids and humans in morphologies and actuation, and lack of a data pipeline for humanoids to learn a… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: project website: https://humanoid-ai.github.io/

  47. arXiv:2406.09413  [pdf, other

    cs.CV cs.GR cs.LG

    Interpreting the Weight Space of Customized Diffusion Models

    Authors: Amil Dravid, Yossi Gandelsman, Kuan-Chieh Wang, Rameen Abdal, Gordon Wetzstein, Alexei A. Efros, Kfir Aberman

    Abstract: We investigate the space of weights spanned by a large collection of customized diffusion models. We populate this space by creating a dataset of over 60,000 models, each of which is a base model fine-tuned to insert a different person's visual identity. We model the underlying manifold of these weights as a subspace, which we term weights2weights. We demonstrate three immediate applications of th… ▽ More

    Submitted 22 November, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/weights2weights

  48. arXiv:2406.04239  [pdf, other

    cs.LG

    Solving Inverse Problems in Protein Space Using Diffusion-Based Priors

    Authors: Axel Levy, Eric R. Chan, Sara Fridovich-Keil, Frédéric Poitevin, Ellen D. Zhong, Gordon Wetzstein

    Abstract: The interaction of a protein with its environment can be understood and controlled via its 3D structure. Experimental methods for protein structure determination, such as X-ray crystallography or cryogenic electron microscopy, shed light on biological processes but introduce challenging inverse problems. Learning-based approaches have emerged as accurate and efficient methods to solve these invers… ▽ More

    Submitted 23 April, 2025; v1 submitted 6 June, 2024; originally announced June 2024.

  49. arXiv:2405.18424  [pdf, other

    cs.CV

    3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting

    Authors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang

    Abstract: Scene image editing is crucial for entertainment, photography, and advertising design. Existing methods solely focus on either 2D individual object or 3D global scene editing. This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity. In this work, we propose 3DitScene, a novel and unified scene editing framework… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  50. arXiv:2405.17531  [pdf, other

    cs.CV

    Evolutive Rendering Models

    Authors: Fangneng Zhan, Hanxue Liang, Yifan Wang, Michael Niemeyer, Michael Oechsle, Adam Kortylewski, Cengiz Oztireli, Gordon Wetzstein, Christian Theobalt

    Abstract: The landscape of computer graphics has undergone significant transformations with the recent advances of differentiable rendering models. These rendering models often rely on heuristic designs that may not fully align with the final rendering objectives. We address this gap by pioneering \textit{evolutive rendering models}, a methodology where rendering models possess the ability to evolve and ada… ▽ More

    Submitted 6 December, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project page: https://fnzhan.com/Evolutive-Rendering-Models/