Computer vision has been rapidly transformed by advancements in generative models, particularly in text-to-image generation with models like Imagen 3, Stable Diffusion 3, Flux, and DALLE-3, as well as text-to-video models such as Sora, Stable Video Diffusion, and Meta MovieGen. In the realm of 3D generation, models like Zero-123, Instant 3D, and the Large Reconstruction Model (LRM) have pushed the boundaries of 3D content creation. These innovations have enabled the development of highly realistic and diverse synthetic visual datasets, complete with annotations and rich variations, which are invaluable for training and evaluating algorithms in object detection, segmentation, representation learning, and scene understanding. The second SyntaGen Workshop aims to foster collaboration and knowledge exchange across the field, bringing together experts and practitioners to propel the development of generative models and synthetic visual datasets to new heights. Through talks, paper presentations, poster sessions, and panel discussions, the workshop will catalyze breakthroughs at the intersection of generative models and computer vision applications.

Speakers

Varun Jampani
VP Research
Stability AI
Nathan Carr
Adobe Fellow
Adobe Research
Jia-Bin Huang
Associate Professor
University of Maryland College Park
Sergey Tulyakov
Director of Research
Snap Inc.
Shobhita Sundaram
MIT

Schedule

Time Event Duration Speaker
8:30 Introduction 10 mins
8:35 Oral presentation 25 mins
9:00 Invited talk 1: Inventing Data: An Industry Perspective 25 mins Nathan Carr
9:25 Invited talk 2: Beyond 3D: Generating Volumetric Scenes with Motion 25 mins Sergey Tulyakov
9:50 Break 10 mins
10:00 Invited talk 3: Personalized Representation from Personalized Generation 25 mins Shobhita Sundaram
10:25 Invited talk 4: Learning to Recreate Reality 25 mins Jia-Bin Huang
10:50 Invited talk 5: Diffusion Dialed in: Light and Heavy Adaptation of Diffusion Models for Complex Vision Tasks 25 mins Varun Jampani
11:15 Panel discussion 25 mins TBD
11:40 Poster Session 60 mins

Accepted Papers

[Oral] NeIn: Telling What You Don’t Want. Nhat-Tan Bui, Dinh-Hieu Hoang, Quoc-Huy Trinh, Minh-Triet Tran, Truong Nguyen, Susan Gauch
[Oral] Latent Video Dataset Distillation. Ning Li, Antai Andy Liu, Jingran Zhang, Justin Cui
[Oral] Eye Tell the Truth: GazeVal Highlights Shortcomings of Generative AI in Medical Imaging. David Wong, Bin Wang, Gorkem Durak, Marouane Tliba, Akshay S Chaudhari, Aladine Chetouani, Ahmet Enis Cetin, Cagdas Topel, Nicolo Gennaro, Camila Lopes Vendrami, Tugce Agirlar Trabzonlu, Amir Ali Rahsepar, Laetitia Perronne, Matthew Antalek, Onural Ozturk, Gokcan Okur, Andrew C Gordon, Ayis Pyrros, Frank H Miller, Amir A. Borhani, Hatice Savas, Eric Hart, Drew Torigian, Jayaram K Udupa, Elizabeth Anne Krupinski, Ulas Bagci 
[Poster] Bridging 3D Editing and Geometry-Consistent Paired Dataset Creation for 2D Nighttime-to-Daytime Translation. Xiao Cao, Yuyang Zhao, Robby T. Tan, Zhiyong Huang
[Poster] VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors. Juil Koo, Paul Guerrero, Chun-Hao Paul Huang, Duygu Ceylan, Minhyuk Sung
[Poster] Noise Consistency Regularization for Improved Subject-Driven Image Synthesis. Yao Ni, Song Wen, Piotr Koniusz, Anoop Cherian
[Poster] AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection. Ying Zhao
[Poster] SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation. Yonwoo Choi
[Poster] good4cir: Generating Detailed Synthetic Captions for Composed Image Retrieval. Pranavi Kolouju, Eric Xing, Robert Pless, Nathan Jacobs, Abby Stylianou
[Poster] Syn3DTxt: Embedding 3D Cues for Scene Text Generation. Li-Syun Hsiung, Jun Kai Tu, Kuan-wu Chu, Yu-Hsuan Chiu, Yan-Tsung Peng, Sheng-Luen Chung, Gee-Sern Jison Hsu

Call for Papers

We invite papers to propel the development of generative models and/or the use of their synthetic visual datasets for training and evaluating computer vision models. Accepted papers will be presented in the poster session in our workshop. We welcome submissions along two tracks:

  • Full papers: Up to 8 pages, excluding references, with option for inclusion in the proceedings.
  • Short papers: Up to 4 pages, excluding references, not for the proceedings.

Only full papers will be considered for the Best Paper award. Additionally, we offer a Best Paper and a Best Paper Runner-up award with oral presentations. All accepted papers without inclusion in the proceedings are non-archival.

Topics

The main objective of the SyntaGen workshop is to offer a space for researchers, practitioners, and enthusiasts to investigate, converse, and cooperate on the development, use, and potential uses of synthetic visual datasets made from generative models. The workshop will cover various topics, including but not restricted to:

  • Leveraging pre-trained generative models to generate data and annotations for perception-driven tasks, including image classification, representation learning, object detection, semantic and instance segmentation, relationship detection, action recognition, object tracking, and 3D shape reconstruction and recognition.
  • Extending the generative capacity of large-scale pre-trained text-to-image models to other domains, such as videos, 3D, and 4D spaces.
  • Exploring new research directions in generative models, including GANs, VAEs, diffusion models, and autoregressive models, to advance visual content generation.
  • Synergizing expansive synthetic datasets with minimally annotated real datasets to enhance model performance across scenarios including unsupervised, semi-supervised, weakly-supervised, and zero-shot/few-shot learning.
  • Enhancing data quality and improving synthesis methodologies in the context of pre-trained text-to-image (T2I), text-to-video (T2V), text-to-3D, and text-to-4D models.
  • Evaluating the quality and effectiveness of the generated datasets, particularly on metrics, challenges, and open problems related to benchmarking synthetic visual datasets.
  • Ethical implications of using synthetic annotated data, strategies for mitigating biases, verifying and protecting generated visual contents, and ensuring responsible data generation and annotation practices.

Submisison Instructions

Submissions should be anonymized and formatted using the CVPR 2025 template and uploaded as a single PDF.

Notes for registering a new OpenReview account

  • New profiles created without an institutional email will go through a moderation process that can take up to two weeks.
  • New profiles created with an institutional email will be activated automatically.

Supplemental Material

Supplemental materials optionally can be submitted along the paper manuscript on the submission deadline. They must be anonymized and uploaded either as a single PDF or a ZIP file.

Openreview submission link

Important workshop dates

  • Submission deadline: March 22nd, 11:59 PM PST
  • Review and Decision release: April 3rd, 11:59 PM PST March 30th, 11:59 PM PST
  • Metadata Submission for included in CVPR workshop’s proceedings (new): March 31st, 11:59 PM PST
  • Camera Ready: April 7th, 11:59 PM PST (Included in Proceedings) or April 14th, 11:59 PM PST (Not included in Proceedings) April 14th, 11:59 PM PST for included and non-included in CVPR workshop’s proceedings.
  • Workshop date: Jun 12th

Workshop Sponsors

Adobe

Organizers

Khoi Nguyen
Qualcomm AI Research, Vietnam
Anh Tuan Tran
Qualcomm AI Research, Vietnam
Binh Son Hua
Trinity College Dublin, Ireland
Supasorn Suwajanakorn
VISTEC, Thailand
Yi Zhou
Adobe

Organizers affiliations