Visual Autoregressive Modeling

screen shot for Visual Autoregressive Modeling

An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

VAR (Visual Autoregressive Modeling) is a cutting-edge image generation model that pioneers a novel "next-scale prediction" approach, moving away from traditional pixel-by-pixel generation. This method allows for scalable image generation from coarse to fine details and has achieved state-of-the-art results, recognized with a Best Paper award at NeurIPS 2024. VAR's codebase is designed to be ultra-simple and user-friendly, while offering powerful capabilities for researchers and developers to easily experiment with visual autoregressive modeling, generating images interactively through provided demo tools.

Specifically, VAR offers a unique approach to training autoregressive models on images, using a coarse-to-fine next-scale prediction technique, diverging from traditional raster-scan based prediction of pixels. This is combined with a streamlined codebase with a focus on ease of use, making VAR accessible to a wide range of users looking to explore the frontier of image generation. Pre-trained models are available for download, and the provided code allows for both experimentation and training on custom image datasets. The project's open-source nature, under the MIT license, encourages collaboration and advancement in the field of visual generation.

https://github.com/FoundationVision/VAR

Similar