

At the bottom of the 3D viewport on the left, there are some menus, click View -> Background Image. Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors. If you havent already done so, open blender and select one of the orthogonal view angles by pressing NUM7 (top), NUM3 (side), or NUM1. The resulting 3D model of the given text can be viewed from any angle, relit by arbitrary illumination, or composited into any 3D environment. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss. Some of the available options listed in the above link might be used to get a 3d model that you could then import into blender. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D assets and efficient architectures for denoising 3D data, neither of which currently exist. Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
