Simple fix helps AI create realistic 3D models faster and better

The new technique enables the generation of sharper, more lifelike 3D shapes—like these robotic bees—without the need to retrain or finetune a generative AI model. Credit: Massachusetts Institute of Technology.

Creating lifelike 3D models is essential for virtual reality, movies, and engineering design, but it’s often a slow and complex process.

Generative AI tools like DALL-E have made it easier to create 2D images from text prompts, but they struggle to produce realistic 3D shapes.

A team of researchers at MIT has found a simple fix that could make 3D modeling faster, more efficient, and more accessible.

The challenge of 3D shape generation

Diffusion models like DALL-E are trained to create detailed 2D images by adding and removing noise in a controlled process.

These models excel at producing high-quality images but fall short when it comes to generating realistic 3D shapes. That’s because there isn’t enough 3D data available to train them effectively.

To overcome this, a technique called Score Distillation Sampling (SDS) was developed in 2022. SDS uses pretrained 2D diffusion models to combine multiple 2D views into a 3D shape.

It works by starting with a rough 3D representation, rendering 2D images of it from different angles, and refining the shape to match these images. This process is repeated until a complete 3D model is created. However, the results often look blurry or cartoonish.

“This has been a bottleneck for a while,” says Artem Lukoianov, a graduate student in electrical engineering and computer science at MIT. “We knew the models could do better, but no one knew why 3D shapes turned out this way.”

The simple fix

The researchers discovered that the problem lay in a mismatch between how SDS and 2D diffusion models handle noise.

In 2D models, a complex formula tells the model how to adjust noise to create clear, sharp images. However, SDS simplifies this formula by using random noise, which leads to poor-quality 3D results.

Instead of solving the formula directly—which would be too complicated—the MIT team tested various approximation methods.

They found a way to infer the missing noise term by analyzing the current 3D shape rendering. This fix enabled SDS to create sharper, more realistic 3D models without the need for costly retraining or extra processing.

Additionally, the researchers boosted quality by increasing the resolution of the rendered images and fine-tuning some model parameters.

Big results with less effort

Thanks to these improvements, the team was able to use existing diffusion models to create 3D shapes that are as sharp and realistic as those produced by other methods, but with far less effort. This breakthrough could make AI-powered 3D modeling more practical for industries like gaming, virtual reality, and design.

“Now we understand the issue and can find even better ways to solve it,” says Lukoianov. “In the long run, this will help AI become a co-pilot for designers, making it easier to create realistic 3D shapes.”

While their approach still depends on the strengths and weaknesses of the underlying AI model, the researchers see plenty of potential for future improvements. They are now exploring how their insights might also enhance other areas, like AI-powered image editing. With this fix, the road to better, faster 3D modeling just got a lot clearer.

Source: MIT.