Flux Enters the Arena
Black Forest Labs, known for their contributions to Stable Diffusion, have released Flux, a powerful new open-source text-to-image AI model. Boasting 12 billion parameters, Flux generates images comparable to Midjourney, potentially surpassing existing models.
Three versions of Flux are available:
- Flux Dev: Open-source with a non-commercial license, aimed at community development.
- Flux Schnell: A faster, distilled version with a focus on speed, under the Apache 2 license.
- Flux Pro: A closed-source, API-accessible version for professional use.
Benchmark Performance and Accessibility
Benchmark tests position Flux Dev and Pro as leaders in image synthesis, exceeding Midjourney v6.0, Dall-E 3 (HD), and SD3 Ultra in various aspects. However, the open-source models’ 23GB size necessitates a powerful GPU with nearly 24GB VRAM.
To address this, Black Forest Labs partnered with Fal AI to offer cloud-based generation. Users can test the models for free on Replicate.com, with a cost of $1 for 33 Flux Pro generations or 333 Flux Schnell generations after exceeding the daily quota.
Comparative Analysis: Flux vs. the Competition
The article compares Flux’s image generation capabilities with SD3 Medium and Auraflow, using specific prompts to evaluate their performance across different categories.
Illustration: Conveying Horror and Atmosphere
Prompt: “Hand-drawn illustration of a giant spider chasing a woman in the jungle, extremely scary, anguish, dark and creepy scenery, horror, hints of analog photography influence, sketch.”
- Flux: Excels in atmospheric lighting, menacing spider design, and conveying the woman’s anguish, demonstrating accurate anatomy.
- Auraflow: Creates an eerie atmosphere with its color palette but lacks the required darkness and features a less scary spider design.
- SD3 Medium: Strong sketch-like style with a detailed spider, but falls short in capturing the “analog photography” aspect.
Spatial Awareness: Faithful Representation of Elements
Prompt: “A dog standing on top of a TV showing the word ‘Decrypt’ on the screen. On the left there is a woman in a business suit holding a coin, on the right there is a robot standing on top of a first aid box. The overall scenery is surreal.”
- Flux: Accurately positions all elements with a balanced composition and a surreal atmosphere.
- SD3 Medium: Understands the elements but introduces variations, such as a cartoonish style and the dog sitting instead of standing.
- Auraflow: Takes creative liberties, deviating from the prompt in terms of character design, object placement, and overall style.
Realism: Capturing a Bustling Cityscape
Prompt: “A high-resolution photograph of a bustling city street at night, neon signs illuminating the scene, people walking along the sidewalks, cars driving by, a street vendor selling hot dogs, reflections of lights on wet pavement, the overall style is hyper-realistic with attention to detail and lighting, a neon sign says ‘Decrypt.’”
- Flux: Delivers a bustling city street with realistic lighting, reflections, and a clear “Decrypt” sign.
- Auraflow: Creates a vibrant atmosphere but falls short of hyper-realism with cartoonish street vendors, blurry neon signs, and perspective issues.
- SD3 Medium: Captures the main elements but exhibits unrealistic details, such as pedestrians walking on the street.
Head-to-Head: Flux vs. Midjourney
The article further compares Flux with Midjourney using prompts from Midjourney’s “discovery” page, focusing on realism and prompt adherence.
Realism: Posing and Anatomical Accuracy
Prompt: “A black and white photo of a woman with long straight hair, wearing an all-black outfit that accentuates her curves, sitting on the floor in front of a modern sofa. She is posing confidently for the camera, showcasing her slender legs as she crouches down…”
- Midjourney: Creates a dynamic pose with high detail but suffers from anatomical inaccuracies in the woman’s limbs.
- Flux: Delivers a more natural pose with accurate anatomy, contextual background, and detailed rendering.
Prompt Adherence: Cat Musician in a Studio Setting
Prompt: “A white cat playing the piano, wearing sunglasses and a hat, wearing purple Hawaiian style, full body shot against a grey studio background, commercial video screengrab. Credit: Chestnutmuffin.”
- Midjourney: Captures the whimsical nature but deviates from the full-body shot and studio background requirements.
- Flux: Faithfully adheres to the prompt, depicting the cat with all specified details, a full-body shot, and a grey studio background.
Conclusion: Flux’s Rise in the AI Art Landscape
Flux emerges as a strong contender in the AI image generation field, consistently outperforming its open-source counterparts and challenging Midjourney’s dominance. While requiring more specific prompting, Flux rewards users with accurate, realistic, and faithful representations of their vision. Its “Pro” version offers a compelling alternative to paid options, while the open-source versions provide accessible power to a wider audience.