This is an NF4 quantized model of Qwen-image-edit-2509 so it can run on GPUs using 20GB VRAM. You can run it on lower VRAM like 16GB. There were other NF4 models but they made the mistake of blindly quantizing all layers in the transformer. This one does not. We retain some layers at full precision in order to ensure that we get quality output.
You can use the original Qwen-Image-Edit parameters.
This model is not yet
available for inference at JustLab.ai
Model tested: Working perfectly even with 10 steps. Contact: JustLab.ai for commercial support
Performance on rtx4090
- 20 steps about 78 seconds.
- 10 steps about 40 seconds.
Interestingly I was under the impression that the Qwen-VL could not be quantized which is why several projects use the full 15Gb model. Here I have quantized it too and it seems to be workign fine.
Sample script. (min 20GB VRAM)
import os
from PIL import Image
import torch
from diffusers import QwenImageEditPlusPipeline
model_path = "ovedrive/Qwen-Image-Edit-2509-4bit"
pipeline = QwenImageEditPlusPipeline.from_pretrained(model_path, torch_dtype=torch.bfloat16)
print("pipeline loaded") # not true but whatever. do not move to cuda
pipeline.set_progress_bar_config(disable=None)
pipeline.enable_model_cpu_offload() #if you have enough VRAM replace this line with `pipeline.to("cuda")` which is 20GB VRAM
image = Image.open("./example.png").convert("RGB")
prompt = "Remove the lady head with white hair"
inputs = {
"image": image,
"prompt": prompt,
"generator": torch.manual_seed(0),
"true_cfg_scale": 4.0,
"negative_prompt": " ",
"num_inference_steps": 20, # even 10 steps should be enough in many cases
}
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
output_image.save("output_image_edit.png")
print("image saved at", os.path.abspath("output_image_edit.png"))
The original Qwen-Image-Edit-2509 attributions are included verbatim below.
💜 Qwen Chat | 🤗 Hugging Face | 🤖 ModelScope | 📑 Tech Report | 📑 Blog
🖥️ Demo | 💬 WeChat (微信) | 🫨 Discord | Github
Introduction
This September, we are pleased to introduce Qwen-Image-Edit-2509, the monthly iteration of Qwen-Image-Edit. To experience the latest model, please visit Qwen Chat and select the "Image Editing" feature. Compared with Qwen-Image-Edit released in August, the main improvements of Qwen-Image-Edit-2509 include:
- Multi-image Editing Support: For multi-image inputs, Qwen-Image-Edit-2509 builds upon the Qwen-Image-Edit architecture and is further trained via image concatenation to enable multi-image editing. It supports various combinations such as "person + person," "person + product," and "person + scene." Optimal performance is currently achieved with 1 to 3 input images.
- Enhanced Single-image Consistency: For single-image inputs, Qwen-Image-Edit-2509 significantly improves editing consistency, specifically in the following areas:
- Improved Person Editing Consistency: Better preservation of facial identity, supporting various portrait styles and pose transformations;
- Improved Product Editing Consistency: Better preservation of product identity, supporting product poster editing;
- Improved Text Editing Consistency: In addition to modifying text content, it also supports editing text fonts, colors, and materials;
- Native Support for ControlNet: Including depth maps, edge maps, keypoint maps, and more.
Quick Start
Install the latest version of diffusers
pip install git+https://github.com/huggingface/diffusers
The following contains a code snippet illustrating how to use Qwen-Image-Edit-2509
:
import os
import torch
from PIL import Image
from diffusers import QwenImageEditPlusPipeline
pipeline = QwenImageEditPlusPipeline.from_pretrained("Qwen/Qwen-Image-Edit-2509", torch_dtype=torch.bfloat16)
print("pipeline loaded")
pipeline.to('cuda')
pipeline.set_progress_bar_config(disable=None)
image1 = Image.open("input1.png")
image2 = Image.open("input2.png")
prompt = "The magician bear is on the left, the alchemist bear is on the right, facing each other in the central park square."
inputs = {
"image": [image1, image2],
"prompt": prompt,
"generator": torch.manual_seed(0),
"true_cfg_scale": 4.0,
"negative_prompt": " ",
"num_inference_steps": 40,
"guidance_scale": 1.0,
"num_images_per_prompt": 1,
}
with torch.inference_mode():
output = pipeline(**inputs)
output_image = output.images[0]
output_image.save("output_image_edit_plus.png")
print("image saved at", os.path.abspath("output_image_edit_plus.png"))
Showcase
The primary update in Qwen-Image-Edit-2509 is support for multi-image inputs.
Let’s first look at a "person + person" example:
Here is a "person + scene" example:
Below is a "person + object" example:
In fact, multi-image input also supports commonly used ControlNet keypoint maps—for example, changing a person’s pose:
Similarly, the following examples demonstrate results using three input images:
Another major update in Qwen-Image-Edit-2509 is enhanced consistency.
First, regarding person consistency, Qwen-Image-Edit-2509 shows significant improvement over Qwen-Image-Edit. Below are examples generating various portrait styles:
For instance, changing a person’s pose while maintaining excellent identity consistency:
Leveraging this improvement along with Qwen-Image’s unique text rendering capability, we find that Qwen-Image-Edit-2509 excels at creating meme images:
Of course, even with longer text, Qwen-Image-Edit-2509 can still render it while preserving the person’s identity:
Person consistency is also evident in old photo restoration. Below are two examples:
Naturally, besides real people, generating cartoon characters and cultural creations is also possible:
Second, Qwen-Image-Edit-2509 specifically enhances product consistency. We find that the model can naturally generate product posters from plain-background product images:
Third, Qwen-Image-Edit-2509 specifically enhances text consistency and supports editing font type, font color, and font material:
Moreover, the ability for precise text editing has been significantly enhanced:
It is worth noting that text editing can often be seamlessly integrated with image editing—for example, in this poster editing case:
The final update in Qwen-Image-Edit-2509 is native support for commonly used ControlNet image conditions, such as keypoint control and sketches:
License Agreement
Qwen-Image is licensed under Apache 2.0.
Citation
We kindly encourage citation of our work if you find it useful.
@misc{wu2025qwenimagetechnicalreport,
title={Qwen-Image Technical Report},
author={Chenfei Wu and Jiahao Li and Jingren Zhou and Junyang Lin and Kaiyuan Gao and Kun Yan and Sheng-ming Yin and Shuai Bai and Xiao Xu and Yilei Chen and Yuxiang Chen and Zecheng Tang and Zekai Zhang and Zhengyi Wang and An Yang and Bowen Yu and Chen Cheng and Dayiheng Liu and Deqing Li and Hang Zhang and Hao Meng and Hu Wei and Jingyuan Ni and Kai Chen and Kuan Cao and Liang Peng and Lin Qu and Minggang Wu and Peng Wang and Shuting Yu and Tingkun Wen and Wensen Feng and Xiaoxiao Xu and Yi Wang and Yichang Zhang and Yongqiang Zhu and Yujia Wu and Yuxuan Cai and Zenan Liu},
year={2025},
eprint={2508.02324},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.02324},
}
- Downloads last month
- 1,332
Model tree for ovedrive/Qwen-Image-Edit-2509-4bit
Base model
Qwen/Qwen-Image-Edit-2509