Important: This model is recommended for use with the stock Flux, Chroma and HiDream pipelines, in order to achieve identical results compared to the running the stock pipeline at full precision. For SD3.5 and Bria, it is recommended to use this version made from FP16 instead. Also, currently the original T5XXL weights are required to initialize the model correctly. Random initialization (via no_init_weights()) will lead to unpredictable results, even with the same seed.

For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11

After successfully compressing Cosmos-Predict2-14B-Text2Image and Chroma, I wanted to try compressing the text encoders used in diffusion pipelines. The benefits of doing so are as follows:

  1. It provides a further reduction in VRAM footprint if the entire pipeline is loaded onto the GPU, or in the case of enable_model_cpu_offload() it reduces the total system RAM footprint. Some diffusion models like, SD3.5 Medium, Cosmos-Predict2-2B and Bria 3.2 are actually smaller in footprint compared to the text encoder they use, so compressing the text encoder yields a larger benefit.
  2. The text encoder stage of the pipeline is very fast in my experience, so with enable_model_cpu_offload() the (almost insignificant) speed penalty of the text encoding is often more than outweighed by the significantly faster loading and unloading of the compressed text encoder, due to less data shuffling between VRAM and system RAM.

Unfortunately, this was an absolute nightmare to get working. It took many failed attempts to get the compression code working, and then many more attempts to produce a compressed model that successfully loads, and produces identical outputs to the uncompressed model. For T5XXL I was unable to get it to save in a single file, due to some complaint about shared tensors (most likely due to my own incompetence and inexperience, so I welcome any advice in this area). Also, the compressed weights cannot be directly loaded via text_encoder_df11 = DFloat11Model.from_pretrained(), and require specifying the bfloat16_model to load the weights into.

At least for now, using the DF11 compression of the T5XXL weights saves ~2.5GB of VRAM/RAM. This allows pipelines like Bria to run using pipe.to("cuda") instead of pipe.enable_model_cpu_offload() on 24GB VRAM setups, otherwise the uncompressed pipeline starts exceeding 24GB in the VAE decode stage. SD3.5 medium also exceeds 24GB when generating 1440x1440 images (which it seems somewhat capable of doing). As usual, do let me know if you run into any problems.

How to Use

diffusers

  1. Install the DFloat11 pip package (installs the CUDA kernel automatically; requires a CUDA-compatible GPU and PyTorch installed):

    pip install dfloat11[cuda12]
    # or if you have CUDA version 11:
    # pip install dfloat11[cuda11]
    
  2. To use the DFloat11 model, run the following example code in Python:

    import torch
    from diffusers import FluxPipeline, FluxTransformer2DModel
    from dfloat11 import DFloat11Model
    with no_init_weights(): # IMPORTANT! Only the transformer should be initialized this way! The text_encoder currently requires full bf16 weights to load correctly!
      transformer = FluxTransformer2DModel.from_config(
          FluxTransformer2DModel.load_config(
              "shuttleai/shuttle-jaguar",
              subfolder="transformer"
          ),
          torch_dtype=torch.bfloat16
      ).to(torch.bfloat16)
    
    pipe = FluxPipeline.from_pretrained(
        "shuttleai/shuttle-jaguar",
        transformer=transformer,
        torch_dtype=torch.bfloat16
    )
    DFloat11Model.from_pretrained('mingyi456/shuttle-jaguar-DF11', device='cpu', bfloat16_model=pipe.transformer)
    DFloat11Model.from_pretrained('mingyi456/t5-v1_1-xxl-DF11', device='cpu', bfloat16_model=pipe.text_encoder_2)
    pipe.enable_model_cpu_offload()
    prompt = "A futuristic cityscape at sunset, with flying cars, neon lights, and reflective water canals"
    image = pipe(
        prompt,
        guidance_scale=3.5,
        num_inference_steps=30,
        max_sequence_length=256,
        generator=torch.Generator("cpu").manual_seed(0)
    ).images[0]
    image.save("shuttle-jaguar.png")
    

ComfyUI

Unfortunately, this is unlikely to be supported in the near future. Due to my limited experience in this field, I do not think I can make this work unless the original developer steps in.

Downloads last month
8
Safetensors
Model size
132M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mingyi456/t5-v1_1-xxl-DF11

Base model

google/t5-v1_1-xxl
Quantized
(14)
this model