TRL documentation

Model Utilities

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.19.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Model Utilities

clone_chat_template

trl.clone_chat_template

< >

( model: PreTrainedModel tokenizer: PreTrainedTokenizer source_tokenizer_path: str resize_to_multiple_of: typing.Optional[int] = 64 ) model (PreTrainedModel)

Parameters

  • model (PreTrainedModel) — Model to update.
  • tokenizer (PreTrainedTokenizer) — Tokenizer to update.
  • source_tokenizer_path (str) — Path or identifier of the pretrained tokenizer to clone from.
  • resize_to_multiple_of (int or None, optional, defaults to 64) — The embedding layer will be resized to the new vocabulary size. If this is not None, it will round up the new vocabulary size to the nearest multiple of this value.

Returns

model (PreTrainedModel)

Updated model with resized token embeddings and EOS token configured. tokenizer (~transformers.PreTrainedTokenizer): Updated tokenizer with the chat template and special tokens applied.

Clones a chat template from a source tokenizer to the target tokenizer and updates the model accordingly.

This function:

  • Copies the chat template from a source tokenizer to the target tokenizer.
  • Adds any new tokens from the source tokenizer to the target tokenizer.
  • Sets and synchronizes the EOS token across the tokenizer and model.
  • Resizes the model’s token embeddings to match the new vocabulary size, optionally rounding it up to a multiple of a specified value.

Example:

from transformers import AutoModelForCausalLM, AutoTokenizer
from trl import clone_chat_template

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
model, tokenizer = clone_chat_template(model, tokenizer, "Qwen/Qwen3-0.6B")

get_act_offloading_ctx_manager

trl.models.get_act_offloading_ctx_manager

< >

( model: Module use_pin_memory: bool = True use_streams: bool = True min_offload_size: int = 1024 max_fwd_stash_size: int = 5 warn_if_no_head: bool = True ) contextlib.ContextDecorator

Parameters

  • model (nn.Module) — Model to wrap with the activation offloading context manager.
  • use_pin_memory (bool, optional, defaults to True) — Whether to offloaded Tensor will be placed in pinned memory on the CPU. Pinned memory allows the Tensor to be moved back onto GPU more quickly but is a limited resource.
  • use_streams (bool, optional, defaults to True) — Whether to use streams for performance optimization where the communications get overlapped with the computation. Requires a torch build after torch-2.5.0.
  • min_offload_size (int, optional, defaults to 1024) — Minimum number of bytes a Tensor must be in order to qualify for offloading. If the tensor is too small, we do not want to waste bandwidth and resources moving it to CPU and back.
  • max_fwd_stash_size (int, optional, defaults to 5) — Maximum size of the forward stash, or the maximum number of consecutive activations to keep alive during the forward pass. This number must be at least 1. Keeping alive more activations will potentially allow more overlap between the communication and compute streams at the cost of increasing memory usage. Keeping alive fewer activations will conserve memory, but may cause poor overlap between the streams, increasing runtime.
  • warn_if_no_head (bool, optional, defaults to True) — Whether to warn if no output head is detected. If set to False, no warning will be raised if no output head is detected.

Returns

contextlib.ContextDecorator

Activation offloading context manager for the model.

Returns the activation offloading context manager for the model. All but the last output Linear in every step will be offloaded.

If activation offloading is enabled, we return the OffloadActivations context manager. If activation offloading is disabled, we return a NoOpManager context manager.

< > Update on GitHub