nvidia
/

Llama-3_3-Nemotron-Super-49B-v1

@@ -93,6 +93,7 @@ Llama-3.3-Nemotron-Super-49B-v1 is a general purpose reasoning and chat model in
 2. We recommend setting temperature to `0.6`, and Top P to `0.95` for Reasoning ON mode
 3. We recommend using greedy decoding for Reasoning OFF mode
 4. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
 You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).

 2. We recommend setting temperature to `0.6`, and Top P to `0.95` for Reasoning ON mode
 3. We recommend using greedy decoding for Reasoning OFF mode
 4. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
+5. The model will include `<think></think>` if no reasoning was necessary in Reasoning ON model, this is expected behaviour
 You can try this model out through the preview API, using this link: [Llama-3_3-Nemotron-Super-49B-v1](https://build.nvidia.com/nvidia/llama-3_3-nemotron-super-49b-v1).