This is a quantization of the Phi-4-reasoning-plus.

Phi-4-reasoning-plus, developed by Microsoft Research, stands out as a state-of-the-art language model specialized in reasoning and logic, particularly excelling in domains like math, science, and coding. Finetuned from the Phi-4 model, it uniquely combines supervised learning with reinforcement learning, enhancing accuracy and offering advanced reasoning capabilities in memory-constrained and latency-sensitive environments. The model generates responses with a distinct two-section format: a detailed reasoning chain-of-thought process followed by a concise solution, ensuring thorough and accurate answers. Despite being relatively compact with 14 billion parameters, it delivers strong performance across a wide range of complex reasoning tasks and demonstrates the capacity to maintain coherence over extended inputs, making it particularly suited for deep, multi-step reasoning applications.

Evaluations

This model provides an accuracy recovery of 100.0%.

English Phi-4-reasoning-plus Phi-4-reasoning-plus-FP8-Dynamic (this)
Avg. 70.77 70.77
ARC 65.7 65.5
Hellaswag 69 69.5
MMLU 77.61 77.3

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/Phi-4-reasoning-plus-FP8-Dynamic --max-model-len 32768 --gpu-memory-utilization 0.95

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/Phi-4-reasoning-plus-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '
Downloads last month
42
Safetensors
Model size
14.7B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cortecs/Phi-4-reasoning-plus-FP8-Dynamic

Base model

microsoft/phi-4
Quantized
(27)
this model