🧠 Neutrino-Instruct (7B)

Alt text Neutrino-Instruct is a 7B parameter instruction-tuned LLM developed by Fardeen NB.
It is designed for conversational AI, multi-step reasoning, and instruction-following tasks, fine-tuned to maintain coherent and contextual dialogue across multiple turns.

✨ Model Details

  • Model Name: Neutrino-Instruct
  • Developer: Fardeen NB
  • License: Apache-2.0
  • Language(s): English
  • Format: GGUF (optimized for llama.cpp and Ollama)
  • Base Model: Neutrino
  • Version: 2.0
  • Task: Text Generation (chat, Q&A, instruction-following)

πŸš€ Quick Start

Run with llama.cpp

# Clone and build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make

# Run a single prompt
./main -m ./neutrino-instruct.gguf -p "Hello, who are you?"

# Run in interactive mode
./main -m ./neutrino-instruct.gguf -i -p "Let's chat."

# Control output length
./main -m ./neutrino-instruct.gguf -n 256 -p "Write a poem about stars."

# Change creativity (temperature)
./main -m ./neutrino-instruct.gguf --temp 0.7 -p "Explain quantum computing simply."

# Enable GPU acceleration (if compiled with CUDA/Metal)
./main -m ./neutrino-instruct.gguf --gpu-layers 50 -p "Summarize this article."

Run with Ollama

ollama run fardeen0424/neutrino

Run in Python (llama-cpp-python)

from llama_cpp import Llama

# Load the Neutrino-Instruct model
llm = Llama(model_path="./neutrino-instruct.gguf")

# Run inference
response = llm("Who are you?")
print(response["choices"][0]["text"])

# Stream output tokens
for token in llm("Tell me a story about Neutrino:", stream=True):
    print(token["choices"][0]["text"], end="", flush=True)

πŸ“Š System Requirements

  • CPU-only: 32–64GB RAM recommended (runs on modern laptops, slower inference).

  • GPU acceleration:

    • 4GB VRAM β†’ 4-bit quantized (Q4) models
    • 8GB VRAM β†’ 5-bit/8-bit models
    • 12GB+ VRAM β†’ FP16 full precision

🧩 Potential Use Cases

  • Conversational AI assistants
  • Research prototypes
  • Instruction-following agents
  • Chatbots with identity-awareness

⚠️ Out of Scope: Use in critical decision-making, legal, or medical contexts.

πŸ› οΈ Development Notes

  • Model uploaded in GGUF format for portability & performance.
  • Compatible with llama.cpp, Ollama, and llama-cpp-python.
  • Supports quantization levels (Q4, Q5, Q8) for deployment on resource-constrained devices.

πŸ“– Citation

If you use Neutrino in your research or projects, please cite:

@misc{fardeennb2025neutrino,
  title = {Neutrino-Instruct: A 7B Instruction-Tuned Conversational Model},
  author = {Fardeen NB},
  year = {2025},
  howpublished = {Hugging Face},
  url = {https://huggingface.co/neuralcrew/neutrino-instruct}
}
Downloads last month
164
GGUF
Model size
7.25B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Datasets used to train neuralcrew/neutrino-instruct