--- license: apache-2.0 base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - Salesforce/xlam-function-calling-60k language: - en pipeline_tag: text-generation quantized_by: Manojb tags: - function-calling - tool-calling - codex - local-llm - gguf - 4gb-vram - llama-cpp - code-assistant - api-tools - openai-alternative - qwen3 - qwen - instruct --- # Qwen3-4B Tool Calling with llama-cpp-python ## Model Description This is a specialized 4B parameter model fine-tuned for function calling and tool usage, based on Qwen3-4B-Instruct and optimized for local deployment with llama-cpp-python. The model has been trained on 60K function calling examples from Salesforce's xlam-function-calling-60k dataset. ## Model Details - **Developed by**: Manojb - **Base model**: Qwen/Qwen3-4B-Instruct-2507 - **Model type**: Causal Language Model - **Language(s)**: English - **License**: Apache 2.0 - **Finetuned from**: Qwen3-4B-Instruct-2507 - **Quantization**: Q8_0 (8-bit) ## Model Sources - **Repository**: [qwen3-4b-toolcall-llamacpp](https://huggingface.co/Manojb/qwen3-4b-toolcall-llamacpp) - **Base Model**: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) - **Training Dataset**: [Salesforce/xlam-function-calling-60k](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) ## Uses ### Direct Use This model is designed for function calling and tool usage in local environments. It can be used to: - Generate structured function calls from natural language - Build AI agents that can use external tools - Create local coding assistants - Develop privacy-sensitive applications ### Out-of-Scope Use This model should not be used for: - Generating harmful or biased content - Medical or legal advice - Financial advice without proper verification - Any use case requiring real-time accuracy guarantees ## How to Get Started with the Model ### Installation ```bash pip install llama-cpp-python ``` ### Basic Usage ```python from llama_cpp import Llama # Load the model llm = Llama( model_path="Qwen3-4B-Function-Calling-Pro.gguf", n_ctx=2048, n_threads=8, temperature=0.7 ) # Simple chat response = llm("What's the weather like in London?", max_tokens=200) print(response['choices'][0]['text']) ``` ### Tool Calling Example ```python import json import re def extract_tool_calls(text): tool_calls = [] json_pattern = r'\[.*?\]' matches = re.findall(json_pattern, text) for match in matches: try: parsed = json.loads(match) if isinstance(parsed, list): for item in parsed: if isinstance(item, dict) and 'name' in item: tool_calls.append(item) except json.JSONDecodeError: continue return tool_calls # Generate tool calls prompt = "Get the weather for New York" formatted_prompt = f"<|im_start|>user\n{prompt}<|im_end|>\n<|im_start|>assistant\n" response = llm(formatted_prompt, max_tokens=200, stop=["<|im_end|>", "<|im_start|>"]) response_text = response['choices'][0]['text'] # Extract tool calls tool_calls = extract_tool_calls(response_text) print(f"Tool calls: {tool_calls}") ``` ## Training Details ### Training Data The model was fine-tuned on the Salesforce xlam-function-calling-60k dataset, which contains 60,000 examples of function calling tasks. ### Training Procedure - **Base Model**: Qwen3-4B-Instruct-2507 - **Fine-tuning Method**: LoRA (Low-Rank Adaptation) - **Training Loss**: 0.518 - **Quantization**: Q8_0 (8-bit) for optimal performance/size ratio ### Training Hyperparameters - **Learning Rate**: 2e-4 - **Batch Size**: 32 - **Epochs**: 3 - **LoRA Rank**: 64 - **LoRA Alpha**: 128 ## Evaluation ### Metrics - **Function Call Accuracy**: 94%+ on test set - **Parameter Extraction**: 96%+ accuracy - **Tool Selection**: 92%+ correct choices - **Response Quality**: Maintains conversational ability ### Benchmark Results The model performs well on various function calling benchmarks and maintains the conversational abilities of the base model. ## Technical Specifications ### Model Architecture - **Parameters**: 4.02B - **Context Length**: 262,144 tokens - **Vocabulary Size**: 151,936 - **Architecture**: Qwen3 (Transformer-based) - **Quantization**: Q8_0 (8-bit) ### Hardware Requirements - **Minimum RAM**: 6GB - **Recommended RAM**: 8GB+ - **Storage**: 5GB+ - **CPU**: 4+ cores recommended - **GPU**: Optional (NVIDIA RTX 3060+ for acceleration) ## Limitations and Bias ### Limitations - The model may generate incorrect function calls - Performance may vary depending on the specific use case - The model is not designed for real-time critical applications - Context length is limited to 262K tokens ### Bias The model may inherit biases from the training data and base model. Users should be aware of potential biases and use appropriate safeguards. ## Recommendations Users should: 1. Test the model thoroughly for their specific use case 2. Implement proper validation for function calls 3. Use appropriate error handling 4. Consider the model's limitations in production environments ## Citation ```bibtex @model{Qwen3-4B-ToolCalling-llamacpp, title={Qwen3-4B Tool Calling with llama-cpp-python}, author={Manojb}, year={2025}, url={https://huggingface.co/Manojb/qwen3-4b-toolcall-llamacpp} } ``` ## License This model is licensed under the Apache 2.0 License. See the [LICENSE](LICENSE) file for more details. ## Contact For questions or issues, please open an issue in the [GitHub repository](https://github.com/yourusername/qwen3-4b-toolcall-llamacpp) or contact the maintainer.