--- library_name: transformers license: apache-2.0 datasets: - Open-Orca/slimorca-deduped-cleaned-corrected language: - en base_model: - Felladrin/Minueza-2-96M tags: - llama-factory --- # Minueza-2-96M-Instruct (Variant 07) This model is a fine-tuned version of [Felladrin/Minueza-2-96M](https://huggingface.co/Felladrin/Minueza-2-96M) on the English [Open-Orca/slimorca-deduped-cleaned-corrected](https://huggingface.co/datasets/Open-Orca/slimorca-deduped-cleaned-corrected) dataset. ## Usage ```sh pip install transformers==4.51.1 torch==2.6.0 ``` ```python from transformers import pipeline, TextStreamer import torch generate_text = pipeline( "text-generation", model="Felladrin/Minueza-2-96M-Instruct-Variant-07", device=torch.device("cuda" if torch.cuda.is_available() else "cpu"), ) messages = [ { "role": "system", "content": "You are an AI assistant that follows instruction extremely well. Help as much as you can.", }, { "role": "user", "content": "Could you explain how does the Internet work?", }, ] generate_text( generate_text.tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ), streamer=TextStreamer(generate_text.tokenizer, skip_special_tokens=True), max_new_tokens=512, do_sample=True, temperature=0.7, top_p=0.9, top_k=0, min_p=0.1, repetition_penalty=1.17, ) ``` ## Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5.8e-05 - train_batch_size: 4 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 32 - total_train_batch_size: 128 - optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 3 ## Framework versions - Transformers 4.51.1 - Pytorch 2.6.0+cu124 - Datasets 3.5.0 - Tokenizers 0.21.0 ## License This model is licensed under the Apache License 2.0.