File size: 5,839 Bytes
497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 fbe24bb 497b710 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
---
library_name: transformers
license: mit
datasets:
- SciPhi/textbooks-are-all-you-need-lite
- nampdn-ai/tiny-textbooks
- nampdn-ai/tiny-strange-textbooks
- nampdn-ai/tiny-codes
- nampdn-ai/tiny-math-textbooks
- nampdn-ai/tiny-webtext
- nampdn-ai/tiny-orca-textbooks
- nampdn-ai/tiny-lessons
- roneneldan/TinyStories
- ajibawa-2023/Children-Stories-Collection
- ajibawa-2023/General-Stories-Collection
- kerinin/hackernews-stories
- lucadiliello/wikipedia_512_pretraining
- Salesforce/wikitext
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
- iamtarun/python_code_instructions_18k_alpaca
- prithivMLmods/Step-Instruction-Gx
- LinhDuong/chatdoctor-200k
- MBZUAI/LaMini-instruction
- qwedsacf/grade-school-math-instructions
- TigerResearch/tigerbot-stackexchange-qa-en-0.5m
language:
- en
---
# amusktweewt/tiny-model-700M-chat
This is a general-purpose transformer-based language model tailored for conversational tasks, story generation, and code-related interactions. It builds upon earlier models in the "tiny" series with increased model size, improved attention efficiency, and optimized training setup.
It is more than twice as smart as the 500M model, with a significantly better user experience. It knows more facts and is the first model in this series capable of performing basic arithmetic.
## Model Details
### Model Description
- **Model type:** LlamaForCausalLM
- **Hidden size:** 816
- **Layers:** 26
- **Attention heads:** 12
- **Key/Value heads:** 6
- **Intermediate size:** 9856
- **Total Parameters:** 706M
- **Tokenizer vocab size:** 32,768
- **Max sequence length:** 2048 tokens
- **Rotary Positional Encoding:** Dynamic (factor: 2.0)
- **Activation:** SiLU
- **Attention Implementation:** Flash Attention 2
- **Other optimizations:**
- Scaled dot-product attention
- Memory-efficient attention
- No bias in MLP or attention layers
## Training Details
### Training Configuration
- **Optimizer:** AdamW with 8-bit precision (`adamw_bnb_8bit`)
- **Learning rate:** 8e-5
- **Scheduler:** Cosine
- **Warmup ratio:** 15%
- **Weight decay:** 0.01
- **Batch size:** 6 (train), 2 (eval) per device
- **Gradient accumulation:** 2 steps
- **Mixed precision:** bfloat16
- **Epochs:** 1
- **Training tokens:** 43.6B
- **Seed:** 42
### Training Hardware
- **Hardware:** Assumed similar to 4090-class GPU
- **Torch Compile:** Enabled (inductor backend)
## Evaluation
- **Perplexity:** 2.177
- **Eval loss:** 0.7776
In my own custom made benchmark for small models gets the highest grade of all my models
### Intelligence Score Comparison
| Model | Intelligence Score |
|----------------------------------|--------------------:|
| Gemma-3-27B *(for comparison)* | 8.3 |
| tiny-model-700M-chat | 4.42841 |
| tiny-model-141M-chat *(unreleased)* | 2.7 |
| tiny-model-500M-chat-v2 | 2.50909 |
| tiny-model-500M-chat-v2-5-exp | 2.08295 |
## Usage and Applications
### Direct Use
This model is suitable for:
- Text and dialogue generation
- Educational tasks
- Code completion and explanation
- Story creation
### Not Recommended For
- High factual precision tasks
- Sensitive or critical domains without human supervision
## How to Get Started
```python
import torch
from transformers import pipeline, set_seed
# Set up the text-generation pipeline
model_name = "amusktweewt/tiny-model-700M-chat"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0 if torch.cuda.is_available() else -1
)
# Ensure that bos_token and eos_token are explicitly set as strings
chatbot.tokenizer.bos_token = "<sos>"
chatbot.tokenizer.eos_token = "<|endoftext|>"
# Set seed for reproducibility (optional)
set_seed(42)
print("Chatbot is ready! Type 'exit' to end the conversation.")
# Initialize the conversation history
conversation_history = []
conversation_history.append({"role": "system", "content": "You are a highly intelligent and helpful AI assistant named Tiny Chat, developed by amusktweewt. Always refer to yourself like that. Your responses should be clear, concise, and accurate. Always prioritize user needs, provide well-structured answers, and maintain a friendly yet professional tone. Adapt to the user's preferences and communication style. When needed, ask clarifying questions to ensure the best response. Be honest about limitations and avoid making assumptions. Keep interactions engaging, informative, and efficient."})
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exiting chat. Goodbye!")
break
# Append user message to the conversation history
conversation_history.append({"role": "user", "content": user_input})
# Prepare the messages with the conversation history and an empty assistant turn
messages = conversation_history + [{"role": "assistant", "content": ""}]
# Use the tokenizer's apply_chat_template() method to format the prompt.
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# Generate text using the formatted prompt.
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.6,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=20
)
# The returned 'generated_text' includes the prompt plus the generation.
full_text = response[0]["generated_text"]
# Extract the assistant's response by removing the prompt portion.
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
```
## Contact
**Author:** amusktweewt
For issues or feedback, please reach out via Hugging Face profile.
|