Update README.md
Browse files
README.md
CHANGED
@@ -1,199 +1,179 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
#
|
7 |
-
|
8 |
-
<!-- Provide a quick summary of what the model is/does. -->
|
9 |
|
|
|
10 |
|
|
|
11 |
|
12 |
## Model Details
|
13 |
|
14 |
### Model Description
|
15 |
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
- **
|
21 |
-
- **
|
22 |
-
- **
|
23 |
-
- **
|
24 |
-
- **
|
25 |
-
- **
|
26 |
-
- **
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
- **Repository:** [More Information Needed]
|
33 |
-
- **Paper [optional]:** [More Information Needed]
|
34 |
-
- **Demo [optional]:** [More Information Needed]
|
35 |
-
|
36 |
-
## Uses
|
37 |
-
|
38 |
-
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
39 |
-
|
40 |
-
### Direct Use
|
41 |
-
|
42 |
-
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
43 |
-
|
44 |
-
[More Information Needed]
|
45 |
-
|
46 |
-
### Downstream Use [optional]
|
47 |
-
|
48 |
-
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
|
49 |
-
|
50 |
-
[More Information Needed]
|
51 |
-
|
52 |
-
### Out-of-Scope Use
|
53 |
-
|
54 |
-
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
55 |
-
|
56 |
-
[More Information Needed]
|
57 |
-
|
58 |
-
## Bias, Risks, and Limitations
|
59 |
-
|
60 |
-
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
61 |
-
|
62 |
-
[More Information Needed]
|
63 |
-
|
64 |
-
### Recommendations
|
65 |
-
|
66 |
-
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
67 |
-
|
68 |
-
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
|
69 |
-
|
70 |
-
## How to Get Started with the Model
|
71 |
-
|
72 |
-
Use the code below to get started with the model.
|
73 |
-
|
74 |
-
[More Information Needed]
|
75 |
|
76 |
## Training Details
|
77 |
|
78 |
-
### Training
|
79 |
-
|
80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
81 |
|
82 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
83 |
|
84 |
-
### Training
|
85 |
|
86 |
-
|
87 |
-
|
88 |
-
#### Preprocessing [optional]
|
89 |
-
|
90 |
-
[More Information Needed]
|
91 |
-
|
92 |
-
|
93 |
-
#### Training Hyperparameters
|
94 |
-
|
95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
96 |
-
|
97 |
-
#### Speeds, Sizes, Times [optional]
|
98 |
-
|
99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
100 |
-
|
101 |
-
[More Information Needed]
|
102 |
|
103 |
## Evaluation
|
104 |
|
105 |
-
|
106 |
-
|
107 |
-
### Testing Data, Factors & Metrics
|
108 |
-
|
109 |
-
#### Testing Data
|
110 |
-
|
111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
112 |
-
|
113 |
-
[More Information Needed]
|
114 |
-
|
115 |
-
#### Factors
|
116 |
-
|
117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
118 |
|
119 |
-
|
120 |
|
121 |
-
|
122 |
|
123 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
124 |
|
125 |
-
[More Information Needed]
|
126 |
|
127 |
-
|
128 |
|
129 |
-
|
130 |
-
|
131 |
-
#### Summary
|
132 |
-
|
133 |
-
|
134 |
-
|
135 |
-
## Model Examination [optional]
|
136 |
-
|
137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
138 |
-
|
139 |
-
[More Information Needed]
|
140 |
-
|
141 |
-
## Environmental Impact
|
142 |
-
|
143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
144 |
-
|
145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
146 |
-
|
147 |
-
- **Hardware Type:** [More Information Needed]
|
148 |
-
- **Hours used:** [More Information Needed]
|
149 |
-
- **Cloud Provider:** [More Information Needed]
|
150 |
-
- **Compute Region:** [More Information Needed]
|
151 |
-
- **Carbon Emitted:** [More Information Needed]
|
152 |
-
|
153 |
-
## Technical Specifications [optional]
|
154 |
-
|
155 |
-
### Model Architecture and Objective
|
156 |
-
|
157 |
-
[More Information Needed]
|
158 |
|
159 |
-
|
|
|
|
|
|
|
|
|
160 |
|
161 |
-
|
162 |
|
163 |
-
|
|
|
164 |
|
165 |
-
|
166 |
|
167 |
-
|
|
|
|
|
168 |
|
169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
170 |
|
171 |
-
|
|
|
|
|
172 |
|
173 |
-
|
|
|
174 |
|
175 |
-
|
176 |
|
177 |
-
|
|
|
178 |
|
179 |
-
|
180 |
|
181 |
-
|
|
|
|
|
|
|
|
|
182 |
|
183 |
-
|
|
|
184 |
|
185 |
-
|
|
|
186 |
|
187 |
-
|
|
|
188 |
|
189 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
190 |
|
191 |
-
|
|
|
|
|
|
|
|
|
|
|
192 |
|
193 |
-
##
|
194 |
|
195 |
-
|
196 |
|
197 |
-
|
198 |
|
199 |
-
[More Information Needed]
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
+
license: mit
|
4 |
+
datasets:
|
5 |
+
- SciPhi/textbooks-are-all-you-need-lite
|
6 |
+
- nampdn-ai/tiny-textbooks
|
7 |
+
- nampdn-ai/tiny-strange-textbooks
|
8 |
+
- nampdn-ai/tiny-codes
|
9 |
+
- nampdn-ai/tiny-math-textbooks
|
10 |
+
- nampdn-ai/tiny-webtext
|
11 |
+
- nampdn-ai/tiny-orca-textbooks
|
12 |
+
- nampdn-ai/tiny-lessons
|
13 |
+
- roneneldan/TinyStories
|
14 |
+
- ajibawa-2023/Children-Stories-Collection
|
15 |
+
- ajibawa-2023/General-Stories-Collection
|
16 |
+
- kerinin/hackernews-stories
|
17 |
+
- lucadiliello/wikipedia_512_pretraining
|
18 |
+
- Salesforce/wikitext
|
19 |
+
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
|
20 |
+
- iamtarun/python_code_instructions_18k_alpaca
|
21 |
+
- prithivMLmods/Step-Instruction-Gx
|
22 |
+
- LinhDuong/chatdoctor-200k
|
23 |
+
- MBZUAI/LaMini-instruction
|
24 |
+
- qwedsacf/grade-school-math-instructions
|
25 |
+
- TigerResearch/tigerbot-stackexchange-qa-en-0.5m
|
26 |
+
language:
|
27 |
+
- en
|
28 |
---
|
29 |
|
30 |
+
# amusktweewt/tiny-model-700M-chat
|
|
|
|
|
31 |
|
32 |
+
This is a general-purpose transformer-based language model tailored for conversational tasks, story generation, and code-related interactions. It builds upon earlier models in the "tiny" series with increased model size, improved attention efficiency, and optimized training setup.
|
33 |
|
34 |
+
It is more than twice as smart as the 500M model, with a significantly better user experience. It knows more facts and is the first model in this series capable of performing basic arithmetic.
|
35 |
|
36 |
## Model Details
|
37 |
|
38 |
### Model Description
|
39 |
|
40 |
+
- **Model type:** LlamaForCausalLM
|
41 |
+
- **Hidden size:** 816
|
42 |
+
- **Layers:** 26
|
43 |
+
- **Attention heads:** 12
|
44 |
+
- **Key/Value heads:** 6
|
45 |
+
- **Intermediate size:** 9856
|
46 |
+
- **Total Parameters:** 706M
|
47 |
+
- **Tokenizer vocab size:** 32,768
|
48 |
+
- **Max sequence length:** 2048 tokens
|
49 |
+
- **Rotary Positional Encoding:** Dynamic (factor: 2.0)
|
50 |
+
- **Activation:** SiLU
|
51 |
+
- **Attention Implementation:** Flash Attention 2
|
52 |
+
- **Other optimizations:**
|
53 |
+
- Scaled dot-product attention
|
54 |
+
- Memory-efficient attention
|
55 |
+
- No bias in MLP or attention layers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
56 |
|
57 |
## Training Details
|
58 |
|
59 |
+
### Training Configuration
|
|
|
|
|
60 |
|
61 |
+
- **Optimizer:** AdamW with 8-bit precision (`adamw_bnb_8bit`)
|
62 |
+
- **Learning rate:** 8e-5
|
63 |
+
- **Scheduler:** Cosine
|
64 |
+
- **Warmup ratio:** 15%
|
65 |
+
- **Weight decay:** 0.01
|
66 |
+
- **Batch size:** 6 (train), 2 (eval) per device
|
67 |
+
- **Gradient accumulation:** 2 steps
|
68 |
+
- **Mixed precision:** bfloat16
|
69 |
+
- **Epochs:** 1
|
70 |
+
- **Training tokens:** 43.6B
|
71 |
+
- **Seed:** 42
|
72 |
|
73 |
+
### Training Hardware
|
74 |
|
75 |
+
- **Hardware:** Assumed similar to 4090-class GPU
|
76 |
+
- **Torch Compile:** Enabled (inductor backend)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
77 |
|
78 |
## Evaluation
|
79 |
|
80 |
+
- **Perplexity:** 2.177
|
81 |
+
- **Eval loss:** 0.7776
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
|
83 |
+
In my own custom made benchmark for small models gets the highest grade of all my models
|
84 |
|
85 |
+
### Intelligence Score Comparison
|
86 |
|
87 |
+
| Model | Intelligence Score |
|
88 |
+
|----------------------------------|--------------------:|
|
89 |
+
| Gemma-3-27B *(for comparison)* | 8.3 |
|
90 |
+
| tiny-model-700M-chat | 4.42841 |
|
91 |
+
| tiny-model-141M-chat *(unreleased)* | 2.7 |
|
92 |
+
| tiny-model-500M-chat-v2 | 2.50909 |
|
93 |
+
| tiny-model-500M-chat-v2-5-exp | 2.08295 |
|
94 |
|
|
|
95 |
|
96 |
+
## Usage and Applications
|
97 |
|
98 |
+
### Direct Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
99 |
|
100 |
+
This model is suitable for:
|
101 |
+
- Text and dialogue generation
|
102 |
+
- Educational tasks
|
103 |
+
- Code completion and explanation
|
104 |
+
- Story creation
|
105 |
|
106 |
+
### Not Recommended For
|
107 |
|
108 |
+
- High factual precision tasks
|
109 |
+
- Sensitive or critical domains without human supervision
|
110 |
|
111 |
+
## How to Get Started
|
112 |
|
113 |
+
```python
|
114 |
+
import torch
|
115 |
+
from transformers import pipeline, set_seed
|
116 |
|
117 |
+
# Set up the text-generation pipeline
|
118 |
+
model_name = "amusktweewt/tiny-model-700M-chat"
|
119 |
+
chatbot = pipeline(
|
120 |
+
"text-generation",
|
121 |
+
model=model_name,
|
122 |
+
device=0 if torch.cuda.is_available() else -1
|
123 |
+
)
|
124 |
|
125 |
+
# Ensure that bos_token and eos_token are explicitly set as strings
|
126 |
+
chatbot.tokenizer.bos_token = "<sos>"
|
127 |
+
chatbot.tokenizer.eos_token = "<|endoftext|>"
|
128 |
|
129 |
+
# Set seed for reproducibility (optional)
|
130 |
+
set_seed(42)
|
131 |
|
132 |
+
print("Chatbot is ready! Type 'exit' to end the conversation.")
|
133 |
|
134 |
+
# Initialize the conversation history
|
135 |
+
conversation_history = []
|
136 |
|
137 |
+
conversation_history.append({"role": "system", "content": "You are a highly intelligent and helpful AI assistant named Tiny Chat, developed by amusktweewt. Always refer to yourself like that. Your responses should be clear, concise, and accurate. Always prioritize user needs, provide well-structured answers, and maintain a friendly yet professional tone. Adapt to the user's preferences and communication style. When needed, ask clarifying questions to ensure the best response. Be honest about limitations and avoid making assumptions. Keep interactions engaging, informative, and efficient."})
|
138 |
|
139 |
+
while True:
|
140 |
+
user_input = input("You: ").strip()
|
141 |
+
if user_input.lower() == "exit":
|
142 |
+
print("Exiting chat. Goodbye!")
|
143 |
+
break
|
144 |
|
145 |
+
# Append user message to the conversation history
|
146 |
+
conversation_history.append({"role": "user", "content": user_input})
|
147 |
|
148 |
+
# Prepare the messages with the conversation history and an empty assistant turn
|
149 |
+
messages = conversation_history + [{"role": "assistant", "content": ""}]
|
150 |
|
151 |
+
# Use the tokenizer's apply_chat_template() method to format the prompt.
|
152 |
+
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
|
153 |
|
154 |
+
# Generate text using the formatted prompt.
|
155 |
+
response = chatbot(
|
156 |
+
prompt,
|
157 |
+
do_sample=True,
|
158 |
+
max_new_tokens=512,
|
159 |
+
top_k=50,
|
160 |
+
temperature=0.6,
|
161 |
+
num_return_sequences=1,
|
162 |
+
repetition_penalty=1.1,
|
163 |
+
pad_token_id=chatbot.tokenizer.eos_token_id,
|
164 |
+
min_new_tokens=20
|
165 |
+
)
|
166 |
|
167 |
+
# The returned 'generated_text' includes the prompt plus the generation.
|
168 |
+
full_text = response[0]["generated_text"]
|
169 |
+
# Extract the assistant's response by removing the prompt portion.
|
170 |
+
bot_response = full_text[len(prompt):].strip()
|
171 |
+
print(f"Bot: {bot_response}")
|
172 |
+
```
|
173 |
|
174 |
+
## Contact
|
175 |
|
176 |
+
**Author:** amusktweewt
|
177 |
|
178 |
+
For issues or feedback, please reach out via Hugging Face profile.
|
179 |
|
|