Zardos commited on
Commit
cd56dfa
·
verified ·
1 Parent(s): 0f6f872

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -1
README.md CHANGED
@@ -6,11 +6,121 @@ tags:
6
  - text-generation-inference
7
  - transformers
8
  - llama3
 
9
  - trl
10
  base_model: unsloth/llama-3-8b-Instruct
11
  ---
12
 
13
  # Uploaded model
14
 
15
- - **Developed by:** Zardos
16
  - **License:** apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - text-generation-inference
7
  - transformers
8
  - llama3
9
+ - llama
10
  - trl
11
  base_model: unsloth/llama-3-8b-Instruct
12
  ---
13
 
14
  # Uploaded model
15
 
16
+ - **Finetuned by:** Zardos
17
  - **License:** apache-2.0
18
+
19
+
20
+ ## How to use
21
+
22
+ This repository contains two versions of Meta-Llama-3-8B-Instruct, for use with transformers and with the original `llama3` codebase.
23
+
24
+ ### Use with transformers
25
+
26
+ You can run conversational inference using the Transformers pipeline abstraction, or by leveraging the Auto classes with the `generate()` function. Let's see examples of both.
27
+
28
+ #### Transformers pipeline
29
+
30
+ ```python
31
+ import transformers
32
+ import torch
33
+
34
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
35
+
36
+ pipeline = transformers.pipeline(
37
+ "text-generation",
38
+ model=model_id,
39
+ model_kwargs={"torch_dtype": torch.bfloat16},
40
+ device_map="auto",
41
+ )
42
+
43
+ messages = [
44
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
45
+ {"role": "user", "content": "Who are you?"},
46
+ ]
47
+
48
+ prompt = pipeline.tokenizer.apply_chat_template(
49
+ messages,
50
+ tokenize=False,
51
+ add_generation_prompt=True
52
+ )
53
+
54
+ terminators = [
55
+ pipeline.tokenizer.eos_token_id,
56
+ pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
57
+ ]
58
+
59
+ outputs = pipeline(
60
+ prompt,
61
+ max_new_tokens=256,
62
+ eos_token_id=terminators,
63
+ do_sample=True,
64
+ temperature=0.6,
65
+ top_p=0.9,
66
+ )
67
+ print(outputs[0]["generated_text"][len(prompt):])
68
+ ```
69
+
70
+ #### Transformers AutoModelForCausalLM
71
+
72
+ ```python
73
+ from transformers import AutoTokenizer, AutoModelForCausalLM
74
+ import torch
75
+
76
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
77
+
78
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
79
+ model = AutoModelForCausalLM.from_pretrained(
80
+ model_id,
81
+ torch_dtype=torch.bfloat16,
82
+ device_map="auto",
83
+ )
84
+
85
+ messages = [
86
+ {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
87
+ {"role": "user", "content": "Who are you?"},
88
+ ]
89
+
90
+ input_ids = tokenizer.apply_chat_template(
91
+ messages,
92
+ add_generation_prompt=True,
93
+ return_tensors="pt"
94
+ ).to(model.device)
95
+
96
+ terminators = [
97
+ tokenizer.eos_token_id,
98
+ tokenizer.convert_tokens_to_ids("<|eot_id|>")
99
+ ]
100
+
101
+ outputs = model.generate(
102
+ input_ids,
103
+ max_new_tokens=256,
104
+ eos_token_id=terminators,
105
+ do_sample=True,
106
+ temperature=0.6,
107
+ top_p=0.9,
108
+ )
109
+ response = outputs[0][input_ids.shape[-1]:]
110
+ print(tokenizer.decode(response, skip_special_tokens=True))
111
+ ```
112
+
113
+
114
+ ### Use with `llama3`
115
+
116
+ Please, follow the instructions in the [repository](https://github.com/meta-llama/llama3)
117
+
118
+ To download Original checkpoints, see the example command below leveraging `huggingface-cli`:
119
+
120
+ ```
121
+ huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir Meta-Llama-3-8B-Instruct
122
+ ```
123
+
124
+ For Hugging Face support, we recommend using transformers or TGI, but a similar command works.
125
+
126
+ ## Hardware and Software