Spaces:
Running
on
CPU Upgrade
Text Generation
Generate text based on a prompt.
If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion
task.
For more details about the text-generation
task, check out its dedicated page! You will find examples and related materials.
Recommended models
- google/gemma-2-2b-it: A text-generation model trained to follow instructions.
- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B: Smaller variant of one of the most powerful models.
- meta-llama/Meta-Llama-3.1-8B-Instruct: Very powerful text generation model trained to follow instructions.
- microsoft/phi-4: Powerful text generation model by Microsoft.
- Qwen/Qwen2.5-7B-Instruct-1M: Strong conversational model that supports very long instructions.
- Qwen/Qwen2.5-Coder-32B-Instruct: Text generation model used to write code.
- deepseek-ai/DeepSeek-R1: Powerful reasoning based open large language model.
Explore all available models and find the one that suits you best here.
Using the API
Language
Python JavaScript cURL
Client
huggingface_hub requests openai
Provider
Featherless Together AI
Settings
Settings
Settings
Copied
import os from huggingface_hub import InferenceClient
client = InferenceClient( provider="featherless-ai", api_key=os.environ["HF_TOKEN"], )
completion = client.chat.completions.create( model="mistralai/Magistral-Small-2506", messages="\"Can you please let us know more details about your \"", )
print(completion.choices[0].message)
API specification
Request
Headers
authorization
string
Authentication header in the form 'Bearer: hf_****'
when hf_****
is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.
Payload
inputs*
string
parameters
object
adapter_id
string
Lora adapter id
best_of
integer
Generate best_of sequences and return the one if the highest token logprobs.
decoder_input_details
boolean
Whether to return decoder input token logprobs and ids.
details
boolean
Whether to return generation details.
do_sample
boolean
Activate logits sampling.
frequency_penalty
number
The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.
grammar
unknown
One of the following:
(#1)
object
type*
enum
Possible values: json.
value*
unknown
A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.
(#2)
object
type*
enum
Possible values: regex.
value*
string
(#3)
object
type*
enum
Possible values: json_schema.
value*
object
name
string
Optional name identifier for the schema
schema*
unknown
The actual JSON schema definition
max_new_tokens
integer
Maximum number of tokens to generate.
repetition_penalty
number
The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
return_full_text
boolean
Whether to prepend the prompt to the generated text
seed
integer
Random sampling seed.
stop
string[]
Stop generating tokens if a member of stop
is generated.
temperature
number
The value used to module the logits distribution.
top_k
integer
The number of highest probability vocabulary tokens to keep for top-k-filtering.
top_n_tokens
integer
The number of highest probability vocabulary tokens to keep for top-n-filtering.
top_p
number
Top-p value for nucleus sampling.
truncate
integer
Truncate inputs tokens to the given size.
typical_p
number
Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.
watermark
boolean
Watermarking with A Watermark for Large Language Models.
stream
boolean
Response
Output type depends on the stream
input parameter. If stream
is false
(default), the response will be a JSON object with the following fields:
Body
details
object
best_of_sequences
object[]
finish_reason
enum
Possible values: length, eos_token, stop_sequence.
generated_text
string
generated_tokens
integer
prefill
object[]
id
integer
logprob
number
text
string
seed
integer
tokens
object[]
id
integer
logprob
number
special
boolean
text
string
top_tokens
array[]
id
integer
logprob
number
special
boolean
text
string
finish_reason
enum
Possible values: length, eos_token, stop_sequence.
generated_tokens
integer
prefill
object[]
id
integer
logprob
number
text
string
seed
integer
tokens
object[]
id
integer
logprob
number
special
boolean
text
string
top_tokens
array[]
id
integer
logprob
number
special
boolean
text
string
generated_text
string
If stream
is true
, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.
Body
details
object
finish_reason
enum
Possible values: length, eos_token, stop_sequence.
generated_tokens
integer
input_length
integer
seed
integer
generated_text
string
index
integer
token
object
id
integer
logprob
number
special
boolean
text
string
top_tokens
object[]
id
integer
logprob
number
special
boolean
text
string
Chat Completion