docs/for-bots/huggingface/text-generation.md · jbilcke-hf/tikslop at main

Text Generation

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion task.

For more details about the text-generation task, check out its dedicated page! You will find examples and related materials.

Recommended models

google/gemma-2-2b-it: A text-generation model trained to follow instructions.
deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B: Smaller variant of one of the most powerful models.
meta-llama/Meta-Llama-3.1-8B-Instruct: Very powerful text generation model trained to follow instructions.
microsoft/phi-4: Powerful text generation model by Microsoft.
Qwen/Qwen2.5-7B-Instruct-1M: Strong conversational model that supports very long instructions.
Qwen/Qwen2.5-Coder-32B-Instruct: Text generation model used to write code.
deepseek-ai/DeepSeek-R1: Powerful reasoning based open large language model.

Explore all available models and find the one that suits you best here.

Using the API

Language

Python JavaScript cURL

Client

huggingface_hub requests openai

Provider

Featherless Together AI

Settings

Copied

import os from huggingface_hub import InferenceClient

client = InferenceClient( provider="featherless-ai", api_key=os.environ["HF_TOKEN"], )

completion = client.chat.completions.create( model="mistralai/Magistral-Small-2506", messages="\"Can you please let us know more details about your \"", )

print(completion.choices[0].message)

API specification

Request

Headers

authorization

string

Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.

Payload

inputs*

string

parameters

object

adapter_id

string

Lora adapter id

best_of

integer

Generate best_of sequences and return the one if the highest token logprobs.

decoder_input_details

boolean

Whether to return decoder input token logprobs and ids.

details

boolean

Whether to return generation details.

do_sample

boolean

Activate logits sampling.

frequency_penalty

number

The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

grammar

unknown

One of the following:

(#1)

object

type*

enum

Possible values: json.

value*

unknown

A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.

(#2)

object

type*

enum

Possible values: regex.

value*

string

(#3)

object

type*

enum

Possible values: json_schema.

value*

object

name

string

Optional name identifier for the schema

schema*

unknown

The actual JSON schema definition

max_new_tokens

integer

Maximum number of tokens to generate.

repetition_penalty

number

The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

return_full_text

boolean

Whether to prepend the prompt to the generated text

seed

integer

Random sampling seed.

stop

string[]

Stop generating tokens if a member of stop is generated.

temperature

number

The value used to module the logits distribution.

top_k

integer

The number of highest probability vocabulary tokens to keep for top-k-filtering.

top_n_tokens

integer

The number of highest probability vocabulary tokens to keep for top-n-filtering.

top_p

number

Top-p value for nucleus sampling.

truncate

integer

Truncate inputs tokens to the given size.

typical_p

number

Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.

watermark

boolean

Watermarking with A Watermark for Large Language Models.

stream

boolean

Response

Output type depends on the stream input parameter. If stream is false (default), the response will be a JSON object with the following fields:

Body

details

object

best_of_sequences

object[]

finish_reason

enum

Possible values: length, eos_token, stop_sequence.

generated_text

string

generated_tokens

integer

prefill

object[]

integer

logprob

number

text

string

seed

integer

tokens

object[]

integer

logprob

number

special

boolean

text

string

top_tokens

array[]

integer

logprob

number

special

boolean

text

string

finish_reason

enum

Possible values: length, eos_token, stop_sequence.

generated_tokens

integer

prefill

object[]

integer

logprob

number

text

string

seed

integer

tokens

object[]

integer

logprob

number

special

boolean

text

string

top_tokens

array[]

integer

logprob

number

special

boolean

text

string

generated_text

string

If stream is true, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.

Body

details

object

finish_reason

enum

Possible values: length, eos_token, stop_sequence.

generated_tokens

integer

input_length

integer

seed

integer

generated_text

string

index

integer

token

object

integer

logprob

number

special

boolean

text

string

top_tokens

object[]

integer

logprob

number

special

boolean

text

string

< > Update on GitHub

Chat Completion