tikslop / docs /for-bots /huggingface /text-generation.md
jbilcke-hf's picture
jbilcke-hf HF Staff
fix issue with YAML parsing
422262e

Text Generation

Generate text based on a prompt.

If you are interested in a Chat Completion task, which generates a response based on a list of messages, check out the chat-completion task.

For more details about the text-generation task, check out its dedicated page! You will find examples and related materials.

Recommended models

Explore all available models and find the one that suits you best here.

Using the API

Language

Python JavaScript cURL

Client

huggingface_hub requests openai

Provider

Featherless Together AI

Settings

Settings

Settings

Copied

import os from huggingface_hub import InferenceClient

client = InferenceClient( provider="featherless-ai", api_key=os.environ["HF_TOKEN"], )

completion = client.chat.completions.create( model="mistralai/Magistral-Small-2506", messages="\"Can you please let us know more details about your \"", )

print(completion.choices[0].message)

API specification

Request

Headers

authorization

string

Authentication header in the form 'Bearer: hf_****' when hf_**** is a personal user access token with “Inference Providers” permission. You can generate one from your settings page.

Payload

inputs*

string

parameters

object

        adapter_id

string

Lora adapter id

        best_of

integer

Generate best_of sequences and return the one if the highest token logprobs.

        decoder_input_details

boolean

Whether to return decoder input token logprobs and ids.

        details

boolean

Whether to return generation details.

        do_sample

boolean

Activate logits sampling.

        frequency_penalty

number

The parameter for frequency penalty. 1.0 means no penalty Penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim.

        grammar

unknown

One of the following:

                 (#1)

object

                        type*

enum

Possible values: json.

                        value*

unknown

A string that represents a JSON Schema. JSON Schema is a declarative language that allows to annotate JSON documents with types and descriptions.

                 (#2)

object

                        type*

enum

Possible values: regex.

                        value*

string

                 (#3)

object

                        type*

enum

Possible values: json_schema.

                        value*

object

                                name

string

Optional name identifier for the schema

                                schema*

unknown

The actual JSON schema definition

        max_new_tokens

integer

Maximum number of tokens to generate.

        repetition_penalty

number

The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.

        return_full_text

boolean

Whether to prepend the prompt to the generated text

        seed

integer

Random sampling seed.

        stop

string[]

Stop generating tokens if a member of stop is generated.

        temperature

number

The value used to module the logits distribution.

        top_k

integer

The number of highest probability vocabulary tokens to keep for top-k-filtering.

        top_n_tokens

integer

The number of highest probability vocabulary tokens to keep for top-n-filtering.

        top_p

number

Top-p value for nucleus sampling.

        truncate

integer

Truncate inputs tokens to the given size.

        typical_p

number

Typical Decoding mass See Typical Decoding for Natural Language Generation for more information.

        watermark

boolean

Watermarking with A Watermark for Large Language Models.

stream

boolean

Response

Output type depends on the stream input parameter. If stream is false (default), the response will be a JSON object with the following fields:

Body

details

object

        best_of_sequences

object[]

                finish_reason

enum

Possible values: length, eos_token, stop_sequence.

                generated_text

string

                generated_tokens

integer

                prefill

object[]

                        id

integer

                        logprob

number

                        text

string

                seed

integer

                tokens

object[]

                        id

integer

                        logprob

number

                        special

boolean

                        text

string

                top_tokens

array[]

                        id

integer

                        logprob

number

                        special

boolean

                        text

string

        finish_reason

enum

Possible values: length, eos_token, stop_sequence.

        generated_tokens

integer

        prefill

object[]

                id

integer

                logprob

number

                text

string

        seed

integer

        tokens

object[]

                id

integer

                logprob

number

                special

boolean

                text

string

        top_tokens

array[]

                id

integer

                logprob

number

                special

boolean

                text

string

generated_text

string

If stream is true, generated tokens are returned as a stream, using Server-Sent Events (SSE). For more information about streaming, check out this guide.

Body

details

object

        finish_reason

enum

Possible values: length, eos_token, stop_sequence.

        generated_tokens

integer

        input_length

integer

        seed

integer

generated_text

string

index

integer

token

object

        id

integer

        logprob

number

        special

boolean

        text

string

top_tokens

object[]

        id

integer

        logprob

number

        special

boolean

        text

string

< > Update on GitHub

Chat Completion