Model Summary

Alisia-7B-it is a 7 billion parameter instruction-tuned language model. It is designed for general-purpose conversational AI and assistant-like tasks, demonstrating strong performance in factual knowledge, commonsense reasoning, and mathematical problem-solvin.

Evaluation

Custom Benchmark Results

The model was evaluated on a manually curated suite of 25 questions across 5 categories. The results for the base model are summarized below:

Benchmark Category	Score	Notes
Knowledge & Comprehension	100%	Excellent factual recall.
Commonsense Reasoning	100%	Strong understanding of everyday scenarios.
Mathematical Reasoning	100%	Proficient in arithmetic and algebra.
Linguistic Semantics	80%	Struggles with complex pronoun resolution.
Logical & Creative Reasoning	60%	Primary weakness. Fails on abstract logic and spatial puzzles.
Overall Score	88%	A capable generalist with a clear performance profile.

Further standard benchmark results (e.g., on MMLU, HellaSwag, ARC-Challenge) are recommended to confirm these findings at a larger scal.

Training Details

Training Data

The model was fine-tuned on a mixture of publicly available instruction datasets, including but not limited to cleaned versions of the Alpaca dataset. This data primarily consists of instruction-response pairs designed to teach the model to follow user commands.

Uses

Direct Use

This model is intended for direct use in the following applications:

Conversational AI: As a chatbot or interactive assistant.
Question Answering: Providing factual information and explanations.
Text Generation: Creative writing, summarization, and ideation.
Educational Tool: Assisting with homework, particularly in mathematics and general knowledge subjects.

Out-of-Scope Use

The model should not be used for:

Critical decision-making in legal, medical, or financial contexts.
Generating highly technical or scientific content without human verification.
Tasks requiring flawless logical or spatial reasoning (see Limitations).

How to Get Started with the Model

Use the code below to get started with the model. Ensure you have the required libraries installed:

Requirements

°transformers>=4.56.2 °torch

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Gems234/Alisia-7B-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Create a prompt
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate a response
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can also use the chat template format:

# Chat template
messages = [
    {"role": "user", "content": "Write me a poem about Machine Learning."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    return_dict=True
).to(model.device)

# Génération
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Instruction format

To take full advantage of the model's performance, we recommend using the Alpaca format:

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}

For exemple:


instruction = "You are Alisia. Be concise and helpful."
input  = "where is the Eiffel Tower"
response= "" 

prompt = alpaca_prompt.format(instruction, input_text, output_text)

# Tokenizer
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response

Limitations

A manual evaluation on a custom benchmark suite revealed the following performance profile for Alisia-7B-it:

Identified Strengths

Knowledge & Comprehension (MMLU-like): Achieved a perfect score (100%), demonstrating excellent recall of factual information across history, science, and literature.
Commonsense Reasoning (HellaSwag-like): Achieved a perfect score (100%), showing a robust understanding of everyday physical and social causality.
Mathematical Reasoning (GSM8K-like): Achieved a perfect score (100%), excelling at basic arithmetic, algebra, and problem-solving.

Identified Weaknesses

Logical & Creative Reasoning (ARC-like): Achieved a score of 60%. The model struggles with formal logic puzzles (e.g., syllogisms) and non-intuitive spatial reasoning problems. It is not recommended for applications requiring infallible abstract reasoning.
Linguistic Semantics (Winogrande-like): Achieved a score of 80%. While generally very good, the model can occasionally fail to resolve complex pronoun coreference ambiguities, potentially leading to minor misunderstandings in narrative text or dialogue.

Overall Benchmark Score: 88% (22/25 correct). The model is a robust generalist with a specific, predictable profile of strengths and weaknesses.

Bias, Risks, and Recommendations

Known Biases

Identity Bias: Due to the nature of its training data (which includes datasets like alpaca-cleaned), the model may occasionally incorrectly identify itself as "ChatGPT" or another AI system. This is a known artifact and does not reflect its actual origin or capabilities.
As with all large language models, Alisia-7B-it may reflect and amplify social biases present in its training data. Outputs should not be assumed to be free from bias.

Recommendations

Users should:

Be aware of the model's limitations in logical and spatial reasoning.
Critically evaluate its outputs, especially for critical applications.
Use a safety classifier or content filter in production environments.

Downloads last month: 126

Safetensors

Model size

7.62B params

Tensor type

F16

Model tree for Gems234/Alisia-7B-it

Base model

Gems234/Alisia-7B-Instruct-V1.0-private

Finetuned

Gems234/Alisia-x3-7B-Instruct-V1.0-private

Finetuned

Gems234/Alisia-7B-Instruct-V1.0-private

Finetuned

(1)

this model

Quantizations

2 models

Gems234
/

Alisia-7B-it

Model Summary

Evaluation

Custom Benchmark Results

Training Details

Training Data

Uses

Direct Use

Out-of-Scope Use

How to Get Started with the Model

Requirements

Instruction format

Limitations

Identified Strengths

Identified Weaknesses

Bias, Risks, and Recommendations

Known Biases

Recommendations

Model tree for Gems234/Alisia-7B-it

Space using Gems234/Alisia-7B-it 1