Model Summary

Alisia-7B-it is a 7 billion parameter instruction-tuned language model. It is designed for general-purpose conversational AI and assistant-like tasks, demonstrating strong performance in factual knowledge, commonsense reasoning, and mathematical problem-solvin.

Evaluation

Custom Benchmark Results

The model was evaluated on a manually curated suite of 25 questions across 5 categories. The results for the base model are summarized below:

Benchmark Category Score Notes
Knowledge & Comprehension 100% Excellent factual recall.
Commonsense Reasoning 100% Strong understanding of everyday scenarios.
Mathematical Reasoning 100% Proficient in arithmetic and algebra.
Linguistic Semantics 80% Struggles with complex pronoun resolution.
Logical & Creative Reasoning 60% Primary weakness. Fails on abstract logic and spatial puzzles.
Overall Score 88% A capable generalist with a clear performance profile.

Further standard benchmark results (e.g., on MMLU, HellaSwag, ARC-Challenge) are recommended to confirm these findings at a larger scal.

Training Details

Training Data

The model was fine-tuned on a mixture of publicly available instruction datasets, including but not limited to cleaned versions of the Alpaca dataset. This data primarily consists of instruction-response pairs designed to teach the model to follow user commands.

Uses

Direct Use

This model is intended for direct use in the following applications:

  • Conversational AI: As a chatbot or interactive assistant.
  • Question Answering: Providing factual information and explanations.
  • Text Generation: Creative writing, summarization, and ideation.
  • Educational Tool: Assisting with homework, particularly in mathematics and general knowledge subjects.

Out-of-Scope Use

The model should not be used for:

  • Critical decision-making in legal, medical, or financial contexts.
  • Generating highly technical or scientific content without human verification.
  • Tasks requiring flawless logical or spatial reasoning (see Limitations).

How to Get Started with the Model

Use the code below to get started with the model. Ensure you have the required libraries installed:

Requirements

掳transformers>=4.56.2 掳torch

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "Gems234/Alisia-7B-it"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Create a prompt
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate a response
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

You can also use the chat template format:

# Chat template
messages = [
    {"role": "user", "content": "Write me a poem about Machine Learning."},
]

inputs = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    return_dict=True
).to(model.device)

# G茅n茅ration
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Instruction format

To take full advantage of the model's performance, we recommend using the Alpaca format:

### Instruction:
{instruction}

### Input:
{input}

### Response:
{output}

For exemple:


instruction = "You are Alisia. Be concise and helpful."
input  = "where is the Eiffel Tower"
response= "" 

prompt = alpaca_prompt.format(instruction, input_text, output_text)

# Tokenizer
inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        top_p=0.9,
        do_sample=True
    )
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response

Limitations

A manual evaluation on a custom benchmark suite revealed the following performance profile for Alisia-7B-it:

Identified Strengths

  • Knowledge & Comprehension (MMLU-like): Achieved a perfect score (100%), demonstrating excellent recall of factual information across history, science, and literature.
  • Commonsense Reasoning (HellaSwag-like): Achieved a perfect score (100%), showing a robust understanding of everyday physical and social causality.
  • Mathematical Reasoning (GSM8K-like): Achieved a perfect score (100%), excelling at basic arithmetic, algebra, and problem-solving.

Identified Weaknesses

  • Logical & Creative Reasoning (ARC-like): Achieved a score of 60%. The model struggles with formal logic puzzles (e.g., syllogisms) and non-intuitive spatial reasoning problems. It is not recommended for applications requiring infallible abstract reasoning.
  • Linguistic Semantics (Winogrande-like): Achieved a score of 80%. While generally very good, the model can occasionally fail to resolve complex pronoun coreference ambiguities, potentially leading to minor misunderstandings in narrative text or dialogue.

Overall Benchmark Score: 88% (22/25 correct). The model is a robust generalist with a specific, predictable profile of strengths and weaknesses.

Bias, Risks, and Recommendations

Known Biases

  • Identity Bias: Due to the nature of its training data (which includes datasets like alpaca-cleaned), the model may occasionally incorrectly identify itself as "ChatGPT" or another AI system. This is a known artifact and does not reflect its actual origin or capabilities.
  • As with all large language models, Alisia-7B-it may reflect and amplify social biases present in its training data. Outputs should not be assumed to be free from bias.

Recommendations

Users should:

  • Be aware of the model's limitations in logical and spatial reasoning.
  • Critically evaluate its outputs, especially for critical applications.
  • Use a safety classifier or content filter in production environments.
Downloads last month
126
Safetensors
Model size
7.62B params
Tensor type
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for Gems234/Alisia-7B-it

Space using Gems234/Alisia-7B-it 1