Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Afri-Aya Gemma 3 4B Vision - LoRA Adapters 🌍

This repository contains the LoRA (Low-Rank Adaptation) adapters for the Afri-Aya Gemma 3 4B Vision model, fine-tuned on the Afri-Aya dataset for African cultural visual question answering.

Model Details

  • Base Model: unsloth/gemma-3-4b-pt
  • Training Dataset: CohereLabsCommunity/afri-aya (2,466 images, 13 African languages)
  • Fine-tuning Method: LoRA with Unsloth
  • Languages Supported: English + 13 African languages
  • LoRA Rank: 16
  • Training Framework: Unsloth + TRL

Repository Contents

This repository contains the LoRA adapter weights that can be applied to the base Gemma 3 4B model:

  • adapter_config.json - LoRA configuration
  • adapter_model.safetensors - LoRA adapter weights
  • README.md - This documentation
  • Other supporting files for the LoRA adapters

Usage

Option 1: Load LoRA Adapters with Unsloth

from unsloth import FastVisionModel

# Load base model with LoRA adapters
model, processor = FastVisionModel.from_pretrained(
    model_name="Bronsn/afri-aya-gemma-3-4b-vision-lora",
    load_in_4bit=True,
)

# Enable inference mode
FastVisionModel.for_inference(model)

Option 2: Use with PEFT/Transformers

from transformers import AutoModelForVision2Seq, AutoProcessor
from peft import PeftModel

# Load base model
base_model = AutoModelForVision2Seq.from_pretrained(
    "unsloth/gemma-3-4b-pt",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "Bronsn/afri-aya-gemma-3-4b-vision-lora")
processor = AutoProcessor.from_pretrained("unsloth/gemma-3-4b-pt")

Option 3: Merge and Use

For production use, you might want to merge the adapters with the base model:

from unsloth import FastVisionModel

# Load with LoRA
model, processor = FastVisionModel.from_pretrained(
    model_name="Bronsn/afri-aya-gemma-3-4b-vision-lora",
    load_in_4bit=True,
)

# Merge and save
model = FastVisionModel.merge_and_unload(model)
model.save_pretrained("merged_model")
processor.save_pretrained("merged_model")

Merged Model

For convenience, we also provide a merged version of this model at: Bronsn/afri-aya-gemma-3-4b-vision

The merged model is ready to use without requiring LoRA loading.

Training Details

  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: Vision and text projection layers
  • Learning Rate: 2e-4
  • Batch Size: 1 (with gradient accumulation)
  • Epochs: 1
  • Training Framework: Unsloth for efficient fine-tuning

Dataset

Trained on the Afri-Aya dataset which includes:

  • 2,466 images from 13 African cultures
  • Bilingual captions (English + local languages)
  • Cultural Q&A pairs for each image
  • 13 categories: Food, Festivals, Notable Figures, Music, etc.

Languages Covered

Luganda, Kinyarwanda, Arabic, Twi, Hausa, Nyankore, Yoruba, Kirundi, Zulu, Swahili, Gishu, Krio, Igbo

Example Usage

from unsloth import FastVisionModel
from transformers import TextStreamer
from PIL import Image

# Load model with LoRA adapters
model, processor = FastVisionModel.from_pretrained(
    "Bronsn/afri-aya-gemma-3-4b-vision-lora",
    load_in_4bit=True,
)
FastVisionModel.for_inference(model)

# Prepare input
image = Image.open("african_cultural_image.jpg")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "What cultural significance does this image have?"},
            {"type": "image"},
        ],
    }
]

# Generate response
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")

text_streamer = TextStreamer(processor.tokenizer, skip_prompt=True)
result = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128)

Model Performance

This model has been fine-tuned specifically for:

  • African cultural image understanding
  • Multilingual visual question answering
  • Cultural context recognition
  • Traditional and modern African life scenarios

Citation

@model{afri_aya_lora_2024,
  title={Afri-Aya Gemma 3 4B Vision LoRA: African Cultural VQA Adapters},
  author={Cohere Labs Regional Africa Community},
  year={2024},
  publisher={HuggingFace},
  url={https://huggingface.co/Bronsn/afri-aya-gemma-3-4b-vision-lora}
}

License

Apache 2.0

Acknowledgments

  • Dataset: Afri-Aya dataset by Cohere Labs Regional Africa Community
  • Base Model: Gemma 3 4B by Google
  • Training Framework: Unsloth for efficient LoRA fine-tuning
  • Community: Expedition Aya challenge participants

LoRA adapters created with ❀️ for African culture preservation and education

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Bronsn/afri-aya-gemma-3-4b-vision-lora

Adapter
(2)
this model

Dataset used to train Bronsn/afri-aya-gemma-3-4b-vision-lora