QLoRA Fine-tuned PaliGemma-3B for Visual Reasoning on CLEVR-CoGen

This repository contains the QLoRA adapters for the google/paligemma-3b-pt-224 model, fine-tuned for a Visual Question Answering (VQA) task on the leonardPKU/clevr_cogen_a_train dataset.

This fine-tuned model demonstrates significantly improved performance on questions requiring spatial and logical reasoning about complex scenes with multiple objects compared to the base PaliGemma model. The use of QLoRA (4-bit quantization) makes it possible to run and train this powerful model on consumer-grade hardware.

Model Description

Base Model: google/paligemma-3b-pt-224
Fine-tuning Technique: QLoRA (Quantized Low-Rank Adaptation)
Task: Visual Question Answering (VQA)
Dataset: A subset of leonardPKU/clevr_cogen_a_train
Key Improvement: Enhanced ability to perform complex reasoning, counting, and attribute identification in visual scenes.

How to Use

To use this model, you must load the 4-bit quantized base model and then apply the PEFT adapters from this repository.

Installation

First, ensure you have the necessary libraries installed:

pip install -q transformers peft bitsandbytes accelerate Pillow requests

Downloads last month: -

Model tree for tahamajs/paligemma-clevr-cogen-qlora

Base model

google/paligemma-3b-pt-224

Adapter

(231)

this model

tahamajs
/

paligemma-clevr-cogen-qlora

QLoRA Fine-tuned PaliGemma-3B for Visual Reasoning on CLEVR-CoGen

Model Description

How to Use

Installation

Model tree for tahamajs/paligemma-clevr-cogen-qlora

Dataset used to train tahamajs/paligemma-clevr-cogen-qlora