metadata

title: Ocr Llm Test
emoji: 🌍
colorFrom: pink
colorTo: gray
sdk: gradio
sdk_version: 5.16.0
app_file: app.py
pinned: false
short_description: Technical Assessment

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

OCR LLM Classifier

This project provides a simple interface for Optical Character Recognition (OCR) and spam classification using deep learning models. It supports three OCR methods (PaddleOCR, EasyOCR, and KerasOCR) and uses a DistilBERT model for classifying the extracted text as "Spam" or "Not Spam."

Features

Extract text from images using OCR.
Classify extracted text as either "Spam" or "Not Spam."
Save the extracted text and classification results to a local JSON and CSV file.

How It Works

OCR: The app uses one of the three OCR methods to extract text from the uploaded image:
- PaddleOCR
- EasyOCR
- KerasOCR
Classification: The extracted text is passed to a pre-trained DistilBERT model that classifies the text as either "Spam" or "Not Spam."
Save Results: The extracted text and classification results are saved locally in both JSON and CSV formats, allowing easy retrieval and review.

Installation

To get started with this project, follow these steps:

1. Clone the Repository

git clone https://github.com/yourusername/ocr-llm-test.git
cd ocr-llm-test

2. Install Dependencies

You can install the required dependencies using pip:

pip install -r requirements.txt

3. Run the App

To run the Gradio interface locally, execute:

python app.py

Once the app is running, it will be accessible through your web browser at http://localhost:7860.

API Documentation

1. API Endpoint

The main endpoint for this API is /predict.

2. API Call Example

Install the Python Client

If you don't already have it installed, run the following command:

pip install gradio_client

Make an API Call

from gradio_client import Client, handle_file

client = Client("winamnd/ocr-llm-test")
result = client.predict(
    method="PaddleOCR",
    img=handle_file('https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png'),
    api_name="/predict"
)
print(result)

3. Parameters

Parameter	Type	Description
`method`	`Literal['PaddleOCR', 'EasyOCR', 'KerasOCR']`	Choose the OCR method to be used for text extraction. Default is "PaddleOCR."
`img`	`dict`	The image input, which can be provided as a URL, path, or base64 encoded image.

Image Input Details

path: Path to a local file.
url: Publicly available URL for the image.
size: The size of the image (in bytes).
orig_name: Original filename.
mime_type: MIME type of the image.
is_stream: Always set to False.
meta: Metadata.

4. Returns

The API returns a tuple with two elements:

Extracted Text (str): The text extracted from the image.
Spam Classification (str): The classification result ("Spam" or "Not Spam").