A newer version of the Streamlit SDK is available:
1.44.1
title: Image-to-Story-Generation
emoji: π
colorFrom: indigo
colorTo: blue
sdk: streamlit
app_file: streamlit_app.py
pinned: false
license: apache-2.0
Image-to-Story-Generation
Image-to-Story-Generation is a Streamlit-based application that transforms images into captivating stories and narrates them aloud. The application leverages state-of-the-art AI models for image captioning, story generation, and text-to-speech conversion.
Features
- Image Captioning: Automatically generates captions for uploaded images using a Vision-Encoder-Decoder model.
- Story Generation: Creates a short, coherent story based on the generated caption using a language model.
- Text-to-Speech: Converts the generated story into audio using Google Text-to-Speech (gTTS).
- Streamlit Interface: Provides an intuitive web interface for users to upload images and interact with the pipeline.
How It Works
- Upload an Image: Users upload an image via the Streamlit interface.
- Generate Caption: The app generates a caption for the image using a pre-trained Vision-Encoder-Decoder model.
- Generate Story: A short story is created based on the caption using a language model.
- Text-to-Speech: The story is converted into an audio file, which can be played directly in the app.
Installation
Prerequisites
- Python 3.11 or higher
- Hugging Face account with an API token
- Streamlit installed
Steps
Clone the repository:
git clone https://github.com/diptaraj23/Image-to-Story-Generation.git cd genai-storyteller
Create a virtual environment and activate it:
python -m venv venv venv\Scripts\activate # On Windows
Install dependencies:
pip install -r requirements.txt
Add your Hugging Face API token to
.streamlit/secrets.toml
:[HF_TOKEN] HF_TOKEN = "your_hugging_face_api_token"
Run the Streamlit app:
streamlit run streamlit_app.py
Open the app in your browser at
http://localhost:8501
.
File Structure
genai-storyteller/
βββ app/
β βββ __init__.py
β βββ captioning.py # Image captioning logic
β βββ storytelling.py # Story generation logic
β βββ tts.py # Text-to-speech conversion
β βββ logger.py # Logging utility
βββ assets/ # Directory for input images
βββ outputs/ # Directory for generated audio files
βββ logs/ # Directory for log files
βββ tests/ # Unit tests for the application
β βββ __init__.py
β βββ test_captioning.py # Tests for captioning
β βββ test_story.py # Tests for story generation
β βββ test_tts.py # Tests for text-to-speech
βββ .streamlit/
β βββ config.toml # Streamlit configuration
βββ .devcontainer/
β βββ devcontainer.json # Dev container configuration
βββ requirements.txt # Python dependencies
βββ run_pipeline.py # CLI pipeline for image-to-audio
βββ streamlit_app.py # Main Streamlit application
βββ .gitignore # Git ignore file
Usage
Streamlit Interface
- Upload an image in
.jpg
,.jpeg
, or.png
format. - Click the "Generate Story" button to process the image.
- View the generated caption and story.
- Listen to the story in audio format.
Command-Line Interface
You can also run the pipeline via the command line:
python run_pipeline.py <image_filename>
Replace <image_filename>
with the name of the image file in the assets/
directory.
Models Used
- Image Captioning:
ydshieh/vit-gpt2-coco-en
(Vision-Encoder-Decoder model) - Story Generation:
TinyLlama/TinyLlama-1.1B-Chat-v1.0
(Language model) - Text-to-Speech: Google Text-to-Speech (gTTS)
Logging
Logs are stored in the logs/
directory. The application logs both to the console and to a file named pipeline.log
.
Contributing
Contributions are welcome! Please fork the repository and submit a pull request.
License
This project is licensed under the MIT License. See the LICENSE
file for details.