Diptaraj Sen
README.md updated
0f043be
---
title: Image-to-Story-Generation
emoji: πŸš€
colorFrom: indigo
colorTo: blue
sdk: streamlit
app_file: streamlit_app.py
pinned: false
license: apache-2.0
---
# Image-to-Story-Generation
Image-to-Story-Generation is a Streamlit-based application that transforms images into captivating stories and narrates them aloud. The application leverages state-of-the-art AI models for image captioning, story generation, and text-to-speech conversion.
## Features
- **Image Captioning**: Automatically generates captions for uploaded images using a Vision-Encoder-Decoder model.
- **Story Generation**: Creates a short, coherent story based on the generated caption using a language model.
- **Text-to-Speech**: Converts the generated story into audio using Google Text-to-Speech (gTTS).
- **Streamlit Interface**: Provides an intuitive web interface for users to upload images and interact with the pipeline.
## How It Works
1. **Upload an Image**: Users upload an image via the Streamlit interface.
2. **Generate Caption**: The app generates a caption for the image using a pre-trained Vision-Encoder-Decoder model.
3. **Generate Story**: A short story is created based on the caption using a language model.
4. **Text-to-Speech**: The story is converted into an audio file, which can be played directly in the app.
## Installation
### Prerequisites
- Python 3.11 or higher
- [Hugging Face account](https://huggingface.co/) with an API token
- [Streamlit](https://streamlit.io/) installed
### Steps
1. Clone the repository:
```bash
git clone https://github.com/diptaraj23/Image-to-Story-Generation.git
cd genai-storyteller
```
2. Create a virtual environment and activate it:
```bash
python -m venv venv
venv\Scripts\activate # On Windows
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Add your Hugging Face API token to `.streamlit/secrets.toml`:
```toml
[HF_TOKEN]
HF_TOKEN = "your_hugging_face_api_token"
```
5. Run the Streamlit app:
```bash
streamlit run streamlit_app.py
```
6. Open the app in your browser at `http://localhost:8501`.
## File Structure
```
genai-storyteller/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ captioning.py # Image captioning logic
β”‚ β”œβ”€β”€ storytelling.py # Story generation logic
β”‚ β”œβ”€β”€ tts.py # Text-to-speech conversion
β”‚ β”œβ”€β”€ logger.py # Logging utility
β”œβ”€β”€ assets/ # Directory for input images
β”œβ”€β”€ outputs/ # Directory for generated audio files
β”œβ”€β”€ logs/ # Directory for log files
β”œβ”€β”€ tests/ # Unit tests for the application
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ test_captioning.py # Tests for captioning
β”‚ β”œβ”€β”€ test_story.py # Tests for story generation
β”‚ β”œβ”€β”€ test_tts.py # Tests for text-to-speech
β”œβ”€β”€ .streamlit/
β”‚ β”œβ”€β”€ config.toml # Streamlit configuration
β”œβ”€β”€ .devcontainer/
β”‚ β”œβ”€β”€ devcontainer.json # Dev container configuration
β”œβ”€β”€ requirements.txt # Python dependencies
β”œβ”€β”€ run_pipeline.py # CLI pipeline for image-to-audio
β”œβ”€β”€ streamlit_app.py # Main Streamlit application
β”œβ”€β”€ .gitignore # Git ignore file
```
## Usage
### Streamlit Interface
1. Upload an image in `.jpg`, `.jpeg`, or `.png` format.
2. Click the "Generate Story" button to process the image.
3. View the generated caption and story.
4. Listen to the story in audio format.
### Command-Line Interface
You can also run the pipeline via the command line:
```bash
python run_pipeline.py <image_filename>
```
Replace `<image_filename>` with the name of the image file in the `assets/` directory.
## Models Used
- **Image Captioning**: `ydshieh/vit-gpt2-coco-en` (Vision-Encoder-Decoder model)
- **Story Generation**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0` (Language model)
- **Text-to-Speech**: Google Text-to-Speech (gTTS)
## Logging
Logs are stored in the `logs/` directory. The application logs both to the console and to a file named `pipeline.log`.
<!--
## Testing
Unit tests are located in the `tests/` directory. To run the tests:
```bash
pytest tests/
```
-->
## Contributing
Contributions are welcome! Please fork the repository and submit a pull request.
## License
This project is licensed under the MIT License. See the `LICENSE` file for details.
## Acknowledgments
- [Hugging Face Transformers](https://huggingface.co/transformers/)
- [Streamlit](https://streamlit.io/)
- [Google Text-to-Speech](https://pypi.org/project/gTTS/)