|
--- |
|
title: Image-to-Story-Generation |
|
emoji: π |
|
colorFrom: indigo |
|
colorTo: blue |
|
sdk: streamlit |
|
app_file: streamlit_app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
|
|
# Image-to-Story-Generation |
|
Image-to-Story-Generation is a Streamlit-based application that transforms images into captivating stories and narrates them aloud. The application leverages state-of-the-art AI models for image captioning, story generation, and text-to-speech conversion. |
|
|
|
## Features |
|
|
|
- **Image Captioning**: Automatically generates captions for uploaded images using a Vision-Encoder-Decoder model. |
|
- **Story Generation**: Creates a short, coherent story based on the generated caption using a language model. |
|
- **Text-to-Speech**: Converts the generated story into audio using Google Text-to-Speech (gTTS). |
|
- **Streamlit Interface**: Provides an intuitive web interface for users to upload images and interact with the pipeline. |
|
|
|
## How It Works |
|
|
|
1. **Upload an Image**: Users upload an image via the Streamlit interface. |
|
2. **Generate Caption**: The app generates a caption for the image using a pre-trained Vision-Encoder-Decoder model. |
|
3. **Generate Story**: A short story is created based on the caption using a language model. |
|
4. **Text-to-Speech**: The story is converted into an audio file, which can be played directly in the app. |
|
|
|
## Installation |
|
|
|
### Prerequisites |
|
|
|
- Python 3.11 or higher |
|
- [Hugging Face account](https://huggingface.co/) with an API token |
|
- [Streamlit](https://streamlit.io/) installed |
|
|
|
### Steps |
|
|
|
1. Clone the repository: |
|
```bash |
|
git clone https://github.com/diptaraj23/Image-to-Story-Generation.git |
|
cd genai-storyteller |
|
``` |
|
|
|
2. Create a virtual environment and activate it: |
|
```bash |
|
python -m venv venv |
|
venv\Scripts\activate # On Windows |
|
``` |
|
|
|
3. Install dependencies: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
4. Add your Hugging Face API token to `.streamlit/secrets.toml`: |
|
```toml |
|
[HF_TOKEN] |
|
HF_TOKEN = "your_hugging_face_api_token" |
|
``` |
|
|
|
5. Run the Streamlit app: |
|
```bash |
|
streamlit run streamlit_app.py |
|
``` |
|
|
|
6. Open the app in your browser at `http://localhost:8501`. |
|
|
|
## File Structure |
|
|
|
``` |
|
genai-storyteller/ |
|
βββ app/ |
|
β βββ __init__.py |
|
β βββ captioning.py # Image captioning logic |
|
β βββ storytelling.py # Story generation logic |
|
β βββ tts.py # Text-to-speech conversion |
|
β βββ logger.py # Logging utility |
|
βββ assets/ # Directory for input images |
|
βββ outputs/ # Directory for generated audio files |
|
βββ logs/ # Directory for log files |
|
βββ tests/ # Unit tests for the application |
|
β βββ __init__.py |
|
β βββ test_captioning.py # Tests for captioning |
|
β βββ test_story.py # Tests for story generation |
|
β βββ test_tts.py # Tests for text-to-speech |
|
βββ .streamlit/ |
|
β βββ config.toml # Streamlit configuration |
|
βββ .devcontainer/ |
|
β βββ devcontainer.json # Dev container configuration |
|
βββ requirements.txt # Python dependencies |
|
βββ run_pipeline.py # CLI pipeline for image-to-audio |
|
βββ streamlit_app.py # Main Streamlit application |
|
βββ .gitignore # Git ignore file |
|
``` |
|
|
|
## Usage |
|
|
|
### Streamlit Interface |
|
|
|
1. Upload an image in `.jpg`, `.jpeg`, or `.png` format. |
|
2. Click the "Generate Story" button to process the image. |
|
3. View the generated caption and story. |
|
4. Listen to the story in audio format. |
|
|
|
### Command-Line Interface |
|
|
|
You can also run the pipeline via the command line: |
|
|
|
```bash |
|
python run_pipeline.py <image_filename> |
|
``` |
|
|
|
Replace `<image_filename>` with the name of the image file in the `assets/` directory. |
|
|
|
## Models Used |
|
|
|
- **Image Captioning**: `ydshieh/vit-gpt2-coco-en` (Vision-Encoder-Decoder model) |
|
- **Story Generation**: `TinyLlama/TinyLlama-1.1B-Chat-v1.0` (Language model) |
|
- **Text-to-Speech**: Google Text-to-Speech (gTTS) |
|
|
|
## Logging |
|
|
|
Logs are stored in the `logs/` directory. The application logs both to the console and to a file named `pipeline.log`. |
|
|
|
<!-- |
|
## Testing |
|
|
|
Unit tests are located in the `tests/` directory. To run the tests: |
|
|
|
```bash |
|
pytest tests/ |
|
``` |
|
--> |
|
## Contributing |
|
|
|
Contributions are welcome! Please fork the repository and submit a pull request. |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License. See the `LICENSE` file for details. |
|
|
|
## Acknowledgments |
|
|
|
- [Hugging Face Transformers](https://huggingface.co/transformers/) |
|
- [Streamlit](https://streamlit.io/) |
|
- [Google Text-to-Speech](https://pypi.org/project/gTTS/) |