BERTopic_Multimodal / README.md
MaartenGr's picture
Update README.md
7551eb1
---
tags:
- bertopic
library_name: bertopic
---
# BERTopic_Multimodal
This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model.
BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
This model was trained on 8000 images from Flickr **without** the captions. This demonstrates how BERTopic can be used for topic modeling using images as input only.
A few examples of generated topics:
!["multimodal.png"](multimodal.png)
## Usage
To use this model, please install BERTopic:
```
pip install -U bertopic[vision]
pip install -U safetensors
```
You can use the model as follows:
```python
from bertopic import BERTopic
topic_model = BERTopic.load("MaartenGr/BERTopic_Multimodal")
topic_model.get_topic_info()
```
You can view all information about a topic as follows:
```python
topic_model.get_topic(topic_id, full=True)
```
## Topic overview
* Number of topics: 29
* Number of training documents: 8091
<details>
<summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label |
|----------|----------------|-----------------|-------|
| -1 | while - air - the - in - jumping | 34 | -1_while_air_the_in |
| 0 | bench - sitting - people - woman - street | 1132 | 0_bench_sitting_people_woman |
| 1 | grass - running - dog - grassy - field | 1693 | 1_grass_running_dog_grassy |
| 2 | boy - girl - little - young - holding | 1290 | 2_boy_girl_little_young |
| 3 | dog - frisbee - running - water - mouth | 1224 | 3_dog_frisbee_running_water |
| 4 | skateboard - ramp - doing - trick - cement | 415 | 4_skateboard_ramp_doing_trick |
| 5 | snow - dog - covered - running - through | 309 | 5_snow_dog_covered_running |
| 6 | mountain - range - slope - standing - person | 205 | 6_mountain_range_slope_standing |
| 7 | pool - blue - boy - toy - water | 189 | 7_pool_blue_boy_toy |
| 8 | trail - bike - down - riding - person | 166 | 8_trail_bike_down_riding |
| 9 | snowboarder - mid - jump - air - after | 126 | 9_snowboarder_mid_jump_air |
| 10 | rock - climbing - up - wall - tree | 124 | 10_rock_climbing_up_wall |
| 11 | wave - surfboard - top - riding - of | 112 | 11_wave_surfboard_top_riding |
| 12 | beach - surfboard - people - with - walking | 102 | 12_beach_surfboard_people_with |
| 13 | jumping - track - horse - racquet - dog | 98 | 13_jumping_track_horse_racquet |
| 14 | snowboard - snow - girl - hill - slope | 95 | 14_snowboard_snow_girl_hill |
| 15 | game - being - football - played - professional | 91 | 15_game_being_football_played |
| 16 | soccer - kicking - team - ball - player | 80 | 16_soccer_kicking_team_ball |
| 17 | dirt - bike - person - rider - going | 75 | 17_dirt_bike_person_rider |
| 18 | soccer - boys - field - ball - kicking | 69 | 18_soccer_boys_field_ball |
| 19 | baseball - player - bat - swinging - into | 63 | 19_baseball_player_bat_swinging |
| 20 | basketball - up - and - playing - jumping | 59 | 20_basketball_up_and_playing |
| 21 | bird - body - flying - over - long | 55 | 21_bird_body_flying_over |
| 22 | motorcycle - track - race - racer - racing | 55 | 22_motorcycle_track_race_racer |
| 23 | boat - sitting - water - lake - hose | 53 | 23_boat_sitting_water_lake |
| 24 | street - riding - down - bike - woman | 52 | 24_street_riding_down_bike |
| 25 | paddle - suit - paddling - water - in | 49 | 25_paddle_suit_paddling_water |
| 26 | pair - scissors - stage - white - shirt | 42 | 26_pair_scissors_stage_white |
| 27 | tennis - court - racket - racquet - swinging | 34 | 27_tennis_court_racket_racquet |
</details>
## Training Procedure
The data was retrieved as follows:
```python
import os
import glob
import zipfile
import numpy as np
import pandas as pd
from tqdm import tqdm
from sentence_transformers import util
# Flickr 8k images
img_folder = 'photos/'
caps_folder = 'captions/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
os.makedirs(img_folder, exist_ok=True)
if not os.path.exists('Flickr8k_Dataset.zip'): #Download dataset if does not exist
util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip', 'Flickr8k_Dataset.zip')
util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip', 'Flickr8k_text.zip')
for folder, file in [(img_folder, 'Flickr8k_Dataset.zip'), (caps_folder, 'Flickr8k_text.zip')]:
with zipfile.ZipFile(file, 'r') as zf:
for member in tqdm(zf.infolist(), desc='Extracting'):
zf.extract(member, folder)
images = list(glob.glob('photos/Flicker8k_Dataset/*.jpg'))
```
Then, to perform topic modeling on multimodal data with BERTopic:
```python
from bertopic import BERTopic
from bertopic.backend import MultiModalBackend
from bertopic.representation import VisualRepresentation, KeyBERTInspired
# Image embedding model
embedding_model = MultiModalBackend('clip-ViT-B-32', batch_size=32)
# Image to text representation model
representation_model = {
"Visual_Aspect": VisualRepresentation(image_to_text_model="nlpconnect/vit-gpt2-image-captioning", image_squares=True),
"KeyBERT": KeyBERTInspired()
}
# Train our model with images only
topic_model = BERTopic(representation_model=representation_model, verbose=True, embedding_model=embedding_model, min_topic_size=30)
topics, probs = topic_model.fit_transform(documents=None, images=images)
```
The above demonstrates that the input were only images. These images are clustered and from those clusters a small subset of representative images are extracted. The representative images are captioned using `"nlpconnect/vit-gpt2-image-captioning"` to generate a small textual dataset over which we can run c-TF-IDF and the additional
`KeyBERTInspired` representation model.
## Training hyperparameters
* calculate_probabilities: False
* language: None
* low_memory: False
* min_topic_size: 30
* n_gram_range: (1, 1)
* nr_topics: None
* seed_topic_list: None
* top_n_words: 10
* verbose: True
## Framework versions
* Numpy: 1.23.5
* HDBSCAN: 0.8.29
* UMAP: 0.5.3
* Pandas: 1.5.3
* Scikit-Learn: 1.2.2
* Sentence-transformers: 2.2.2
* Transformers: 4.29.2
* Numba: 0.56.4
* Plotly: 5.14.1
* Python: 3.10.10