metadata
language:
- en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
- xgboost
- multiclass
- cuisine
- region-classification
- kaggle
metrics:
- accuracy
- f1
model-index:
- name: CuisineClassifier
results:
- task:
type: text-classification
name: Cuisine (20 classes)
dataset:
name: What's Cooking? (Kaggle)
type: whats-
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.77
- type: f1
value: 0.69
- task:
type: text-classification
name: Region (5 classes)
dataset:
name: What's Cooking? (Kaggle) — aggregated to regions
type: whats-cooking
url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
split: test
metrics:
- type: accuracy
value: 0.89
🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
It uses an XGBoost classifier trained on normalized ingredient data.
📊 Model Overview
- Task: Multiclass Classification (Cuisines & Regions)
- Input: List of ingredients (
["salt", "flour", "sugar", ...]
) - Output: Cuisine class (e.g.
"italian"
) or Region (e.g."Central Europe"
) - Algorithm: XGBoost
- Training Data: Kaggle What’s Cooking? dataset, ingredients normalized using AllRecipes dataset
- Train/Test Split: 80 / 20, stratified
- Cross Validation: 5-fold CV with
random_state=42
🌍 Region Mapping
Region | Cuisines |
---|---|
Central Europe | british, french, greek, irish, italian, russian, spanish |
North America | cajun_creole, southern_us |
Asia | chinese, filipino, indian, japanese, korean, thai, vietnamese |
Middle East | moroccan |
Latin America | mexican, jamaican, brazilian |
🧪 Performance
Model Comparison
Metric | Stratified Baseline | Logistic Regression | XGBoost |
---|---|---|---|
Precision (20 cuisines) | 0.05 | 0.65 | 0.75 |
Recall (20 cuisines) | 0.05 | 0.69 | 0.66 |
Macro F1 (20 cuisines) | 0.05 | 0.67 | 0.69 |
Accuracy (20 cuisines) | 0.10 | 0.75 | 0.77 |
Accuracy (5 regions) | 0.27 | 0.89 | 0.89 |
✅ Conclusion:
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.
Per-Region Metrics (5 Classes)
Region | Precision (XGB) | Recall (XGB) | F1 (XGB) |
---|---|---|---|
Asia | 0.94 | 0.92 | 0.93 |
Central Europe | 0.85 | 0.93 | 0.89 |
Latin America | 0.92 | 0.88 | 0.90 |
Middle East | 0.88 | 0.74 | 0.81 |
North America | 0.87 | 0.76 | 0.81 |
🚀 How to Use
from huggingface_hub import hf_hub_download
import joblib
class CuisineClassifier:
def __init__(self, classifier="region"):
print("Initializing CuisineClassifier...")
components = ["cuisine_pipeline", "label_encoder"]
paths = {}
print("Downloading files from Hugging Face Hub...")
for name in components:
print(f"Downloading {name}.joblib ...")
try:
paths[name] = hf_hub_download(
repo_id="NoahMeissner/CuisineClassifier",
filename=f"region_classifier/{name}.joblib"
if classifier == "cuisine":
filename=f"cuisine_classifier/{name}.joblib"
)
print(f"{name} downloaded.")
except Exception as e:
print(f"Failed to download {name}: {e}")
raise
print("Loading model components with joblib...")
try:
self.model = joblib.load(paths["cuisine_pipeline"])
print("Model loaded.")
self.label_encoder = joblib.load(paths["label_encoder"])
print("Label encoder loaded.")
except Exception as e:
print(f"Failed to load components: {e}")
raise
print("All components loaded successfully.")
def classify(self, text_input):
data = " ".join(text_input)
predicted_class = self.model.predict([data])
predicted_label = self.label_encoder.inverse_transform(predicted_class)
return predicted_label