🍽 Cuisine Classifier (XGBoost)

This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
It uses an XGBoost classifier trained on normalized ingredient data.

📊 Model Overview

Task: Multiclass Classification (Cuisines & Regions)
Input: List of ingredients (["salt", "flour", "sugar", ...])
Output: Cuisine class (e.g. "italian") or Region (e.g. "Central Europe")
Algorithm: XGBoost
Training Data: Kaggle What’s Cooking? dataset, ingredients normalized using AllRecipes dataset
Train/Test Split: 80 / 20, stratified
Cross Validation: 5-fold CV with random_state=42

🌍 Region Mapping

Region	Cuisines
Central Europe	british, french, greek, irish, italian, russian, spanish
North America	cajun_creole, southern_us
Asia	chinese, filipino, indian, japanese, korean, thai, vietnamese
Middle East	moroccan
Latin America	mexican, jamaican, brazilian

🧪 Performance

Model Comparison

Metric	Stratified Baseline	Logistic Regression	XGBoost
Precision (20 cuisines)	0.05	0.65	0.75
Recall (20 cuisines)	0.05	0.69	0.66
Macro F1 (20 cuisines)	0.05	0.67	0.69
Accuracy (20 cuisines)	0.10	0.75	0.77
Accuracy (5 regions)	0.27	0.89	0.89

✅ Conclusion:
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.

Per-Region Metrics (5 Classes)

Region	Precision (XGB)	Recall (XGB)	F1 (XGB)
Asia	0.94	0.92	0.93
Central Europe	0.85	0.93	0.89
Latin America	0.92	0.88	0.90
Middle East	0.88	0.74	0.81
North America	0.87	0.76	0.81

🚀 How to Use

from huggingface_hub import hf_hub_download
import joblib

class CuisineClassifier:

    def __init__(self, classifier="region"):
        print("Initializing CuisineClassifier...")

        components = ["cuisine_pipeline", "label_encoder"]
        paths = {}

        print("Downloading files from Hugging Face Hub...")
        for name in components:
            print(f"Downloading {name}.joblib ...")
            try:
                paths[name] = hf_hub_download(
                    repo_id="NoahMeissner/CuisineClassifier", 
                    filename=f"region_classifier/{name}.joblib"
                    if classifier == "cuisine":
                      filename=f"cuisine_classifier/{name}.joblib"
                )
                print(f"{name} downloaded.")
            except Exception as e:
                print(f"Failed to download {name}: {e}")
                raise

        print("Loading model components with joblib...")
        try:
            self.model = joblib.load(paths["cuisine_pipeline"])
            print("Model loaded.")
            self.label_encoder = joblib.load(paths["label_encoder"])
            print("Label encoder loaded.")
        except Exception as e:
            print(f"Failed to load components: {e}")
            raise

        print("All components loaded successfully.")

    def classify(self, text_input):
        data = " ".join(text_input)
        predicted_class = self.model.predict([data])
        predicted_label = self.label_encoder.inverse_transform(predicted_class)
        return predicted_label

Downloads last month: 3

Evaluation results

accuracy on What's Cooking? (Kaggle)
test set self-reported

0.770
f1 on What's Cooking? (Kaggle)
test set self-reported

0.690
accuracy on What's Cooking? (Kaggle) — aggregated to regions
test set self-reported

0.890

View on Papers With Code