CuisineClassifier / README.md
NoahMeissner's picture
Update README.md
8fd6a48 verified
metadata
language:
  - en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
  - xgboost
  - multiclass
  - cuisine
  - region-classification
  - kaggle
metrics:
  - accuracy
  - f1
model-index:
  - name: CuisineClassifier
    results:
      - task:
          type: text-classification
          name: Cuisine (20 classes)
        dataset:
          name: What's Cooking? (Kaggle)
          type: whats-
          url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
          split: test
        metrics:
          - type: accuracy
            value: 0.77
          - type: f1
            value: 0.69
      - task:
          type: text-classification
          name: Region (5 classes)
        dataset:
          name: What's Cooking? (Kaggle)  aggregated to regions
          type: whats-cooking
          url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
          split: test
        metrics:
          - type: accuracy
            value: 0.89

🍽 Cuisine Classifier (XGBoost)

This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
It uses an XGBoost classifier trained on normalized ingredient data.


📊 Model Overview

  • Task: Multiclass Classification (Cuisines & Regions)
  • Input: List of ingredients (["salt", "flour", "sugar", ...])
  • Output: Cuisine class (e.g. "italian") or Region (e.g. "Central Europe")
  • Algorithm: XGBoost
  • Training Data: Kaggle What’s Cooking? dataset, ingredients normalized using AllRecipes dataset
  • Train/Test Split: 80 / 20, stratified
  • Cross Validation: 5-fold CV with random_state=42

🌍 Region Mapping

Region Cuisines
Central Europe british, french, greek, irish, italian, russian, spanish
North America cajun_creole, southern_us
Asia chinese, filipino, indian, japanese, korean, thai, vietnamese
Middle East moroccan
Latin America mexican, jamaican, brazilian

🧪 Performance

Model Comparison

Metric Stratified Baseline Logistic Regression XGBoost
Precision (20 cuisines) 0.05 0.65 0.75
Recall (20 cuisines) 0.05 0.69 0.66
Macro F1 (20 cuisines) 0.05 0.67 0.69
Accuracy (20 cuisines) 0.10 0.75 0.77
Accuracy (5 regions) 0.27 0.89 0.89

Conclusion:
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.


Per-Region Metrics (5 Classes)

Region Precision (XGB) Recall (XGB) F1 (XGB)
Asia 0.94 0.92 0.93
Central Europe 0.85 0.93 0.89
Latin America 0.92 0.88 0.90
Middle East 0.88 0.74 0.81
North America 0.87 0.76 0.81

🚀 How to Use

from huggingface_hub import hf_hub_download
import joblib

class CuisineClassifier:

    def __init__(self, classifier="region"):
        print("Initializing CuisineClassifier...")

        components = ["cuisine_pipeline", "label_encoder"]
        paths = {}

        print("Downloading files from Hugging Face Hub...")
        for name in components:
            print(f"Downloading {name}.joblib ...")
            try:
                paths[name] = hf_hub_download(
                    repo_id="NoahMeissner/CuisineClassifier", 
                    filename=f"region_classifier/{name}.joblib"
                    if classifier == "cuisine":
                      filename=f"cuisine_classifier/{name}.joblib"
                )
                print(f"{name} downloaded.")
            except Exception as e:
                print(f"Failed to download {name}: {e}")
                raise

        print("Loading model components with joblib...")
        try:
            self.model = joblib.load(paths["cuisine_pipeline"])
            print("Model loaded.")
            self.label_encoder = joblib.load(paths["label_encoder"])
            print("Label encoder loaded.")
        except Exception as e:
            print(f"Failed to load components: {e}")
            raise

        print("All components loaded successfully.")

    def classify(self, text_input):
        data = " ".join(text_input)
        predicted_class = self.model.predict([data])
        predicted_label = self.label_encoder.inverse_transform(predicted_class)
        return predicted_label