CuisineClassifier / README.md

NoahMeissner

Update README.md

8fd6a48 verified 11 days ago

preview code

raw

history blame contribute delete

5.05 kB

metadata

language:
  - en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
  - xgboost
  - multiclass
  - cuisine
  - region-classification
  - kaggle
metrics:
  - accuracy
  - f1
model-index:
  - name: CuisineClassifier
    results:
      - task:
          type: text-classification
          name: Cuisine (20 classes)
        dataset:
          name: What's Cooking? (Kaggle)
          type: whats-
          url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
          split: test
        metrics:
          - type: accuracy
            value: 0.77
          - type: f1
            value: 0.69
      - task:
          type: text-classification
          name: Region (5 classes)
        dataset:
          name: What's Cooking? (Kaggle) — aggregated to regions
          type: whats-cooking
          url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
          split: test
        metrics:
          - type: accuracy
            value: 0.89

🍽 Cuisine Classifier (XGBoost)

This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
It uses an XGBoost classifier trained on normalized ingredient data.

📊 Model Overview

Task: Multiclass Classification (Cuisines & Regions)
Input: List of ingredients (["salt", "flour", "sugar", ...])
Output: Cuisine class (e.g. "italian") or Region (e.g. "Central Europe")
Algorithm: XGBoost
Training Data: Kaggle What’s Cooking? dataset, ingredients normalized using AllRecipes dataset
Train/Test Split: 80 / 20, stratified
Cross Validation: 5-fold CV with random_state=42

🌍 Region Mapping

Region	Cuisines
Central Europe	british, french, greek, irish, italian, russian, spanish
North America	cajun_creole, southern_us
Asia	chinese, filipino, indian, japanese, korean, thai, vietnamese
Middle East	moroccan
Latin America	mexican, jamaican, brazilian

🧪 Performance

Model Comparison

Metric	Stratified Baseline	Logistic Regression	XGBoost
Precision (20 cuisines)	0.05	0.65	0.75
Recall (20 cuisines)	0.05	0.69	0.66
Macro F1 (20 cuisines)	0.05	0.67	0.69
Accuracy (20 cuisines)	0.10	0.75	0.77
Accuracy (5 regions)	0.27	0.89	0.89

✅ Conclusion:
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.

Per-Region Metrics (5 Classes)

Region	Precision (XGB)	Recall (XGB)	F1 (XGB)
Asia	0.94	0.92	0.93
Central Europe	0.85	0.93	0.89
Latin America	0.92	0.88	0.90
Middle East	0.88	0.74	0.81
North America	0.87	0.76	0.81

🚀 How to Use

from huggingface_hub import hf_hub_download
import joblib

class CuisineClassifier:

    def __init__(self, classifier="region"):
        print("Initializing CuisineClassifier...")

        components = ["cuisine_pipeline", "label_encoder"]
        paths = {}

        print("Downloading files from Hugging Face Hub...")
        for name in components:
            print(f"Downloading {name}.joblib ...")
            try:
                paths[name] = hf_hub_download(
                    repo_id="NoahMeissner/CuisineClassifier", 
                    filename=f"region_classifier/{name}.joblib"
                    if classifier == "cuisine":
                      filename=f"cuisine_classifier/{name}.joblib"
                )
                print(f"{name} downloaded.")
            except Exception as e:
                print(f"Failed to download {name}: {e}")
                raise

        print("Loading model components with joblib...")
        try:
            self.model = joblib.load(paths["cuisine_pipeline"])
            print("Model loaded.")
            self.label_encoder = joblib.load(paths["label_encoder"])
            print("Label encoder loaded.")
        except Exception as e:
            print(f"Failed to load components: {e}")
            raise

        print("All components loaded successfully.")

    def classify(self, text_input):
        data = " ".join(text_input)
        predicted_class = self.model.predict([data])
        predicted_label = self.label_encoder.inverse_transform(predicted_class)
        return predicted_label