---
language:
- en
license: mit
library_name: xgboost
pipeline_tag: text-classification
tags:
- xgboost
- multiclass
- cuisine
- region-classification
- kaggle
metrics:
- accuracy
- f1
model-index:
- name: CuisineClassifier
  results:
  - task:
      type: text-classification
      name: Cuisine (20 classes)
    dataset:
      name: What's Cooking? (Kaggle)
      type: whats-
      url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
      split: test
    metrics:
    - type: accuracy
      value: 0.77
    - type: f1
      value: 0.69
  - task:
      type: text-classification
      name: Region (5 classes)
    dataset:
      name: What's Cooking? (Kaggle) — aggregated to regions
      type: whats-cooking
      url: https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset
      split: test
    metrics:
    - type: accuracy
      value: 0.89
---

# 🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a **Cuisine (20 classes)** or a **Region (5 classes)**.  
It uses an **XGBoost classifier** trained on normalized ingredient data.

---

## 📊 Model Overview

- **Task**: Multiclass Classification (Cuisines & Regions)  
- **Input**: List of ingredients (`["salt", "flour", "sugar", ...]`)  
- **Output**: Cuisine class (e.g. `"italian"`) or Region (e.g. `"Central Europe"`)  
- **Algorithm**: [XGBoost](https://xgboost.ai/)  
- **Training Data**: Kaggle [*What’s Cooking?*](https://www.kaggle.com/datasets/kaggle/recipe-ingredients-dataset) dataset, ingredients normalized using AllRecipes dataset  
- **Train/Test Split**: 80 / 20, stratified  
- **Cross Validation**: 5-fold CV with `random_state=42`

### 🌍 Region Mapping
| Region          | Cuisines                                                   |
|-----------------|-----------------------------------------------------------|
| Central Europe  | british, french, greek, irish, italian, russian, spanish  |
| North America   | cajun_creole, southern_us                                 |
| Asia            | chinese, filipino, indian, japanese, korean, thai, vietnamese |
| Middle East     | moroccan                                                  |
| Latin America   | mexican, jamaican, brazilian                              |


---

## 🧪 Performance

### Model Comparison

| Metric | Stratified Baseline | Logistic Regression | XGBoost |
|-------|----------------------|---------------------|---------|
| **Precision (20 cuisines)** | 0.05 | 0.65 | **0.75** |
| **Recall (20 cuisines)**    | 0.05 | **0.69** | 0.66 |
| **Macro F1 (20 cuisines)**  | 0.05 | 0.67 | **0.69** |
| **Accuracy (20 cuisines)**  | 0.10 | 0.75 | **0.77** |
| **Accuracy (5 regions)**    | 0.27 | **0.89** | **0.89** |

✅ **Conclusion:**  
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.  
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.

---

### Per-Region Metrics (5 Classes)

| Region          | Precision (XGB) | Recall (XGB) | F1 (XGB) |
|-----------------|------------------|--------------|----------|
| Asia           | 0.94 | 0.92 | 0.93 |
| Central Europe | 0.85 | **0.93** | 0.89 |
| Latin America  | 0.92 | 0.88 | 0.90 |
| Middle East    | **0.88** | 0.74 | 0.81 |
| North America  | **0.87** | 0.76 | 0.81 |

---

## 🚀 How to Use

```python
from huggingface_hub import hf_hub_download
import joblib

class CuisineClassifier:

    def __init__(self, classifier="region"):
        print("Initializing CuisineClassifier...")

        components = ["cuisine_pipeline", "label_encoder"]
        paths = {}

        print("Downloading files from Hugging Face Hub...")
        for name in components:
            print(f"Downloading {name}.joblib ...")
            try:
                paths[name] = hf_hub_download(
                    repo_id="NoahMeissner/CuisineClassifier", 
                    filename=f"region_classifier/{name}.joblib"
                    if classifier == "cuisine":
                      filename=f"cuisine_classifier/{name}.joblib"
                )
                print(f"{name} downloaded.")
            except Exception as e:
                print(f"Failed to download {name}: {e}")
                raise

        print("Loading model components with joblib...")
        try:
            self.model = joblib.load(paths["cuisine_pipeline"])
            print("Model loaded.")
            self.label_encoder = joblib.load(paths["label_encoder"])
            print("Label encoder loaded.")
        except Exception as e:
            print(f"Failed to load components: {e}")
            raise

        print("All components loaded successfully.")

    def classify(self, text_input):
        data = " ".join(text_input)
        predicted_class = self.model.predict([data])
        predicted_label = self.label_encoder.inverse_transform(predicted_class)
        return predicted_label