🍽 Cuisine Classifier (XGBoost)
This model classifies dishes based on their ingredients and assigns them either to a Cuisine (20 classes) or a Region (5 classes).
It uses an XGBoost classifier trained on normalized ingredient data.
📊 Model Overview
- Task: Multiclass Classification (Cuisines & Regions)
- Input: List of ingredients (
["salt", "flour", "sugar", ...]
)
- Output: Cuisine class (e.g.
"italian"
) or Region (e.g. "Central Europe"
)
- Algorithm: XGBoost
- Training Data: Kaggle What’s Cooking? dataset, ingredients normalized using AllRecipes dataset
- Train/Test Split: 80 / 20, stratified
- Cross Validation: 5-fold CV with
random_state=42
🌍 Region Mapping
Region |
Cuisines |
Central Europe |
british, french, greek, irish, italian, russian, spanish |
North America |
cajun_creole, southern_us |
Asia |
chinese, filipino, indian, japanese, korean, thai, vietnamese |
Middle East |
moroccan |
Latin America |
mexican, jamaican, brazilian |
🧪 Performance
Model Comparison
Metric |
Stratified Baseline |
Logistic Regression |
XGBoost |
Precision (20 cuisines) |
0.05 |
0.65 |
0.75 |
Recall (20 cuisines) |
0.05 |
0.69 |
0.66 |
Macro F1 (20 cuisines) |
0.05 |
0.67 |
0.69 |
Accuracy (20 cuisines) |
0.10 |
0.75 |
0.77 |
Accuracy (5 regions) |
0.27 |
0.89 |
0.89 |
✅ Conclusion:
XGBoost achieves the best results for the 20-class cuisine classification and clearly outperforms the baseline.
For the 5-region setting, Logistic Regression and XGBoost perform nearly identically — however, XGBoost provides more consistent results across classes.
Per-Region Metrics (5 Classes)
Region |
Precision (XGB) |
Recall (XGB) |
F1 (XGB) |
Asia |
0.94 |
0.92 |
0.93 |
Central Europe |
0.85 |
0.93 |
0.89 |
Latin America |
0.92 |
0.88 |
0.90 |
Middle East |
0.88 |
0.74 |
0.81 |
North America |
0.87 |
0.76 |
0.81 |
🚀 How to Use
from huggingface_hub import hf_hub_download
import joblib
class CuisineClassifier:
def __init__(self, classifier="region"):
print("Initializing CuisineClassifier...")
components = ["cuisine_pipeline", "label_encoder"]
paths = {}
print("Downloading files from Hugging Face Hub...")
for name in components:
print(f"Downloading {name}.joblib ...")
try:
paths[name] = hf_hub_download(
repo_id="NoahMeissner/CuisineClassifier",
filename=f"region_classifier/{name}.joblib"
if classifier == "cuisine":
filename=f"cuisine_classifier/{name}.joblib"
)
print(f"{name} downloaded.")
except Exception as e:
print(f"Failed to download {name}: {e}")
raise
print("Loading model components with joblib...")
try:
self.model = joblib.load(paths["cuisine_pipeline"])
print("Model loaded.")
self.label_encoder = joblib.load(paths["label_encoder"])
print("Label encoder loaded.")
except Exception as e:
print(f"Failed to load components: {e}")
raise
print("All components loaded successfully.")
def classify(self, text_input):
data = " ".join(text_input)
predicted_class = self.model.predict([data])
predicted_label = self.label_encoder.inverse_transform(predicted_class)
return predicted_label