HW2 Classical AutoML — AutoGluon TabularPredictor
Model Overview
This model was trained using [AutoGluon TabularPredictor] as part of Homework 2 for 24-679.
It predicts the target column (color
) of Scotty’s HW1 tabular dataset based on a set of numeric flower features (diameter, petal length, petal width, petal count, stem height).
The workflow demonstrates how classical AutoML can search across multiple baseline models (e.g., Random Forest, Gradient Boosting, Logistic Regression, Neural Net) with automatic preprocessing, feature generation, and hyperparameter tuning.
Dataset
- Source: Scotty’s HW1 tabular dataset on Hugging Face (
scottymcgee/flowers
) - Samples: ~30 original samples, expanded via augmentation
- Features: numeric (flower_diameter_cm, petal_length_cm, petal_width_cm, petal_count, stem_height_cm)
- Target:
color
(multiclass, 6 possible values) - Split: 80% training, 20% validation
Training Configuration
- Framework: AutoGluon
TabularPredictor
- Presets:
medium_quality
(balanced speed vs. accuracy) - Problem Type:
multiclass
classification - Time Limit: 600 seconds (10 minutes)
- Random Seed: 42 (for reproducible train/val split)
- Hardware: Google Colab CPU/GPU runtime
AutoGluon automatically handled:
- Standardization of numeric features
- Encoding of categorical features (none in this dataset)
- Model ensembling and stacking
Results
- Best model: Reported by AutoGluon leaderboard
- Validation Metric (Weighted F1): ~0.9 (exact value depends on random seed / run)
- Leaderboard: includes candidate models such as RandomForest, ExtraTrees, GradientBoosting, LightGBM
Note: Due to the small dataset size, metrics may vary slightly across runs.
Repository Artifacts
autogluon_predictor.pkl
→ cloudpickled predictor (loadable if library versions match)autogluon_predictor_dir.zip
→ zipped native AutoGluon directory (preferred for portability)
AI Tool Disclosure
This notebook used ChatGPT for scaffolding code and documentation. All dataset selection, training, evaluation, and uploads were performed by the student.
Dataset used to train george2cool36/hw2_classical_automl
Evaluation results
- accuracy on scottymcgee/flowerstest set self-reported0.870
- f1_macro on scottymcgee/flowerstest set self-reported0.840