Model Card for Image AutoML Predictor
Binary/multiclass image classifier trained with AutoGluon MultiModal on the augmented split of ccm/2025-24679-image-dataset
to predict survey-derived image labels. Metrics are reported on a held-out test portion of the augmented split and evaluated via external validation on the original split. Artifacts include (1) a zipped native AutoGluon predictor directory (recommended) and (2) a cloudpickled predictor (for convenience).
Model Details
Model Description
- Developed by: Fall 2025 24-679 (CMU) β instructor: Christopher McComb
- Shared by: Christopher McComb
- Model type: AutoML (AutoGluon MultiModalPredictor with ResNet18 backbone)
- Task: Image classification
- Target column:
label
- License: MIT
- Framework:
autogluon.multimodal
- Repo artifacts:
autogluon_image_predictor_dir.zip
(zipped native predictor directory)
autogluon_image_predictor.pkl
(cloudpickled predictor)
Uses
Direct Use
- Classroom demos of AutoML for image classification
- Baseline experiments for augmentation vs. generalization
- Comparing augmented vs original split performance
Out-of-Scope Use
- Production deployment with sensitive/real-world decision stakes
- Generalization beyond course context or survey-specific images
Bias, Risks, and Limitations
- Synthetic data inflation: Augmented data may artificially boost in-split accuracy.
- Limited representativeness: Original dataset is small, student-generated, not diverse.
- Label noise: Survey/image associations may be noisy or inconsistent.
Recommendations
- Always report both augmented-test and original-validation metrics.
- Emphasize didactic use cases (education, experimentation).
- Use consistent random seeds and splits for reproducibility.
How to Get Started with the Model
import pathlib, shutil, zipfile
import huggingface_hub as hf
from autogluon.multimodal import MultiModalPredictor
REPO = "ccm/2025-24679-image-autogluon-predictor"
ZIPNAME = "autogluon_image_predictor_dir.zip"
dest = pathlib.Path("hf_download")
dest.mkdir(exist_ok=True)
zip_path = hf.hf_hub_download(
repo_id=REPO,
filename=ZIPNAME,
repo_type="model",
local_dir=str(dest),
local_dir_use_symlinks=False,
)
extract_dir = dest / "predictor_dir"
if extract_dir.exists():
shutil.rmtree(extract_dir)
extract_dir.mkdir(parents=True, exist_ok=True)
with zipfile.ZipFile(zip_path, "r") as zf:
zf.extractall(str(extract_dir))
predictor = MultiModalPredictor.load(str(extract_dir))
preds = predictor.predict(test_df[["image"]])
Training Details
Training Data
- Dataset: ccm/2025-24679-image-dataset
- Splits:
- Augmented: 80/20 train/test with stratification (random_state=42)
- Validation: 20% of train used as val split
- External validation: Entire original split (unused in training)
Training Procedure
- Library: AutoGluon MultiModal
- Presets: "medium_quality"
- Backbone: timm_image β resnet18
- Training time limit: default (few minutes)
- Eval metric: Accuracy
Hyperparameters
- model.names: timm_image
- checkpoint: resnet18
- presets: medium_quality
- random_state: 42
Evaluation
Testing Data
- Augmented test: Held-out 20% of augmented split
- External validation: Entire original split
Metrics
- Accuracy: % correct predictions
- Weighted F1: Harmonic mean of precision/recall, weighted by support
Results (example β replace with actuals)
- Augmented test: Accuracy = 0.7429, Weighted F1 = 0.7392
- Original validation: Accuracy = 0.8621, Weighted F1 = 0.8620
Environmental Impact
- Hardware: Single GPU (short run)
- Training wall-time: < 10 minutes
- Estimated emissions: negligible
- Cloud provider: N/A (depends on student setup)
See ML COβ calculator for custom estimates.
Model Card Contact
Christopher McComb β ccm@cmu.edu