🎡 Spotify Song Popularity Prediction

Predict the popularity of a song based on its audio features and estimate potential Spotify royalties.

License: MIT


πŸ“– Project Overview

This project explores machine learning models to predict the popularity of songs using publicly available features such as danceability, energy, tempo, and valence. It also demonstrates a prototype pricing tool that estimates potential Spotify revenue based on predicted popularity.

Despite the challenges in accurately forecasting popularity due to time-evolving factors, our models show that minimum popularity and expected revenue can be estimated using machine learning techniques.


πŸ“Š Dataset

  • Source:
    • Spotify Web API
    • Original Dataset (114,000 songs) expanded to **2 million songs**
  • Features:
    • Acoustic features (energy, danceability, valence, etc.)
    • Target variable: popularity (integer from 0–100)

πŸ”¬ Methods

  • Data Cleaning and Preparation:
    • Removed zero-popularity entries, duplicates (~8% of rows), and outliers
    • Standardized genres using clustering
  • Exploratory Data Analysis (EDA):
    • Analyzed distributions, correlations, and cumulative trends
  • Modeling:
    • Linear Regression, Ridge Regression
    • Decision Tree, Random Forest, AdaBoost (best recall: 86% on popular songs)
    • XGBoost (binning) and Neural Networks
  • Revenue Estimation:
    • Quadratic regression fit between predicted popularity and play counts
    • Prototype pricing tool predicting Spotify revenue for songs

πŸ† Results

Model Highlights
Linear/Ridge Regression Poor fit due to complex, noisy data
Random Forest Best overall stability (recall on populars)
AdaBoost (weighted) Best performance: 86% recall for popular songs
Neural Networks Showed challenges due to "popularity" instability
  • Predicted revenue for a song with popularity 55 β‰ˆ $357,000 CAD.
  • Pricing tool demonstrated practical viability despite prediction limitations.

πŸ“ˆ Example

Predicting a song’s revenue based on its feature vector:

# Example (simplified)
predicted_popularity = model.predict(features)
predicted_revenue = pricing_function(predicted_popularity)

πŸš€ How to Run

# Clone this repo
git clone https://huggingface.co/username/spotify-popularity-prediction

# Install dependencies
pip install -r requirements.txt

# Train or evaluate models
python train_models.py
python evaluate_models.py

# Predict song revenue
python pricing_tool.py

(Adaptable scripts for different model types: AdaBoost, Random Forest, Neural Net.)


πŸ€” Limitations

  • Song features alone are not sufficient for high-accuracy predictions.
  • "Popularity" is a time-dependent and dynamic metric.
  • Genre diversity (>5000 unique genres) complicated modeling.

🧠 Future Work

  • Predict play count directly instead of popularity.
  • Fine-tune XGBoost and deep neural networks on larger datasets.
  • Integrate time-evolution models for dynamic popularity changes.
  • Improve genre classification with unsupervised learning (e.g., genre embeddings).

popularity_predictor.pth

This neural network model is extremely weak. I was not good at data science when I made this

Iterations

null:

Trained on 500 Epoch with 2.1 million song data from Spotify Database
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd


# Split the data into features and target variable
X = df[numerical_features[:-1]].values  # all except popularity
y = df['popularity'].values

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train).view(-1, 1)  # shape to (N, 1)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.FloatTensor(y_test).view(-1, 1)

# Define the neural network model
class PopularityPredictor(nn.Module):
    def __init__(self):
        super(PopularityPredictor, self).__init__()
        self.fc1 = nn.Linear(X_train.shape[1], 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 32)
        self.fc4 = nn.Linear(32, 1)

    def forward(self, x):
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Create an instance of the model
model = PopularityPredictor()

# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)
    
    # Backward pass and optimization
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluate the model
model.eval()
with torch.no_grad():
    predicted = model(X_test_tensor)
    

πŸ“š Citation

If you use this project, please cite:

@misc{bhuiyan2024spotify,
  title={Spotify Song Popularity Prediction},
  author={Ashiful Bhuiyan, Blanca FernΓ‘ndez MΓ©ndez, Nazanin Ghelichi, Pavle Curcin},
  year={2024},
  institution={York University},
}

πŸ§‘β€πŸ’» Authors

  • Ashiful Bhuiyan
  • Blanca Elvira FernΓ‘ndez MΓ©ndez
  • Nazanin Ghelichi
  • Pavle Curcin

πŸ“„ License

This project is licensed under the MIT License.

🏷 Tags

#spotify #machine-learning #music-prediction #data-science #regression #classification #popularity-analysis

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train ConquestAce/Spotify-Popularity-Predictor