Skip to main content

Posts

Showing posts from May, 2016

Can you predict when that song was released !!

Generalized Linear Models in PySparkIn this exercise, I will be using a subset of the Million Song Dataset from the UCI Machine Learning Repository. The goal is to train a linear regression model to predict the release year of a song given a set of audio features. (The feature vectors have been center-scaled prior to loading in this environment) This exercise will cover: ####Part 1: Read and parse the initial dataset#### Visualization 1: Features#### Visualization 2: Shifting labels####Part 2: Create and evaluate a baseline model#### Visualization 3: Predicted vs. actual####Part 3: Train (via gradient descent) and evaluate a linear regression model#### Visualization 4: Training error####Part 4: Train using MLlib and tune hyperparameters via grid search#### Visualization 5: Best model's predictions#### Visualization 6: Hyperparameter heat map####Part 5: Add interactions between featuresFor reference, you can look up the details of the relevant Spark methods in Spark's Pytho…