Skip to main content


Showing posts from August, 2015
Data & The Art of Beer Recommendation 

I have been wanting to write this blog for some time now and this is in continuation to an earlier blog that I published some time back where I used Trifacta to wrangle the beer reviews data and make it ready for some Predictive fun !!

The Dataset : Crowd sourced data of beer reviews from the website, where beer aficionados from all over the world have rated and critiqued beers. There are close to 1.6 million reviews from 1999 to 2012 spanning almost 66,000 different beers globally.

 Task : Recommend some awesome beers for Mr Data Wrangler

Platform : R  (Hosted on Amazon EC2 m4.2xlarge )

 So let the fun begin :)

1. Set the environment and load the required libraries

library(ggplot2)library(data.table)library(reshape2)library(reshape)library(Matrix)library(dummies)library(plyr)setwd("/home/vulcan/Python")load("beermerge.Rda") 2. Calculate the weighted average review score and filter to select only the beers t…