Showing posts from 2014

Back In Action

Its been quiet some time since my last post.  A lot going on since this past February/March time frame..Relocated to SFO Bay Area, new job, travelling, family. Hence the long period of silence while I was still settling in..Anyways, back to business

So, the data-set I am gonna be working with is the New York City Taxi and Limousine data that was made public courtesy the FOIL request by Chris Whong.

The data-set contains daily trip and fare information for each and every trip within 2013. Some of the fields of interest that I will be exploring are trip origination and destination geo-coordinates, pickup and drop-off times, Tip, payment method, trip distance, trip time.

Luckily, a pretty clean data-set with almost no missing values and very few outliers. For the calculations, data aggregation,s and geo-spatial analysis, I used Alteryx and then pushed the outputs as Tableau data extracts to build visualizations.

So lets get rolling..

1. Rela…

Followup to Superbowl Tweets

Comparison CloudI classified the tweets on the basis of hashtags as either Bronco or Seahawk fans
Bronco Fans :    #GoBroncos","#BroncosNation","#BroncosFan","#GoManning","#PeytonManningRocks", "#BroncosWin"
Seahawk Fans:  #GoSeahawks","#SeahawksNation","#SeahawksFan","#GoSeattle","#SeahawksWin","#SeattleSeahawksRule","#CrushBroncos","#CrushManning","#Manningchokes

Broncos Fan Tweets  Geo-plot  

Seahawk Fan Tweets  Geo-plot 

Tweet Sentiment Comparison
I took a rolling mean of 100 tweets for Broncos & Seahawk fans. The excitement of Broncos fans was pretty short lived and as the game progressed, that blue line for Seahawks consistently showed a better sentiment score than the red line for Broncos. 

By the end of the game, there were rapid spurts tweets with Broncos hash tags that were loaded with negative words and the red line tells that story.

SuperBowl 48 Tweets

Analysis of Tweets from  Superbowl 48
Wordcloud from tweets with  #superbowlcommercials

CocaCola's  "America is Beautiful" Maserati                                     Bud Light                                  Budweiser                                 Radio Shack                              Cheerios                                   

Bruno Mars Wordcloud Bruno Mars Tweet Sentiment
Took a rolling mean (100 tweet's) of the sentiment scores to get the time series below. These were for all the tweets where the text contained "Bruno"
Top  30 Tweeters

Y Axis : # of Tweets from a user Geo PlotsAbout 4.1%  of total tweeters  had their geo-coordinates on.
Centered   - Nor…

Geoplotting Twitter Users can get creepy

So, about 3-4% of people on average seem to Tweet with their geo-coordinates on the mobile devices turned on. Thanks to Google street view and ggmap package, that information can be precious to someone running a marketing campaign, new retail store opening, happy hours at bar/restaurant, or someone who is very curious .

Around Black Friday 2013, I started getting a tweet grab of people tweeting with #Blackfriday, #Blackfriday 2013deals to do some trend and sentiment analysis of major brands like Amazon, Target, Walmart, Sony, Dell etc.

As I started to dig deeper to see people who were constantly tweeting good/bad with co-ordinates turned on, I saw this user close to Gainesville, Florida who was tweeting almost every 5 minutes. At the street map level, I could  see this person stopping at stores like Kohls, Walmart, BestBuy and talking about deals. Then I saw some tweets coming from a residential address about finally reaching home, how much shopping the user did, what door busters th…

Facebook Page User Stats & Engagement

Started to play around with the R package "RFacebook"   to connect to Facebook's API's and access social graph data.
I am posting some of the social graph visualizations that I created using Gephi and ggplot2  ( visualization package in R)

This is a heterogeneous social graph of Facebook page  for my favourite RSS news reader Feedly

Hetrogeneous Graph : The nodes (spheres/dots) on the graph to the left  are
1. Facebook users
2. Facebook posts

The edges on this graph depict connections between nodes, and for this particular graph, I looked at the "likes"

The nodes have  also been color coded where blue represents male, pink represents female, and the green is undisclosed. The green nodes are the Facebook page posts by Feedly Facebook admin.
The nodes sizes are also scaled proportionally by the in-degree (# of people who liked a post) , so the larger green nodes were the posts the generated a …

Twitter Streaming Visualizations !! #AAM, #AAMAADMI, #JANTADARBAR

Back Again...Quiet a few things going on @ work, family etc....

Anyways, so I got inspired by André Panisson's awesome visualization of the Egypt revolution     from the Twitter feeds using Gephi and Python.  Made some changes to his version of the code to model the nodes and edges slightly differently. I have uploaded the video on Youtube, and here is the link :

It is a directed social graph where each blue sphere is a Twitter ID and the edge between them is a re-tweet.  Incoming arrows on a node signifies that node's tweet was re-tweeted  by the node from there the edge originated. 

I am using Force Atlas Visualization.

Twitter Search on : #AAM, #AAMAADMI, #JANTADARBAR on Jan 10 early in the morning when Arvind Kejriwal was holding the rally that ended in a chaos. (For those who don't know the context of AAM AADMI, its a newly formed ruling party in Delhi lead by @Arvind Kejriwal