Back In Action

Its been quiet some time since my last post.  A lot going on since this past February/March time frame..Relocated to SFO Bay Area, new job, travelling, family. Hence the long period of silence while I was still settling in..Anyways, back to business

So, the data-set I am gonna be working with is the New York City Taxi and Limousine data that was made public courtesy the FOIL request by Chris Whong.

The data-set contains daily trip and fare information for each and every trip within 2013. Some of the fields of interest that I will be exploring are trip origination and destination geo-coordinates, pickup and drop-off times, Tip, payment method, trip distance, trip time.

Luckily, a pretty clean data-set with almost no missing values and very few outliers. For the calculations, data aggregation,s and geo-spatial analysis, I used Alteryx and then pushed the outputs as Tableau data extracts to build visualizations.

So lets get rolling..

1. Relationship between Tip % vs Pick Up Time : Trips originating between 9 PM to 10 PM have the maximum average Tip % = 11.82.  Night trips in general yield higher tips. The lowest bucket is the trips originating in the wee hours of the morning between 4-5 AM.

2. Taxi demand at different hours of the day and the mode of payment (Card vs Cash)

No surprises here. The demand kicks in starting 6 AM and goes through an initial peak  period in the morning hours

3. Relationship between Tip % (Average & Median) with day of the week

Wednesday tops the list with Tuesday and Thursday coming pretty close.  A lot of variance between Median and Average on Saturday and Sunday.

4. Relationship between Number of trips with day of the week

Sunday tops the list and Tuesday is at the bottom

5.  Analysis of demographic information : So far I was exploring with the basic variables. Next, I went ahead and pulled in the demographic clusters information (residential mosaic groups) using the syndicated data set from Experian using Alteryx. Here is some detailed info on the Mosaic groups as defined by Experian.

This information was appended to the pickup locations that came in with the original data set.

The thought behind this analysis is to look at which strata of the household demographics most uses the cabs and does that play a role in Tip %.

Pastoral pride, Families in Motion, and Promising Families have almost negligible sample sizes to ignore them from any further analysis.

Now lets see how much Tip % (Median) is generated by each of these demographic strata

Power Elite and Young City Solos seem to fit the overall average of around 11% Tip. A bunch of those mosaic clusters show median Tip % = 0.

Lets now look at average tip % to see how it changes the picture. 

Quiet a difference !!

Part 2 of this analysis soon to follow where I will try and predict the Tip % for a particular trip using few different models...Stay tuned..




Popular posts from this blog