Micro Projects

1. Binary Image Classification: Classifying indian railways electric locomotives

Jupyter notebook

Brief Summary: This was a part of the final project of the first course in Coursera’s tenforflow in practice specialization. I am passionate about railways so I decided to download images of two electric locomotives namely WAP-4 and WAP-7 and train a conv net which identifies the respective locomotives.

Things learnt: Image classification, hands-on practice on keras

2. Bank Marketing Analysis: Choosing predictive model having best ROI

Jupyter notebook, Original Data

Brief Summary: In this small notebook I use ‘Cummulative Gains’ graph to check which of the two fitted predictive models will give us a better ROI on future Term Deposit campaigns for a bank. Working iteratively to improve the analysis.

Things learnt: Using ROI to select predictive models, package ‘modelplotpy’

3. Shiny R: Visualizing Web KPIs

Link to the app. Note: App load will take a minute or two

Brief Summary: This is a basic Shiny app which lets users see how 5 key web performance metrics are doing at a daily and monthly level. Working to make the app better and to add more functionality.

Things learnt: Shiny app development, using HTML & CSS in Shiny, deploying Shiny app to the cloud

4. Predictive Analytics using Spark Data Frame API

Github repo, Jupyter notebook, Original Data

Brief Summary: Created predictive models in a Big Data environment using Spark. The target variable was Readmission (Y=1) or No Readmission (Y=0). Steps used: Replacing “?” as missing values using RDD.map(), integer-encoding, one hot encoding, Logistic Regression and Random Forest.

Things learnt: Worked with Spark.ML Data Frame API, worked with RDDs.

5. US Domestic Airline Delays Visualization

Viewing at 1.5x is advised

6. Time Series in R: Predicting Bitcoin Closing Values

Check out the notebook here

Brief Summary: Used Time Series Analysis in R to forecast the daily closing values of Bitcoin. Used exponential smoothing and ARIMA models. Compared all fitted models to naive forecasts and selected the best model. Also check out a small blog on decisions to be taken before we start a Time Series Analysis project.

Things learnt: Time Series, auto-arima, exponential smoothing.

7. Data cleaning: Finding, manipulation and joining external data

Jupyter notebook, Original Loans Data, External Data Souce

Brief Summary: In this notebook I execute the task of connecting multiple external data to the analysis table (philp_loans table). The external data was not in the appropriate form for a direct join and neither was main data. So I use text manipulation and joins to bring the data into the right form and then combining it together.

Things learnt: Text manipulation/Data cleaning, bringing keys in appropriate form for a join.


1. BNP Paribas Cardiff Claims Management - Kaggle

Rank: 225/2926 (Top 8%)

Competition home page, Github Repo of the code

Things learnt: Working of bias-variance tradeoff, feature engineering, model tuning and using cross-validation.

2. Santander Customer Satisfaction - Kaggle

Rank: 540/5123 (Top 11%)

Competition home page

Things learnt: AUC-ROC, F-measure, different ways to analyze a binary classification problems, working with class imbalance problem