Micro Projects
1. Binary Image Classification: Classifying indian railways electric locomotives
Brief Summary: This was a part of the final project of the first course in Coursera’s tenforflow in practice specialization. I am passionate about railways so I decided to download images of two electric locomotives namely WAP-4 and WAP-7 and train a conv net which identifies the respective locomotives.
Things learnt: Image classification, hands-on practice on keras
2. Bank Marketing Analysis: Choosing predictive model having best ROI
Jupyter notebook, Original Data
Brief Summary: In this small notebook I use ‘Cummulative Gains’ graph to check which of the two fitted predictive models will give us a better ROI on future Term Deposit campaigns for a bank. Working iteratively to improve the analysis.
Things learnt: Using ROI to select predictive models, package ‘modelplotpy’
3. Shiny R: Visualizing Web KPIs
Link to the app. Note: App load will take a minute or two
Brief Summary: This is a basic Shiny app which lets users see how 5 key web performance metrics are doing at a daily and monthly level. Working to make the app better and to add more functionality.
Things learnt: Shiny app development, using HTML & CSS in Shiny, deploying Shiny app to the cloud
4. Predictive Analytics using Spark Data Frame API
Github repo, Jupyter notebook, Original Data
Brief Summary: Created predictive models in a Big Data environment using Spark. The target variable was Readmission (Y=1) or No Readmission (Y=0). Steps used: Replacing “?” as missing values using RDD.map(), integer-encoding, one hot encoding, Logistic Regression and Random Forest.
Things learnt: Worked with Spark.ML Data Frame API, worked with RDDs.
5. US Domestic Airline Delays Visualization
Viewing at 1.5x is advised
6. Time Series in R: Predicting Bitcoin Closing Values
Brief Summary: Used Time Series Analysis in R to forecast the daily closing values of Bitcoin. Used exponential smoothing and ARIMA models. Compared all fitted models to naive forecasts and selected the best model. Also check out a small blog on decisions to be taken before we start a Time Series Analysis project.
Things learnt: Time Series, auto-arima, exponential smoothing.
7. Data cleaning: Finding, manipulation and joining external data
Jupyter notebook, Original Loans Data, External Data Souce
Brief Summary: In this notebook I execute the task of connecting multiple external data to the analysis table (philp_loans table). The external data was not in the appropriate form for a direct join and neither was main data. So I use text manipulation and joins to bring the data into the right form and then combining it together.
Things learnt: Text manipulation/Data cleaning, bringing keys in appropriate form for a join.
Competitions
1. BNP Paribas Cardiff Claims Management - Kaggle
Rank: 225/2926 (Top 8%)
Competition home page, Github Repo of the code
Things learnt: Working of bias-variance tradeoff, feature engineering, model tuning and using cross-validation.
2. Santander Customer Satisfaction - Kaggle
Rank: 540/5123 (Top 11%)
Things learnt: AUC-ROC, F-measure, different ways to analyze a binary classification problems, working with class imbalance problem