Projects
Titanic Surival Prediction
The Titanic competition is one of the most popular machine learning competitions on Kaggle. The goal is to predict the fate of the passengers onboard of this unsinkable ship.
For this project I
Imputed missing values using groupby statement (e.g. replace missing fare by the median fare by class and title) Used regex to extract the title from the passenger name feature Cleaned the title feature further by correcting wrongly labeled titles and grouping rare titles together Identified passengers traveling together using their last name and ticket number Attempted (but failed) to identify the nationality of passengers by their last names Explored the relation between survival and several features using box plots and bar plots Optimized Random forest Classifier using GridSearch CV to obtain a top 9% model with 79.
Projects
Ames House Price Prediction
The Housing prices competition is with over 45.000 participating teams and individuals one of the most popular Machine Learning competition on Kaggle. The goal of this competition is to predict the sale price of residential homes in Ames (Iowa) using 79 explanatory variables.
For this project I performed a variety of actions including:
Imputing missing data using Simple imputer, KNN imputer and Mice imputer Creating variety of new binary features, features representing the number of years since the house was last remoddeled and features indicating the proximity to the train station Exploring the effect of features on the sale price using scatter and bar plots Testing the difference of using log-transformed vs untransformed sale price on RMSE Optimizing Ridge regression using GridSearch CV to obtain a model with a top 9% score on the public Kaggle leaderboard Click here to view this project on GitHub.
Projects
Market Data Dashboard
I am responsible for making market data available to the marketing and sales teams. I generally create a variety of pivot tables and use extensive conditional formatting and calculated fields to improve readability. I realized that some colleagues were struggling to find relevant information and to interpret tables correctly. I challenged myself to create a dashboard in Excel 365 to make it easier for my colleagues to draw conclusions from this dataset.