Below you will find pages that utilize the taxonomy term “scene”
Projects
Adventure Works Sales Report in Power BI
In preparation for my DA-100 exam, I retook the Power BI course from Maven Analytics on Udemy. Less than a year ago, I created a dynamic dashboard in Excel. In comparison to some of the formulas I wrote then to make the dashboard perform the way I needed it, I can say that Power BI and Dax are an absolute delight.
For this project I
Created a Sales Report in Power BI giving detailed insights on best-selling products and customers Imported data from Excel and used Power Query to transform the data Created a data model in Power BI and used DAX to create calculated columns and measures Improved user experience by applying conditional formatting, drillthrough filters and visual interactions Click here to view this project on GitHub.
Projects
Exploring Breeding Bird Census Data
CBS (Statistics Netherlands) provides reliable statistical information and data to provide insights into social issues. The 0DATA API allows users to consistently access this data.
The Breeding bird dataset provides insights into the breeding trends of endemic species that regularly breed in The Netherlands. My goal was to use the CBS API to explore this dataset and determine which birds show the strongest positive and negative trend over the past 12 years.
Projects
Bike Sharing Demand Prediction
Bike sharing systems are a means of renting bicycles. The goal of the Bike Sharing Demand competition is to predict demand by combining historical usage patterns with weather data.
For this project I
Explored the effect of features on the bike rental count using line and point plots Tested a variety of Regression models, including Linear regression, Ridge regression, Random forest regression, KNN and XGBoost Optimized the performance of the best performing models using GridSearch CV Used a voting ensemble on the optimized models to boost model performance resulting in a top 5% score Click here to view this project on GitHub.
Projects
Titanic Surival Prediction
The Titanic competition is one of the most popular machine learning competitions on Kaggle. The goal is to predict the fate of the passengers onboard of this unsinkable ship.
For this project I
Imputed missing values using groupby statement (e.g. replace missing fare by the median fare by class and title) Used regex to extract the title from the passenger name feature Cleaned the title feature further by correcting wrongly labeled titles and grouping rare titles together Identified passengers traveling together using their last name and ticket number Attempted (but failed) to identify the nationality of passengers by their last names Explored the relation between survival and several features using box plots and bar plots Optimized Random forest Classifier using GridSearch CV to obtain a top 9% model with 79.
Projects
Ames House Price Prediction
The Housing prices competition is with over 45.000 participating teams and individuals one of the most popular Machine Learning competition on Kaggle. The goal of this competition is to predict the sale price of residential homes in Ames (Iowa) using 79 explanatory variables.
For this project I performed a variety of actions including:
Imputing missing data using Simple imputer, KNN imputer and Mice imputer Creating variety of new binary features, features representing the number of years since the house was last remoddeled and features indicating the proximity to the train station Exploring the effect of features on the sale price using scatter and bar plots Testing the difference of using log-transformed vs untransformed sale price on RMSE Optimizing Ridge regression using GridSearch CV to obtain a model with a top 9% score on the public Kaggle leaderboard Click here to view this project on GitHub.