View on GitHub

Portfolio

I consider myself a data science enthusiast who is always eager to learn new topics and implement new ideas. I also write about data science, machine learning, deep learning and statistics. Below is a list of the projects I have done and some of the stories from my blog on Medium.

Projects

Image Classification with Deep Learning

Motivation: Computer vision is an highly important field in data science with many applications from self-driving cars to cancer diagnosis. Convolutional neural networks (CNNs) are commonly used for computer vision and image classification tasks. I implemented a CNN using Keras to perform a binary classification task. I tried to explain the concepts of each step in a convolutional neural network and the theory behind them.

Data: The images are taken from Caltech101 dataset

Achievements:

Model accuracy

GitHub repo of the project

Blog post of the project

Cryptocurrency Prediction with Deep Learning

Motivation: Although the first redistributed cryptocurrency (bitcoin) was created in 2009, the idea of digital money arised in 1980s. In the recent years, cryptocurrencies have gained tremendeous popularity. As traditional currencies, the value of cryptocurrencies are changing in time. Using the historical data, I will implement a recurrent neural netwok using LSTM (Long short-term memory) layers to predict the trend of value of a cryptocurrency in the future.

Data: There is a huge dataset about cryptocurrency market prices on Kaggle. I only used a part of it which is historical price data of litecoin.

Achievements:

Test Set- Actual vs Real

How to Improve: We can build a more robust and accurate model by collecting more data. We can also try to adjust number of nodes in a layer or add additional LSTM layers. We can also try to increase the number of timesteps which was 90 in our model.

GitHub repo of the project

Blog post of the project

Churn Prediction

Motivation: Churn prediction is common use case in machine learning domain. It is very critical for business to have an idea about why and when customers are likely to churn (i.e. leave the company). Having a robust and accurate churn prediction model helps businesses to take actions to prevent customers from leaving the company.

Data: I used the telco customer churn dataset available on Kaggle. The dataset includes 20 features (independent variables) and 1 target (dependent) variable for 7043 customers.

Achievements:

How to Improve: The fuel of machine learning models is data so if we can collect more data, it is always helpful in improving the model. We can also try a wider range of parameters in GridSearchCV because a little adjustment in a parameter may slighlty increase the model.

GitHub repo of the project

Blog post of the project

Predicting Used Car Prices

Motivation: Used cars are usually sold on a website called “sahibinden (from the owner)” in Turkey. “Sahibinden” means “from the owner”. Dealers also use this website to sell or buy used cars. Thus, it shapes the used car market at some level. The most critical part of selling a used car is to determine the optimal price. There are many websites that give you an estimate on the value of a used car but it is better to also search the market before setting the price. Moreover, there are other factors which affect the price such as location, how fast you want to sell the car, smoking in the car and so on. Before we post an ad on the website, it is best to look through the price of similar cars. However, this process might be exhausting because there are lots of ads online. Therefore, I decided to take advantage of the convenience offered by machine learning to create a model that predicts used car prices based on the data available on “sahibinden”.

Data: I scraped the data of a particular brand and model from “sahibinden.com” website. Dataset includes 7 features and the price (target variable) of 6731 cars.

Achievements:

How to Improve: There are many ways to improve a machine learning model. I think the most fundamental and effective one is to gather more data. In our case, we can (1) collect data for more cars or (2) more information of the cars in the current dataset or both. For the first one, there are other websites to sell used cars so we can increase the size of our dataset by adding new cars. For the second one, we can scrape more data about the cars from “sahibinden” website. If we click on an ad, another page with detailed information and pictures opens up. In this page, people write about the problems of the car, any previous accident or repairment and so on.

Another way to improve is to adjust model hyperparameters. We can use RandomizedSearchCV to find optimum hyperparameter values.

GitHub repo of the project

Blog post of the project

Blog

Machine Learning and Deep Learning

Data Analysis

Statistics and Math