START LEARNING
We will contact you and answer all your questions in the course.
DATA SCIENTIST WORK SIMULATOR
>>> STARTING |
You have completed some online courses, but do you do not feel prepared for the actual work? This course will help you gain confidence.
VLAD GROZIN
YOUR DATA SCIENCE MENTOR
ABOUT ///
You are hired into a small online shop to build a system that suggests items to users.

Your goal is to build a robust production-ready microservice that recommends the right products. You must maximize revenue it generates within 3 months - and you are going to compete with other course participants!

Circumstances are similar to real work: you will perform typical industrial tasks such as analytics, pipeline writing, model deployment, paper reading and implementation.

You will work with user data that react to your recommendations, and your code will be peer-reviewed.
FOR WHO ://
DATA SCIENTIST ENTHUSIASTS AND JUNIORS
Online courses do not give all skills required for working in companies. In this course with heavy practical focus, you will "work" as a Data Scientist of a wide profile. This will let you experience the profession and understand what skills you need to focus on in further training, and teach you how to use modeling skills and knowledge of mathematics to solve business problems. A completed project will be a great item on your resume.
MIDDLE DATA SCIENTISTS
Do you want to learn more about common stack? Do you have some knowledge, but want a holistic understanding of practical data science? This course will help you do exactly that. Lectorila material contains comprehensive review of commonly used technologies, and tells you how they can be used together to solve tasks.
DATA ANALYSTS
You will learn about schedulers and how they can help you automate your work. Also, you will learn more about metrics and data science practices, which will help you communicate better your insights.
MENTOR >>>
VLAD GROZIN
Developed search and personalization engines for large online-retailers such as Sephora, Bonobos, Backcountry, Ulmart, Sportmaster and Mediamarkt.
Over 6 year of practical data science.
[PREREQUISITES]
PROGRAM ://
1. Introduction
Let’s get to know each other!
2. Theory / Data architecture of modern companies
How modern companies store data and assemble data processing pipelines. What is REST service; what is model deployment and which tools and technology is best fitting used in each circumstance.

Homework:
Quiz on the lecture contents.
3. Theory / Evolution of recommender systems
How recommender systems came into being, and how they evolved over time. This part is important to understand the context of the course, as it touches on milestone papers and approaches.

Homework:
Quiz on the lecture contents.
4. Baseline / Analytics with Jupyter. Calculating popular products
Here, you will learn about stuff that data scientists do on a daily basis: analytics and model deployment. You will build the first iteration of the pipeline and service. It won’t have a complex algorithm, but it will have all crucial components in place, and you will improve them in next iterations.

Jupyter is a popular tool for analytics and developing. You will learn how it can be used in conjunction with Spark for data analytics and code debugging.

Homework:
Write a function that calculates product popularity and stores it in the database.
5. Baseline / Wrapping popularity calculation into an Airflow task
Airflow is a popular scheduler. In this lesson we will show how it can be used to define pipelines, and how.

Homework:
Write a pipeline that calculates product popularity (for last week) each day, and stores the results in Postgre.
6. Baseline / Writing microservice with Flask
This lesson finalizes the baseline. You will find out how you can create microservices using Python. You will develop a Flask service that will actually output recommendations. After this step, you will actually have a fully-functioning recommender service that outputs recommendations.

Homework:
Develop a microservice that outputs recommendations.
7. Personalization / Metrics. Offline and online evaluation
Our system is working, but it’s simple – too simple. Let’s improve the recommendations algorithm, so recommendations will have better quality!

But what is „quality“? This is a property of „goodness“ of a product, usually measured with different metrics. This lesson will show you.

Homework:
Write a function that computes online metrics for each day.
8. Personalization / Improving recommender system. 
Let’s implement proper personalization.

Homework:
Develop a pipeline that calculates top-10 similar items for each item, and stores this information in the PostgreSQL, along with the popularity score.
9. Free swimming / Read & Implement
You may not believe this, but you have all the tools and knowledge required for working in industrial Data Science. You have analyzed the data to find the patterns. You can write pipelines, and deploy models into production.
Here, we will show you how these skills can be used in conjunction with each other to solve real world problems. You will improve the recommender system - but now, without handholding.

Data Science is a very dynamic field, so one of the parts of work of a data scientist is reading scientific papers.
Here, we present several papers which can be used to improve the recommender system, as well as short summaries of these papers.

Homework:
Reflect on how these papers correlate with your findings at Lesson 7. Which approach should improve the metrics the most? Implement the approach described in one of the papers discussed above.
10. Free swimming / Beat the baseline!
Implement an algorithm that generates more revenue than the baseline solution.

You can try another approach described at Lesson 9 - or go back to Lesson 7 and try to implement your own ideas. You can use any method you like — all is fair in war!