Module 00 - Prerequisites Engineering Tools for Data roles
This module will set you up for success in your data career 🚀. You will learn about the proper setup of your workplace 🖥️ and the first steps to take before diving into analytics and data engineering 📊.
The objectives of this module are:
- Ensure you have a good working machine that is performant for data use cases and convenient for work and study 🖥️.
- Understand what an Integrated Development Environment (IDE) is and get one. If you’ve never used one, we’ll start with Visual Studio Code 📝.
- Know what the Command Line Interface (CLI) is and master basic commands to navigate your computer and modify/create files 📂.
- Learn about version control systems like
git
and set up your own GitHub account to save all your work, potentially using it as a future portfolio for your data projects 🌐. - Understand what a container is, create a simple Docker container, and become familiar with Docker terminology 🐳.
Overall, you should become comfortable with “engineering” concepts used in all companies. You’ll be able to browse the organization’s code repository without being overwhelmed by terms and files like .pre-commit.yaml
, .dockerignore
, understand what code linting is, how to ensure code quality, and why Code Review is beneficial 🛠️.
If you don’t grasp everything after the first attempt, don’t worry—it took me a long time too, as no one covered this topic for me in the past. This is a gentle introduction, and I bet you’ll benefit from this knowledge and skills 💡.
Book recommendation for the module: The missing readme by Chris Riccomini & Dmitriy Ryaboy
Module 0.1: Choosing the Best Laptop, Monitor, Chair, and Lighting for Data Professionals
In this video, I will show the best possible setup for your data work and share a tip for a cost-effective laptop solution. Before we dive into learning actual analytics and engineering skills, we want to ensure that we have sufficient performance for our laptop, a comfortable chair, the right size 4k monitor, and good lighting.
Video Lesson: Choosing the Best Laptop, Monitor, Chair, and Lighting for Data Professionals
Module 0.2: Getting Started with VSCode IDE
Before proceeding with any analytics or engineering work, let’s choose the best IDE. In this video, I will present a couple of options and demonstrate how to download and configure Visual Studio Code (VS Code). We will utilize this IDE for the remainder of the course.
Video Lesson: Getting Started with VSCode IDE
Module 0.3: Just Enough CLI
The Command Line Interface (CLI) or Terminal might seem intimidating for those not in engineering 🛠️. However, it’s not as complicated as it appears and is incredibly useful for a wide range of data roles. This is especially true for non-technical positions, such as Data Analysts or BI Developers. Possessing CLI skills can provide a competitive edge in the job market and pave the way towards engineering roles like Data Engineer or Analytics Engineer. It also helps in understanding how to utilize popular open-source tools for data analytics and work with version control systems like Git 🚀.
In this video, I will cover:
- Common use cases for Data Analyst and Data Engineer roles 🔍
- Basic commands for the CLI 💻
- CLI editors like VIM, NANO, EMACS 📝
- Comparing ZSH and BASH 🤔
- Plugins and Themes for ZSH to enhance your experience 🎨
The goal of this lesson is to give you an overview of the CLI and help set up the environment. This will enable you to use the CLI throughout the course and encourage further practice 🌟.
Video Lesson - Just Enough CLI GitHub Repository - Just Enough CLI
Module 0.4: Just Enough GitHub
No matter your role, a version control system is now a hallmark of quality work 🌟. All code, along with documentation and even your own resume and pet projects, is stored in a codebase. Every company uses a Git system, and understanding how to navigate a code repository and documentation can greatly benefit you 📚.
Another advantage of such systems is the development workflow they enable, which aligns everyone on the same page and fosters transparency through Code Reviews. They also set a standard for code and data quality with linting and Continuous Integration processes 🛠️.
In this video, I will cover:
- The most popular Git system - GitHub 🌍
- How to create your own GitHub account and your first repository 🏁
- Understanding Branches, Commits, Merges, and Pull Requests 🌲
- A typical development workflow using the CLI (as referenced in the previous video) 💻
- Ensuring code quality with Linting and Continuous Integration processes (GitHub Workflow) ✔️
The aim is to get you started on using GitHub for everything, as you learn and complete exercises for Surfalytics 🏄♂️.
Video Lesson - Just Enough GitHub GitHub Repository - Just Enough Github
Module 0.5: Just Enough Docker
n this video, we’ll delve into a fundamental piece of the analytics landscape: containers.
No matter your role in the data field, you’ll encounter containers, and Docker is the go-to solution.
Even if you’re not directly building containers, you may still need to navigate your organization’s codebase, where containers are prevalent.
That’s why I’m excited to cover this essential topic, just as we’ve explored CLI, GitHub, and IDEs before. 📦🚀
Here’s what you’ll learn:
- The difference between Virtual Machines and Containers 🤔
- How to run Worms World Party on a Mac 😄
- Understanding the concepts of containers, images, registries, and repositories in Docker 🐳
- How to create a Docker container 🛠️
- An introduction to Docker Hub 🌐
- Exploring images of popular analytics open-source products 🔍
- Insights into
.dockerignore
, Dockerfile, YAML, and Docker Compose 📝 - How to create a new branch and commit code into the main branch on GitHub 🌿➡️🔖
- How to create your first release in GitHub 🎉
This video marks the conclusion of Module 0 of Surfalytics course prerequisites. Just practice CLI, Docker, GitHub, and your IDE skills, and your career will surely benefit from it.
Video Lesson - Just Enough Docker GitHub Repository - Just Enough Docker