Module 02 - Getting started with databases and SQL
Let’s get acquainted with databases, and understand their advantage for working with data compared to Excel/Google Sheets.
Module 2.0 Introduction
The second module of the course will focus on databases. The module aims to provide students with a solid understanding of databases, including the different types that exist. The primary focus will be on gaining hands-on experience with databases and mastering Structured Query Language (SQL). The module will cover various topics, such as key query patterns, keywords, and SQL exercises. Students will have the opportunity to practice SQL through real-world exercises and learn how to think about SQL queries without external assistance. The module will also explore different databases like SQL Server, PostgreSQL, and MySQL, both on-premises and in Docker containers. Additionally, students will deploy a database in the cloud and create a small BI solution using a free BI tool to access the cloud database, write SQL queries, and create dashboards. The module emphasizes understanding the full end-to-end process of working with databases, from creation to data insertion and querying, providing students with a competitive advantage in interviews and their future work.
Video lesson - Module 2.0 - Introduction
Module 2.1 What is Database?
The first lesson in Module Two of the course is an introduction to databases. The lesson begins by defining what a database is and the different types of databases that exist, including relational, analytical, and NoSQL databases. The lesson then focuses on relational databases, which are the most commonly used type of database for data analytics. The lesson explains the basic structure of a relational database, which consists of tables, rows, and columns. The lesson also introduces SQL, the language used to query relational databases. Finally, the lesson provides a real-world example of how databases are used to store and manage data.
Video lesson - Module 2.1 - What is Database?
Module 2.2 Connecting to Databases
The second lesson of the module focuses on connecting to databases and executing SQL queries. The lesson emphasizes that the ability to connect to and query databases is a crucial skill for data analysts and BI engineers, regardless of the specific database architecture or type. The lesson provides a hands-on demonstration of how to connect to a PostgreSQL database using the DBeaver SQL client and also explores the use of Docker containers for database setup and management. The lesson also covers essential SQL operations, including DDL (Data Definition Language), DML (Data Manipulation Language), and DQL (Data Query Language), and provides examples of common SQL queries and joins. The lesson concludes by recommending several resources for further learning and practice in SQL, highlighting the importance of hands-on experience in mastering this essential skill for data professionals.
Video lesson - Module 2.2 - Connecting to Databases GitHub Repository - Connecting to Databases
Module 2.3 Data Models in Databases
Lesson 2.3 focuses on data modeling and databases. The lesson explains the importance of data models as maps of the data in databases. It then goes over how to transform business processes into data models, using the example of a relational data model in a star schema. The lesson also introduces different data modeling techniques, including dimensional modeling, third normal form, and data vault modeling. It also touches on the wide table technique. The lesson then compares and contrasts different data modeling techniques, including star vs snowflake schema and normalized vs denormalized data. It also explains the concept of a data mart and its role in data warehousing. Finally, the lesson provides an overview of the design sequence in a data warehouse, from conceptual to physical models, and concludes with a discussion of various data modeling tools.
Video lesson - Module 2.3 - Data Models in Databases
Module 2.4 Databases in a Cloud
This lesson of the course focuses on cloud databases and how to use them. In the lesson Dmitry emphasizes the importance of SQL skills for data professionals and the need to understand databases. He then moves on to the cloud, explaining why it is important and how it is used by most companies today. The lesson also provides a brief overview of cloud providers such as AWS, Azure, and GCP, and the different types of databases they offer. It then dives into the specifics of creating a cloud database on AWS and Azure, including how to set up a free trial account, navigate the console, and create a database instance. The lesson also includes a homework assignment for students to create their own cloud database and connect it to DBeaver, which will be used in the next lesson for data visualization.
Video lesson - Module 2.4 - Databases in a Cloud
Module 2.5 Visualisation elements for Databases
This is a data engineering lesson that will teach you how to visualize data from databases. You will learn about different BI tools and how to use them to connect to your data and create dashboards. You will also learn about the importance of using the right metrics and how to define active users. Finally, you will get a chance to practice your SQL skills and learn how to use a NoSQL database.
Video lesson - Module 2.5 Visualisation elements for Databases
Resources mentioned in the video: • https://app.diagrams.net/ • https://mode.com/ • https://www.metabase.com/ • https://redash.io/ • https://aws.amazon.com/ru/pm/quicksight/#Learn_More_About_Amazon_QuickSight • https://sql-ex.ru/
Module 2.6 NoSQL Databases for Data Analyst and Data Engineer
This lesson is about NoSQL databases and their use cases for data analysts and data engineers. The lesson will cover topics such as the different types of NoSQL databases, the benefits of using a NoSQL database, and how to use a NoSQL database for real-world applications. The lesson will also discuss the importance of understanding the business context when working with data and the different ways to extract data from a NoSQL database. Finally, the lesson will provide a brief overview of MongoDB, a popular NoSQL database.
Video lesson - Module 2.6 NoSQL Databases for Data Analyst and Data Engineer
• Link to repo: https://github.com/surfalytics/analytics-course/blob/main/02_getting_started_with_databases/06_nosql_databases/mongodb/readme.md • Install MongoDB Community Edition: https://www.mongodb.com/docs/manual/administration/install-community/ • MongoDB Atlas CLI Installation: https://www.mongodb.com/docs/atlas/cli/current/install-atlas-cli/ • MongoDB Compass Download: https://www.mongodb.com/try/download/compass • GUI Client Options: MongoDB Compass, Robo 3T: https://robomongo.org/ • Fivetran Connector to MongoDB: https://www.fivetran.com/connectors/mongodb • Matillion ETL Connector to MongoDB: https://docs.matillion.com/metl/docs/2391882/
Module 2.7 Just enough DuckDB for Data Analyst
This lesson is about DuckDB, an open-source database that is simple, portable, fast, extensible, and free. The lesson will cover topics such as how to install DuckDB, how to use DuckDB to query data from different sources, and how to use DuckDB for different use cases. The lesson will also discuss the benefits of using DuckDB over other databases, such as its ability to work with larger files than you have operation memory. Finally, the lesson will provide a brief overview of some of the technical advantages of DuckDB, such as its columnar storage and vectorized query execution.
Video lesson - Module 2.7 Just enough DuckDB for Data Analyst
Resources used: • https://github.com/surfalytics/analytics-course/tree/5b6c52f0ba66d63eb46dd508543e88e5f7b2c9df/02_getting_started_with_databases/07_duckdb • https://duckdb.org/ • https://aws.amazon.com/compare/the-difference-between-olap-and-oltp/