Udemy - Batch Processing with Apache Beam in Python - TutFlix

Easy to follow, hands-on introduction to batch data processing in Python

What you'll learn

Core concepts of the Apache Beam framework
How to design a pipeline in Apache Beam
How to install Apache Beam locally
How to build a real-world ETL pipeline in Apache Beam
How to read and write CSV data from Apache Beam
How to apply built-in and custom transformations on a dataset
How to deploy your pipeline to Cloud Dataflow on Google Cloud

Requirements

Python programming experience
Having an idea of distributed data processing e.g. You have used Spark before
Having Conda (or other Virtual Environment Manager) installed on your machine

Description
Apache Beam is an open-source programming model for defining large scale ETL, batch and streaming data processing pipelines. It is used by companies like Google, Discord and PayPal.
In this course you will learn Apache Beam in a practical manner, with every lecture comes a full coding screencast. By the end of the course you'll be able to build your own custom batch data processing pipeline in Apache Beam.
This course includes 20 concise bite-size lectures and a real-life coding project that you can add to your Github portfolio! You're expected to follow the instructor and code along with her.
You will learn:

How to install Apache Beam on your machine
Basic and advanced Apache Beam concepts
How to develop a real-world batch processing pipeline
How to define custom transformation steps
How to deploy your pipeline on Cloud Dataflow

This course is for all levels. You do not need any previous knowledge of Apache Beam or Cloud Dataflow.
Who this course is for:

Data Engineers
Aspiring Data Engineers
Python developers interested in Apache Beam

Reactions: destroyer27210, maxyhunt, general_zod and 29 others

Search

Search

Udemy Batch Processing with Apache Beam in Python

More resources from TUTProfessor

Share this resource