Rock the JVM - Spark Streaming with Scala - TutFlix

Stream big data in real time with Spark and integrate any data source from Kafka to Twitter.

Nothing static, everything in motion.

You probably already know: Spark is the most popular big data computing engine, the most serviced and with a proven performance record. It's 100 times faster than the old MapReduce paradigm, and it can be easily expanded with machine learning, streaming, and more.

In this course, we will take a natural step forward: we will process big data as it becomes available.

What awaits you:

You'll find out how Spark Structured Streaming and Spark's "regular" batches are similar and different.
You'll be working with new streaming abstractions (DStreams) for low-level, high-control processing.
You integrate Kafka, JDBC, Cassandra and Akka Streams (!), so you can integrate whatever you like afterwards.
You'll be working with powerful state-tracking APIs that few know how to use correctly.

And some additional perks:

You'll have access to all the code I write on camera (2200 q LOC)
(soon) You'll have access to slides

This course is for Scala and Spark programmers who need to process streaming data, not one-time or packaged processing. If you have never studied Scala or Spark, this course is not for you.

Project 1: Twitter

In this project we will integrate live data from Twitter. We'll create a customizable data source that we'll use with Spark, and do a variety of analyses: the length of tweets that the most commonly used hashtags in real time. You can use this project as a model for any data source you might want to integrate. At the very end, we'll use Stanford's NLP library to analyze the moods in tweets and find out the general state of social media.

You'll learn:

how to set up your own data receiver, which you can manage yourself and "extract" new data
how to create dStream from your custom code
how to get data from Twitter
how to aggregate tweets
how to use the Stanford coreNLP library to analyze moods
how to apply mood analysis to real-time tweets

Project 2: A Science Project

In this project, we will write a full-featured web application that will support multiple users who are subject to scientific testing. We study the effects of alcohol/substances/insert_your_addictive_drug_like_Scala on reflexes and reaction times. We will send the data through a web interface connected to the REST end point, then the data will pass through the Kafka broker and finally to the spark streaming server part, which will process the data. You can use this app as a model for any full-featured app that combines and processes real-time Spark streaming data from any number of simultaneous working users.

You'll learn:

how to set up an HTTP server in minutes with Akka HTTP
how to manually send data via Kafka
how to aggregate data in a way that is almost impossible in the sql
how to write a full-featured application with a web interface, Akka HTTP, Kafka and Spark Streaming

Reactions: m.shou, vampire_p, leather face and 57 others

Search

Search

Rock the JVM - Spark Streaming with Scala

More resources from TUTProfessor

Share this resource

Latest reviews