Rock the JVM - Spark Streaming with Scala

Rock the JVM - Spark Streaming with Scala

Register & Get access to index
2868462_017a.jpg

Stream big data in real time with Spark and integrate any data source from Kafka to Twitter.


Nothing static, everything in motion.

You probably already know: Spark is the most popular big data computing engine, the most serviced and with a proven performance record. It's 100 times faster than the old MapReduce paradigm, and it can be easily expanded with machine learning, streaming, and more.

In this course, we will take a natural step forward: we will process big data as it becomes available.

What awaits you:

  • You'll find out how Spark Structured Streaming and Spark's "regular" batches are similar and different.
  • You'll be working with new streaming abstractions (DStreams) for low-level, high-control processing.
  • You integrate Kafka, JDBC, Cassandra and Akka Streams (!), so you can integrate whatever you like afterwards.
  • You'll be working with powerful state-tracking APIs that few know how to use correctly.
And some additional perks:

  • You'll have access to all the code I write on camera (2200 q LOC)
  • (soon) You'll have access to slides
This course is for Scala and Spark programmers who need to process streaming data, not one-time or packaged processing. If you have never studied Scala or Spark, this course is not for you.

Project 1: Twitter


In this project we will integrate live data from Twitter. We'll create a customizable data source that we'll use with Spark, and do a variety of analyses: the length of tweets that the most commonly used hashtags in real time. You can use this project as a model for any data source you might want to integrate. At the very end, we'll use Stanford's NLP library to analyze the moods in tweets and find out the general state of social media.

You'll learn:

  • how to set up your own data receiver, which you can manage yourself and "extract" new data
  • how to create dStream from your custom code
  • how to get data from Twitter
  • how to aggregate tweets
  • how to use the Stanford coreNLP library to analyze moods
  • how to apply mood analysis to real-time tweets
Project 2: A Science Project

In this project, we will write a full-featured web application that will support multiple users who are subject to scientific testing. We study the effects of alcohol/substances/insert_your_addictive_drug_like_Scala on reflexes and reaction times. We will send the data through a web interface connected to the REST end point, then the data will pass through the Kafka broker and finally to the spark streaming server part, which will process the data. You can use this app as a model for any full-featured app that combines and processes real-time Spark streaming data from any number of simultaneous working users.

You'll learn:

  • how to set up an HTTP server in minutes with Akka HTTP
  • how to manually send data via Kafka
  • how to aggregate data in a way that is almost impossible in the sql
  • how to write a full-featured application with a web interface, Akka HTTP, Kafka and Spark Streaming
Author
TUTProfessor
Downloads
58
Views
975
First release
Last update
Rating
1.00 star(s) 1 ratings

More resources from TUTProfessor

Latest reviews

There is a problem with the Mega link. The link has been removed, why don't try to put again online but with a password? So other members external to the forum cannot declare the file as not matching the copyright. Also try to call it with a more "random" name. As for eg: Videos_of_my_wedding.zip