They say Spark is nimble. How do I make the most of it? Master the inner Spark device so that your work turns into a laser beam and the cluster can withstand maximum weight.
Tell me if it looks like you:
- you run 3 large jobs with the same DataFrame, so you try to cache it, but then you look into the user interface and it's nowhere to be found
- finally you got the cluster you asked for... and then you ask, "How many performers do I have to choose?"
- You have a simple job with 1GB of data that takes 5 minutes for 1149 tasks... and 3 hours for the last task
- you have a big data set and you know you have to break it properly, but you can't choose a number from 2 to 50,000 because you can find good reasons for both
- you're looking for "cashing," "serialization," "splitting," "setting up" and finding only incomprehensible blog posts and narrow StackOverflow questions.
In Spark Optimization 1, you've learned how to write productive code. It's time to do sports and set up Spark as best you can. You're browsing the only course on the Internet that makes the most of Spark's features and capabilities. With the techniques you're learning here, you'll save time, money, energy and get rid of headaches.
With this course, we cut the weeds under the root. We delve deeper into Spark and understand what tools you have at your disposal, and you may be surprised at how much leverage you have. You'll learn 20 degrees of techniques and optimization strategies. Each of them individually can give at least a double increase in the performance of your work (some of them even 10 times), and I show it on camera.
What awaits you:
- You'll understand Spark's internal device to explain how Spark is already damn fast.
- You'll be able to predict in advance whether it will take a long time
- You diagnose hang-ups, stages and tasks
- You'll discover and correct data skews
- You will make the right choice between speed, memory usage and resilience.
- You'll be able to customize your cluster with optimal resources
- You'll save hours of computing time just in this course (not to mention the product!)
- You'll control the parallelism of your tasks by breaking down correctly