Welcome to the AMP Camp 6 hands-on exercises! These exercises are extended and enhanced from those given at previous AMP Camp Big Data Bootcamps. They were written by volunteer graduate students and postdocs in the UC Berkeley AMPLab. Many of those same graduate students are present today as teaching assistants. The exercises we cover today will have you working directly with the Spark specific components of the AMPLab’s open-source software stack, called the Berkeley Data Analytics Stack (BDAS).
In order to get the most out of this course, we assume:
- You have experience programming in Python or Scala
- You have a laptop
- Your laptop has Java 7 or 8 installed
If you would like a quick primer on Scala, check out the following doc in the appendix:
|SparkR||R only||R only||R only|
|Spark Time Series||yes||no||no|
In several of the proceeding training modules, you can choose which language you want to use as you follow along and gain experience with the tools. The following table shows which languages this mini course supports for each section. You are welcome to mix and match languages depending on your preferences and interests.
The modules we will cover at the AMPCamp training are listed below. These can be done in any order according to your interests, though we recommend that new users start with Spark.
Note: Please follow the setup instructions at the Getting Started page before any of the exercises.
|Spark||Use the Spark shell to write interactive queries||Short||Programming Guide|
|Spark SQL||Use the Spark shell to write interactive SQL queries||Short||Programming Guide|
|IndexedRDD||Use mutable RDDs||Medium||Github|
|Tachyon||Deploy Tachyon and try simple functionalities.||Medium||Project Website|
|SparkR||Interactive Data Analytics using Spark in R||Short||Programming Guide|
|Succinct||Query compressed data with Succinct||Medium||Project Page|
|KeystoneML||Text and Image classification with KeystoneML||Medium||Project Page|
|Splash||Use Splash to run stochastic learning algorithms||Short||Project Page|
|Spark Time Series||Analyze time series data||Long|
IRC Chat Room
A chat room is available for participants to connect with each other and get realtime help with exercises. The room can be joined here or by using an IRC client to connect to the #ampcamp channel on the FreeNode (irc.freenode.net) network.