AMP Camp 4 consisted of talks and hands-on tutorials at the 2014 O’Reilly Strata Conference in Santa Clara, CA on February 11, 2014.
AMP Camp 4 included two parts:
- Part 1 – Talks: Faster and Smarter Big Data Analysis with BlinkDB, MLlib, GraphX and Tachyon: New components of the Berkeley Data Analytics Stack (BDAS) (Morning session)
- Part 2 – Tutorials: Hands-on training with the newest BDAS components: Learn BlinkDB, GraphX, Tachyon, MLlib, Spark, Spark Streaming, and Shark (Afternoon Session)
The first of the two-part big data analysis training series, presented three new cutting edge components of the Berkeley Data Analytics Stack (BDAS), and provided a brief introduction to the stack as a whole. We started by covering Spark, Spark Streaming, and Shark. Then took a dive into four newly released components of BDAS. We discussed BlinkDB, MLlib, GraphX, and Tachyon.
- Introduction to BDAS stack – Ion Stoica, UC Berkeley –
- Introduction to Spark – Patrick Wendell, Databricks –
- Tachyon – Ali Ghodsi, UC Berkeley –
- GraphX – Joseph Gonzales, UC Berkeley –
- BlinkDB – Sameer Agarwal, UC Berkeley –
- MLlib – Evan Sparks, UC Berkeley –
For the second part of camp, we provided hands-on training for BlinkDB, MLlib, GraphX, Tachyon, Spark, and Shark. We provided each audience member access to an EC2 cluster pre-loaded with real-world datasets, and walked them through hands-on exercises analyzing the data using the aforementioned technologies.
Additionally, we learned to use more mature components of the stack including the Spark and Shark command line interfaces for ad-hoc analysis and Spark Streaming, the real-time component of Spark.