AMP Camp 4

AMP Camp 4 consisted of talks and hands-on tutorials at the 2014 O’Reilly Strata Conference in Santa Clara, CA on February 11, 2014.

AMP Camp 4 included two parts:

Talks 

The first of the two-part big data analysis training series, presented three new cutting edge components of the Berkeley Data Analytics Stack (BDAS), and provided a brief introduction to the stack as a whole. We started by covering Spark, Spark Streaming, and Shark. Then took a dive into four newly released components of BDAS. We discused BlinkDB, MLlib, GraphX, and Tachyon.

Presentations:

  • Introduction to BDAS stack – Ion Stoica, UC Berkeley – pdf
  • Introduction to Spark – Patrick Wendell, Databricks – ppt
  • Tachyon – Ali Ghodsi, UC Berkeley – ppt
  • GraphX – Joseph Gonzales, UC Berkeley – ppt
  • BlinkDB – Sameer Agarwal, UC Berkeley – ppt
  • MLlib – Evan Spark, UC Berkeley – pdf

Exercises

For the second park of camp, we provided hands-on training for BlinkDB, MLlib, GraphX, Tachyon, Spark, and Shark. We provided each audience member access to an EC2 cluster pre-loaded with real-world datasets, and walked them through hands-on exercises analyzing the data using the aforementioned technologies.

Additonally, we learned to use more mature components of the stack including the Spark and Shark command line interfaces for ad-hoc analysis and Spark Streaming, the real-time component of Spark.