On This Page
AMP Camp One – Big Data Bootcamp Berkeley 2012 was held in Berkeley California in August 2012. In addition, it was broadcast live and video archived for free. Find links to the slides and video archives of the AMP Camp talks below.
8:00am Continental Breakfast
9:00am Introduction to Big Data & the AMPLab (Michael Franklin)
An overview of the Big Data problem and its main concerns, such as data acquisition, cleaning, analysis, public data, and crowdsourcing.
9:20am Overview of the AMP Camp Curriculum (Andy Konwinski)
A preview of the next two days, including the AMP Camp agenda and vision.
9:30am Warehouse-scale computing (Ion Stoica)
A discussion of the hardware and software typical for big data processing, the costs and capacities of different hardware resources, and an overview of the AMPLab’s BDAS stack.
10:00am Parallel programming with Spark – Part 1 (Matei Zaharia)
A brief intro to Scala and exploring data in the Spark Shell.
10:55am Coffee Break
11:10am Parallel programming with Spark – Part 2 (Matei Zaharia)
Writing standalone Spark programs using Scala or Java.
Introductions to the Hive Data Model and Metadata Management, as well as querying Data in Shark.
2:00pm Machine Learning – Part 1 (Ariel Kleiner)
An overview of Machine Learning and an introduction to classification.
3:00 3:20pm Coffee Break 3:15pm 3:40pm Machine Learning – Part 2 (Tamara Broderick)
A continuation of the Machine Learning session, covering clustering.
4:15pm 4:45pm Hands-on Exercises using EC2
Manipulate multi-gigabyte datasets with Spark, Shark, and the distributed Machine Learning algorithms covered today.
6:30pm 6:45pm – 8:30pm Reception (Wozniak Lounge, Soda Hall)
8:00am Continental Breakfast
9:00am Crowdsourcing (Tim Kraska)
A survey of crowdsourcing techniques and technologies including CrowdDB.
9:30am Advanced Spark Features (Matei Zaharia)
An introduction to advanced Spark features such as controllable partitioning, caching formats, and serialization.
10:20am Spark Python API (Josh Rosen)
A preview of features of the new Python API in development for Spark.
10:45am Coffee Break
11:00am Spark Streaming (Tathagata Das)
An introduction to large-scale near-real-time stream processing with the soon to be released Spark Streaming.
11:30am Managing Twitter clusters with Mesos (Benjamin Hindman)
An introduction to running frameworks on Mesos, including Hadoop and Spark, and an overview of how Twitter uses Mesos to build and manage production services.
1:30pm Carat – Collaborative Energy Debugging (Adam Oliner)
Detecting and diagnosing misbehavior in mobile apps using big data + cloud + crowd.
2:00 Spark User Applications
- 2:00pm Conviva (Dilip Joseph)
- 2:30pm Quantifind (Erich Nachbar)
- 3:00pm Mobile Millennium. (Tim Hunter)
3:30pm – 3:40pm Wrap-up & Future Directions (Michael Franklin)
We will be using piazza.com to manage Q&A throughout both days of AMP Camp. Folks that register before opening day will be enrolled in the piazza class and will be able to post questions and collaborate to edit responses to these questions. In addition, AMP Camp instructors will also answer questions, endorse good answers, and moderate. For U.C. Berkeley students, enroll as a student at the AMP Camp Piazza Class. Non-students that registration before August 21st will be enrolled automatically.