DSC 232R. Big Data Analytics Using Spark (4 units)
Link to catalog page: https://catalog.ucsd.edu/courses/DSC.html#dsc232r
Description
This course covers techniques for achieving scalability in data analysis, using tools such as MapReduce, Hadoop, and Spark. Topics include programming Spark using PySpark; identifying the computational tradeoffs in a Spark application; performing data loading and cleaning using Spark and Parquet; modeling data through statistical and machine learning methods, and mitigating bottlenecks that arise in massive parallel computations by using the Spark framework. This is a distance education course. Prerequisites: DSC 255R. Restricted to major codes DS78 and DS79. All other students with graduate standing may be considered as space permits.
Prerequisite courses
Loading...
Successor courses
No courses have DSC 232R as a prerequisite.