DSC 232R. Big Data Analytics Using Spark (4 units)
Link to catalog page: https://catalog.ucsd.edu/courses/DSC.html#dsc232r
Description
This course covers techniques for achieving scalability in data analysis, using tools such as MapReduce, Hadoop, and Spark. Topics include programming Spark using PySpark; identifying the computational tradeoffs in a Spark application; performing data loading and cleaning using Spark and Parquet; modeling data through statistical and machine learning methods, and mitigating bottlenecks that arise in massive parallel computations by using the Spark framework. This is a distance education course. Prerequisites: DSC 255R. Restricted to major code DS77. All other students with graduate standing may be considered as space permits.
Prerequisite courses
Loading...
Successor courses
No courses have DSC 232R as a prerequisite.