Some common libraries for data science in Python, such as Numpy, Pandas, Scikit-Learn, etc. usually work well if the dataset fits into the RAM on a single machine. When dealing with large datasets, it could be a challenge to work around memory constraints. This course module provides an introduction to scalable and accelerated data science with Dask and RAPIDS. Dask provides a framework and libraries that can handle large datasets on a single multi-core machine or crossing multiple machines on a cluster; while RAPIDS can help to offload analytics workloads to GPUs to accelerate your data science and analytics toolchain with minimal code changes.
Live online classes will take place on Wed. Nov. 8, Mon. Nov. 13, and Wed. Nov. 15 from 1pm to 2pm Eastern Time. Recordings of live sessions will be available afterwards for self-paced learning.
- Teacher: Jinhui Qin