Some common libraries for data analytics in Python, such as Numpy, Pandas, Scikit-Learn, etc. usually work well if the dataset fits into the existing RAM on a single machine. However, when dealing with large datasets, it can be a significant challenge to work around such memory constraints. This is where Dask can help. Dask provides a framework and libraries that can handle large datasets on a single multi-core machine or on a cluster.
This course provides an introduction to Dask.
Live Session Dates: Nov. 15 & 18
NOTE: If enrolling after the live session dates, recordings of such can be viewed online.
- Teacher: Jinhui Qin
Access is restricted to Digital Research Alliance of Canada (formerly Compute Canada) authenticated users only: Yes