Course Description

This course is an introduction to learning big data tools with a focus on the Hadoop Ecosystem. Students will gain a clear understanding of Hadoop concepts, technology landscapes, and market trends. They will learn how to construct queries of moderate to high complexity using Pig/Hive to work with big data. Students will get hands-on experience working with interesting real-world datasets as geo-tagged social media posts, check-in datasets, and business tracking platform review data. Various use cases will be utilized such that students can apply what they learned in their daily work. 

What Will You Learn?

Develop a comprehensive understanding of big data and its industrial and sectoral applications.

Learn how to:

  • Engage in big data and AI computing (cloud computing) and their industrial applications.
  • Utilize Hadoop ecosystem for big data.
  • Employ Linux file systems, bash commands, and regular expressions.
  • Write complex queries on big data using Apache Hive to query data stored in various databases and file systems that integrate with Hadoop.
  • Write scripts and analyze data using Apache Spark to efficiently execute streaming and machine learning on big data.
  • Leverage network analyses and their use cases.


Students are recommended to check the important dates for the Chang School current term before enrolling in the course and paying the fees. Notably, the Azure Virtual Desktop assigned to students will be accessible two days after the course’s starting date, and swapping between sections will not be permitted.

Students are also encouraged to download the Microsoft Remote Desktop app to access the software needed to complete this course’s requirements. Students also need to test the compatibility of the computer they plan to use before the first session, as machines operated using a third-party administrator, such as laptops provided by a workplace, may not allow access to the required software/download(s). International students might need to use their virtual private network (VPN) software if they cannot connect to University resources.

Students who wish to take CIND 719 and CIND 840 Practical Approaches in Machine Learning concurrently, as part of the Practical Data Science and Machine Learning Certificate should contact Ceni Babaoglu, Assistant Program Director, Data Science at cenibabaoglu@torontomu.ca


Department Consent Required.  Please submit the Request Department Consent form, or contact Ceni Babaoglu, Assistant Program Director, Data Science at cenibabaoglu@torontomu.ca for more information. 

Department consent may be provided if the student has specific professional experience.


Thank you for your interest in this course. Unfortunately, this course currently isn't available.

Please browse our courses for more options or check back later for updated scheduling.