CanPath Student Dataset

The CanPath Student Dataset, now known as the CanPath Synthetic Dataset, has been a valuable resource for students and educators alike. This page focuses on showcasing various applications and examples of how students have utilized this dataset in their academic projects and research.

What is the Synthetic Dataset?

Previously known as the CanPath Student Dataset, the CanPath Synthetic Dataset was manipulated to mimic CanPath’s nationally harmonized data but does not include or reveal actual data of any CanPath participants.

Canadian university and college instructors can use the CanPath Synthetic Dataset for free for their academic courses. CanPath will provide the Synthetic Dataset and a supporting data dictionary.

“The dataset was easy to use, and the number of variables it included would be beneficial in many analyses. Also, the dataset included a very large number of observations that made a strong analysis possible.”

MPH Student, Dalla Lana School of Public Health, University of Toronto

Student Project Examples

In Summer 2023, the Artificial Intelligence for Public Health (AI4PH) Summer Institute hosted twenty-two graduate students, post-doctoral fellows, and early-career researchers at the Fields Institute at the University of Toronto (U of T). The trainees were supplied with the CanPath Synthetic Dataset. Over five days, the trainees participated in various learning sessions, including a data challenge that progressed from selecting and preparing data to creating machine-learning models, evaluating model accuracy, and ultimately making causal conclusions.

“This synthesis allows for a comprehensive understanding of health-related problems, formulating them with the right techniques, and eventually solving them.”

Hassan Maleki Golandouz, PhD Student at the University of Manitoba and participant at the AI4PH Summer Institute
Students at the AI4PH Summer Institute, who partook in the CanPath Student Dataset challenge
Students and faculty members at the AI4PH Summer Institute

Other examples of student projects:

Please note: The CanPath Synthetic Dataset is for training purposes only and cannot be used for publication. Students interested in finding out if their project results can be replicated using real CanPath data for potential publication can apply through the regular CanPath Access Process. A reduced fee is available to students and trainees applying for access to CanPath data and biosamples.

Apply Today

Questions?