What is the Synthetic Dataset?
Previously known as the CanPath Student Dataset, the CanPath Synthetic Dataset was manipulated to mimic CanPath’s nationally harmonized data but does not include or reveal actual data of any CanPath participants.
Canadian university and college instructors can use the CanPath Synthetic Dataset for free for their academic courses. CanPath will provide the Synthetic Dataset and a supporting data dictionary.
“The dataset was easy to use, and the number of variables it included would be beneficial in many analyses. Also, the dataset included a very large number of observations that made a strong analysis possible.”
MPH Student, Dalla Lana School of Public Health, University of Toronto
Student Project Examples
In Summer 2023, the Artificial Intelligence for Public Health (AI4PH) Summer Institute hosted twenty-two graduate students, post-doctoral fellows, and early-career researchers at the Fields Institute at the University of Toronto (U of T). The trainees were supplied with the CanPath Synthetic Dataset. Over five days, the trainees participated in various learning sessions, including a data challenge that progressed from selecting and preparing data to creating machine-learning models, evaluating model accuracy, and ultimately making causal conclusions.
“This synthesis allows for a comprehensive understanding of health-related problems, formulating them with the right techniques, and eventually solving them.”
Hassan Maleki Golandouz, PhD Student at the University of Manitoba and participant at the AI4PH Summer Institute
Other examples of student projects:
- Work Schedule and Binge Drinking
- Fruit and Vegetable Intake and Colorectal Cancer
- Smoking and Multiple Sclerosis
- In Vitro Fertilization and Cardiovascular disease
- Anxiety and Migraines
- Green Space and Obesity
- Education and Blood Pressure
- Anxiety and Addiction
Please note: The CanPath Synthetic Dataset is for training purposes only and cannot be used for publication. Students interested in finding out if their project results can be replicated using real CanPath data for potential publication can apply through the regular CanPath Access Process. A reduced fee is available to students and trainees applying for access to CanPath data and biosamples.