What is the Student Dataset?
The CanPath Student Dataset is a synthetic dataset that was manipulated to mimic CanPath’s nationally harmonized data but does not include or reveal actual data of any CanPath participants.
Canadian university and college instructors can use the CanPath Student Dataset for free for their academic courses. CanPath will provide the Student Dataset and a supporting data dictionary.
Advantages of the CanPath Student Dataset
- Large sample size (over 40,000 participants)
- Real-world population-level Canadian data
- A variety of areas of information, allowing for a wide range of research topics
- No cost to faculty members
- Potential for students to apply for real CanPath data to publish their findings
“The dataset was easy to use, and the number of variables it included would be beneficial in many analyses. Also, the dataset included a very large number of observations that made a strong analysis possible.”
MPH Student, Dalla Lana School of Public Health, University of Toronto
What’s available?
Student Dataset Access Process
Completed applications and supporting documents can be submitted by email to apply@canpath.ca. Applications will be reviewed within two weeks.
Eligibility Criteria
- Applicant must be an instructor at a Canadian university or college;
- The dataset is being requested for use in an academic course;
- The course objectives are relevant to CanPath’s purpose, vision and mission;
- The CanPath dataset aligns with course objectives and methods.
Required Documents
- Completed Application Form
- Copy of REB application*
- REB decision letter or proof of exemption
- Brief CV of Applicant (2 pages)
- Course syllabus**
*An REB application, decision letter, and proof of exemption are only required if another dataset is being used along with the CanPath Student Dataset in the course
**The course syllabus must cite the use of the CanPath Student Dataset
After each iteration of the course, users are required to provide CanPath feedback on the use of the dataset using the Synthetic Dataset Utilization Form.
Student Project Examples
In Summer 2023, the Artificial Intelligence for Public Health (AI4PH) Summer Institute hosted twenty-two graduate students, post-doctoral fellows, and early-career researchers at the Fields Institute at the University of Toronto (U of T). The trainees were supplied with the CanPath Student Dataset. Over five days, the trainees participated in various learning sessions, including a data challenge that progressed from selecting and preparing data to creating machine-learning models, evaluating model accuracy, and ultimately making causal conclusions.
“This synthesis allows for a comprehensive understanding of health-related problems, formulating them with the right techniques, and eventually solving them.”
Hassan Maleki Golandouz, PhD Student at the University of Manitoba and participant at the AI4PH Summer Institute
Other examples of student projects:
- Work Schedule and Binge Drinking
- Fruit and Vegetable Intake and Colorectal Cancer
- Smoking and Multiple Sclerosis
- In Vitro Fertilization and Cardiovascular disease
- Anxiety and Migraines
- Green Space and Obesity
- Education and Blood Pressure
- Anxiety and Addiction
Please note: The CanPath Student Dataset is for training purposes only and cannot be used for publication. Students interested in finding out if their project results can be replicated using real CanPath data for potential publication can apply through the regular CanPath Access Process. A reduced fee is available to students and trainees applying for access to CanPath data and biosamples.