CanPath Student Dataset

The CanPath Student Dataset provides students with the unique opportunity to gain hands-on experience working with CanPath data.

What is the Student Dataset?

The CanPath Student Dataset is a synthetic dataset that was manipulated to mimic CanPath’s nationally harmonized data but does not include or reveal actual data of any CanPath participants.

Canadian university and college instructors can use the CanPath Student Dataset for free for their academic courses. CanPath will provide the Student Dataset and a supporting data dictionary.

Advantages of the CanPath Student Dataset

“The dataset was easy to use, and the number of variables it included would be beneficial in many analyses. Also, the dataset included a very large number of observations that made a strong analysis possible.”

MPH Student, Dalla Lana School of Public Health, University of Toronto

What’s available?

Canadian Data

The synthetic dataset is similar to a random sample of CanPath data, which includes participants from the BC Generations Project, Alberta’s Tomorrow Project, the Ontario Health Study, CARTaGENE, and Atlantic PATH.

The student dataset includes over 40,000 observations with 403 categorical variables from the CanPath Baseline and Additional Diseases Questionnaires.

Areas of Information

Variables include socio-demographic and economic information, lifestyle and behaviour (e.g. tobacco use, alcohol use, nutrition), perception of health, and select self-reported diseases such as high blood pressure, arthritis, and first cancer.

CANUE Environmental Exposure Variables

The student dataset also includes environmental variables originating from the Canadian Urban Environmental Health Research Consortium (CANUE) dataset, such as material deprivation index and annual average exposure to ambient air pollution.

Student Dataset Access Process

Completed applications and supporting documents can be submitted by email to Applications will be reviewed within two weeks.

Eligibility Criteria

  • Applicant must be an instructor at a Canadian university or college;
  • The dataset is being requested for use in an academic course;
  • The course objectives are relevant to CanPath’s purpose, vision and mission;
  • The CanPath dataset aligns with course objectives and methods.

Required Documents

  1. Completed Application Form
  2. Copy of REB application*
    • REB decision letter or proof of exemption  
  3. Brief CV of Applicant (2 pages) 
  4. Course syllabus** 

*An REB application, decision letter, and proof of exemption are only required if another dataset is being used along with the CanPath Student Dataset in the course

**The course syllabus must cite the use of the CanPath Student Dataset

After each iteration of the course, users are required to provide CanPath feedback on the use of the dataset using the Synthetic Dataset Utilization Form.

Student Project Examples

In Summer 2023, the Artificial Intelligence for Public Health (AI4PH) Summer Institute hosted twenty-two graduate students, post-doctoral fellows, and early-career researchers at the Fields Institute at the University of Toronto (U of T). The trainees were supplied with the CanPath Student Dataset. Over five days, the trainees participated in various learning sessions, including a data challenge that progressed from selecting and preparing data to creating machine-learning models, evaluating model accuracy, and ultimately making causal conclusions.

“This synthesis allows for a comprehensive understanding of health-related problems, formulating them with the right techniques, and eventually solving them.”

Hassan Maleki Golandouz, PhD Student at the University of Manitoba and participant at the AI4PH Summer Institute
Students at the AI4PH Summer Institute, who partook in the CanPath Student Dataset challenge
Students and faculty members at the AI4PH Summer Institute

Other examples of student projects:

Please note: The CanPath Student Dataset is for training purposes only and cannot be used for publication. Students interested in finding out if their project results can be replicated using real CanPath data for potential publication can apply through the regular CanPath Access Process. A reduced fee is available to students and trainees applying for access to CanPath data and biosamples.

Apply Today
