What is synthetic data?
Synthetic data is designed to replicate the statistical properties and structure of real-world data without compromising privacy. Created through advanced computer simulations and algorithms, synthetic data offers a secure and versatile alternative for researchers and data scientists.
- Privacy-Preserving: Synthetic data ensures the protection of individual privacy by not containing any real personal information.
- Realistic Representation: It mirrors the statistical characteristics of actual data, making it ideal for testing and training machine learning models.
- Wide Applicability: From healthcare to finance, synthetic data is invaluable for validating models and conducting experiments without the risk of exposing sensitive information.
What is the CanPath Synthetic Dataset?
The CanPath Synthetic Dataset was manipulated to mimic CanPath’s nationally harmonized data but does not include or reveal actual data of any CanPath participants.
How was it developed?
The synthetic dataset was created using an open-source R software package called “synthpop.” This package was designed to generate synthetic versions of longitudinal survey data. It randomly sampled the CanPath data, replacing and rearranging the participant information. So, the synthetic dataset preserves the statistical patterns (i.e., relationships between variables) but none of the real-world data.
What are the advantages of the CanPath Synthetic Dataset?
- Large sample size (over 40,000 participants)
- Real-world population-level Canadian data
- A variety of areas of information, allowing for a wide range of research topics
- No cost to faculty members
- Potential for students to apply for real CanPath data to publish their findings
What’s available?
Examples of Use
Canadian university and college instructors can use the CanPath Synthetic Dataset for free for their academic courses. CanPath will provide the Synthetic Dataset and a supporting data dictionary.
Access Process
Completed applications and supporting documents can be submitted by email to apply@canpath.ca. Applications will be reviewed within two weeks.
Eligibility Criteria
- Applicant must be an instructor at a Canadian university or college;
- The dataset is being requested for use in an academic course;
- The course objectives are relevant to CanPath’s purpose, vision and mission;
- The CanPath dataset aligns with course objectives and methods.
Required Documents
- Completed Application Form
- Copy of REB application*
- REB decision letter or proof of exemption
- Brief CV of Applicant (2 pages)
- Course syllabus**
*An REB application, decision letter, and proof of exemption are only required if another dataset is being used along with the CanPath Synthetic Dataset in the course
**The course syllabus must cite the use of the CanPath Synthetic Dataset
After each iteration of the course, users are required to provide CanPath feedback on the use of the dataset using the Synthetic Dataset Utilization Form.
Next Steps
Following application approval, users must review and sign the Synthetic Dataset User Acknowledgement.
For all other inquiries, please connect with the CanPath Access Office.