Training tomorrow’s analysts: CanPath workshop showcases the power of synthetic data

What does it take to make population health data more accessible for research and education?
At its inaugural Synthetic Dataset Workshop, CanPath welcomed a diverse group of researchers, instructors, and trainees for a full day of hands-on learning. Together, they explored how synthetic data and cloud-based tools can lower barriers to population health research and teaching, while still offering powerful, real-world insights.
CanPath is Canada’s largest population health study, following over 330,000 Canadians to explore how genetics, environment, lifestyle, and behaviour influence chronic disease. But how can researchers and instructors gain experience with this kind of real-world data without navigating the lengthy process of ethics and data access approvals?
The CanPath Synthetic Dataset and its secure Trusted Research Environment offer one solution. The synthetic dataset is a free teaching and research tool that mimics the statistical structure of real CanPath data, without revealing any participant information. It’s hosted on the Lifebit platform, a cloud-based environment where users can build cohorts, run statistical models, and get hands-on practice in a secure, scalable space.

On June 24, 2025, at CanPath’s inaugural Synthetic Dataset Workshop, attendees got the chance to try it all firsthand.
The day kicked off with an overview of CanPath and the synthetic dataset, delivered by Noah Frank, CanPath Research Operations Manager. Created using the open-source R package synthpop, the Synthetic Dataset mirrors real CanPath data without containing any actual participant information. It preserves statistical patterns between variables, making it ideal for safe exploration in both research and education.

After an orientation to the Lifebit platform, participants launched into a guided genome-wide association study (GWAS) mini-exercise. While some found the early steps daunting, staff from CanPath and Lifebit moved from table to table offering guidance and reassurance.
“The analysis was a bit advanced for me, but I came to learn about the user experience, and I got exactly what I needed,” said Sophie Hogeveen, Data Access Officer from the Canadian Longitudinal Study on Aging (CLSA). “It gave me a lot to think about as CLSA prepares to move to a cloud-based environment.”
Lifebit’s Sangram Keshari Sahu walked participants through the technical side of the platform, highlighting how to build cohorts, query variables, and run pipelines. While some examples leaned toward advanced applications, they helped spark ideas about what the platform could do, especially for instructors thinking about integrating the synthetic dataset into a course.

“A couple of attendees mentioned they’d usually do this kind of work locally,” said Jeff Brabec, Senior Client Success Manager at Lifebit. “But that’s the whole point of using a cloud-based Trusted Research Environment. Synthetic or not, when you’re working with real health data, security matters.”
Later in the day, CanPath Data Manager Sheraz Cheema demonstrated how to use the dataset for public health research questions, walking through real-world case studies. His patient, step-by-step approach helped attendees see how to adapt the tool for their own teaching and learning needs.

“I came in expecting more of a focus on genomics and transcriptomics,” said Phuong Nguyen, first-year PhD student from the University of Toronto’s Institute of Medical Science, “but I was surprised, in a good way, to learn about the environmental and clinical data available. Seeing how those factors play together opens up new possibilities for my research.”
The 22 participants brought a range of perspectives: researchers in genomics, environmental health, and clinical science, educators, support staff, and students. They joined from across Ontario and Saskatchewan, with presenters travelling from the UK and the US. Career stages ranged from undergraduate students to senior faculty members.

“I took a lot of notes during the stats portions, especially around regression modeling,” Phuong added. “It brought back a lot of concepts, and I’m excited to brush up on the math and apply it to my work. I even had a great chat with a postdoc who suggested some machine learning approaches to try. It was really encouraging.”
“One of the challenges with workshops like this is the range of technical levels in the room,” said Jeff. “But everyone was able to ask questions at their own level, and we got them all answered. That’s the best outcome.”
One of the highlights was the collaborative spirit during the hands-on session. Staff walked around the room, supporting participants in real-time.

“For me, the most valuable part was meeting people from so many different fields,” Sangram reflected. It really opens up new opportunities and helps you understand what others are working on. The in-person component made a big difference. It was definitely worth flying so many hours from London to be here. And from a tech side, it was a great stress test. Our team was checking everything in the background to make sure it all ran smoothly, and it did.”

This was CanPath’s first workshop of its kind, but interest in synthetic data and trusted research environments is only growing. The Synthetic Dataset Workshop participants can continue using the platform, and they had the opportunity to join individualized Office Hours for personalized support. Their feedback will help shape future sessions.
“I had attended training and read through lots of documentation about the cloud environment previously, but still didn’t fully understand how researchers would actually use the platform,” said Maya Vu, Program and Policy Consultant at Healthy Future Sask, CanPath’s Saskatchewan cohort. “Doing it live with staff support really helped it click.”
Making population health data accessible for teaching and research takes more than a dataset. It takes tools, training, and a sense of community, something this workshop proved is well within reach.
Want to explore the synthetic dataset and platform yourself?
Questions? Email us at apply@canpath.ca
For more information, please contact:
Megan Fleming
Communications & Knowledge Translation Officer
Canadian Partnership for Tomorrow’s Health (CanPath)
info@canpath.ca