Data Bites

Your Research Data Management Refresher

News bites about important research data management principles that connects researchers to helpful practical resources. Data Bites is a regular feature in Compass.

Compass Issue 30 – December 4, 2025

The Rise of the Agents: Agentic AI

What is Agentic AI? 

AI tools are shifting from Chatbot AI (or passive assistants) to agentic AI which acts as an autonomous actor. AI agents are increasingly capable of independently executing complex research workflows like automating data collection or generating metadata.

Why Does this Matter for Research Data Management?

While efficiency increases, so does the risk of “agentic misalignment.” Research shows that when autonomous agents face pressure to complete a goal, they can act deceptively or recklessly. This effectively makes them a digital “insider threat.”

Unintended actions: An agent tasked with “optimizing server costs” might inadvertently shut down critical security logs or inadvertently delete raw data to save space.
Agent hijacking: Attackers have already started to target the agents themselves. If they compromise your AI assistant, they gain the ability to execute actions on your behalf.

How can I Prepare?

Demand explainability: Prioritize tools that feature Explainable AI (XAI) and “chain of thought” transparency. This logs the step-by-step reasoning of the agent so you can audit why it made a specific decision. This is critical for research reproducibility.
Install “circuit breakers”: Just as you wouldn’t give an intern keys to a secure server space, don’t give an AI agent unlimited autonomy. Ensure there is human approval before deleting files or changing access to permissions on shared datasets.
Limit access: Apply the principle of “least privilege” to your AI tools. Only grant them access to the specific files they need for the immediate task.
Sandbox the environment: Run agentic workflows in isolated environments, such as virtual machines, rather than on your main system. If an agent is hijacked or executes a destructive command, the damage is contained to the sandbox and your original research data files would remain protected.

In short, verify then trust. Never grant autonomy to a system that cannot explain its work.

Where Can I Find support?

The Research Information Security Program is available for consultations and other support services geared towards the University’s research community. You can also register for the Cyber Security and Your Research(ers) Office Hours. For additional resources and community support related to AI, visit AI research landscape at U of T.

Compass Issue 29 – October 2, 2025

Less is More: Data Minimization in Research

What is Data Minimization?

Data minimization is the principle of collecting, using and storing only the data that is absolutely necessary to achieve a specific research aim. Before collecting any information, it encourages you to ask the critical question: “Do I truly need this piece of data to answer my research question?”.

Why is Data Minimization Important?

Every record or piece of data you hold is a potential liability. The more data you collect, especially sensitive or personal information, the greater the potential harm if a security breach occurs.

By limiting your data collection, you reduce the potential target for a data breach. If the data doesn’t exist, it can’t be stolen. This is often referred to as reducing the “attack surface.”

Reduces Risk

In the event of a security breach, the damage is significantly lessened if the compromised dataset is minimal. Losing a small amount of essential, de-identified data is far less of a concern than losing vast amounts of detailed personal information.

Minimizes Impact

Data minimization is a core principle across many data privacy regulations and ethics frameworks, helping you meet legal and professional obligations.

Simplifies Compliance

How can I Practice Data Minimization?

Plan: Before beginning your research, create a clear data management plan, that includes a the essential data points to be collected.
Avoid “just-in-case” data: Resist the temptation to collect extra data that you think you might need later.
De-identify or pseudonymize early: Whenever possible, remove direct identifiers (like names or addresses) and indirect identifiers (like postal code or date of birth) from your data early in the process. If later re-identification may be necessary, keep keys separate.
Set retention limits and delete securely: Establish a schedule, document it, and dispose of data securely once it’s no longer needed for research, contractual or regulatory requirements.

In short, collect only what you need, protect what you keep, and securely remove the rest.

Where can I find Support?

The Research Information Security Program is available for consultations and other support services geared towards the University’s research community. Register for upcoming Cyber Security and Your Research(ers) Office Hours with an information security expert.

Compass Issue 27 – June 5, 2025

Where Is My Data and What Laws Apply to It? 

What is the Difference Between Data Residency and Data Sovereignty?

Refers to the physical or geographical location where data is stored and processed. For example, cloud services like Amazon Web Services (AWS), Azure and Google Cloud Platform (GCP) can be configured to store data exclusively in Canada, though some features may be limited by this. 

Data Residency

Refers to the legal jurisdiction governing your data, particularly when stored in another country or with foreign-owned cloud providers. Even if data is stored in Canada, using a foreign-owned cloud provider could subject your data to foreign laws due to company ownership. 

Data Sovereignty

How is this Relevant to my Research Data? 

Funding, data sharing or other research agreements may require Canadian data residency. Researchers need to ensure their storage and processing environments meet these compliance conditions. For particularly sensitive projects or proprietary work, Canadian data sovereignty may also be required. Due to concerns about foreign access, interference or influence, some researchers may opt for sovereign solutions regardless of external requirements. 

Who Can I Contact for More Information?   

Divisional or departmental IT: First point of contact for guidance selecting suitable infrastructure. 
Research Information Security Program: Offers consultations on security research data. 
Information Technology Services:
- Offers some Azure Cloud Services with Canadian residency enforcement. 
- Offers a Private Cloud Service hosted at the University to ensure sovereignty. 
SciNet: High-performance computing for researchers with Canadian data sovereignty.

Ongoing Sessions  

Have questions about data residency, data sovereignty, or other data security topics? Register for an upcoming Cyber Security and Your Research(ers) Office Hours session to speak to a Research Information Security Specialist.

Compass Issue 26 – April 3, 2025

Research Data Management and AI Tools

AI tools powered by Large Language Models (LLMs) are revolutionizing the way we handle research data. They can analyze documents, generate code, and create text, images and videos from prompts, making them potentially helpful tools in your research workflow. But before you dive in, let’s talk about two crucial risks you need to be aware of.

Data Protection

Most LLMs operate in the cloud, meaning the company behind them can retain a copy of everything you share with it. This could lead to unintended disclosures of private information. If you are working with non-public data, don’t share it with AI tools without contractual assurance that your data will not be saved or used to train the model. Your research data’s security is paramount!

Data Integrity

LLMs can produce incredibly useful content, but they can also make mistakes that aren’t easy to spot. AI tools can generate false statements, draw inaccurate conclusions, and fabricate non-existent references. These errors undermine research integrity and compromise research replicability. The onus is on researchers to ensure the accuracy of all their research outputs. If you choose to use an AI tool, you must always verify that AI-generated and AI-informed content is accurate and complete.

University Resources for Safe AI Use

University of Toronto researchers have access to the enterprise version of Microsoft Copilot, which is safe to use for up to Level 3 data and ensures your data won’t be retained or used for training. For more details, visit the Microsoft Copilot Tool Guide.

Learn how to use U of T library-licensed AI tools to power your literature exploration by watching the recording and accessing resources from the Introduction to Scopus AI and Web of Science Research Assistant to Explore Literature webinar on April 9, 2025.

Generative AI Considerations in Academic Research

For more information about using GenAI in your research workflow, check out the CRIS guide on Generative AI Considerations in Academic Research.

Compass Issue 25 – February 6, 2025

Data Ownership

Data Ownership – Whose Data Is It, Anyway?

Did you know that Feb 10-14 is International Love Data Week 2025? With the theme of Whose Data Is It, Anyway?, this year’s organizers invite researchers to reflect on where their data came from and who owns the data.

Some key questions to consider about data ownership:

What rights do I have to use and publish the data that my research team has collected?
What legal and ethical obligations do I have regarding the data?
What rights do research participants have to their data?
If I’m conducting research involving Indigenous peoples, how has data ownership been established to respect Indigenous data sovereignty?
How are the contributions of collaborators and students considered?
What are the academic institution’s policies on data ownership?
What happens to the data if I leave the institution or project?
Research data can sometimes be considered intellectual property (IP). How does the institution’s IP policy impact data ownership?
Are there any funder or publisher policies that impact data ownership?
If I’m using existing data, are there any restrictions on data usage due to licensing, data use agreements, or other permission limitations?

These questions don’t have a one-size-fits-all answer. Consult the University of Toronto Institutional Research Data Management Strategy, or connect with the Innovations & Partnerships Office (IPO) or the Centre for Research & Innovation Support (CRIS) for support. Contact the University of Toronto Libraries for more information about data licenses.

Want to Learn More?

Browse the full list of International Love Data Week 2025 events and activities, and view recorded sessions on the ICPSI YouTube channel.

Indigenous Data Sovereignty

Learn about and reflect on Indigenous data sovereignty in CRIS’s Learning Together Discussion Group: The Fundamentals of OCAP® (Ownership, Control, Access and Possession) program.

Compass Issue 24 – December 5, 2024

How to Prevent Data Loss

How Can Data Loss Occur?

While data can be lost through malicious interference or attack, it is most commonly caused by simple human error or a system failure. While the hope is that our data will always be there for us, a good backup strategy will help to minimize such loss and allow you to recover faster to carry on your research.

Strategies for Preventing Data Loss

A simple and effective backup strategy for preventing against data loss is commonly known as the 3-2-1 strategy.

You should have at least
3 copies of your data.

3 Copies

Improve resiliency by storing your backups on at least 2 different storage mediums. For example, 1 backup hosted in the cloud and 1 backup on an external drive.

2 Mediums

Don’t keep all of your backups in close proximity to each other, as a fire, flood or natural disaster could render them all unusable.

1 Off-site

How can I back up my data?

Your divisional or departmental IT staff are a fantastic resource regarding what tools, techniques and services might already be available to you to backup your data. Otherwise, the University’s Information Security Handbook provides practical how-tos on developing a backup strategy and backing up your data.

Securing and Maintaining my Backups

Remember that your backups are a copy of your data and intellectual property, therefore it is essential that just as you encrypt your working data that you encrypt your backups as well.

Having access to a working backup when you need it could be the difference between marginal inconvenience and the irrecoverable loss of years of work. It’s important to periodically test your backups by recovering from them and ensuring that they are usable.

Compass Issue 23 – October 3, 2024

Data Management Plan (DMP)

Creating Your Data Management Plan (DMP)

A Data Management Plan (DMP) is more than just a requirement—it’s a helpful tool for keeping your research organized and on track.

A DMP is a formal document that outlines how you will handle your research data throughout your project, covering everything from storage and security to sharing and preservation. It helps ensure your data is well-organized, easy to find, and secure, reducing the risk of pitfalls like data loss or duplication.

Why Bother with a DMP?

Think of it as your guide to making smart decisions about your data. It helps you plan for challenges, estimate resource needs, and keep your research team aligned. Plus, a solid DMP can make your life easier when meeting funder requirements. In fact, major funding agencies, including the Tri-Agencies, are starting to request a DMP as part of grant applications.

Not Sure Where to Start?

Visit the DRI Portal and The University of Toronto Libraries’ RDM website to take the first step toward a smoother research process and ensure your DMP meets both your project needs and funder expectations. You can also use the DMP Assistant, a free collaborative tool that provides customized templates and guidance. Or check out the McMaster Data Management Plan Database to see a variety of example DMPs.

Sign up to our Compass Newsletter

To stay up to date on important research data management principles, subscribe to the CRIS mailing list to receive our newsletters.

Questions?

Please email cris@utoronto.ca if you have any questions.

« Collecting and Accessing Research Data

Where Can I Access Software and Compute Resources? »

Table of Contents

Contents

Your Research Data Management Refresher
Compass Issue 30 – December 4, 2025
The Rise of the Agents: Agentic AI
Compass Issue 29 – October 2, 2025
Less is More: Data Minimization in Research
Compass Issue 27 – June 5, 2025
Where Is My Data and What Laws Apply to It? 
Compass Issue 26 – April 3, 2025
Research Data Management and AI Tools
Compass Issue 25 – February 6, 2025
Data Ownership
Compass Issue 24 – December 5, 2024
How to Prevent Data Loss
Compass Issue 23 – October 3, 2024
Data Management Plan (DMP)
Sign up to our Compass Newsletter
- Questions?

Data Bites

Your Research Data Management Refresher

The Rise of the Agents: Agentic AI

What is Agentic AI?

Why Does this Matter for Research Data Management?

How can I Prepare?

Where Can I Find support?

Less is More: Data Minimization in Research

What is Data Minimization?

Why is Data Minimization Important?

Reduces Risk

Minimizes Impact

Simplifies Compliance

How can I Practice Data Minimization?

Where can I find Support?

Where Is My Data and What Laws Apply to It?

What is the Difference Between Data Residency and Data Sovereignty?

Data Residency

Data Sovereignty ​

How is this Relevant to my Research Data?

Who Can I Contact for More Information?

Ongoing Sessions

Research Data Management and AI Tools

Data Protection

Data Integrity

University Resources for Safe AI Use

Generative AI Considerations in Academic Research

Data Ownership

Data Ownership – Whose Data Is It, Anyway?

Some key questions to consider about data ownership:

Want to Learn More?

Indigenous Data Sovereignty

How to Prevent Data Loss

How Can Data Loss Occur?

Strategies for Preventing Data Loss

3 Copies

2 Mediums​

1 Off-site

Securing and Maintaining my Backups

Data Management Plan (DMP)

Creating Your Data Management Plan (DMP)

Why Bother with a DMP?

Not Sure Where to Start?

Questions?

What is Agentic AI? 

Where Is My Data and What Laws Apply to It? 

Data Sovereignty

How is this Relevant to my Research Data? 

Who Can I Contact for More Information?   

Ongoing Sessions  

2 Mediums