Data Connection: Data Services

Research Data Lifecycle

Data Lifespan: Plan, Create, Process, Analyze, Disseminate, Preserview, Reuse

Managing Original Data

If you are conducting original research, and will be collecting, storing, and disseminating data, there are a few things you should know about research data management and the research data lifecycle. To learn more about appropriate research methods for collecting data, see Understanding Research Methods.

Managing Your Data

The Portage Network provides excellent information on managing the entire data lifecycle. The aim of Portage is to coordinate and expand existing expertise, services, and infrastructure so that all academic researchers in Canada have access to the support they need for research data management (RDM).

Research Data Management

Research data management (or RDM) is a term that describes the organization, storage, preservation, and sharing of data collected and used in a research project. It involves the everyday management of research data during the lifetime of a research project (for example, using consistent file naming conventions). It also involves decisions about how data will be preserved and shared after the project is completed (for example, depositing the data in a repository for long-term archiving and access).

There are a host of reasons why research data management is important:

  • Data, like journal articles and books, is a scholarly product.
  • Data (especially digital data) is fragile and easily lost.
  • There are growing research data requirements imposed by funders and publishers.
  • Research data management saves time and resources in the long run.
  • Good management helps to prevent errors and increases the quality of your analyses.
  • Well-managed and accessible data allows others to validate and replicate findings.
  • Research data management facilitates sharing of research data and, when shared, data can lead to valuable discoveries by others outside of the original research team.

Canada's Open Data Principles

The Government of Canada provides a list of standards for open data, which can be generally applied to most types of datasets. These include:

Datasets should be as complete as possible, reflecting the entirety of what is recorded about a particular subject. All raw information from a dataset should be released to the public, unless there are Access to Information or Privacy issues. Metadata that defines and explains the raw data should be included, along with explanations for how the data was calculated.

Datasets should come from a primary source. This includes the original information collected by the Government of Canada and available details on how the data was collected. Public dissemination will allow users to verify that information was collected properly and recorded accurately.

Datasets released by the Government of Canada should be made available to the public in a timely fashion. Whenever feasible, information collected by the Government of Canada should be released as quickly as it is gathered and collected. Priority should be given to data whose utility is time sensitive.

Datasets released by the Government of Canada should be as accessible as possible, with accessibility defined as the ease with which information can be obtained. Barriers to electronic access include making data accessible only via submitted forms or systems that require browser-oriented technologies (e.g., Flash, Javascript, cookies or Java applets). By contrast, providing an interface for users to make specific calls for data through an Application Programming Interface (API) make data much more readily accessible.

Machines can handle certain kinds of inputs much better than others. Datasets released by the Government of Canada should be stored in widely-used file formats that easily lend themselves to machine processing (e.g. CSV, XML). These files should be accompanied by documentation related to the format and how to use it in relation to the data.

Non-discrimination refers to who can access data and how they must do so. Barriers to use of data can include registration or membership requirements. Datasets released by the Government of Canada should have as few barriers to use as possible. Non-discriminatory access to data should enable any person to access the data at any time without having to identify him/herself or provide any justification for doing so.

Commonly owned standards refer to who owns the format in which data is stored. For example, if only one company manufactures the program that can read a file where data is stored, access to that information is dependent upon use of that company's program. Sometimes that program is unavailable to the public at any cost, or is available, but for a fee. Removing this cost makes the data available to a wider pool of potential users. Datasets released by the Government of Canada should be in freely available file formats as often as possible.

The Government of Canada releases datasets under the Open Government Licence – Canada agreement. The licence is designed to increase openness and minimize restrictions on the use of the data.

The capability of finding information over time is referred to as permanence. For best use by the public, information made available online should remain online, with appropriate version-tracking and archiving over time.

The Government of Canada releases the data on the Open Government site free of charge.

This is a paragraph.

Research Ethics

The Sheridan Research Ethics Board (SREB) is responsible for granting approval to prospective researchers, monitoring projects, facilitating amendments, and accommodating appeals of previous Board decisions. Visit the SREB website for information regarding the SREB process, forms & templates, FAQs, & more.