Practical tips and lessons learned

For health data related research, development, and innovation projects

Data is the new oil, they say. This applies both for research and for business. However, using data as the oil requires some specific approaches in planning and executing your research, development and innovation projects and in disseminating its results. Especially so, if you are using personal or health data!

Here we have gathered some practical tips and lessons learned which our friends and we have learned along the way of conducting personal or health data-based research and development projects. The objective is to provide insights into matters that must be noted, useful links to further information, and practical ‘how-to’ solutions for solving challenges. Have a look, and do not hesitate to contact us, if you have any comments, questions, or need for more information!

Contents of this page:

It’s a multidisciplinary science
Research plan is the base for what you’re allowed to do, so take your time
Make sure you have a right to use the data and that it is adequate in terms of quality and quantity
Is the data usable for machine learning?
Combining data from several sources
Make sure the processing of data happens in a safe and secure manner and place
Commercializing research results

These tips are for informative purposes only. The contents should not be used for legal guidance or should not be considered as legal advice or as an interpretation of any existing legislation.


It’s a multidisciplinary science

When working with health data, you are going to need diverse expertise and roles in your research team. Medical expertise is needed to understand the needs and special features of the medicine and health domain, and data scientists, AI experts, and programmers to harness the power of the latest technologies of data-based research.



The research plan is the base for what you’re allowed to do, so take your time

Allow and allocate enough time and resources for preparing your research plan, as it defines the purpose and scope of data usage. Be clear of what kind of data you will need and are going to use – personal (health) data, patient record data, or data from one or more registries – as this affects the processes further on. Making major changes to the research plan during the research project is challenging, especially if the changes are related to matters that are described in the consent form, without major effort and delay to the project schedule. However, you can make other changes to the research plan, as long as they are documented, and you need to keep track of the research plan versions.

When you are preparing the research plan, use the data protection risk assessment template (found from Data Protection site, link below) to find how sensitive your situation is, and move further as the tool guides you to do. If needed, contact your faculty’s data specialist or data protection ombudsman.

More info for University of Oulu personnel (internal links):


Make sure you have a right to use the data and that it is adequate in terms of quality and quantity

Health data is always collected for a specific, explicit, and legitimate purpose. It’s critical that any changes in this purpose are documented and the research plan is updated and given version numbers. In addition to general laws and regulations on utilizing health data, note also possible requirements for data management by the funding partner of your research, for example, data management plan by the Academy of Finland.

If you are using data based on individuals’ consent, keep in mind that the consent form needs to include a description of the data processing (collection, processing, storing). Note that it is not possible to change the data processing or other aspect of data collection afterward without asking for a new consent.

The purpose of collecting and processing data must be written in such a way that the consent covers the data processing you are planning to do. For example, if there is a possibility that a third party processes the data, it must be mentioned in the consent form. On the other hand, the data processing description can’t be wider than necessary for the research.

If using and collecting personal data, make sure to consult GDPR Legal Expert when you are writing the consent forms for ensuring that you have everything essential concerning data collection and processing written in your form(s).

Here are examples of data sources, which require a permit to use for research:

Data source


Useful links

Open source data

Data’s license terms

Collected personal data

Consent (@GDPR law)

Patient register data or other social and health data in Findata’s datasets

Turku University Hospital patient register data

Permit from register controller, operator: TurkuCRC

Oulu University Hospital patient register data (also other university hospitals)

Permit from register controller, Managing Doctor (johtajaylilääkäri)

Private healthcare providers (e.g. Terveystalo, Mehiläinen,YTHS).

Findata: Data permit

Findata’s registries

Findata: Data request or data permit

Combined data from several public/private registries

Findata: Data request or data permit

Data in Kanta-data lake

Findata: Data permit

Please note that it may take a long time to get permits for data in patient registers (in the order of months).

More info:

More info for University of Oulu personnel:


Is the data usable for machine learning?

Using AI and machine learning methods in addition to, or instead of more traditional and statistical analysis tools, enables the use of more variables for analysis, can produce deeper insights and enriched results for your research.

Various things, such as the amount and quality of data and the quality of metadata, all affect data’s usability for machine learning tools. Use the Health Data Assessment for assessing your health data sources' quality and integrity.  

For example:

  • As a showpiece of good metadata, you may take a look at NHANES III data descriptions.
  • Machine-learning tools’ opportunities and strengths in research are demonstrated in Nature’s article Statistics versus machine learning. AI can bring results even if the data is skewed or has missing data.
  • Machine learning has also been used to predict acute kidney injury, and for example for many diagnostics assisting tools’ development.


Combining data from several sources

If you plan to use several data sources in your research, you have to describe all the sources in your research plan and you have to mention all the other data sources (and the data processing) also in the consent form, whenever you collect personal data. Please consult Data Specialist Expert or GDPR Legal Expert when you are writing the consent forms for ensuring that you have everything essential concerning data collection and processing written in your form(s).

When collecting data from several sources/registries, Findata is the health and social data permit authority that provides access to different data sources. Findata also provides a safe environment for combining the self-collected data sets. Data from a single register is either directly requested from and permitted by the controller of that specific register or from Findata.


Make sure the processing of data happens in a safe and secure manner and place

When collecting data, keep in mind that the data needs to be handled in a secure manner and place in all phases of the research. A secure place for processing data can be for example CSC ePouta service (in case of large data amount) or Findata’s processing platform (when you are combining data from several registries).

In general, health data cannot be used as such, but it needs to be anonymized so that it is not possible to identify the people behind the data. In some exceptional cases (if necessary for cleaning or combining datasets, or when not identifying the person can cause damage to their health or person, for example when genetic risk or pharmaceuticals research) pseudonymization is advisable.

Consider anonymizing the data if you do not need personal/pseudo ID-codes. If pseudonymization is needed, the code keys to returning the personal data have to be stored in a safe place and the access to it kept limited. All processes also need to be planned and documented properly. If you have any doubts, consult a specialist.

For example:

  • In one small-scale research case, the personal data was anonymized properly (anonymization must be documented well).  After proper anonymization, the data could be transferred to a third party for processing and research using standard agreement procedure, and GDPR regulation requirements were thus avoided.
  • is a company providing services for health data de-identification.
  • ARX is a comprehensive open-source software for anonymizing sensitive personal data.

More information for University of Oulu personnel:


Commercializing research results

If you are planning - or might end up - commercializing your research results, be aware of what exactly is the product you are commercializing, as your research plan or consent forms might affect this. If the result of your research plan - the product being developed - is not connected to any personal data, you should not have obstacles for commercialization. However, if the product itself contains data used in research, it is much more crucial to include the possibility of commercialization from day zero. So be aware of what your product is and what it contains.

Commercialization possibilities are affected also by research funding instruments, as the terms for public funding require the results to be public too. This generates constraints for protecting the IPR, which needs to be taken into account if planning to commercialize such research results.

For example:

  • Personal data could have been used to develop and teach a machine learning model, but the model itself does not contain any personal data, and thus it can be commercialized. The researcher moving forward with commercialization should collaborate with University Innovation Centre to move forward.

More information for University of Oulu personnel:

Last updated: 7.9.2020