Max Petzold (University of Gothenburg, Sweden): Open data in epidemiology
Managing research data as open and accessible as possible can provide more effective research with more and faster research results. Data that do not contain personal data and are not otherwise protected can be made searchable and freely available. Data that have some form of restriction, either legal regulation or through ownership, can be made searchable but must be handled in a special way when delivered. I will present three different contributions that can simplify working with open science in the latter case.
Infrastructures such as the Swedish National Data Service and our sister organizations globally work together to provide searchable, structured and compatible metadata directories where research data is provided with persistent identifiers. This, in combination with advice and secure handling and disclosure of sensitive data, provides the basis for our researchers to be able to meet FAIR criteria in this case as well.
When working with sensitive data it is preferred not to deliver the research data to any third party. By harmonizing the content of different databases, the content can be analyzed in federated analysis where only summary statistics are transferred from the individual database and are then weighted together centrally. This allows for advanced statistical analysis without the owner having to transfer research data to any other party but only contributing with harmonization and searchability.
However, some limitations in federated analysis exist, e.g. analysis based on ranking of observations is impossible. In cases when sensitive data need to be delivered to a third party there are methods to minimize the risks. This includes simple methods such as removal of direct identifiers, decoupling of variables, and of course deliveries of a minimal research data content. But, there are also more advanced methods to preserve confidentiality such as adding random variation, simulating data preserving relations, variability and levels, or only delivering residuals after modelling. However, these methods need to be further assessed and tested if confidentiality can be guaranteed.