Abstracts

Presentations in Concurrent Sessions

(in alphabetical order by the last name of the first author)

Improving metadata quality of the European Social Survey using DDI

Benjamin Beuster (NSD – Norwegian Centre for Research Data) (to schedule)
Track: Reusing and Sharing Metadata – Session Type: Short Presentation – Project Report

The first edition of round 9 data and documentation from the European Social Survey (ESS) will be released in October 2019. The metadata is in great part produced with DDI based tools. The data is published both in DDI-Codebook (NESSTAR) and DDI-Lifecycle (QVDB/Colectica) making the datasets more accessible for researchers and facilitating meaningful interpretation and use.

In particular, this presentation describes how DDI is used in the production of ESS9 data and documentation. This includes practical examples like the production of ESS Codebooks and Data Setup files based on DDI3.2 XML and it illustrates how metadata components like variables and codelists are reused over time (ESS1 to ESS9).

Opportunities and Challenges in Moving from DDI-Codebook to DDI-Lifecycle: the case of the Generations and Gender Programme

Arianna Caporali (French Institute for Demographic Studies (INED)) (to schedule)
Track: Data Harmonization – Session Type: Short Presentation – Community Impact

In the context of cross-national and longitudinal surveys, it is key to document the comparability of the data across countries and subsequent waves. It is also important to provide metadata on the procedure to harmonize the data between countries. DDI-Lifecycle is specifically suited to document this type of surveys. What are the limits if DDI-Codebook is used for these surveys? What are the challenges to implement DDI-Lifecycle?

This paper presents the case of the Generations and Gender Programme (GGP), a cross-national and longitudinal demographic survey run in 20 countries in Europe and beyond. GGP is based on post hoc data harmonization and uses DDI-Codebook and the software Nesstar to document the datasets. After an overview of the arguments in favor of DDI (DDI-Lifecycle in particular) for documenting cross-national longitudinal surveys, we present the GGP methodology. We describe the procedure to document with DDI-Codebook GGP data harmonization and data comparability across countries and waves, its strengths and its limitations. We then report on a test we did of DDI-Lifecycle, and we discuss how its implementation could enhance data documentation in the GGP (e.g., through Conceptual and Represented Variables). We conclude with some reflections on the challenges linked to such development.

Challenges and benefits of implementing the CESSDA Core Metadata Model in an existing environment

Alina Danciu (Center for Socio-Political Data, Sciences Po Paris), Alexandre Mairot (Center for Socio-Political Data, Sciences Po Paris) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Short Presentation – Project Report

Providing access to research Social Science data and metadata is one of the missions of the Center for Socio-Political Data (CDSP), a service unit of Sciences Po and of the French National Center for Scientific Research (CNRS). The CDSP is member of PROGEDO, an infrastructure for SSH Data, which is the French CESSDA provider.

The CDSP is currently reviewing its documentation practices and dissemination tools, in order to make its data FAIRer and more usable. This dynamic, that began a few years ago with a DDI-L experimentation process, is continuing with the adoption of a Dataverse for the dissemination of our data and metadata.

In parallel, we are working on a DDI-L question bank that should be harvested in the future European EQB. In this evolving environment, the European experts’ recommendations have been of precious help to us. The CESSDA Metadata Model and the Controlled vocabularies of the DDI Alliance are two of the resources that we are using to improve our processes and to be harvested in the CESSDA data catalogue.

What do you hear when I say data curation? A study on data curation perceptions and practices in France

Alina Danciu (Center for Socio-Political Data, Sciences Po Paris), Nicolas Sauger (Center for Socio-Political Data, Sciences Po Paris) (to schedule)
Track: Other – Session Type: Regular Presentation – Community Impact

The secondary use of research data in scientific research is more and more common today in France. In parallel, the value of research data curation is increasingly recognised.

New data services are emerging rapidly as the legislative context is changing, strongly inciting both researchers and institutions to share data. Different professionals are dealing with research data curation: data managers, archivists, librarians…. Some have a “traditional” role, others are newer to this field.

To understand the current state of SSH data curation in France, we plan to conduct a quantitative survey and question these professionals on their perception on data curation. Our assumption is that, along with the national context, the educational and professional background of the data curators as well as their perceptions influence the data “business model” in France.

In parallel, we plan to perform an analysis of online resources and institutional policies of the French actors dealing with research data. Also, how is the French community adhering to the best practices provided by actors like CESSDA, DDI Alliance or RDA? Is the French curation model a “disciplinary” and/or an institutional” model? Or do we have a larger, more “universal”, model?

This paper will present the outcome of this research.

Facilitating metadata needs for multiple research domains

Johan Fihn Marberg (SND – Swedish National Data Service), Sara Svensson (SND – Swedish National Data Service), Sofia Agnesten (SND – Swedish National Data Service) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Regular Presentation – Community Impact

Different research domains have different needs. When talking to researchers and designing applications for them to use, you very soon realize you need to adapt your vocabulary to the domain the researchers belong to.

During spring 2019 SND launched five research domain specific metadata profiles (social science, earth and environment, language, medicine and health, and archaeology and history), to be used in our systems, as well as a common profile to be used for any research domain. Consideration was taken in regards to domain specific metadata standards, infrastructure needs from e.g. CESSDA and CLARIN, controlled vocabularies and ontologies, community specific metadata, data type specific metadata, as well as needs of community specific definition of metadata fields.

The profiles also include mapping between metadata standards used by the various research domains. For the Social Science domain DDI and the CESSDA CMM are used, while for others other metadata standards are better suited. Most fields in the common metadata profile are also present in the DDI-L standard.

This paper presents experiences from the project and the results of the metadata adaptations as well as explains the process getting to the end result. We will also present the draw-backs and delimitations done to be able to reach a compromise that worked for all research domains.

The Matter of Meta in Research Data Management: Introducing the CESSDA Metadata Office Project

André Förster (GESIS – Leibniz Institute for the Social Sciences), Kerrin Borschewski (GESIS – Leibniz Institute for the Social Sciences), Sharon Bolton (UK Data Service), Taina Jääskeläinen (Finnish Social Science Data Archive), Jeannine Beeken (UK Data Service) (to schedule)
Track: Reusing and Sharing Metadata – Session Type: Full Paper – Community Impact

The provision and maintenance of metadata – understood as data about (research) data – has a key role in contextualizing, understanding, and preserving research data within Research Data Management (RDM). Acknowledging the importance of metadata in the social sciences, the Consortium of European Social Science Data Archives started the Metadata Office project (MDO) in 2019. This presentation covers the various activities and impact of the MDO, including metadata models, controlled vocabularies and thesauri, and introduces plans on how the project may develop. The MDO collaborates with the DDI Alliance on multilingual translations of DDI vocabularies for CESSDA Service Providers and provides communication, training and advice on metadata and DDI use across CESSDA.

DDI vocabularies and their language versions

Taina Jääskeläinen (Finnish Social Science Data Archive), Sharon Bolton (UK Data Service) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Regular Presentation

Controlled vocabularies (CVs) promote the consistency of metadata across organisations and languages. We will present the controlled vocabularies maintained by the DDI Alliance.

The presentation discusses the benefits of using the DDI vocabularies and how they are used in the Consortium of European Social Science Data Archives, CESSDA. The challenges relating to controlled vocabulary elements in cross-national data catalogues are also discussed.

CESSDA Vocabulary Service for managing vocabulary content

Taina Jääskeläinen (Finnish Social Science Data Archive), Sharon Bolton (UK Data Service) (to schedule)
Track: Software / Tools – Session Type: Regular Presentation – Community Impact

As a result of strong collaboration between CESSDA and the DDI Alliance, the CESSDA Vocabulary Service (CVS) was launched this year. The CVS allows searching, browsing and downloading controlled vocabularies in a variety of languages. The CVS is a useful and timely resource that promotes international standardisation by providing a source of controlled vocabularies to suit varied metadata systems.

This presentation demonstrates how users can browse, create, edit and translate controlled vocabularies in the CVS. In addition, we will discuss how the tool manages user access, version control and enables the management of organisation-specific vocabularies as well.

Metadata Disclosure Framework – open science vs privacy

Jon Johnson (CLOSER, UCL, Institute of Education) (to schedule)
Track: Privacy and Access Control – Session Type: Regular Presentation – Scientific Method

Archives have been at the fore front of the provision of accessible data, in particular in the social sciences and have over the last 50 years developed robust governance mechanisms to guard privacy of individual data.

This has led to a comparable typology across archives of open (no restrictions), licenced (institutional based licences) and restricted (access via secure settings). A ‘safety first’ principle has meant that availability of metadata for these restricted datasets has been minimal, even though for individual variables they may not be (and often are) not disclosive.

The presentation will offer up a framework that could be actioned in both DDI-Codebook and DDI-Lifecycle elements (or by extensions to the existing standards) to allow the publication of variable statistics that meets the demands of open science and the FAIR principles, but respects the privacy and disclosure requirements of data providers.

DDI Lifecycle 3.3 – Public Review – Question and Answers

Jon Johnson (CLOSER, UCL, Institute of Education), Wendy Thomas (Minnesota Population Center, University of Minnesota) (to schedule)
Track: Software / Tools – Session Type: Report

DDI Lifecycle 3.3 will be out for Public Review during the Conference.

There will be a presentation on the new features introduced since the last version, and it is also an opportunity for delegates to ask questions and provide feedback during the Review period.

DDI based Questionnaire Editor

Claus-Peter Klas (GESIS – Leibniz Institute for the Social Sciences), Oliver Hopt (GESIS – Leibniz Institute for the Social Sciences), Sigit Nugraha (GESIS – Leibniz Institute for the Social Sciences) (to schedule)
Track: Software / Tools – Session Type: Regular Presentation – Community Impact

As presented in prior EDDI conferences (EDDI 2016 & EDDI 2017) as a theoretical tool & project, we are now able to roll out the questionnaire editor developed as web portal initially for the use case of the German national election studies (GLES).

The features of the questionnaire editor are:

  • Creation and editing of questionnaires with questions, question grids and free statements
  • Structuring order of questionnaires
  • Translation of questions, answers, interviewer instruction, filter statements and free statements into (European) languages
  • Workflow support through statistics on questions, e.g. from “not-worked-on” up to “in-field-questionnaire”
  • sophisticate administration of users and roles and role management
  • Simple discussion tool to discuss questions e.g. for question development
  • Version history of any change on questions
  • Native in DDI-L and export to Word files with templates
  • Direct indexing and searching within the European Question Bank (EQB)

The editor can be evaluated at: https://multiweb.gesis.org/labs/apps/qeditor/

  • Credentials user: eddi, pw: eddi2019

The process of creating an in-house interface for converting Blaise output into DDI metadata

Alexandre Mairot (Center for Socio-Political Data, Sciences Po Paris) (to schedule)
Track: Reusing and Sharing Metadata – Session Type: Regular Presentation – Project Report

ELIPSS (Longitudinal Internet Studies for Social Sciences) is a longitudinal probabilistic French panel. To release the 70 produced datasets, we had to deal with numerous issues, in a environment with limited resources and an important fragmentation of the tools: Blaise for the surveys, DDI-C for the documentation, SPSS for producing the datasets and SAS for the weights of the respondents. Also, heterogeneous survey programming and documentation practices had to be taken into account. Our primary goal was to favorise the reuse of data and metadata.

To handle the challenges of this project, it was necessary to imagine and implement new processes and tools that are adapted to the context and have the smallest impact on the existing environment to increase the quality and the efficiency of the data management. The solution we found was to create an interface, a liaison between all the tools for fluidizing the production and dissemination cycle of the datasets. The solution we conceived is based on the output and the input files of the software linked between them by a tool developed with R.

Our presentation will underline the building process of this solution. We’ll also give a feedback on how it performed during 5 years in our environment.

The challenges and opportunities of documenting longitudinal data and questionnaires

Hayley Mills (CLOSER, UCL, Institute of Education), Jon Johnson (CLOSER, UCL, Institute of Education) (to schedule)
Track: Reusing and Sharing Metadata – Session Type: Regular Presentation – Project Report

CLOSER brings together eight world-leading UK longitudinal studies to maximize their use, value and impact. A major output has been CLOSER Discovery, which currently holds metadata for; 250+ questionnaires, 300+ datasets, 100,000+ variables and 35,000+ questions. CLOSER Discovery allows users to search and browse this questionnaire and dataset metadata.

As these metadata are fully documented using DDI-lifecycle they can be further utilised to allow new data and metadata management possibilities going forward. For example; capturing summated scales or standardised measures allows users to find the original questions used, but also allows the generation of high quality documentation to accompany data sharing. Creating lists of questions and variables allows users to save them for planning, evaluation of their efficacy as well as reuse in new questionnaires.

New functionality in Discovery which makes these lists both persistent and identifiable allow them to be referenced in publications, working papers, and in multi-centre collaborations, increasing both efficiency and research transparency.

This presentation will report on these new and planned future features of CLOSER Discovery as well as the successes and problems faced in using the DDI-Lifecycle metadata standard to achieve these ambitions.

Data description with the DDI4 Core

Hilde Orten (NSD – Norwegian Centre for Research Data), Larry Hoyle (University of Kansas) (to schedule)
Track: Other – Session Type: Regular Presentation – Scientific Method

Whilst earlier version of DDI lets users document data from rectangular data files and NCubes, the DDI4 takes steps to handle a broader range of data structures as well as data from different sources.

Based on the DDI4 prototype released October 2018, the Modelling Representation and Testing (MRT) working group has this year been working to make core features of the DDI4 ready for production release. The DDI4 Core comprises conceptual components, data description and process.

This presentation sums up the main outcomes of the work of the MRT group regarding data description, and provides examples how the DDI4 Core can be used to document data from different structures and sources.

How individual research projects can benefit from adopting standardized metadata like DDI

Anja Perry (GESIS – Leibniz Institute for the Social Sciences), Oliver Watteler (GESIS – Leibniz Institute for the Social Sciences), Wolfgang Zenk-Möltgen (GESIS – Leibniz Institute for the Social Sciences), Arofan Gregory (Consultant), Barry T. Radler (University of Wisconsin-Madison) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Regular Presentation – Community Impact

To our knowledge, very few small and medium-sized research projects have adopted DDI so far. Challenges to adoption include high initial costs to implement DDI as part of new project management workflows, as well as disruptions to the traditional working processes of researchers. Initial investments in DDI pay dividends in the long-term by facilitating secondary analysis of the data, for follow-up projects, for re-purposing of metadata, etc. Often these benefits are not immediately apparent to the researchers.

We present straightforward tools and approaches for small- and medium-sized research projects in implementing DDI throughout the data life cycle to realize benefits from the start. We examine the challenges and how DDI could realistically be used through integration in working practices. We also identify those areas where existing software could be improved to better support metadata standards and realize the benefits of their use.

We explore ways to make data management more efficient from the researcher’s perspective. DDI can be an effective tool, applied by researchers in small and medium-sized projects. In addition, knowledge of DDI can help these researchers in larger projects later in their careers, and strengthen the DDI community as a whole.

DDI 4 Core: describing and managing data for traditional and modern data platforms

Flavio Rizzolo (Statistics Canada) (to schedule)
Track: Other – Session Type: Regular Presentation – Project Report

Following the DDI 4 Prototype review, DDI 4 Core was launched early this year to address the feedback received and to include new requirements from an ever-evolving data space.

DDI 4 Core has a narrower scope than the DDI 4 Prototype had and an emphasis in short-term delivery. The specification, to be released early next year, will support data description in different formats, from traditional wide, unit data files to data cubes, key-value pairs and other advanced data structures prevalent in modern data platforms. It will also include conceptual aspects of variables and classifications as well as the ability to describe data management and lineage in real-world use cases.

DDI 4 Core is a production-ready version with an XML representation, to enable interdisciplinary, cross-domain and interoperable metadata-driven solutions for complex data processing, like those existing in National Statistical Offices and other government agencies.

This presentation will provide an overview of the specification to date and a vision for the future.

 

Implementation of DDI in Statistics Estonia

Aivi Saar, Kaia Kulla (Statistics Estonia) (to schedule)
Track: Official Statistics – Session Type: Regular Presentation – Project Report

According to Statistics Estonia’s strategy our goal is to produce high quality statistics with as low administrative burden and as high efficiency as possible. For achieving this, it is essential to have metadata driven statistical production process.

At the beginning of 2018 Statistics Estonia (SE) started piloting Colectica software in search for a new metainformation system. Our goal was to test as much functionality as possible and to deliver the pilot report to the management.

In the early days of 2019 it was decided that Colectica is the suitable software for implementing DDI standard in SE. Currently we are using metadata management system that is based on Neuchâtel model. The main reason for choosing DDI is the lifecycle model that enables to version and reuse different metadata objects and monitor data lineage.

SE has decided to start the implementation of DDI step-by-step. The implementation process is divided into sub-projects. The first sub-project is managing statistical classifications in Colectica and developing website view through Colectica portal. The statistical classifications project began in April 2019 and the new classifications webpage should be live at the beginning of 2020.

Documenting the Questionnaire Design Process Using the Questionnaire Design and Documentation Tool (QDDT): Experiences from ESS Round 10

Luca Salini (European Social Survey ERIC HQ City, University of London) (to schedule)
Track: Software / Tools – Session Type: Regular Presentation – Project Report

The European Social Survey has developed over the years a carefully designed model for cross-national questionnaire design, employing a combination of qualitative and quantitative pre-testing strategies during the design process of each rotating module to try and achieve optimal comparability across countries.

The design and development process lasts for almost two years – from the appointment of the successful question module design team through to the release of the source questionnaire for the round. It incorporates expert review from members of the ESS Core Scientific Team as well as the national teams, alongside coding item characteristics to predict their validity and reliability using the Survey Quality Predictor (SQP), cognitive interviewing, advance translation and quantitative testing on omnibus surveys and in a two-nation pilot survey.

In all rounds between ESS Round 4 (2008) and ESS Round 9 (2018), the design and development of rotating modules has been fully documented through a purposely designed question module design template, which shows the concepts underpinning the design of the module as well as the wording of the questions included in the survey.

In ESS Round 10 (2020), for the first time the design and documentation of rotating modules is taking place within the Questionnaire Design and Documentation Tool (QDDT) developed under the SERISS project, allowing to document, version and reuse elements following the Data Documentation Initiative (DDI).

The presentation focuses on early findings, challenges and opportunities deriving from applying a structured metadata approach to the documentation of a complex multi-stakeholder questionnaire design process for an established cross-national social survey. It also showcases the latest developments of the QDDT and how they are being applied to the ESS Round 10 questionnaire design process.

Statistics Canada’s Metadata Driven Ecosystem

Farrah Sanjari (Statistics Canada) (to schedule)
Track: Official Statistics – Session Type: Regular Presentation – Project Report

With the rise of digital technologies, the manner by which national statistical agencies collect, manage and govern metadata – including how it is accessed and shared – must change. Metadata has always been a vital component at Statistics Canada but increasingly, it has become evident that the evolution of metadata standards across the Agency has become a fundamental enabler for the implementation of government wide solutions. As a national statistical agency, Statistics Canada is moving toward playing a greater role in defining standards, such as through its work with stakeholders in the private and public sector.

This presentation will explore Statistics Canada’s usage of DDI throughout the agency in relation to the agency’s use of the Generic Statistical Business Process Model. Numerous modernization initiatives across the Agency are in progress, such as Secure Managed Data Platforms, Modern Collaborative Services for Open Statistical Standards (MCSOSS) and Data Analytics as a Service (DAaaS), which are in turn expanding the reach of DDI and other standards. These initiatives are moving the organization one step closer to realizing the vision of a interoperable metadata driven ecosystem.

Enrichment of DDI support in the Dataverse data repository

Vyacheslav Tykhonov (Data Archiving and Networked Services (DANS)) (to schedule)
Track: Software / Tools – Session Type: Regular Presentation – Project Report

SSHOC Dataverse team is working on the extension of the Dataverse data repository with new tools intended for the DDI community. DDI explorer was fully integrated in the latest version of Dataverse and introduced to the CESSDA partners as a part of basic functionality that can be used both locally or deployed in the Cloud.

The new DDI converter tool was designed in order to help DDI community to migrate their datasets to Dataverse from other systems like NESSTAR and use DDI explorer features to browse data visually and do basic statistical analysis online.

The migration process relies on the XSLT mappings curated by DDI community and reused by the networked application that can be connected to any Dataverse instance.

Case study: ReShare, a mature implementation of DDI

Anca Daniela Vlad (UK Data Service) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Regular Presentation – Project Report

ReShare is the self-deposit data repository of the UK Data Service, holding over 1500 collections of data, spanning a wide disciplinary range of research assets.

Now in its 5th year, ReShare’s metadata profile is based on the DDI schema. It has become the primary publishing system for social science research data in the UK and over time has sought to continuously improve its publishing process. Areas of focus have been on enabling an easy-to-use interface and an intuitive workflow, and adding useful metrics for us as data publishers. The tool itself has also been an excellent opportunity to raise awareness of creating high quality sharable data, as well as helping in our mission to support and train researchers in good data management practices.

This presentation will look at ReShare, as a mature implementation of the DDI Schema (DDI2.5). It will cover its main features, such as various CESSDA controlled vocabularies, provide an overview of data collections processed to date, and possible future improvements.

Using Colectica Designer for questionnaire specification – challenges, progress and future plans

Catherine Yuen (Institute for Social and Economic Research (ISER), University of Essex), Nicole James (Institute for Social and Economic Research (ISER), University of Essex) (to schedule)
Track: Software / Tools – Session Type: Regular Presentation – Project Report

Understanding Society is the largest longitudinal household panel study of its kind and provides crucial information for researchers and policymakers on the changes and stability of people’s lives in the UK.

Every year, the households in the sample are either visited by an interviewer or complete the survey online. The questions asked cover a wide range of themes such as family life, education, employment, finance, health and wellbeing. Some questions and modules are asked every year, whilst others are asked occasionally.

The questionnaire has very complex routings and can have over 50 different modules each wave. Therefore, it was necessary to find a tool that would be able to cope with the questionnaires from a large longitudinal survey and enable us to specify those questionnaires in a standardised format (using DDI).

Colectica Designer is the software we have decided to adopt and this presentation focuses on the challenges we faced, the progress we have made so far and our future plans.

Towards maintenance and publication of thesauri in controlled vocabulary manager systems

Benjamin Zapilko (GESIS – Leibniz Institute for the Social Sciences), Tanja Friedrich (GESIS – Leibniz Institute for the Social Sciences), Behnam Ghavimi (GESIS – Leibniz Institute for the Social Sciences), Stefan Jakowatz (GESIS – Leibniz Institute for the Social Sciences), Reiner Mauer (GESIS – Leibniz Institute for the Social Sciences), Thomas Müller (GESIS – Leibniz Institute for the Social Sciences), Sigit Nugraha (GESIS – Leibniz Institute for the Social Sciences), Claus-Peter Klas (GESIS – Leibniz Institute for the Social Sciences) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Regular Presentation – Project Report

A thesaurus is a specific type of a controlled vocabulary which is used to index the content of e.g. publications and research datasets.

However, thesauri hold particular relations between terms like related terms and further attributes like scope and editorial notes which are often not part of general controlled vocabularies or systems to maintain them. Moreover, they may be organized in an additional systematic classification.

In this presentation, we present a thesaurus manager which addresses additional requirements of thesauri regarding controlled vocabulary manager systems in terms of technical aspects and necessary content-wise changes.

The thesaurus manager is based on the CESSDA CV manager and based on the DDI-FlatDB as storage developed by GESIS

Posters / Software Demonstrations

(in alphabetical order by the last name of the first author)

Penna – A Tool for Collecting Qualitative Textual Data

Jukka Ala-Fossi (Finnish Social Science Data Archive (FSD)), Enna Raerinne (Finnish Social Science Data Archive (FSD)), Sirkku Seitamäki (Finnish Social Science Data Archive (FSD)), Matti Heinonen (Finnish Social Science Data Archive (FSD)) (to schedule)
Track: Software / Tools – Session Type: Poster/Software Demonstration – Project Report

Penna is a web application for collecting self-administered writings for research purposes. It was built at Finnish Social Science Data Archive (FSD) and has been in production since fall 2017. FSD offers Penna as a service for researchers to collect metadata and writings easily. FSD gets to archive the collected data. FSD and the researcher construct together a questionnaire which is a web form to be shared with possible respondents. The respondents fill the form and submit it. After the collection period is over FSD can export the questionnaire and the writings as a package that can be delivered to themselves and the researcher.

The exported questionnaires can be later imported to create new questionnaires. Additionally, a DDI Codebook of the questionnaire is included in the export package. The DDI file consists of questions/variables of the questionnaire and metadata about the collection. The file is compatible with DDI-C versions 2.1 and 2.5. The DDI file can be used as a basis for documenting the data.

This poster will present Penna’s features and technical details from the perspective of different types of users. We will also demonstrate the use of Penna.

Toward the construction of a data catalog using DDI-C in Japan

Makoto Asaoka (National Institute of Informatics), Yukio Maeda (Japan Society for the Promotion of Science / University of Tokyo) (to schedule)
Track: User Needs, Efficient Infrastructures and Improved Quality – Session Type: Poster/Software Demonstration – Project Report

In this poster, we introduce the activities of constructing a data catalog for the humanities and social science (HSS) in Japan.

Japan Society for the Promotion of Science (JSPS) started the project of data infrastructure for the HSS in 2018. One of the aims of this project is to build an infrastructure for sharing Japanese HSS research data among researchers across research fields and countries.

JSPS and National Institute of Informatics (NII) are trying to construct a data catalog that can retrieval SSH data owned by several research repositories according to the purpose of researchers. We collaborated with four research institutions that have distinctive SSH data, and surveyed the metadata used by each institution.

As a result, we adopted DDI-Codebook as common metadata for our data catalog from the viewpoint of machine-readable and interoperability. Currently, we are defining the mandatory items elements and controlled vocabulary for this metadata in consultation with each organization.

We plan to collect the metadata of each institution via OAI-PMH using JAIRO Cloud, the institutional repository service provided by NII and provide these metadata to international institutions in future.

When RDF is not enough – the gremlins in DDI4

Darren Stephen Bell (UK Data Archive) (to schedule)
Track: Open Data and Linked Open Data – Session Type: Regular Presentation – Scientific Method

The UK Data Archive has experimented with two serialisations of DDI4 for high-volume data, first with a “traditional” RDF approach and latterly with a directed property graph.

This presentation outlines some of the issues associated with RDF and Sparql when working with data at scale and contrasts it with property graphs and the Gremlin query language which can offer a more performant solution but also come with some trade-offs. The possible consequences for future bindings of DDI are also described, including options for property graph schema.

QVDB: A Metadata Portal for the ESS

Benjamin Beuster (NSD – Norwegian Centre for Research Data) (to schedule)
Track: Software / Tools – Session Type: Poster/Software Demonstration – Project Report

The Question Variable Database (QVDB) with the Colectica platform as technical backbone can be described as a system for storage and retrieval of metadata components. The overall aim of the QVDB is to serve the European Social Survey (ESS) in the work with specifying, documenting, versioning and disseminating survey data.

At this poster, staff from the ESS archive will be available to demonstrate the system and to discuss how the metadata best can be used in the production of survey data and documentation.

Crosswalk 4.1

Lynda Kellam (Cornell Institute for Social and Economic Research (CISER)) (to schedule)
Track: Software / Tools – Session Type: Poster/Software Demonstration – Project Report

CISER has been using Crosswalk 3.0 to create setup files and datasets in various formats for our legacy collection in ASCII format.

Through a simple Excel spreadsheet with variable information and location in the ASCII file, we can create outputs in SAS, SPSS, STATA, R, DDI 2.5 Codebook, and DDI 3.2 Study Unit quickly and efficiently.

For EDDI 2019, we have created Crosswalk 4.1. In addition to the original features, Crosswalk 4.1 will create DDI 3.2 Resource Package that incorporates variable level metadata and statistics. It will also create a SQL query that can be used to read the ASCII data and transform it into a database in Postgres.

In this poster we will show examples of the original process and demonstrate Crosswalk 4.1’s capabilities.

DDI Implementation Projects at SSJDA: Core Institution for Constructing Data Infrastructure for the Humanities and Social Sciences.

Takenori Konaka (Center for Social Research and Data Archives Institute of Social Science, The University of Tokyo) (to schedule)
Track: Other – Session Type: Poster/Software Demonstration – Project Report

Social Science Japan Data Archive (SSJDA) has been developing DDI implementation projects. We present ongoing projects and a future task in SSJDA. Now, users can access more than 1300 datasets and metadata, but those are not based on DDI format. In 2018, we are authorized by Japan Society for the Promotion of Science (JSPS) as a Core Institution for Constructing Data Infrastructure for the Humanities and Social Sciences.

We plan to make metadata in the DDI format (DDI 2.5). But we have some problems with DDI implementation projects. For example, the level of granularity of SSJDA metadata is lower than the DDI element. At the same time, I would like to integrate metadata in the DDI format into our existing metadata browsing and data download system and improve the process of data archiving work.

We like to discuss these projects and future issues with the EDDI19 participants.

The current situation of DDIR : R package to utilize DDI as personal tools for social research data analysis

Yasuto NAKANO (Kwansei Gakuin University) (to schedule)
Track: Software / Tools – Session Type: Poster/Software Demonstration – Project Report

‘DDIR’ is an R package which handles information in DDI format metafiles on R environment. Among several projects for same purposes, DDIR has been developed since 2014.

It contains import and export functions and related tools. An import function generates a tibble (data frame) from a DDI metafile and a raw csv file with related information (e.g. missing values, value labels, variable labels, question sentences). Variables with value labels could be imported as factor type variables. An exporting function generates a XML meta file in DDI format with necessary information (e.g. document description, study description, file description, data description). With this environment, we can utilize DDI as personal tools for social research data analysis projects.

In this version of DDIR, we internally code each functions in tidyverse manners, so that it improves its stability, speed, and memory consumption.

The Questionnaire Design and Documentation Tool (QDDT) – a DDI based tool for assisting questionnaire design teams in their work

Stig Norland (NSD – Norwegian Centre for Research Data), Luca Salini (ESS HQ, City University of London), Hilde Orten (NSD – Norwegian Centre for Research Data) (to schedule)
Track: Software / Tools – Session Type: Poster/Software Demonstration – Project Report

Successful large-scale international survey projects often put lots of effort into the questionnaire design and development process.

The Questionnaire Design and Documentation Tool QDDT is a web-based tool, developed to assist questionnaire development teams in their work. Focusing on the process of developing concepts and questions for topical questionnaire modules, the tools allows for reuse of items over time, keeping track of their development history, as well as publishing of content at various milestones. The questionnaire design process of the European Social Survey is the primary use-case for the tool.

The conceptual model of the QDDT is based on the metadata standard DDI3.2, which facilitates reuse of items and interoperability with other tools.

This poster provides a live demonstration of the tool as of December 2019.

Tutorials

(in alphabetical order by the last name of the first author)

Let’s take a look at the DDI-documented paradata collected by the information system of a longitudinal panel!

Geneviève Michaud (Center for Socio-Political Data, Sciences Po Paris), Baptiste Rouxel (Center for Socio-Political Data, Sciences Po Paris) (to schedule)
Track: Software / Tools – Session Type: Tutorials or Workshop

The Center for Socio Political Data (CDSP) created the ELIPSS web panel in 2012. Its probability-based sample and scientific purpose made the project the first of its kind in France. Its mobile web dimension made it the first in Europe.

The CDSP’s IT Team conceived and developed several survey panel tools for ELIPSS. For example, to support multi-channel interactions between the panel members and the panel managers, a set of features managing phone calls, electronic and traditional mail, text messages, and in-app notification messages were developed and integrated to the ELIPSS application suite over time. These interactions were captured in a massive set of raw data.

We’ll begin the workshop with a brief tour of the ELIPSS in-house application system. One of the challenges we dealt with was an important HR turnover. We’ll give a feedback on the best practices we implemented in order to render the processes more fluid and assist the CDSP teams capture and release the data, DDI metadata and paradata of the project in due time. A DDI metadata creation exercise, to describe a paradata file, will end the first block of this session.

The workshop will continue with a paradata visualization experiment. As previously mentioned, the ELIPSS application suite recorded the online and offline interactions between the panel managers and the panelists. Thus, a massive set of raw data were collected. We’ll provide further insight on the curation and DDI-C metadata creation process we went through to manage a ready-to-use paradata set. We’ll then invite participants to join a collaborative Jupyter Notebook to analyze and visualize it.

No previous knowledge of the DDI standard is required to participate. This workshop is based on an experience of more than a decade of producing data and metadata collection tools and repositories at the CDSP.

Perspective

IT project and paradata management perspective.

Intended audience

No previous knowledge of the DDI standard is required to participate. Some basic coding skills would be appreciated, but not mandatory, for the second-block of the session.

Prerequisites

A laptop with Internet access is advised to take full advantage of the hands-on session, during which we’ll open a collaborative Jupyter notebook.

This workshop will be machine-learning and AI free!

What can DDI do for you? An introduction to the DDI

Hilde Orten (NSD – Norwegian Centre for Research Data) (to schedule)
Track: Other – Session Type: Tutorials or Workshop

Are you interested to learn about what DDI can do for your organization or institution? DDI is an international standard for describing data from the social, economic and behavioral sciences, currently moving into new fields. The standard contains metadata items that can be used to develop and document at different stages in the data lifecycle, from the first conceptualization through data collection, processing and dissemination and archiving.

This course provides an overview of the work products of the DDI Alliance. The conceptual basis of DDI will be described introducing the participants to the main building blocks and items of the standard. Practical examples on how DDI beneficially can be used in business processes of organizations and institutions that manage research data will also be shown.

The overall approach of the course is DDI version agnostic. The examples shown will however be based on specific DDI versions (DDI-Codebook, DDI-Lifecycle or the DDI4 Core under development).

Main focus will be put on the following areas:

  • Data description and variable management
  • Questionnaire design and implementation
  • Question and variable banking
  • Controlled Vocabularies
  • Making your data and metadata FAIR (Findable, Accessible, Interoperable and Reusable) using DDI

Beyond documenting a codebook…using DDI to support the management and processing of metadata

Wendy Thomas (Minnesota Population Center, University of Minnesota) (to schedule)
Track: Reusing and Sharing Metadata – Session Type: Tutorials or Workshop

Much of the focus of past workshops and documentation has been on the use of DDI to document a study via the creation of a codebook or on the production of questionnaires. However, most of the questions posed by users ask about the use of DDI to document

  • secondary sources such as administrative data or analysis results
  • link studies/data sets intellectually within an archive or library to facilitate discovery
  • use of geographic metadata to link across data sets and document geography down to the data item
  • record intended or actual processing activities
  • capture provenance from the level of the study down through the datum

This workshop will focus on these aspects of using DDI to support the management and processing of metadata. We will look primarily at DDI Lifecycle (including some exciting new features of DDI-L 3.3) as well as explore the extent to which these approaches can be applied to a collection of DDI Codebook instances. Attendees will receive a packet of recommended best practices and examples. Examples provided by attendees will be reviewed as time permits.

Generating DDI 4 in the Language of your Choice

Joachim Wackerow (GESIS – Leibniz Institute for the Social Sciences) (to schedule)
Track: Software / Tools – Session Type: Tutorials or Workshop

Model Driven Architecture for Metadata Standards in Practice

Half-day tutorial, target group: software developers

This tutorial provides the knowledge and skills on how a syntax representation of the DDI 4 UML model can be generated in any language. The focus is on the DDI 4 Core model, which is in many important respects independent from the social science domain and therefore suitable for cross-domain purposes (the public review of DDI 4 Core is planned for the beginning of 2020)

The first part of the tutorial gives an overview of the model-driven approach, the model itself, used UML modeling techniques, and the generation of representations in XML Schema (for XML instances) and in OWL (for RDF instances). It describes how the UML model can be used as basis to generate a syntax representation in a chosen language.

The second part provides time for exercises. Eclipse modeling tools are used for this purpose. Participants are encouraged to choose a language in which they wish to use DDI 4. You should bring your own laptop and be ready to install the required software (a detailed list will be provided) prior to the tutorial.

Agenda

  • Part I
    1. Overview
    2. Model-driven approach
    3. DDI 4 Core model – UML subset and model structure
    4. Generation of XML Schema for XML instances
    5. Generation of OWL Vocabulary for RDF instances
    6. Approach for the generation of any syntax representations
  • Part II
    1. Eclipse environment
    2. Exercises

Side Meetings

(in alphabetical order by the last name of the first author)

Dagstuhl Train the Trainers workshop follow-up meeting

Hayley Mills (CLOSER, UCL, Institute of Education), Jon Johnson (CLOSER, UCL, Institute of Education) (to schedule)
Track: Other – Session Type: Side Meeting

The goal of the Train the Trainers workshop held in 2018 was to build up more training capacity on DDI, enabling the new trainers to teach specific tutorials for their institution or general purpose tutorials for a broader audience.

The train the trainers working group was set up to build on the materials produced from the workshop and make these available for others to use.

This side meeting provides the opportunity for those who attended the Dagstuhl Train the Trainers workshop 2018 to discuss and feedback their experiences of delivering the training materials and will allow us to discuss the next steps.

CESSDA Technical Infrastructure workshop

John William Shepherdson (CESSDA ERIC) (to schedule)
Track: Software / Tools – Session Type: Side Meeting

Type: Closed meeting – invitation only

Morning session 4 hours (09:00 – 13:00)

  • deploying an application on the CESSDA Technical Infrastructure (from commit, via build and test in development and staging to deployment in the production environment)

Afternoon session 3.5 hours (13:30 – 17:00)

  • using the CESSDA public APIs;
  • distributed logging;
  • service to service communication.