5 Reproducibility and data management
en/fair-principles/" target="_blank">FAIR (Findable, Accessible, Interoperable, Reusable) principles, which aim at ensuring that data is discoverable, accessible, interoperable and reusable. To achieve this, we used programming tools, including the programming language R and GitHub.
Programming languages, such as R R, offer a number of advantages over tools such as ArcGIS. They offer great flexibility, enabling the modification of whole projects much more efficiently, as well as the integration of new elements to a project without having to manually recompute a number of complex analytical steps. This flexibility is not limited to analysis, since all steps of a project, from the integration of raw data to the production of this report, is integrated and readily modifiable with the appropriate computational skills. This facilitates the integration of comments or new recommendations arising from engagement processes.
We also used GitHub, a version control tool for the documentation, quality control, and the full history of programming modifications relevant to the entire project. We created a public repository entitled nceadfo, which serves as the research compendium of the assessment, i.e. the collection of all parts of the research project including text, figures, data, and code that ensures the reproducibility of the assessment. A detailed description of the assessment’s research compendium structure is available in Appendix 3 and on the GitHub repository webpage nceadfo.
Although data are not included directly in the nceadfo research compendium, it contains all the resources necessary to access, transform, and prepare the data to perform the assessment. This is done through the pipedat R Package, an experimental tool that is currently in early developmental stages. The pipedat package provides analytical pipelines to access, load, and format a variety of data from multiple sources programmatically. The intent of this package is to facilitate building the integrated datasets necessary to perform ecosystem-level assessments such as cumulative effects assessments and marine spatial planning. This package is currently under development and only contains the necessary pipelines to access the raw data that are used or were considered for this assessment. Future iterations of this package should provide many more functionalities. In its current form, it can only be used for this particular assessment. While data are not included directly in nceadfo, all the metadata and bibliographic files related to data used are available so that a user may know which data are used.
All analytical pipelines used to access the datasets used for this assessment are identified by a 8 digit unique identifier randomly generated by the pipedat
package of the form name_of_data-########
; these unique identifiers are available in Appendix 2 along with a list of all datasets used for this assessment. All data and metadata included in the assessment’s research compendium use these unique identifiers to reference the databases used. In total, 30 databases were used for the cumulative effects assessment and are accessible through the research compendium. See Appendix 2 for more details on these databases and Appendix 3 for a list of organizations and experts for each dataset.