USE CASE 3 - Ocean
Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing. This is why scientists say that “a measurement not made today is lost forever”. For these reasons, it is fundamental to properly store both the data and metadata, so that access to them can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Interoperable and Reusable.
The data area testing case titled “Ocean” aims to achieve three main goals:
- The improvement of long-term stewardship of marine insitu data. The SeaNoe service allows users to upload, archive and publish their data, to which a permanent identifier (DOI) is assigned so the dataset can be cited and referenced. Efforts will be articulated around scalability, the exchanges between data centres in charge of related data types and the protection of long-time archives. The long-tail data (measurements acquired more randomly, e.g. during a scientific cruise or manual work) are of particular interest.
- The improvement of data storage for services to users. The goal is to provide users with (1) fast and interoperable access to data from multiple sources, for visualisation and submitting purposes; (2) parallel processing capabilities within dedicated high performance computing, using, for example, Jupyter notebooks or the PANGEO software ecosystem.
- Marine data processing workflows for on-demand processing. The objective is that users can access data, software tools and computing resources in a seamless way to create added-value products, for example quality controlled, merged datasets or gridded fields. This path to achieve these objectives is led by IFREMER, together with Europe’s leading research groups in ocean studies, such as the Université de Liège, MARIS, CNRS, CSC and the Finnish Environment Institute, with the coordination of CINES, the leading HPC centre in France.
In order to fulfil the scientific goals of the use case, the work plans are mostly focused on technical developments and the implementation of tools. In particular, the tools related to the long-term archiving of both data and metadata and the storage and archiving of large salinity datasets from in situ (SeaDataCloud) and from satellite (SMOS mission) have to be developed or improved. The team has listed different topics as part of their working plan for the forthcoming months, also providing the corresponding proposed solutions.
- Service scalability: Due to technical limitations, the maximum allowed size for data upload is presently 0.2 Tbytes.
- Solution: use of other protocols, which can be asynchronous, for example, Virtual File Systems. Sharing the allocation of necessary storage resources from different infrastructures.
- Back-office exchanges: In order to make long-tail data available in data collections, many exchanges (performed manually at present) between the involved Data Centres are necessary.
- Solution: implementing iRODS (Integrated Rule-Oriented Data System) data flows to automate these exchanges and make them more efficient.
- Securing long-term archive: Data Centre infrastructures for data archiving are not always suitable for long-term archiving and dedicated staff are not always available.
- Solution: rely on professional long-term repositories instead and distribute dataset storage across different geographically distributed repositories. This could be achieved by using, for instance, iRODS data flows.
- Fast access to datasets: In situ datasets are made available among a wide range of systems, making the assembly of multidisciplinary datasets more difficult for users.
- Solution: use a working copy data, called a technical cache or Data Lake, with a suitable structure in order to speed up and facilitate data processing. Data Lake will be periodically synchronised with the Data Centres and Data publication services.
- On-demand processing: Using specialised tools requires the installation of software and the availability of computing resources. The former can be time-demanding for users.
- Solution: deployment of the DIVAnd interpolation software tool (Deliverable 6.3.1) in a virtual machine in order to provide a significant improvement on what researchers or data experts typically have access to from their office.
Learn more about the PHIDIAS use case 3 "Ocean"
- IMDIS 2021 turned into virtual, 12-14 April 2021
- Zoo & Phytoplankton EOV Products: Big data and machine learning methods to enhance biodiversity data
- PHIDIAS: Boosting the use of cloud services for marine data management, services and processing
- PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services
- Highlights & Takeaways: PHIDIAS at the IMDIS 2021
- Highlights & Takeaways: PHIDIAS webinar on Ocean Use case
- PHIDIAS: Continuing to boost cloud services for marine data management, services and processing
- Combining in situ and satellite measurements in oceanography
- Using HPC to combine marine environmental data from different sources
- MARIS' contribution to the Ocean use case
- Webinar “PHIDIAS HPC – Building a prototype for Earth Science Data and HPC Services”