Skip to main content

USE CASE 3 - Ocean


Observing the ocean is challenging: missions at sea are costly, different scales of processes interact, and the conditions are constantly changing. This is why scientists say that “a measurement not made today is lost forever”. For these reasons, it is fundamental to properly store both the data and metadata, so that access to them can be guaranteed for the widest community, in line with the FAIR principles: Findable, Accessible, Interoperable and Reusable.
 

GOALS

The data area testing case titled “Ocean” aims to achieve three main goals:

  1. The improvement of long-term stewardship of marine insitu data. The SeaNoe service allows users to upload, archive and publish their data, to which a permanent identifier (DOI) is assigned so the dataset can be cited and referenced. Efforts will be articulated around scalability, the exchanges between data centres in charge of related data types and the protection of long-time archives. The long-tail data (measurements acquired more randomly, e.g. during a scientific cruise or manual work) are of particular interest.
  2. The improvement of data storage for services to users. The goal is to provide users with (1) fast and interoperable access to data from multiple sources, for visualisation and submitting purposes; (2) parallel processing capabilities within dedicated high performance computing, using, for example, Jupyter notebooks or the PANGEO software ecosystem.
  3. Marine data processing workflows for on-demand processing. The objective is that users can access data, software tools and computing resources in a seamless way to create added-value products, for example quality controlled, merged datasets or gridded fields. This path to achieve these objectives is led by IFREMER, together with Europe’s leading research groups in ocean studies, such as the Université de Liège, MARIS, CNRS, CSC and the Finnish Environment Institute, with the coordination of CINES, the leading HPC centre in France.

 

FUTURE PLANS

In order to fulfil the scientific goals of the use case, the work plans are mostly focused on technical developments and the implementation of tools. In particular, the tools related to the long-term archiving of both data and metadata and the storage and archiving of large salinity datasets from in situ (SeaDataCloud) and from satellite (SMOS mission) have to be developed or improved. The team has listed different topics as part of their working plan for the forthcoming months, also providing the corresponding proposed solutions.

  • Service scalability: Due to technical limitations, the maximum allowed size for data upload is presently 0.2 Tbytes.
  • Solution: use of other protocols, which can be asynchronous, for example, Virtual File Systems. Sharing the allocation of necessary storage resources from different infrastructures.
  • Back-office exchanges: In order to make long-tail data available in data collections, many exchanges (performed manually at present) between the involved Data Centres are necessary.
  • Solution: implementing iRODS (Integrated Rule-Oriented Data System) data flows to automate these exchanges and make them more efficient.
  • Securing long-term archive: Data Centre infrastructures for data archiving are not always suitable for long-term archiving and dedicated staff are not always available.
  • Solution: rely on professional long-term repositories instead and distribute dataset storage across different geographically distributed repositories. This could be achieved by using, for instance, iRODS data flows.
  • Fast access to datasets: In situ datasets are made available among a wide range of systems, making the assembly of multidisciplinary datasets more difficult for users.
  • Solution: use a working copy data, called a technical cache or Data Lake, with a suitable structure in order to speed up and facilitate data processing. Data Lake will be periodically synchronised with the Data Centres and Data publication services.
  • On-demand processing: Using specialised tools requires the installation of software and the availability of computing resources. The former can be time-demanding for users.
  • Solution: deployment of the DIVAnd interpolation software tool (Deliverable 6.3.1) in a virtual machine in order to provide a significant improvement on what researchers or data experts typically have access to from their office.


Learn more about the PHIDIAS use case 3 "Ocean"

Events/Webinars:

News:

Get in touch with us

The 30-months project NGIatlantic.eu will push the Next Generation Internet a step further by providing cascade funding to EU-based researchers and innovators in carrying out Next Generation Internet related experiments in collaboration with US research teams.




contact action add button