Digital tools addressing the pain points scientists face
C4IR Ocean is proud to announce that two of our colleagues presented at the International Ocean Data Conference. Our senior Data Scientist Tara Zeynep Baris is one of the two.
The International Ocean data conference - The data we need for the Ocean we want, kicked off on the 14th of February. This Conference targets discussions around Global Ocean Data Ecosystem and how evolving user needs will shape our national, regional and global data systems.
Our senior data scientist Tara Zeynep Baris addressed the Progress and learnings from building a cloud-based architecture for Ocean Data (Ocean Data Platform). The Ocean Data Platform (ODP) was founded in 2019 to unlock ocean data and provide easy access to a wide range of users. (ODP will launch for General Availability the summer of 2022. )
The ODP is a cloud-based architecture, that mirrors ocean data and allows for transformation and contextualization of large amounts of data.
A development team of ten experienced software developers and data scientists have spent two years creating this platform for ocean data sharing. The platform includes built-in analytical and visualization tools, software development kits, and connection to a geospatial database via APIs. This architecture facilitates data ingestion and is currently only available on servers of different organizations and institutions.
“When building the ODP, we experimented with a multitude of big data tools. Some of these tools enabled us to create extremely fast and efficient data ingest pipelines, but the complexity was quite high leading to very long development cycles. Instead, we have opted for a framework enabling our data engineers and scientists alike to rapidly create and deploy data pipelines using familiar environments and tools. A system of data integration pipelines also checks external servers for updates to keep ODP data content up to date”
-Tara Zaynep Baris, Data Scientist.
The ODP mirrors a catalog of datasets, one of which is the World Ocean Database (WOD) in the Microsoft cloud platform, Azure. The ODP provides access to the data through the Ocean Data Connector -- a very powerful tool for exploration and analysis of WOD and OBIS data. This tool is powered by the Jupyter ecosystem, and Dask, an open-source library for parallel computing.
The platform supports a variety of hardware configurations, ranging from low-resource environments like a personal laptop, to high-resource environments with options of adding graphical processing units (GPUs) and other specialized hardware. The user is also able to create further compute-resources in the form of clusters–personalized on-demand supercomputers supported by Dask. Users will be able to share and collaborate real-time in the same environment, and all results will be reproducible and backed with full data lineage, meaning that replicating published results has never been easier.
“We hope that through the tools we have created, we can address a lot of the pain points that scientists face with access to relevant data so they can concentrate on the downstream value of this data”
-Tara Zeynep Baris
How is your organization playing a part in ensuring data access? We’d love to hear your strategy for sharing data.