Enabling Cloud-Based Paleoclimatology

The world is moving towards cloud-based workflows for nearly everything, and there is no reason that paleoclimate research should be left behind. To be sure, there are significant hurdles, with most data still living in disparate databases and the output of Earth System Models (ESMs) occupying .

The old paradigm of bringing data to scientists is unsustainable in the age of Big Data; in its place, the emerging paradigm is to bring scientists to the data. PaleoCube, recently funded by the National Science Foundation EarthCube program, will build on existing cyberinfrastructure (LinkedEarth, Pangeo, Project Pythia, Project Jupyter, emerging data standards and the scientific Python ecosystem to bring cutting-edge capabilities to the fingertips of climate scientists. We will make these tools accessible, interoperable with the Scientific Python Stack and build a large library of reproducible scientific workflows that unexperienced users can emulate and modify to serve their own purposes. In the process, PaleoCube will bring together scientists of diverse perspectives, educational backgrounds and levels, and computer literacy, to truly collaborate on an even playing field. PaleoCube aims to revolutionize interdisciplinary work in the climate sciences, bringing paleoclimatologists closer to the core of climatology, helping close a longstanding divide.

A user’s view of PaleoCube. Users will interact with PaleoCube through a simple web-browser, allowing them to access Pythia’s educational materials and research-grade workflows, which they will be able to run and modify via a dedicated JupyterHub connected to cloud storage of relevant ESM simulations. The project rests on data standards and an extensive stack of cloud-aware open-source software built on them, which will bring users to data that are in an analysis-ready, cloud-optimized form, so the experience is smooth and largely transparent to them.

PaleoCube will extend and develop cyberinfrastructure for the climate sciences, enhancing the computer literacy of the geoscientific workforce. In turn, this will pave the way for a deeper embedding of reproducibility into geoscientific research, truly harnessing the data revolution. PaleoCube will ensure that all scientific output generated by the LinkedEarth Community is Findable, Accessible, Interoperable, and Re- usable (FAIR).

By providing easy and free access to interactive computing at scale, coupled with didactic examples and hackathons, PaleoCube will broaden participation in the geosciences to under-represented groups, enriching the STEM pipeline, and providing transferable data science skills to geoscience practitioners and enthusiasts.

PaleoCube is led by Deborah Khider in collaboration with J.E.G. and Nick McKay. For more details, see this blog post. If you are interested in participating, get in touch.