ushering in the future of past climates


Just because paleogeosciences study the past, doesn’t mean they need to stay stuck in it. Until recently, people were mostly using Excel as their main tool to store, share, and analyze paleoclimate data. While spreadsheets have their place in the paleo workflow, they also severely limit the interpretability of the underlying data by humans and machines, and only allow the most rudimentary data analysis to be carried out. In the third millennium CE, we should be able to do quite a bit better than spend 80% of our time data wrangling. Much of my recent and present work addresses these challenges by leveraging information technology.

Data Standards

The biggest hurdle to more integrative paleoclimatology is the lack of a well-accepted data standard. That involves three things:

  1. A common structure to hold the data & metadata

  2. A common language to label the contents of that structure

  3. A common set of practices of what data and metadata to report to enable reproducible research.

To develop a common structure, Nick McKay & J.E.G introduced a flexible data format for paleo data, called LiPD (pronounce “lipid”). LiPD provides a flexible structure that contains and describes any paleoclimatic or paleoenvironmental dataset, the metadata that describe the details and complexity of the data (at any level from observations to collections), as well as models that accompany the data and their output, such as age models and their ensemble output. This powers efficient, 21st century scientific workflows, and enables open science and reproducible research.

This is why LiPD has been used in multiple data-intensive PAGES working groups, including the 2k temperature project (PAGES 2k Consortium, 2017), and Iso2k. Being able to rely on consistently-structured data with rich metadata has greatly reduced the “time to science” for projects relying on the PAGES 2k database, such as a recent global temperature reconstruction intercomparison (Neukom et al., 2019), and the Last Millennium Reanalysis project. We expect similar benefits for a host of other paleo synthesis projects. For more details see LiPD’s website.

Modules of the Linked Earth Ontology and their use in representing paleotemperature observations from a marine sediment core.

To develop a common language, the LinkeEarth project proposed the first paleoclimate ontology, which supports the LinkedEarth platform. It is highly synergistic with NOAA’s PaST Thesaurus. For more information, see LinkeEarth.

To develop a common set of practices, we started a community initiative that culminated in a 135-author paper.

Paleoclimate analytics

A wide range of tools that leverage these emerging standards have been developed. This includes the LiPD Utilities, which provide basic functionality for reading, writing and querying LiPD data in R, Matlab and Python, and provides the base level functionality for more sophisticated packages, including GeoChronR and pyleoclim. A rich set of interactive, graphical, web-based tools for creating and modifying LiPD files has been created at The LiPDverse hosts several thousand LiPD-formatted datasets and provides basic visualization capabilities for each of them. CSciBox, an AI-powered tool for age modeling, uses LiPD as an input and output format. The upcoming PaleoCube project will further enhance these analytical capabilities.

An illustration of the Pyleoclim user interface, part of a growing collection of software tools to grease the wheels of exploring, visualizing and analyzing paleoclimate data.

With these tools in hand, the future of past climates looks quite bright. Our lab has a number of opportunities in this area, starting at the undergraduate level. Please contact us if you want to contribute. For more context, please watch this video: