Chris Turner, who has been working on an array of National Science Foundation-funded data science projects in Alaska for many years, recently spoke with West Hub science writer Kim Bruch about his management of environmental systems data, strategies on implementing FAIR (Findability, Accessibility, Interoperability, and Reusability) data principles, and how the COVID-19 pandemic has impacted his work. Chris is a data librarian in Anchorage, Alaska, at Axiom Data Science, which is an informatics and software development firm that develops cyberinfrastructure in support of a variety of federal, private, academic and non-governmental organizations conducting research and monitoring systems in the ecological, geological and ocean sciences.
KB: What type of big data projects have you been working on lately and how has the pandemic impacted your work?
CT: One of my biggest data management projects funded by the National Science Foundation is called NGA LTER, short for Northern Gulf of Alaska (NGA) Long Term Ecological Research (LTER), which is one of 28 sites throughout the U.S. on an overall network with an array of collected data. Our LTER site is focused on the ecosystem features and processes that drive production and foster resilience in the Northern Gulf of Alaska – this includes seasonal and decadal variability in the environmental, oceanographic, and biological components of the NGA ecosystem, and the marine ecosystem’s response to those variations. One of my tasks is to work with all the NGA LTER researchers, as well as their technicians and students, to ensure that the site data is as FAIR as possible now, and to plan for how we can make the data even FAIR-er in the future. We are working really hard right now on standardizing incoming data and the associated metadata so that it meets those FAIR principles. This is all aimed at making the data more immediately reusable, and ensuring that it remains understandable and usable in the future. Ultimately, this data will be available through repositories in the DataOne network, where we hope it will be available and understandable forever. For-ev-er.
KB: Does your work involve other publicly available datasets or mostly sensor-collected data reflecting ocean temperature, acidity, depth, and that sort of information?
CT: Yes, we handle lots of publicly available data, from in situ sensors, satellite imagery, project- and cruise-based sampling, etc. Starting a few years ago, Axiom became the Data Assembly Center (DAC) for the Integrated Ocean Observing System (IOOS) Animal Telemetry Network. As such, it’s our job to create data products and identifiers such that FAIR principles are followed for all of the complicated and related telemetry data products. We also manage lots of more conventional dataset from field scientists, like field observations of birds and marine mammals or nutrient and plankton abundance from water samples. Working with all of these different data types, it’s challenging and interesting, to create data management processes and data systems that responsibly handle the data throughout its lifecycle while incorporating the FAIR principles. Just before the pandemic, I attended a FAIR workshop that was hosted by West Hub and South Hub, and it was helpful to learn from other scientists and librarians about how they have adopted FAIR principles in their data management work.
KB: You mention the Advancing FAIR and GO FAIR in the USA Workshop that you attended last year. Participants gathered from across the nation – including you. What was your biggest takeaway from the workshop?
CT: For starters, that was the last ‘normal’ meeting that I attended prior to the current pandemic situation, so I have vivid memories of the two-day workshop and interactions that I experienced there at Georgia Tech. I think I have two core takeaways from that workshop. The first, which is really a reinforcement of something I think we’d all learned through our work, was the need to develop relationships with the working scientists or modellers who collect or generate the data that we want to make FAIR. Sometimes it is difficult to form initial trust in communities when you are asking them to consider changing the way they format or manage their data. In this case, to adhere (or aspire) to the FAIR principles, it’s also necessary to be able to make the case for FAIR-ness and develop that trust with the program funders, managers, and administrators who ultimately need to approve of any new requirements for data and metadata deliverables. The second big takeaway for me was the huge and essential component that semantic data and metadata plays in making these resources truly FAIR. Because the infrastructure and broad community adoption of semantic aren’t as widespread as we’d need them to be for widely accessible FAIR data systems, I hadn’t planned much of our data and metadata around semantic triples. The workshop helped clarify for me that in order for data to be truly FAIR, there needs to be a rich semantic component, rich enough for machine understandability. I still think that’s a long way off, but having recognized the need, it’s something that we’ve begun planning for and building into our data and metadata improvements.
KB: Would you be interested in presenting your data management tactics that apply FAIR principles with others in an upcoming workshop?
CT: Probably! I am always eager to share information with (and learn from) others in the science data management community, and I’m thankful to the West Hub, South Hub, and GO FAIR teams for the great workshop last year. Let’s hope we can reconvene sometime in the near future to update one another on our work with FAIR efforts.
About the West Big Data Innovation Hub:
The West Big Data Innovation Hub is one of four regional hubs funded by the National Science Foundation (NSF) to build and strengthen strategic partnerships across industry, academia, nonprofits, and government. The West Hub community aims to catalyze and scale data science for societal needs – connecting research, education, and practice in thematic areas such as natural resources and hazards, metro data science, health, and data-enabled discovery and learning. Coordinated by UC Berkeley’s Division of Computing, Data Science, and Society, the San Diego Supercomputer Center, and the University of Washington, the West Hub region includes contributors and data enthusiasts from Alaska, Arizona, California, Colorado, Hawaii, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, Wyoming, and a global network of partners.
West Big Data Innovation Hub: westbigdatahub.org
National Science Foundation: www.nsf.gov/
The Big Data Innovation Hubs: bigdatahubs.org