# Packages¶

Pangeo is not one specific software package. A Pangeo environment is made of up of many different open-source software packages. Each of these packages has its own repository, documentation, and development team.

Note

Many people and organizations were involved in the development of the software described on this page. Most of them have nothing to do with Pangeo. By listing software packages on this website, we are in no way claiming credit for their hard work. Full information about the developers responsible for creating these tools can be found by following the links below.

## Pangeo Core Packages¶

Because of the Technical Architecture of Pangeo environments, these three core packages are considered essential to the project.

### Xarray¶

Xarray is an open source project and Python package that provides a toolkit for working with labeled multi-dimensional arrays of data. Xarray adopts the Common Data Model for self- describing scientific data in widespread use in the Earth sciences: xarray.Dataset is an in-memory representation of a netCDF file. Xarray provides the basic data structures used by many other Pangeo packages, as well as powerful tools for computation and visualization.

### Iris¶

Iris seeks to provide a powerful, easy to use, and community-driven Python library for analysing and visualising meteorological and oceanographic data sets.

With Iris you can:

• Use a single API to work on your data, irrespective of its original format.
• Read and write (CF-)netCDF, GRIB, and PP files.
• Easily produce graphs and maps via integration with matplotlib and cartopy.

Iris is an alternative to Xarray. Iris is developed primarily by the UK Met Office<http://www.metoffice.gov.uk/> Analysis, Visualisation and Data team (AVD).

Dask is a flexible parallel computing library for analytics. Dask is the key to the scalability of the Pangeo platform; its data structures are capable of representing extremely large datasets without actually loading them in memory, and its distributed schedulers permit supercomputers and cloud computing clusters to efficiently parallelize computations across many nodes.

### Jupyter¶

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages. Jupyter provides the interactive layer to the Pangeo platform, allowing scientists to interact with remote systems where data and computing resources live.

## Pangeo Affiliated Packages¶

There are many other python packages that can work with the core packages to provide additional functionality. We plan to eventually catalog these packages here on the Pangeo website. For now, please refer to the Xarray list of related projects.

## Guidelines for New Packages¶

Our vision for the Pangeo project is an ecosystem of mutually compatible Geoscience python packages which follow open-source best practices. These practices are well established across the scientific python community.

### General Best Practices for Open Source¶

1. Use an open-source license. See Jake VanderPlas’ article on licensing scientific code or these more general guidelines
2. Use version control for source code (for example, on github)
3. Provide thorough test coverage and continuous integration of testing
4. Maintain comprehensive Documentation
5. Establish a code of conduct for contributors

The open-source guide provides some great advice on building and maintaining open-source projects.

### Best Practices for Pangeo Projects¶

To address the needs of geoscience researchers, we have developed some additional recommendations.

1. Solve a general problem: packages should solve a general problem that is encountered by a relatively broad groups of researchers.
2. Clearly defined scope: packages should have a clear and relatively narrow scope, solving the specific problem[s] identified in the point above (rather than attempting to cover every possible aspect of geoscience research computing).
3. No duplication: developers should try to leverage existing packages as much as possible to avoid duplication of effort. (In early-stage development and experimentation, however, some duplication will be inevitable as developers try implementing different solutions to the same general problems.)
4. Consume and Produce Xarray Objects: Xarray data structures facilitate mutual interoperability between packages.
5. Operate Lazily: whenever possible, packages should avoid explicitly triggering computation on Dask objects.