Saturday, December 14th, 2024 (5 days ago)
Please join us for a Pangeo working meeting / hackathon on December 14 in Washington, DC! This event serves to bring together members of the Pangeo community to solve problems related to scientific research and programming.
Goals:
Zarr-Python 3 and the Icechunk storage backend are just about to be released. This breakout will focus on two topics:
Some knowledge of Zarr and Xarray will help you. But we can help anyone that knows Python and wants to get started with Zarr.
Come hack on the earthaccess python library to try to close some of the issues that are outstanding, and extend its functionality with the Pangeo ecosystem. Includes improving access patterns to mission data (Zarr and services). Bring your ideas, not limited to this!
A friendly mindset, and either engineering, documentation, or coordinating skills to support this work!
Set-up and installation of tooling, even really basic tooling like an IDE, is a barrier to use and learning. I have been working on making scientific Docker images (like Pangeo notebook) easier to use outside of JupyterHubs. I am a JupyterHub admin, but most people don't have access to one. This project shows some ideas for how to create links that 'spin' up a JupyterLab based image in Codespaces: https://nmfs-opensci.github.io/container-images/, but it is desperately slow. In this hack session, I'd like to work on other ideas for running Pangeo Docker images and test out these ideas. This might mean improving documentation (how-to) for running images locally or it might be working up (and debugging) running images in various VMs (like Google Cloud say). Or we could work on some bash scripts that users run that create an easier user start-up experience when running docker images locally. Or we try to get a pangeo-notebook running on a Google Workstation. We'll basically be working on this page: https://pangeo-docker-images.readthedocs.io/en/latest/howto/launch.html#how-to-launch-a-notebook-using-these-images.
Experience with running Docker images. Bash scripts or devcontainers. Or no experience and be a tester.
We’ll work on VirtualiZarr as a tool for creating cloud-optimized virtual datasets which point to pre-existing multidimensional scientific data.
Topic ideas:
Users just need a dataset they want to virtualize, developers ideally should have some familiarity with Zarr/Kerchunk/VirtualiZarr, though we will take anyone eager!
A definite breakout group, stay tuned!
Let's make the integration between xarray and GPU libraries (e.g. CuPy, Pytorch, JaX) more seamless!
Topic ideas:
We'll be working with GPU libraries, so it might be helpful if you have access to a GPU (on your laptop or a cloud/HPC server) to run things. If not, feel free to join still and we can work out something together (e.g. pair programming) as a group.
The Pangeo community strives to host inclusive, accessible events. To be respectful of those with allergies and environmental sensitivities, please refrain from wearing strong fragrances. To request an accommodation or for inquiries about accessibility, please contact Max Jones (@maxrjones).
All participants must abide by the Pangeo Code of Conduct.