Photo-z Server
Tutorial Notebook 0 - Introduction
Contact author: Julia Gschwend
Last verified run: 2024-Jul-22
The PZ Server
Introduction
The PZ Server is an online service available for the LSST Community to host and share lightweight photo-z (PZ) -related data products. The upload and download of data and metadata can be done at the website pz-server.linea.org.br (during the development phase, a test environment is available at pz-server-dev.linea.org.br). There, you will find two separate pages containing a list of data products each: one for Rubin Observatory Data Management’s official data products and the other for user-generated data products. The registered data products can also be accessed directly from Python code using the PZ Server’s data access API, as demonstrated below.
The PZ Server is developed and delivered as part of the in-kind contribution program BRA-LIN, from LIneA to the Rubin Observatory’s LSST project. The service is hosted in the Brazilian IDAC, not directly connected to the Rubin Science Platform (RSP). However, user authorization requires the same credentials as RSP. For comprehensive documentation about the PZ Server, please visit the PZ Server’s documentation page. There, you will also find an overview of all LIneA’s contributions related to the PZ production. The internal documentation of the API functions is available on the API’s documentation page.
How to upload a data product on the PZ Server website
To upload a data product, click on the button NEW PRODUCT on the top
left of the User-generated Data Products page and fill in the Upload
Form with relevant metadata. Alternatively, the user can upload files to
the PZ Server programmatically via the pzserver
Python Library
(described below).
The photo-z-related products are organized into four categories (product types):
Spec-z Catalog: Catalog of spectroscopic redshifts and positions (usually equatorial coordinates).
Training Set: Sample for training photo-z algorithms (tabular data). It usually contains magnitudes, errors, and true redshifts.
Photo-z Validation Results: The Results of a photo-z validation procedure (free format). They usually contain photo-z estimates (single estimates and/or PDFs) of a validation set, photo-z validation metrics, validation plots, etc.
Photo-z Table: This category is for the results of a photo-z estimation procedure. Ideally, the data should be in the same format as the photo-z tables delivered by the DM as part of the LSST data releases. If the data is larger than the file upload limit (200MB), the product entry will store only the metadata, and instructions on accessing the data should be provided in the description field. Storage space can be provided exceptionally for larger tables, depending on the science project justification (to be evaluated by IDAC’s management committee).
How to download a data product from the PZ Server website
To download a data product available on the Photo-z Server, go to one of the two pages by clicking on the card “Rubin Observatory PZ Data Products” (for official products released by Rubin Data Management Team) or “User-generated Data Products” (for products uploaded by the members of LSST community). The download button is on the right side of each data product (each row of the list). Also, there are buttons to share, remove, and edit the metadata of a given data product.
The PZ Server API (Python library pzserver
)
Installation
For regular users
The PZ Server API is avalialble on pip as pzserver
. To install
the API and its dependencies, type, on the Terminal:
$ pip install pzserver
! pip install pzserver
For developers
Alternatively, if you have cloned the repository with:
$ git clone https://github.com/linea-it/pzserver.git
To install the API and its dependencies, type:
$ pip install .[dev]
OBS: You might need to restart the kernel on the notebook to incorporate the new library.
Imports and Setup
from pzserver import PzServer
import matplotlib.pyplot as plt
%reload_ext autoreload
%autoreload 2
The connection with the PZ Server from Python code is done by an object
of the class PzServer
. To get authorization to define an instance of
PzServer
, the users must provide an API Token generated on the
top right menu on the PZ Server
website (during the development
phase, on the test
environment).
# pz_server = PzServer(token="<your token>", host="pz-dev") # "pz-dev" is the temporary host for test phase
For convenience, the token can be saved into a file named as
token.txt
(which is already listed in the .gitignore file in this
repository).
with open('token.txt', 'r') as file:
token = file.read()
pz_server = PzServer(token=token, host="pz-dev") # "pz-dev" is the temporary host for test phase
How to get general info from PZ Server
The object pz_server
just created above can provide access to data
and metadata stored in the PZ Server. It also brings useful methods for
users to navigate through the available contents. The methods with the
preffix get_
return the result of a query on the PZ Server database
as a Python dictionary, and are most useful to be used programatically
(see details on the API documentation
page).
Alternatively, those with the preffix display_
show the results as a
styled Pandas
DataFrames,
optimized for Jupyter Notebook (note: column names might change in the
display version). For instance:
Display the list of product types supported with a short description;
pz_server.display_product_types()
Display the list of users who uploaded data products to the server;
pz_server.display_users()
Display the list of data releases available at the time;
pz_server.display_releases()
Display all data products available (WARNING: this list can rapidly grow during the survey’s operation).
pz_server.display_products_list()
The information about product type, users, and releases shown above can
be used to filter the data products of interest for your search. For
that, the method list_products
receives as argument a dictionary
mapping the products attributes to their values.
pz_server.display_products_list(filters={"release": "LSST DP0",
"product_type": "Training Set"})
It also works if we type a string pattern that is part of the value. For instance, just “DP0” instead of “LSST DP0”:
pz_server.display_products_list(filters={"release": "DP0"})
It also allows the search for multiple strings by adding the suffix
__or
(two underscores + “or”) to the search key. For instance, to
get spec-z catalogs and training sets in the same search (notice that
filtering is not case sensitive):
pz_server.display_products_list(filters={"product_type__or": ["Spec-z Catalog", "training set"]})
To fetch the results of a search and attribute to a variable, just
change the preffix display_
by get_
, like this:
search_results = pz_server.get_products_list(filters={"product_type": "results"}) # PZ Validation results
search_results
How to upload a data product to via Python API (alternative method)
The default method to upload a data product to the PZ Server is the
upload tool on PZ Server website, as shown above. Alternatively, data
products can be sent to the host service using the pzserver
Python
library.
First, prepare a dictionary with the relevant information about your data product:
data_to_upload = {
"name":"example upload via lib",
"product_type": "specz_catalog", # Product type
"release": None, # LSST release, use None if not LSST data
"main_file": "example.csv", # full path
"auxiliary_files": ["example.html", "example.ipynb"] # full path
}
upload = pz_server.upload(**data_to_upload)
upload.product_id
How to display the metadata of a data product
The metadata of a given data product is the information provided by the
user on the upload form. This information is attached to the data
product contents and is available for consulting on the PZ Server page
or using this Python API (pzserver
).
All data products stored on PZ Server are identified by a unique id number or a unique name, a string called internal_name, which is created automatically at the moment of the upload by concatenating the product id to the name given by its owner (replacing blank spaces by “_”, lowering cases, and removing special characters).
The PzServer
’s method get_product_metadata()
returns a
dictionary with the attibutes stored in the PZ Server about a given data
product identified by its id or internal_name. For use in a
Jupyter notebook, the equivalent display_product_metadata()
shows
the results in a formated table.
# pz_server.display_product_metadata(<id (int or str) or internal_name (str)>)
# pz_server.display_product_metadata(6)
# pz_server.display_product_metadata("6")
pz_server.display_product_metadata("6_simple_training_set")
back to the top
How to download data products as .zip files
To download any data product stored in the PZ Server, use the
PzServer
’s method download_product
informing the product’s
internal_name and the path to where it will be saved (the default is
the current folder). This method downloads a compressed .zip file which
contais all the files uploaded by the user, including data, anciliary
files and description files. The time spent to download a data product
depends on the internet connection between the user and the host. Let’s
try it with a small data product.
pz_server.download_product(14, save_in=".")
How to share data products with other RSP users
All data products uploaded to the PZ Server are imediately available and visible to all PZ Server users (people with RSP credentials) through the PZ Server website or via the API. Besides informing the product id or internal_name for programatic access, another way to share a data product is providing the product’s URL, which leads to the product’s download page. The URL is composed by the PZ Server website address + /products/ + internal_name:
https://pz-server.linea.org.br/product/ + internal_name
or, if still in the development phase,
https://pz-server-dev.linea.org.br/product/ + internal_name
For example:
https://pz-server-dev.linea.org.br/product/6_simple_training_set
WARNING: The URL works only with the complete internal name, not with just the id number.
back to the top
How to retrieve contents of data products (work on memory)
Another feature of the PZ Server API is to let users retrieve the
contents of a given data product to work on memory (by atributing the
results of the method get_product()
to a variable in the code). This
feature is available only for tabular data (product types: Spec-z
Catalog and Training Set).
By default, the method get_product
returns an object from a
particular class, depending on the product’s type. The classes
SpeczCatalog
and TrainingSet
are simple extensions of
pandas.DataFrame
(via class composition) with a couple of additional
attributes and methods, such as the attribute metadata
, and the
method display_metadata()
. Let’s see an example:
catalog = pz_server.get_product(8)
catalog
catalog.display_metadata()
The tabular data is allocated in the attribute data
, which is a
pandas.DataFrame
.
catalog.data
type(catalog.data)
It preserves the useful methods from pandas.DataFrame
, such as:
catalog.data.info()
catalog.data.describe()
In the prod-types you will see details about these specific classes. For
those who prefer working with astropy.Table
or pure
pandas.DataFrame
, the method get_product()
gives the flexibility
to choose the output format (fmt="pandas"
or fmt="astropy"
).
dataframe = pz_server.get_product(8, fmt="pandas")
print(type(dataframe))
dataframe
table = pz_server.get_product(8, fmt="astropy")
print(type(table))
table
Specific features for each product type
Please take a look at the other tutorial notebooks with particular
examples of how to use the pzserver
library to access and manipulate
data from the PZ Server.
Users feedback
Is something important missing? Click here to open an issue in the PZ Server library repository on GitHub.