Photo-z Server

Tutorial Notebook 2 - Training Sets

Contact author: Julia Gschwend

Last verified run: 2024-Jul-22

Introduction

Welcome to the PZ Server tutorials. If you are reading this notebook for the first time, we recommend not to skip the introduction notebook: 0_introduction.ipynb also available in this same repository.

Imports and Setup

from pzserver import PzServer
import matplotlib.pyplot as plt
%reload_ext autoreload
%autoreload 2
# pz_server = PzServer(token="<your token>", host="pz-dev") # "pz-dev" is the temporary host for test phase

For convenience, the token can be saved into a file named as token.txt (which is already listed in the .gitignore file in this repository).

with open('token.txt', 'r') as file:
    token = file.read()
pz_server = PzServer(token=token, host="pz-dev") # "pz-dev" is the temporary host for test phase

Product types

The PZ Server API provides Python classes with useful methods to handle particular product types. Let’s recap the product types available:

pz_server.display_product_types()
Product type Description
Spec-z Catalog Catalog of spectroscopic redshifts and positions (usually equatorial coordinates).
Training Set Training set for photo-z algorithms (tabular data). It usually contains magnitudes, errors, and true redshifts.
Validation Results Results of a photo-z validation procedure (free format). Usually contains photo-z estimates (single estimates and/or pdf) of a validation set and photo-z validation metrics.
Photo-z Table Results of a photo-z estimation procedure. If the data is larger than the file upload limit (200MB), the product entry stores only the metadata (instructions on accessing the data should be provided in the description field.

Training Sets

In the context of the PZ Server, Training Sets are defined as the product of matching (spatially) a given Spec-z Catalog (single survey or compilation) to the photometric data, in this case, the LSST Objects Catalog. The PZ Server API offers a tool called Training Set Maker for users to build customized Training Sets based on the Spec-z Catalogs available. Please see the companion Jupyter Notebook pz_tsm_tutorial.ipynb for details.

Note 1: Commonly the training set is split into two or more subsets for photo-z validation purposes. If the Training Set owner has previously defined which objects should belong to each subset (trainining and validation/test sets), this information must be available as an extra column in the table or as clear instructions for reproducing the subsets separation in the data product description.

Note 2: The PZ Server only supports catalog-level Training Sets. Image-based Training Sets, e.g., for deep-learning algorithms, are not supported yet.

Mandatory column: * Spectroscopic (or true) redshift - float

Other expected columns * Object ID from LSST Objects Catalog - integer * Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - float * Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - float * Right ascension [degrees] - float * Declination [degrees] - float * Quality Flag - integer, float, or string * Subset Flag - integer, float, or string

Training Sets can be uploaded by users on PZ Server website or via the pzserver library. Also, they can be created as the spatial cross-matching between a given Spec-z Catalog previously registered in the system and an Object table from a given LSST Data Release available in the Brazilian IDAC by the PZ Sever’s pipeline “Training Set Maker” (under development). Any Training Set built by the pipeline is automatically registered as a regular user-generated data product and has no difference from the uploaded ones.

train_goldenspike = pz_server.get_product(9)
Connecting to PZ Server...
Done!
train_goldenspike.display_metadata()
key value
id 9
release None
product_type Training Set
uploaded_by gschwend
internal_name 9_goldenspike_train_data_hdf5
product_name Goldenspike train data hdf5
official_product False
pz_code
description A mock training set created using the example notebook goldenspike.ipynb available in RAIL's repository. Test upload of files in hdf5 format.
created_at 2023-03-29T19:12:59.746096Z
main_file goldenspike_train_data.hdf5

Display basic statistics

train_goldenspike.data.describe()
mag_err_g_lsst mag_err_i_lsst mag_err_r_lsst mag_err_u_lsst mag_err_y_lsst mag_err_z_lsst mag_g_lsst mag_i_lsst mag_r_lsst mag_u_lsst mag_y_lsst mag_z_lsst redshift
count 62.000000 62.000000 62.000000 61.000000 61.000000 62.000000 62.000000 62.000000 62.000000 61.000000 61.000000 62.000000 62.000000
mean 0.038182 0.016165 0.018770 0.188050 0.054682 0.021478 24.820000 23.384804 24.003970 25.446008 22.932354 23.074481 0.780298
std 0.036398 0.010069 0.013750 0.193747 0.115875 0.014961 1.314112 1.381587 1.387358 1.269277 1.540284 1.400673 0.355365
... ... ... ... ... ... ... ... ... ... ... ... ... ...
50% 0.028309 0.013390 0.016660 0.133815 0.034199 0.018540 25.069970 23.748506 24.470215 25.577029 23.293384 23.514185 0.764600
75% 0.049576 0.024650 0.025802 0.238859 0.063585 0.032557 25.705486 24.488654 24.985225 26.263284 23.993010 24.165944 0.948494
max 0.198195 0.036932 0.065360 1.154073 0.909230 0.051883 27.296152 24.949645 26.036958 28.482391 27.342151 24.693132 1.755764

8 rows × 13 columns

The training set object has a very basic plot method for quick visualization of catalog properties. For advanced interactive data visualization tips, we recommend the notebook DP02_06b_Interactive_Catalog_Visualization.ipynb from Rubin Observatory’s DP0.2 tutorial-notebooks repository.

train_goldenspike.plot(mag_name="mag_i_lsst")
../../_images/output_18_0.png

Users feedback

Is something important missing? Click here to open an issue in the PZ Server library repository on GitHub.