Photo-z Server

Tutorial Notebook 2 - Training Sets

Last verified run: 2024-Jul-22

Introduction

Welcome to the PZ Server tutorials. If you are reading this notebook for the first time, we recommend not to skip the introduction notebook: 0_introduction.ipynb also available in this same repository.

Imports and Setup

from pzserver import PzServer
import matplotlib.pyplot as plt
%reload_ext autoreload
%autoreload 2

# pz_server = PzServer(token="<your token>", host="pz-dev") # "pz-dev" is the temporary host for test phase

For convenience, the token can be saved into a file named as token.txt (which is already listed in the .gitignore file in this repository).

with open('token.txt', 'r') as file:
    token = file.read()
pz_server = PzServer(token=token, host="pz-dev") # "pz-dev" is the temporary host for test phase

Product types

The PZ Server API provides Python classes with useful methods to handle particular product types. Let’s recap the product types available:

pz_server.display_product_types()

Product type	Description
Spec-z Catalog	Catalog of spectroscopic redshifts and positions (usually equatorial coordinates).
Training Set	Training set for photo-z algorithms (tabular data). It usually contains magnitudes, errors, and true redshifts.
Validation Results	Results of a photo-z validation procedure (free format). Usually contains photo-z estimates (single estimates and/or pdf) of a validation set and photo-z validation metrics.
Photo-z Table	Results of a photo-z estimation procedure. If the data is larger than the file upload limit (200MB), the product entry stores only the metadata (instructions on accessing the data should be provided in the description field.

Training Sets

In the context of the PZ Server, Training Sets are defined as the product of matching (spatially) a given Spec-z Catalog (single survey or compilation) to the photometric data, in this case, the LSST Objects Catalog. The PZ Server API offers a tool called Training Set Maker for users to build customized Training Sets based on the Spec-z Catalogs available. Please see the companion Jupyter Notebook pz_tsm_tutorial.ipynb for details.

Note 1: Commonly the training set is split into two or more subsets for photo-z validation purposes. If the Training Set owner has previously defined which objects should belong to each subset (trainining and validation/test sets), this information must be available as an extra column in the table or as clear instructions for reproducing the subsets separation in the data product description.

Note 2: The PZ Server only supports catalog-level Training Sets. Image-based Training Sets, e.g., for deep-learning algorithms, are not supported yet.

Mandatory column: * Spectroscopic (or true) redshift - float

Other expected columns * Object ID from LSST Objects Catalog - integer * Observables: magnitudes (and/or colors, or fluxes) from LSST Objects Catalog - float * Observable errors: magnitude errors (and/or color errors, or flux errors) from LSST Objects Catalog - float * Right ascension [degrees] - float * Declination [degrees] - float * Quality Flag - integer, float, or string * Subset Flag - integer, float, or string

Training Sets can be uploaded by users on PZ Server website or via the pzserver library. Also, they can be created as the spatial cross-matching between a given Spec-z Catalog previously registered in the system and an Object table from a given LSST Data Release available in the Brazilian IDAC by the PZ Sever’s pipeline “Training Set Maker” (under development). Any Training Set built by the pipeline is automatically registered as a regular user-generated data product and has no difference from the uploaded ones.

train_goldenspike = pz_server.get_product(9)

Connecting to PZ Server...
Done!

train_goldenspike.display_metadata()

key	value
id	9
release	None
product_type	Training Set
uploaded_by	gschwend
internal_name	9_goldenspike_train_data_hdf5
product_name	Goldenspike train data hdf5
official_product	False
pz_code
description	A mock training set created using the example notebook goldenspike.ipynb available in RAIL's repository. Test upload of files in hdf5 format.
created_at	2023-03-29T19:12:59.746096Z
main_file	goldenspike_train_data.hdf5

Display basic statistics

train_goldenspike.data.describe()

	mag_err_g_lsst	mag_err_i_lsst	mag_err_r_lsst	mag_err_u_lsst	mag_err_y_lsst	mag_err_z_lsst	mag_g_lsst	mag_i_lsst	mag_r_lsst	mag_u_lsst	mag_y_lsst	mag_z_lsst	redshift
count	62.000000	62.000000	62.000000	61.000000	61.000000	62.000000	62.000000	62.000000	62.000000	61.000000	61.000000	62.000000	62.000000
mean	0.038182	0.016165	0.018770	0.188050	0.054682	0.021478	24.820000	23.384804	24.003970	25.446008	22.932354	23.074481	0.780298
std	0.036398	0.010069	0.013750	0.193747	0.115875	0.014961	1.314112	1.381587	1.387358	1.269277	1.540284	1.400673	0.355365
...	...	...	...	...	...	...	...	...	...	...	...	...	...
50%	0.028309	0.013390	0.016660	0.133815	0.034199	0.018540	25.069970	23.748506	24.470215	25.577029	23.293384	23.514185	0.764600
75%	0.049576	0.024650	0.025802	0.238859	0.063585	0.032557	25.705486	24.488654	24.985225	26.263284	23.993010	24.165944	0.948494
max	0.198195	0.036932	0.065360	1.154073	0.909230	0.051883	27.296152	24.949645	26.036958	28.482391	27.342151	24.693132	1.755764

8 rows × 13 columns

The training set object has a very basic plot method for quick visualization of catalog properties. For advanced interactive data visualization tips, we recommend the notebook DP02_06b_Interactive_Catalog_Visualization.ipynb from Rubin Observatory’s DP0.2 tutorial-notebooks repository.

train_goldenspike.plot(mag_name="mag_i_lsst")

Users feedback

Is something important missing? Click here to open an issue in the PZ Server library repository on GitHub.