Photo-z Server

Tutorial Notebook 1 - Spec-z Catalogs

Contact author: Julia Gschwend

Last verified run: 2024-Jul-22

Introduction

Welcome to the PZ Server tutorials. If you are reading this notebook for the first time, we recommend not to skip the introduction notebook: 0_introduction.ipynb also available in this same repository.

Imports and Setup

from pzserver import PzServer
import matplotlib.pyplot as plt
%reload_ext autoreload
%autoreload 2
# pz_server = PzServer(token="<your token>", host="pz-dev") # "pz-dev" is the temporary host for test phase

For convenience, the token can be saved into a file named as token.txt (which is already listed in the .gitignore file in this repository).

with open('token.txt', 'r') as file:
    token = file.read()
pz_server = PzServer(token=token, host="pz-dev") # "pz-dev" is the temporary host for test phase

Product types

The PZ Server API provides Python classes with useful methods to handle particular product types. Let’s recap the product types available:

pz_server.display_product_types()
Product type Description
Spec-z Catalog Catalog of spectroscopic redshifts and positions (usually equatorial coordinates).
Training Set Training set for photo-z algorithms (tabular data). It usually contains magnitudes, errors, and true redshifts.
Validation Results Results of a photo-z validation procedure (free format). Usually contains photo-z estimates (single estimates and/or pdf) of a validation set and photo-z validation metrics.
Photo-z Table Results of a photo-z estimation procedure. If the data is larger than the file upload limit (200MB), the product entry stores only the metadata (instructions on accessing the data should be provided in the description field.

Spec-z Catalogs

In the context of the PZ Server, Spec-z Catalogs are defined as any catalog containing spherical equatorial coordinates and spectroscopic redshift measurements (or, analogously, true redshifts from simulations). A Spec-z Catalog can include data from a single spectroscopic survey or a combination of data from several sources. To be considered as a single Spec-z Catalog, the data should be provided as a single file to PZ Server’s upload tool. For multi-survey catalogs, it is recommended to add the survey name or identification as an extra column.

Mandatory columns: * Right ascension [degrees] - float * Declination [degrees] - float * Spectroscopic or true redshift - float

Recommended columns: * Spectroscopic redshift error - float * Quality flag - integer, float, or string * Survey name (recommended for compilations of data from different surveys)

Spec-z Catalogs can be uploaded by users on PZ Server website or via the pzserver library. Also, they can be created as the combination of a list of other Spec-z Catalogs previously registered in the system by the PZ Sever’s pipeline “Combine Spec-z Catalogs” (under development). Any catalog built by the pipeline is automaticaly registered as a regular user-generated data product and has no difference from the uploaded ones.

Let’s see an example of Spec-z Catalog:

gama = pz_server.get_product(14)
Connecting to PZ Server...
Done!
gama.display_metadata()
key value
id 14
release None
product_type Spec-z Catalog
uploaded_by gschwend
internal_name 14_gama_specz_subsample
product_name GAMA spec-z subsample
official_product False
pz_code
description A small subsample of the GAMA DR3 spec-z catalog (Baldry et al. 2018) as an example of a typical spec-z catalog from the literature.
created_at 2023-03-29T20:02:45.223568Z
main_file specz_subsample_gama_example.csv

Display basic statistics

gama.data.describe()
ID RA DEC Z ERR_Z FLAG_DES
count 2.576000e+03 2576.000000 2576.000000 2576.000000 2576.0 2576.000000
mean 1.105526e+06 154.526343 -1.101865 0.224811 99.0 3.949534
std 4.006668e+04 70.783868 2.995036 0.102571 0.0 0.218947
... ... ... ... ... ... ...
50% 1.103558e+06 180.140145 -0.480830 0.217804 99.0 4.000000
75% 1.140619e+06 215.836583 1.170363 0.291810 99.0 4.000000
max 1.176440e+06 223.497080 2.998180 0.728717 99.0 4.000000

8 rows × 6 columns

The spec-z catalog object has a very basic plot method for quick visualization of catalog properties. For advanced interactive data visualization tips, we recommend the notebook DP02_06b_Interactive_Catalog_Visualization.ipynb from Rubin Observatory’s DP0.2 tutorial-notebooks repository.

gama.plot()
../../_images/output_20_0.png

The attribute data, which is a DataFrame preserves the plot method from Pandas.

gama.data.plot(x="RA", y="DEC", kind="scatter")
<Axes: xlabel='RA', ylabel='DEC'>
../../_images/output_22_1.png

Users feedback

Is something important missing? Click here to open an issue in the PZ Server library repository on GitHub.