hipscatalog_gen.io package
Submodules
hipscatalog_gen.io.input module
Input readers for Parquet/CSV/TSV and HATS/LSDB catalogs.
- compute_column_report_sample(ddf_like, sample_rows=200000)[source]
Build a small column summary from a sample.
Uses sampling to keep the computation fast and scalable. Works with Dask DataFrames and LSDB catalogs.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog.
sample_rows (int) – Approximate maximum number of rows to materialize.
- Returns:
Nested dict with basic column statistics and examples.
- Return type:
Dict
- compute_column_report_global(ddf_like)[source]
Build a column summary using global Dask-based statistics.
Computes min, max, mean and null counts using a single Dask graph.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog.
- Returns:
Nested dict with global column statistics and examples.
- Return type:
Dict
hipscatalog_gen.io.output module
Writers for HiPS tiles, metadata, MOC, and density map products.
- class TSVTileWriter(out_dir, depth, header_line)[source]
Bases:
objectHelper for writing HiPS catalogue tiles in TSV format.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HiPS order (NorderX).
header_line (str) – Header line for TSV tiles (without completeness line).
- write_properties(out_dir, output_cfg, level_limit, n_src, tile_format='tsv')[source]
Write HiPS ‘properties’ file for a catalogue HiPS.
- Parameters:
out_dir (Path) – HiPS root output directory.
output_cfg (OutputCfg) – Output configuration object.
level_limit (int) – Deepest HiPS order.
n_src (int) – Total number of catalogue sources.
tile_format (str) – Tile format string (usually “tsv”).
- Return type:
None
- write_arguments(out_dir, args_text)[source]
Write command-line arguments used to run the pipeline.
- Parameters:
out_dir (Path) – HiPS root output directory.
args_text (str) – Text blob to persist under
arguments.
- Return type:
None
- write_metadata_xml(out_dir, columns, ra_idx, dec_idx)[source]
Write VOTable metadata (metadata.xml and Metadata.xml).
Marks RA/DEC columns with appropriate UCDs.
- Parameters:
out_dir (Path) – HiPS root output directory.
columns (List[tuple[str, str, str | None]]) – List of
(name, dtype, ucd)tuples.ra_idx (int) – Index of RA column in
columns.dec_idx (int) – Index of DEC column in
columns.
- Return type:
None
- write_moc(out_dir, moc_order, dens_counts)[source]
Build and write MOC from densmap counts.
Outputs both FITS (Moc.fits) and JSON (Moc.json) representations.
- Parameters:
out_dir (Path) – HiPS root output directory.
moc_order (int) – HEALPix order used for the MOC.
dens_counts (ndarray) – Densmap counts at the MOC order.
- Raises:
RuntimeError – If MOC construction fails for all attempted mocpy builders.
- Return type:
None
- write_densmap_fits(out_dir, depth, counts)[source]
Write densmap_o<depth>.fits for depths < 13.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HEALPix order (depth).
counts (ndarray) – Counts per pixel at this depth.
- Return type:
None
- write_index_html(out_dir, output_cfg)[source]
Write a simple index.html page for quick local catalog inspection.
- Parameters:
out_dir (Path)
output_cfg (OutputCfg)
- Return type:
None
- finalize_write_tiles(out_dir, depth, header_line, ra_col, dec_col, counts, selected, order_desc, allsky_collect=False)[source]
Write one TSV per HEALPix cell and build optional Allsky dataframe.
- The function:
Writes a completeness header line.
Writes a single header line per tile.
Writes rows in the same column order as the header.
Uses atomic rename to avoid partial tiles.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HiPS order.
header_line (str) – Header line for tiles.
ra_col (str) – RA column name (unused here but kept for interface stability).
dec_col (str) – DEC column name (unused here but kept for interface stability).
counts (ndarray) – Densmap counts for this depth.
selected (DataFrame) – DataFrame with selected rows for this depth.
order_desc (bool) – Whether scores were sorted in descending order (unused here).
allsky_collect (bool) – If True, return a concatenated Allsky dataframe.
- Returns:
written: Mapping ipix -> number of rows written. allsky_df: DataFrame with all rows (if allsky_collect=True), else None.
- Return type:
Tuple (written, allsky_df) where
- Raises:
OSError – If tile files cannot be written or renamed.
Module contents
Input loaders and HiPS output writers.
- compute_column_report_global(ddf_like)[source]
Build a column summary using global Dask-based statistics.
Computes min, max, mean and null counts using a single Dask graph.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog.
- Returns:
Nested dict with global column statistics and examples.
- Return type:
Dict
- compute_column_report_sample(ddf_like, sample_rows=200000)[source]
Build a small column summary from a sample.
Uses sampling to keep the computation fast and scalable. Works with Dask DataFrames and LSDB catalogs.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog.
sample_rows (int) – Approximate maximum number of rows to materialize.
- Returns:
Nested dict with basic column statistics and examples.
- Return type:
Dict
- class TSVTileWriter(out_dir, depth, header_line)[source]
Bases:
objectHelper for writing HiPS catalogue tiles in TSV format.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HiPS order (NorderX).
header_line (str) – Header line for TSV tiles (without completeness line).
- build_header_line_from_keep(keep_cols)[source]
Build header line from a list of column names.
- Parameters:
keep_cols (List[str])
- Return type:
str
- finalize_write_tiles(out_dir, depth, header_line, ra_col, dec_col, counts, selected, order_desc, allsky_collect=False)[source]
Write one TSV per HEALPix cell and build optional Allsky dataframe.
- The function:
Writes a completeness header line.
Writes a single header line per tile.
Writes rows in the same column order as the header.
Uses atomic rename to avoid partial tiles.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HiPS order.
header_line (str) – Header line for tiles.
ra_col (str) – RA column name (unused here but kept for interface stability).
dec_col (str) – DEC column name (unused here but kept for interface stability).
counts (ndarray) – Densmap counts for this depth.
selected (DataFrame) – DataFrame with selected rows for this depth.
order_desc (bool) – Whether scores were sorted in descending order (unused here).
allsky_collect (bool) – If True, return a concatenated Allsky dataframe.
- Returns:
written: Mapping ipix -> number of rows written. allsky_df: DataFrame with all rows (if allsky_collect=True), else None.
- Return type:
Tuple (written, allsky_df) where
- Raises:
OSError – If tile files cannot be written or renamed.
- write_arguments(out_dir, args_text)[source]
Write command-line arguments used to run the pipeline.
- Parameters:
out_dir (Path) – HiPS root output directory.
args_text (str) – Text blob to persist under
arguments.
- Return type:
None
- write_densmap_fits(out_dir, depth, counts)[source]
Write densmap_o<depth>.fits for depths < 13.
- Parameters:
out_dir (Path) – HiPS root output directory.
depth (int) – HEALPix order (depth).
counts (ndarray) – Counts per pixel at this depth.
- Return type:
None
- write_index_html(out_dir, output_cfg)[source]
Write a simple index.html page for quick local catalog inspection.
- Parameters:
out_dir (Path)
output_cfg (OutputCfg)
- Return type:
None
- write_metadata_xml(out_dir, columns, ra_idx, dec_idx)[source]
Write VOTable metadata (metadata.xml and Metadata.xml).
Marks RA/DEC columns with appropriate UCDs.
- Parameters:
out_dir (Path) – HiPS root output directory.
columns (List[tuple[str, str, str | None]]) – List of
(name, dtype, ucd)tuples.ra_idx (int) – Index of RA column in
columns.dec_idx (int) – Index of DEC column in
columns.
- Return type:
None
- write_moc(out_dir, moc_order, dens_counts)[source]
Build and write MOC from densmap counts.
Outputs both FITS (Moc.fits) and JSON (Moc.json) representations.
- Parameters:
out_dir (Path) – HiPS root output directory.
moc_order (int) – HEALPix order used for the MOC.
dens_counts (ndarray) – Densmap counts at the MOC order.
- Raises:
RuntimeError – If MOC construction fails for all attempted mocpy builders.
- Return type:
None
- write_properties(out_dir, output_cfg, level_limit, n_src, tile_format='tsv')[source]
Write HiPS ‘properties’ file for a catalogue HiPS.
- Parameters:
out_dir (Path) – HiPS root output directory.
output_cfg (OutputCfg) – Output configuration object.
level_limit (int) – Deepest HiPS order.
n_src (int) – Total number of catalogue sources.
tile_format (str) – Tile format string (usually “tsv”).
- Return type:
None