hipscatalog_gen.io package

Submodules

hipscatalog_gen.io.input module

Input readers for Parquet/CSV/TSV and HATS/LSDB catalogs.

compute_column_report_sample(ddf_like, sample_rows=200000)[source]

Build a small column summary from a sample.

Uses sampling to keep the computation fast and scalable. Works with Dask DataFrames and LSDB catalogs.

Parameters:
  • ddf_like (Any) – Dask-like collection or LSDB catalog.

  • sample_rows (int) – Approximate maximum number of rows to materialize.

Returns:

Nested dict with basic column statistics and examples.

Return type:

Dict

compute_column_report_global(ddf_like)[source]

Build a column summary using global Dask-based statistics.

Computes min, max, mean and null counts using a single Dask graph.

Parameters:

ddf_like (Any) – Dask-like collection or LSDB catalog.

Returns:

Nested dict with global column statistics and examples.

Return type:

Dict

hipscatalog_gen.io.output module

Writers for HiPS tiles, metadata, MOC, and density map products.

class TSVTileWriter(out_dir, depth, header_line)[source]

Bases: object

Helper for writing HiPS catalogue tiles in TSV format.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HiPS order (NorderX).

  • header_line (str) – Header line for TSV tiles (without completeness line).

allsky_tmp()[source]

Return temporary path for Allsky.tsv.

Return type:

Path

allsky_path()[source]

Return final path for Allsky.tsv.

Return type:

Path

cell_tmp(ipix)[source]

Return temporary path for a given Npix tile.

Parameters:

ipix (int)

Return type:

Path

cell_path(ipix)[source]

Return final path for a given Npix tile.

Parameters:

ipix (int)

Return type:

Path

write_properties(out_dir, output_cfg, level_limit, n_src, tile_format='tsv')[source]

Write HiPS ‘properties’ file for a catalogue HiPS.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • output_cfg (OutputCfg) – Output configuration object.

  • level_limit (int) – Deepest HiPS order.

  • n_src (int) – Total number of catalogue sources.

  • tile_format (str) – Tile format string (usually “tsv”).

Return type:

None

write_arguments(out_dir, args_text)[source]

Write command-line arguments used to run the pipeline.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • args_text (str) – Text blob to persist under arguments.

Return type:

None

write_metadata_xml(out_dir, columns, ra_idx, dec_idx)[source]

Write VOTable metadata (metadata.xml and Metadata.xml).

Marks RA/DEC columns with appropriate UCDs.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • columns (List[tuple[str, str, str | None]]) – List of (name, dtype, ucd) tuples.

  • ra_idx (int) – Index of RA column in columns.

  • dec_idx (int) – Index of DEC column in columns.

Return type:

None

write_moc(out_dir, moc_order, dens_counts)[source]

Build and write MOC from densmap counts.

Outputs both FITS (Moc.fits) and JSON (Moc.json) representations.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • moc_order (int) – HEALPix order used for the MOC.

  • dens_counts (ndarray) – Densmap counts at the MOC order.

Raises:

RuntimeError – If MOC construction fails for all attempted mocpy builders.

Return type:

None

write_densmap_fits(out_dir, depth, counts)[source]

Write densmap_o<depth>.fits for depths < 13.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HEALPix order (depth).

  • counts (ndarray) – Counts per pixel at this depth.

Return type:

None

write_index_html(out_dir, output_cfg)[source]

Write a simple index.html page for quick local catalog inspection.

Parameters:
Return type:

None

finalize_write_tiles(out_dir, depth, header_line, ra_col, dec_col, counts, selected, order_desc, allsky_collect=False)[source]

Write one TSV per HEALPix cell and build optional Allsky dataframe.

The function:
  • Writes a completeness header line.

  • Writes a single header line per tile.

  • Writes rows in the same column order as the header.

  • Uses atomic rename to avoid partial tiles.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HiPS order.

  • header_line (str) – Header line for tiles.

  • ra_col (str) – RA column name (unused here but kept for interface stability).

  • dec_col (str) – DEC column name (unused here but kept for interface stability).

  • counts (ndarray) – Densmap counts for this depth.

  • selected (DataFrame) – DataFrame with selected rows for this depth.

  • order_desc (bool) – Whether scores were sorted in descending order (unused here).

  • allsky_collect (bool) – If True, return a concatenated Allsky dataframe.

Returns:

written: Mapping ipix -> number of rows written. allsky_df: DataFrame with all rows (if allsky_collect=True), else None.

Return type:

Tuple (written, allsky_df) where

Raises:

OSError – If tile files cannot be written or renamed.

build_header_line_from_keep(keep_cols)[source]

Build header line from a list of column names.

Parameters:

keep_cols (List[str])

Return type:

str

Module contents

Input loaders and HiPS output writers.

compute_column_report_global(ddf_like)[source]

Build a column summary using global Dask-based statistics.

Computes min, max, mean and null counts using a single Dask graph.

Parameters:

ddf_like (Any) – Dask-like collection or LSDB catalog.

Returns:

Nested dict with global column statistics and examples.

Return type:

Dict

compute_column_report_sample(ddf_like, sample_rows=200000)[source]

Build a small column summary from a sample.

Uses sampling to keep the computation fast and scalable. Works with Dask DataFrames and LSDB catalogs.

Parameters:
  • ddf_like (Any) – Dask-like collection or LSDB catalog.

  • sample_rows (int) – Approximate maximum number of rows to materialize.

Returns:

Nested dict with basic column statistics and examples.

Return type:

Dict

class TSVTileWriter(out_dir, depth, header_line)[source]

Bases: object

Helper for writing HiPS catalogue tiles in TSV format.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HiPS order (NorderX).

  • header_line (str) – Header line for TSV tiles (without completeness line).

allsky_tmp()[source]

Return temporary path for Allsky.tsv.

Return type:

Path

allsky_path()[source]

Return final path for Allsky.tsv.

Return type:

Path

cell_tmp(ipix)[source]

Return temporary path for a given Npix tile.

Parameters:

ipix (int)

Return type:

Path

cell_path(ipix)[source]

Return final path for a given Npix tile.

Parameters:

ipix (int)

Return type:

Path

build_header_line_from_keep(keep_cols)[source]

Build header line from a list of column names.

Parameters:

keep_cols (List[str])

Return type:

str

finalize_write_tiles(out_dir, depth, header_line, ra_col, dec_col, counts, selected, order_desc, allsky_collect=False)[source]

Write one TSV per HEALPix cell and build optional Allsky dataframe.

The function:
  • Writes a completeness header line.

  • Writes a single header line per tile.

  • Writes rows in the same column order as the header.

  • Uses atomic rename to avoid partial tiles.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HiPS order.

  • header_line (str) – Header line for tiles.

  • ra_col (str) – RA column name (unused here but kept for interface stability).

  • dec_col (str) – DEC column name (unused here but kept for interface stability).

  • counts (ndarray) – Densmap counts for this depth.

  • selected (DataFrame) – DataFrame with selected rows for this depth.

  • order_desc (bool) – Whether scores were sorted in descending order (unused here).

  • allsky_collect (bool) – If True, return a concatenated Allsky dataframe.

Returns:

written: Mapping ipix -> number of rows written. allsky_df: DataFrame with all rows (if allsky_collect=True), else None.

Return type:

Tuple (written, allsky_df) where

Raises:

OSError – If tile files cannot be written or renamed.

write_arguments(out_dir, args_text)[source]

Write command-line arguments used to run the pipeline.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • args_text (str) – Text blob to persist under arguments.

Return type:

None

write_densmap_fits(out_dir, depth, counts)[source]

Write densmap_o<depth>.fits for depths < 13.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • depth (int) – HEALPix order (depth).

  • counts (ndarray) – Counts per pixel at this depth.

Return type:

None

write_index_html(out_dir, output_cfg)[source]

Write a simple index.html page for quick local catalog inspection.

Parameters:
Return type:

None

write_metadata_xml(out_dir, columns, ra_idx, dec_idx)[source]

Write VOTable metadata (metadata.xml and Metadata.xml).

Marks RA/DEC columns with appropriate UCDs.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • columns (List[tuple[str, str, str | None]]) – List of (name, dtype, ucd) tuples.

  • ra_idx (int) – Index of RA column in columns.

  • dec_idx (int) – Index of DEC column in columns.

Return type:

None

write_moc(out_dir, moc_order, dens_counts)[source]

Build and write MOC from densmap counts.

Outputs both FITS (Moc.fits) and JSON (Moc.json) representations.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • moc_order (int) – HEALPix order used for the MOC.

  • dens_counts (ndarray) – Densmap counts at the MOC order.

Raises:

RuntimeError – If MOC construction fails for all attempted mocpy builders.

Return type:

None

write_properties(out_dir, output_cfg, level_limit, n_src, tile_format='tsv')[source]

Write HiPS ‘properties’ file for a catalogue HiPS.

Parameters:
  • out_dir (Path) – HiPS root output directory.

  • output_cfg (OutputCfg) – Output configuration object.

  • level_limit (int) – Deepest HiPS order.

  • n_src (int) – Total number of catalogue sources.

  • tile_format (str) – Tile format string (usually “tsv”).

Return type:

None