hipscatalog_gen.selection package

Submodules

hipscatalog_gen.selection.common module

Shared selection utilities for HEALPix-aware slicing.

targets_per_tile(counts_depth, depth_total, bias)[source]

Distribute depth_total across active tiles with optional density bias.

Parameters:
  • counts_depth (ndarray)

  • depth_total (int)

  • bias (float)

Return type:

Dict[int, int]

reduce_topk_by_group_dask(ddf_like, group_col, score_col, order_desc, k_per_group, ra_col, dec_col, tie_col=None)[source]

Keep up to k_per_group rows per group, sorted by score then RA/DEC.

Uses a two-stage exact strategy: 1) per-partition local top-k pruning (no global shuffle), 2) global top-k by group on the pruned collection.

Parameters:
  • ddf_like (Any)

  • group_col (str)

  • score_col (str)

  • order_desc (bool)

  • k_per_group (Dict[int, int])

  • ra_col (str)

  • dec_col (str)

  • tie_col (str | None)

add_ipix_column(pdf, depth, ra_col, dec_col)[source]

Attach __ipix__ for a given depth.

Parameters:
  • pdf (DataFrame)

  • depth (int)

  • ra_col (str)

  • dec_col (str)

Return type:

DataFrame

hipscatalog_gen.selection.levels module

Assign targets per HEALPix level and depth.

assign_level_edges(densmaps, depths_sel, fixed_targets, cdf_hist, score_edges_hist, score_min, score_max, n_tot_score, log_fn, label)[source]

Compute cumulative targets per depth and corresponding score edges.

Parameters:
  • densmaps (Dict[int, ndarray])

  • depths_sel (List[int])

  • fixed_targets (Dict[int, float])

  • cdf_hist (ndarray)

  • score_edges_hist (ndarray)

  • score_min (float)

  • score_max (float)

  • n_tot_score (float)

  • label (str)

Return type:

Tuple[ndarray, ndarray]

hipscatalog_gen.selection.score module

Score computations, histograms, and sentinel handling for selection modes.

add_score_column(ddf, score_expr, output_col='__score__')[source]

Attach a numeric score column derived from a column or expression.

Parameters:
  • ddf (Any)

  • score_expr (str)

  • output_col (str)

Return type:

Any

compute_score_histogram_ddf(ddf_like, score_col, score_min, score_max, nbins, *, keep_invalid=False, sentinel=None)[source]

Compute a 1D histogram for score-like columns (Dask/LSDB friendly).

Parameters:
  • ddf_like (Any)

  • score_col (str)

  • score_min (float)

  • score_max (float)

  • nbins (int)

  • keep_invalid (bool)

  • sentinel (float | None)

Return type:

tuple[ndarray, ndarray, int]

compute_histogram_ddf(ddf_like, value_col, value_min, value_max, nbins, *, keep_invalid=False, sentinel=None)[source]

Generic 1D histogram computation for Dask DataFrames or LSDB catalogs.

Parameters:
  • ddf_like (Any) – Dask-like collection or LSDB catalog with the target column.

  • value_col (str) – Column name to histogram.

  • value_min (float) – Lower bound (inclusive).

  • value_max (float) – Upper bound (inclusive).

  • nbins (int) – Number of bins.

  • keep_invalid (bool) – When True, replace NaN/Inf with sentinel instead of dropping.

  • sentinel (float | None) – Sentinel value for invalid entries (used only when keep_invalid is True).

Returns:

  • hist: numpy array with bin counts.

  • edges: numpy array with bin edges.

  • n_total: total number of rows inspected (including invalid rows).

Return type:

Tuple of (hist, edges, n_total) where

resolve_value_range(ddf, value_col, range_mode, min_cfg, max_cfg, hist_nbins, compute_hist_fn, diag_ctx, log_fn, label)[source]

Resolve [min, max] for score-like columns with optional histogram peak.

Parameters:
  • ddf (Any) – Dask-like collection with the target column.

  • value_col (str) – Column name to inspect.

  • range_mode (str) – Either "complete" or "hist_peak".

  • min_cfg (float | None) – Optional configured minimum.

  • max_cfg (float | None) – Optional configured maximum.

  • hist_nbins (int) – Number of bins for histogram estimation.

  • compute_hist_fn (Callable[[Any, str, float, float, int], Tuple[ndarray, ndarray, int]]) – Callable to compute histograms (signature-compatible with compute_histogram_ddf).

  • diag_ctx – Diagnostics context factory.

  • log_fn – Logging callback.

  • label (str) – Human-readable label for logging and error messages.

Returns:

Tuple (min_value, max_value) resolved according to the mode.

Raises:
  • ValueError – When ranges are invalid, non-finite, or histogram estimation fails.

  • RuntimeError – When required bounds are missing for the chosen mode.

Return type:

tuple[float, float]

hipscatalog_gen.selection.slicing module

Slice selections by value or score with HEALPix-aware ordering.

select_by_value_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, value_col, order_desc, label, out_dir, diag_ctx, log_fn, *, level_edges=None, tie_col=None, compute_hist_fn=None, value_min=None, value_max=None, hist_nbins=None, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None)[source]

Slice by per-depth value ranges and write tiles.

Returns:

Dict with per-depth write summaries (currently depth_totals/depth_tiles).

Parameters:
  • remainder_ddf (Any)

  • densmaps (Dict[int, ndarray])

  • depths_sel (Sequence[int])

  • keep_cols (List[str])

  • ra_col (str)

  • dec_col (str)

  • value_col (str)

  • order_desc (bool)

  • label (str)

  • level_edges (ndarray | None)

  • tie_col (str | None)

  • value_min (float | None)

  • value_max (float | None)

  • hist_nbins (int | None)

  • fixed_targets (Dict[int, float] | None)

  • hist_diag_ctx_name (str | None)

  • depth_diag_prefix (str | None)

Return type:

dict[str, dict[str, int]]

select_by_score_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, score_col, score_min, score_max, hist_nbins, out_dir, diag_ctx, log_fn, label, order_desc, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None, tie_col=None)[source]

Score-specialized wrapper around select_by_value_slices.

Parameters:
  • remainder_ddf (Any)

  • densmaps (Dict[int, ndarray])

  • depths_sel (Sequence[int])

  • keep_cols (List[str])

  • ra_col (str)

  • dec_col (str)

  • score_col (str)

  • score_min (float)

  • score_max (float)

  • hist_nbins (int)

  • label (str)

  • order_desc (bool)

  • fixed_targets (Dict[int, float] | None)

  • hist_diag_ctx_name (str | None)

  • depth_diag_prefix (str | None)

  • tie_col (str | None)

Return type:

dict[str, dict[str, int]]

Module contents

Selection helpers for HEALPix slicing, histograms, and score handling.

add_ipix_column(pdf, depth, ra_col, dec_col)[source]

Attach __ipix__ for a given depth.

Parameters:
  • pdf (DataFrame)

  • depth (int)

  • ra_col (str)

  • dec_col (str)

Return type:

DataFrame

assign_level_edges(densmaps, depths_sel, fixed_targets, cdf_hist, score_edges_hist, score_min, score_max, n_tot_score, log_fn, label)[source]

Compute cumulative targets per depth and corresponding score edges.

Parameters:
  • densmaps (Dict[int, ndarray])

  • depths_sel (List[int])

  • fixed_targets (Dict[int, float])

  • cdf_hist (ndarray)

  • score_edges_hist (ndarray)

  • score_min (float)

  • score_max (float)

  • n_tot_score (float)

  • label (str)

Return type:

Tuple[ndarray, ndarray]

reduce_topk_by_group_dask(ddf_like, group_col, score_col, order_desc, k_per_group, ra_col, dec_col, tie_col=None)[source]

Keep up to k_per_group rows per group, sorted by score then RA/DEC.

Uses a two-stage exact strategy: 1) per-partition local top-k pruning (no global shuffle), 2) global top-k by group on the pruned collection.

Parameters:
  • ddf_like (Any)

  • group_col (str)

  • score_col (str)

  • order_desc (bool)

  • k_per_group (Dict[int, int])

  • ra_col (str)

  • dec_col (str)

  • tie_col (str | None)

targets_per_tile(counts_depth, depth_total, bias)[source]

Distribute depth_total across active tiles with optional density bias.

Parameters:
  • counts_depth (ndarray)

  • depth_total (int)

  • bias (float)

Return type:

Dict[int, int]

add_score_column(ddf, score_expr, output_col='__score__')[source]

Attach a numeric score column derived from a column or expression.

Parameters:
  • ddf (Any)

  • score_expr (str)

  • output_col (str)

Return type:

Any

compute_score_histogram_ddf(ddf_like, score_col, score_min, score_max, nbins, *, keep_invalid=False, sentinel=None)[source]

Compute a 1D histogram for score-like columns (Dask/LSDB friendly).

Parameters:
  • ddf_like (Any)

  • score_col (str)

  • score_min (float)

  • score_max (float)

  • nbins (int)

  • keep_invalid (bool)

  • sentinel (float | None)

Return type:

tuple[ndarray, ndarray, int]

resolve_value_range(ddf, value_col, range_mode, min_cfg, max_cfg, hist_nbins, compute_hist_fn, diag_ctx, log_fn, label)[source]

Resolve [min, max] for score-like columns with optional histogram peak.

Parameters:
  • ddf (Any) – Dask-like collection with the target column.

  • value_col (str) – Column name to inspect.

  • range_mode (str) – Either "complete" or "hist_peak".

  • min_cfg (float | None) – Optional configured minimum.

  • max_cfg (float | None) – Optional configured maximum.

  • hist_nbins (int) – Number of bins for histogram estimation.

  • compute_hist_fn (Callable[[Any, str, float, float, int], Tuple[ndarray, ndarray, int]]) – Callable to compute histograms (signature-compatible with compute_histogram_ddf).

  • diag_ctx – Diagnostics context factory.

  • log_fn – Logging callback.

  • label (str) – Human-readable label for logging and error messages.

Returns:

Tuple (min_value, max_value) resolved according to the mode.

Raises:
  • ValueError – When ranges are invalid, non-finite, or histogram estimation fails.

  • RuntimeError – When required bounds are missing for the chosen mode.

Return type:

tuple[float, float]

compute_histogram_ddf(ddf_like, value_col, value_min, value_max, nbins, *, keep_invalid=False, sentinel=None)[source]

Generic 1D histogram computation for Dask DataFrames or LSDB catalogs.

Parameters:
  • ddf_like (Any) – Dask-like collection or LSDB catalog with the target column.

  • value_col (str) – Column name to histogram.

  • value_min (float) – Lower bound (inclusive).

  • value_max (float) – Upper bound (inclusive).

  • nbins (int) – Number of bins.

  • keep_invalid (bool) – When True, replace NaN/Inf with sentinel instead of dropping.

  • sentinel (float | None) – Sentinel value for invalid entries (used only when keep_invalid is True).

Returns:

  • hist: numpy array with bin counts.

  • edges: numpy array with bin edges.

  • n_total: total number of rows inspected (including invalid rows).

Return type:

Tuple of (hist, edges, n_total) where

select_by_score_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, score_col, score_min, score_max, hist_nbins, out_dir, diag_ctx, log_fn, label, order_desc, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None, tie_col=None)[source]

Score-specialized wrapper around select_by_value_slices.

Parameters:
  • remainder_ddf (Any)

  • densmaps (Dict[int, ndarray])

  • depths_sel (Sequence[int])

  • keep_cols (List[str])

  • ra_col (str)

  • dec_col (str)

  • score_col (str)

  • score_min (float)

  • score_max (float)

  • hist_nbins (int)

  • label (str)

  • order_desc (bool)

  • fixed_targets (Dict[int, float] | None)

  • hist_diag_ctx_name (str | None)

  • depth_diag_prefix (str | None)

  • tie_col (str | None)

Return type:

dict[str, dict[str, int]]

select_by_value_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, value_col, order_desc, label, out_dir, diag_ctx, log_fn, *, level_edges=None, tie_col=None, compute_hist_fn=None, value_min=None, value_max=None, hist_nbins=None, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None)[source]

Slice by per-depth value ranges and write tiles.

Returns:

Dict with per-depth write summaries (currently depth_totals/depth_tiles).

Parameters:
  • remainder_ddf (Any)

  • densmaps (Dict[int, ndarray])

  • depths_sel (Sequence[int])

  • keep_cols (List[str])

  • ra_col (str)

  • dec_col (str)

  • value_col (str)

  • order_desc (bool)

  • label (str)

  • level_edges (ndarray | None)

  • tie_col (str | None)

  • value_min (float | None)

  • value_max (float | None)

  • hist_nbins (int | None)

  • fixed_targets (Dict[int, float] | None)

  • hist_diag_ctx_name (str | None)

  • depth_diag_prefix (str | None)

Return type:

dict[str, dict[str, int]]