hipscatalog_gen.selection package
Submodules
hipscatalog_gen.selection.common module
Shared selection utilities for HEALPix-aware slicing.
- targets_per_tile(counts_depth, depth_total, bias)[source]
Distribute depth_total across active tiles with optional density bias.
- Parameters:
counts_depth (ndarray)
depth_total (int)
bias (float)
- Return type:
Dict[int, int]
- reduce_topk_by_group_dask(ddf_like, group_col, score_col, order_desc, k_per_group, ra_col, dec_col, tie_col=None)[source]
Keep up to k_per_group rows per group, sorted by score then RA/DEC.
Uses a two-stage exact strategy: 1) per-partition local top-k pruning (no global shuffle), 2) global top-k by group on the pruned collection.
- Parameters:
ddf_like (Any)
group_col (str)
score_col (str)
order_desc (bool)
k_per_group (Dict[int, int])
ra_col (str)
dec_col (str)
tie_col (str | None)
hipscatalog_gen.selection.levels module
Assign targets per HEALPix level and depth.
- assign_level_edges(densmaps, depths_sel, fixed_targets, cdf_hist, score_edges_hist, score_min, score_max, n_tot_score, log_fn, label)[source]
Compute cumulative targets per depth and corresponding score edges.
- Parameters:
densmaps (Dict[int, ndarray])
depths_sel (List[int])
fixed_targets (Dict[int, float])
cdf_hist (ndarray)
score_edges_hist (ndarray)
score_min (float)
score_max (float)
n_tot_score (float)
label (str)
- Return type:
Tuple[ndarray, ndarray]
hipscatalog_gen.selection.score module
Score computations, histograms, and sentinel handling for selection modes.
- add_score_column(ddf, score_expr, output_col='__score__')[source]
Attach a numeric score column derived from a column or expression.
- Parameters:
ddf (Any)
score_expr (str)
output_col (str)
- Return type:
Any
- compute_score_histogram_ddf(ddf_like, score_col, score_min, score_max, nbins, *, keep_invalid=False, sentinel=None)[source]
Compute a 1D histogram for score-like columns (Dask/LSDB friendly).
- Parameters:
ddf_like (Any)
score_col (str)
score_min (float)
score_max (float)
nbins (int)
keep_invalid (bool)
sentinel (float | None)
- Return type:
tuple[ndarray, ndarray, int]
- compute_histogram_ddf(ddf_like, value_col, value_min, value_max, nbins, *, keep_invalid=False, sentinel=None)[source]
Generic 1D histogram computation for Dask DataFrames or LSDB catalogs.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog with the target column.
value_col (str) – Column name to histogram.
value_min (float) – Lower bound (inclusive).
value_max (float) – Upper bound (inclusive).
nbins (int) – Number of bins.
keep_invalid (bool) – When True, replace NaN/Inf with
sentinelinstead of dropping.sentinel (float | None) – Sentinel value for invalid entries (used only when
keep_invalidis True).
- Returns:
hist: numpy array with bin counts.
edges: numpy array with bin edges.
n_total: total number of rows inspected (including invalid rows).
- Return type:
Tuple of
(hist, edges, n_total)where
- resolve_value_range(ddf, value_col, range_mode, min_cfg, max_cfg, hist_nbins, compute_hist_fn, diag_ctx, log_fn, label)[source]
Resolve [min, max] for score-like columns with optional histogram peak.
- Parameters:
ddf (Any) – Dask-like collection with the target column.
value_col (str) – Column name to inspect.
range_mode (str) – Either
"complete"or"hist_peak".min_cfg (float | None) – Optional configured minimum.
max_cfg (float | None) – Optional configured maximum.
hist_nbins (int) – Number of bins for histogram estimation.
compute_hist_fn (Callable[[Any, str, float, float, int], Tuple[ndarray, ndarray, int]]) – Callable to compute histograms (signature-compatible with
compute_histogram_ddf).diag_ctx – Diagnostics context factory.
log_fn – Logging callback.
label (str) – Human-readable label for logging and error messages.
- Returns:
Tuple
(min_value, max_value)resolved according to the mode.- Raises:
ValueError – When ranges are invalid, non-finite, or histogram estimation fails.
RuntimeError – When required bounds are missing for the chosen mode.
- Return type:
tuple[float, float]
hipscatalog_gen.selection.slicing module
Slice selections by value or score with HEALPix-aware ordering.
- select_by_value_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, value_col, order_desc, label, out_dir, diag_ctx, log_fn, *, level_edges=None, tie_col=None, compute_hist_fn=None, value_min=None, value_max=None, hist_nbins=None, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None)[source]
Slice by per-depth value ranges and write tiles.
- Returns:
Dict with per-depth write summaries (currently depth_totals/depth_tiles).
- Parameters:
remainder_ddf (Any)
densmaps (Dict[int, ndarray])
depths_sel (Sequence[int])
keep_cols (List[str])
ra_col (str)
dec_col (str)
value_col (str)
order_desc (bool)
label (str)
level_edges (ndarray | None)
tie_col (str | None)
value_min (float | None)
value_max (float | None)
hist_nbins (int | None)
fixed_targets (Dict[int, float] | None)
hist_diag_ctx_name (str | None)
depth_diag_prefix (str | None)
- Return type:
dict[str, dict[str, int]]
- select_by_score_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, score_col, score_min, score_max, hist_nbins, out_dir, diag_ctx, log_fn, label, order_desc, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None, tie_col=None)[source]
Score-specialized wrapper around select_by_value_slices.
- Parameters:
remainder_ddf (Any)
densmaps (Dict[int, ndarray])
depths_sel (Sequence[int])
keep_cols (List[str])
ra_col (str)
dec_col (str)
score_col (str)
score_min (float)
score_max (float)
hist_nbins (int)
label (str)
order_desc (bool)
fixed_targets (Dict[int, float] | None)
hist_diag_ctx_name (str | None)
depth_diag_prefix (str | None)
tie_col (str | None)
- Return type:
dict[str, dict[str, int]]
Module contents
Selection helpers for HEALPix slicing, histograms, and score handling.
- add_ipix_column(pdf, depth, ra_col, dec_col)[source]
Attach __ipix__ for a given depth.
- Parameters:
pdf (DataFrame)
depth (int)
ra_col (str)
dec_col (str)
- Return type:
DataFrame
- assign_level_edges(densmaps, depths_sel, fixed_targets, cdf_hist, score_edges_hist, score_min, score_max, n_tot_score, log_fn, label)[source]
Compute cumulative targets per depth and corresponding score edges.
- Parameters:
densmaps (Dict[int, ndarray])
depths_sel (List[int])
fixed_targets (Dict[int, float])
cdf_hist (ndarray)
score_edges_hist (ndarray)
score_min (float)
score_max (float)
n_tot_score (float)
label (str)
- Return type:
Tuple[ndarray, ndarray]
- reduce_topk_by_group_dask(ddf_like, group_col, score_col, order_desc, k_per_group, ra_col, dec_col, tie_col=None)[source]
Keep up to k_per_group rows per group, sorted by score then RA/DEC.
Uses a two-stage exact strategy: 1) per-partition local top-k pruning (no global shuffle), 2) global top-k by group on the pruned collection.
- Parameters:
ddf_like (Any)
group_col (str)
score_col (str)
order_desc (bool)
k_per_group (Dict[int, int])
ra_col (str)
dec_col (str)
tie_col (str | None)
- targets_per_tile(counts_depth, depth_total, bias)[source]
Distribute depth_total across active tiles with optional density bias.
- Parameters:
counts_depth (ndarray)
depth_total (int)
bias (float)
- Return type:
Dict[int, int]
- add_score_column(ddf, score_expr, output_col='__score__')[source]
Attach a numeric score column derived from a column or expression.
- Parameters:
ddf (Any)
score_expr (str)
output_col (str)
- Return type:
Any
- compute_score_histogram_ddf(ddf_like, score_col, score_min, score_max, nbins, *, keep_invalid=False, sentinel=None)[source]
Compute a 1D histogram for score-like columns (Dask/LSDB friendly).
- Parameters:
ddf_like (Any)
score_col (str)
score_min (float)
score_max (float)
nbins (int)
keep_invalid (bool)
sentinel (float | None)
- Return type:
tuple[ndarray, ndarray, int]
- resolve_value_range(ddf, value_col, range_mode, min_cfg, max_cfg, hist_nbins, compute_hist_fn, diag_ctx, log_fn, label)[source]
Resolve [min, max] for score-like columns with optional histogram peak.
- Parameters:
ddf (Any) – Dask-like collection with the target column.
value_col (str) – Column name to inspect.
range_mode (str) – Either
"complete"or"hist_peak".min_cfg (float | None) – Optional configured minimum.
max_cfg (float | None) – Optional configured maximum.
hist_nbins (int) – Number of bins for histogram estimation.
compute_hist_fn (Callable[[Any, str, float, float, int], Tuple[ndarray, ndarray, int]]) – Callable to compute histograms (signature-compatible with
compute_histogram_ddf).diag_ctx – Diagnostics context factory.
log_fn – Logging callback.
label (str) – Human-readable label for logging and error messages.
- Returns:
Tuple
(min_value, max_value)resolved according to the mode.- Raises:
ValueError – When ranges are invalid, non-finite, or histogram estimation fails.
RuntimeError – When required bounds are missing for the chosen mode.
- Return type:
tuple[float, float]
- compute_histogram_ddf(ddf_like, value_col, value_min, value_max, nbins, *, keep_invalid=False, sentinel=None)[source]
Generic 1D histogram computation for Dask DataFrames or LSDB catalogs.
- Parameters:
ddf_like (Any) – Dask-like collection or LSDB catalog with the target column.
value_col (str) – Column name to histogram.
value_min (float) – Lower bound (inclusive).
value_max (float) – Upper bound (inclusive).
nbins (int) – Number of bins.
keep_invalid (bool) – When True, replace NaN/Inf with
sentinelinstead of dropping.sentinel (float | None) – Sentinel value for invalid entries (used only when
keep_invalidis True).
- Returns:
hist: numpy array with bin counts.
edges: numpy array with bin edges.
n_total: total number of rows inspected (including invalid rows).
- Return type:
Tuple of
(hist, edges, n_total)where
- select_by_score_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, score_col, score_min, score_max, hist_nbins, out_dir, diag_ctx, log_fn, label, order_desc, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None, tie_col=None)[source]
Score-specialized wrapper around select_by_value_slices.
- Parameters:
remainder_ddf (Any)
densmaps (Dict[int, ndarray])
depths_sel (Sequence[int])
keep_cols (List[str])
ra_col (str)
dec_col (str)
score_col (str)
score_min (float)
score_max (float)
hist_nbins (int)
label (str)
order_desc (bool)
fixed_targets (Dict[int, float] | None)
hist_diag_ctx_name (str | None)
depth_diag_prefix (str | None)
tie_col (str | None)
- Return type:
dict[str, dict[str, int]]
- select_by_value_slices(remainder_ddf, densmaps, depths_sel, keep_cols, ra_col, dec_col, value_col, order_desc, label, out_dir, diag_ctx, log_fn, *, level_edges=None, tie_col=None, compute_hist_fn=None, value_min=None, value_max=None, hist_nbins=None, fixed_targets=None, hist_diag_ctx_name=None, depth_diag_prefix=None)[source]
Slice by per-depth value ranges and write tiles.
- Returns:
Dict with per-depth write summaries (currently depth_totals/depth_tiles).
- Parameters:
remainder_ddf (Any)
densmaps (Dict[int, ndarray])
depths_sel (Sequence[int])
keep_cols (List[str])
ra_col (str)
dec_col (str)
value_col (str)
order_desc (bool)
label (str)
level_edges (ndarray | None)
tie_col (str | None)
value_min (float | None)
value_max (float | None)
hist_nbins (int | None)
fixed_targets (Dict[int, float] | None)
hist_diag_ctx_name (str | None)
depth_diag_prefix (str | None)
- Return type:
dict[str, dict[str, int]]