hipscatalog_gen.io.input.compute_column_report_sample

compute_column_report_sample(ddf_like, sample_rows=200000)[source]

Build a small column summary from a sample.

Uses sampling to keep the computation fast and scalable. Works with Dask DataFrames and LSDB catalogs.

Parameters:
  • ddf_like (Any) – Dask-like collection or LSDB catalog.

  • sample_rows (int) – Approximate maximum number of rows to materialize.

Returns:

Nested dict with basic column statistics and examples.

Return type:

Dict