astro_prost.associate

Attributes

NPROCESS_MAX

MAX_RETRIES

DEFAULT_RELEASES

ONLY_OFFSET_CATS

Functions

chunks(lst, n)

Yield successive n-sized chunks from lst.

infer_skycoord(row, coord_cols)

Infers a SkyCoord list from the rows of a pandas DF.

consolidate_results(results, transient_catalog)

Updates the original transient catalog with the host properties retrieved during association.

save_results(transient_catalog[, run_name, save_path, ...])

Save the transient catalog results to a CSV file with a timestamp (and optional run name).

log_host_properties(logger, transient_name, cat, ...)

Log selected host galaxy properties for a transient.

get_catalogs(user_input)

Convert user input into a dictionary mapping catalog names to release versions.

associate_transient(idx, row, glade_catalog, ...[, ...])

Associates a transient with its most likely host galaxy.

associate_sample(transient_catalog, catalogs[, ...])

Wrapper function for associating sample of transients.

safe_associate_transient(*args, **kwargs)

Safely executes associate_transient while handling errors.

Module Contents

NPROCESS_MAX[source]
MAX_RETRIES = 3[source]
chunks(lst, n)[source]

Yield successive n-sized chunks from lst.

DEFAULT_RELEASES[source]
ONLY_OFFSET_CATS[source]
infer_skycoord(row, coord_cols)[source]

Infers a SkyCoord list from the rows of a pandas DF.

Parameters:
  • row (pandas.DataFrame row.) – Row of transient_catalog that will be associated (containing properties of 1 transient).

  • coord_cols (tuple of two strings) – Name of the coordinate columns in the pandas.DataFrame.

Returns:

List of retrieved coordinates for transients to associate.

Return type:

array of astropy.coordinates.SkyCoord objects

consolidate_results(results, transient_catalog)[source]

Updates the original transient catalog with the host properties retrieved during association.

Parameters:
  • results (dictionary) – Results from association; keys are row indices, and values are dictionaries of returned properties.

  • transient_catalog (pd.DataFrame) – The dataset containing names, coordinates, and (optionally) redshift information for transients.

Returns:

Original transient catalog, with host columns concatenated.

Return type:

pd.DataFrame

save_results(transient_catalog, run_name=None, save_path='./', drop_unassociated=True)[source]

Save the transient catalog results to a CSV file with a timestamp (and optional run name).

Parameters:
  • transient_catalog (pandas.DataFrame) – A DataFrame containing the transient catalog data.

  • run_name (str, optional) – A string identifier for the current run.

  • save_path (str, optional) – The directory path where the CSV file will be saved. Defaults to the current directory (‘./’).

  • drop_unassociated (bool, optional) – If True, drops unassociated transients before saving. Defaults to True.

Return type:

None

log_host_properties(logger, transient_name, cat, host_idx, title, print_props, calc_host_props, condition_props)[source]

Log selected host galaxy properties for a transient.

Parameters:
  • logger (logging.Logger) – Logger instance to output messages.

  • transient_name (str) – Name of the transient.

  • cat (GalaxyCatalog) – Catalog containing candidate host galaxies.

  • host_idx (int) – Index of the host galaxy in the catalog.

  • title (str) – Header text for the log output.

  • print_props (list of str) – List of property names to log directly (e.g., ‘objID’, ‘ra’, ‘dec’).

  • calc_host_props (list of str) – List of properties (e.g., ‘redshift’, ‘absmag’, ‘offset’) for which mean and std are logged.

  • condition_props (list of str) – List of properties (e.g., ‘redshift’, ‘absmag’, ‘offset’) for which posterior values are logged.

Returns:

Logs the formatted host properties.

Return type:

None

get_catalogs(user_input)[source]

Convert user input into a dictionary mapping catalog names to release versions.

Parameters:

user_input (iterable) – An iterable of catalog entries, where each entry is either a string (catalog name) or a tuple (catalog name, release version).

Returns:

A dictionary with keys as sanitized catalog names and values as the corresponding release version.

Return type:

dict

associate_transient(idx, row, glade_catalog, n_samples, priors, likes, cosmo, catalogs, cat_priority, name_col, coord_cols, redshift_col, cat_cols, log_fn, n_hosts=2, calc_host_props=False, verbose=0, coord_err_cols=('ra_err', 'dec_err'), strict_checking=False, warn_on_fallback=True, plot_match=False, best_redshift=False)[source]

Associates a transient with its most likely host galaxy.

Parameters:
  • idx (int) – Index of the transient from a larger catalog (used to cross-match properties after association).

  • row (pandas Series) – Full row of transient properties.

  • glade_catalog (pandas.DataFrame) – GLADE catalog of galaxies, with sizes and photo-zs.

  • n_samples (int) – Number of samples for the Monte Carlo sampling of associations.

  • priors (dict) – Dictionary of priors for the run (at least one of redshift, offset, absolute magnitude).!

  • likes (dict) – Dictionary of likelihoods for the run (at least one of offset, absolute magnitude).

  • cosmo (astropy.cosmology) – Assumed cosmology for the run (defaults to LambdaCDM if unspecified).

  • catalogs (dict) – Dict of source catalogs to query, with required key “name” and optional key “release”.

  • cat_priorities (dict) – The priority order to run the associations (with value 1 will run first, 2nd will run 2nd, etc). If None, defaults to the order the catalogs are provided in.

  • cat_cols (boolean) – If true, concatenates the source catalog fields to the returned dataframe.

  • log_fn (str, optional) – The fn associated with the logger.Logger object.

  • calc_host_props (boolean) – If true, calculates host galaxy properties even if not needed for association

  • verbose (int) – The verbosity level of the output.

  • coord_err_cols (tuple of strings) – The column names associated with positional uncertainties on the transient positions.

  • strict_checking (boolean, optional) – If true, raises error if catalog doesn’t support conditioning on a property requested.

  • warn_on_fallback (boolean, optional) – If true, raises warning if catalog doesn’t support conditioning on a property requested.

  • plot_match (boolean, optional) – If true, attempts to generate a plot image.

  • best_redshift (boolean, optional) – If True, queries NED for spectroscopic redshift when host is found within 1 arcsec. Default is False.

Returns:

Properties of the first and second-best host galaxy matches, and a dictionary of catalog columns (empty if cat_cols=False)

Return type:

tuple

associate_sample(transient_catalog, catalogs, name_col=None, coord_cols=None, redshift_col=None, cat_priority=None, run_name=None, priors=None, likes=None, n_samples=1000, verbose=1, n_hosts=2, parallel=True, save=True, save_path='./', log_path=None, cat_cols=False, progress_bar=False, cosmology=None, n_processes=None, calc_host_props=True, coord_err_cols=None, best_redshift=False)[source]

Wrapper function for associating sample of transients.

Parameters:
  • transient_catalog (pandas.DataFrame) – Dataframe containing transient name and coordinates.

  • priors (dict) – Dictionary of prior distributions on redshift, fractional offset, and/or absolute magnitude

  • likes (dict) – Dictionary of likelihood distributions on redshift, fractional offset, absolute magnitude

  • catalogs (list) – List of catalogs to query (can include ‘glade’, ‘decals’, ‘panstarrs’)

  • cat_priority (dict) – Dict of catalog priority (determines what gets run first)

  • run_name (str or None) – Optional name for the run – used to name logfiles

  • n_samples (int) – List of samples to draw for monte-carlo association.

  • verbose (int) – Verbosity level for logging; can be 0 - 3.

  • n_hosts (int) – Number of potential hosts to return.

  • parallel (boolean) – If True, runs in parallel with multiprocessing. Cannot be used with ipython!

  • save (boolean) – If True, saves resulting association table to save_path.

  • save_path (str) – Path where the association table should be saved (when save=True).

  • log_path (str) – Path where the logfile should be saved. If none, log everything to screen

  • cat_cols (boolean) – If True, contatenates catalog columns to resulting DataFrame.

  • progress_bar (boolean) – If True, prints a loading bar for each association (when parallel=True).

  • cosmology (astropy.cosmology) – Assumed cosmology for the run (defaults to LambdaCDM if unspecified).

  • n_processes (int) – Number of parallel processes to run when parallel=True (defaults to n_cores-4 if unspecified).

  • calc_host_props (boolean) – If True, calculates all host properties (redshift, absmag, and fractional offset) regardless of whether or not they’re needed for association.

  • best_redshift (boolean, optional) – If True, queries NED for spectroscopic redshift when host is found within 1 arcsec. Default is False.

Returns:

The transient dataframe with columns corresponding to the associated transient.

Return type:

pandas.DataFrame

safe_associate_transient(*args, **kwargs)[source]

Safely executes associate_transient while handling errors.

Parameters:
  • *args (tuple) – Positional arguments to be passed directly to associate_transient. The first argument (args[0]) is expected to be the transient’s catalog index.

  • **kwargs (dict) – Keyword arguments passed to associate_transient.

Returns:

The output of associate_transient if successful, otherwise None.

Return type:

dict or None