Host-Galaxy Association with Pröst
This notebook shows the basics of using pröst for host-galaxy association.
First, let’s import some relevant packages. We’ll need distributions to define our priors and likelihoods.
[2]:
import pandas as pd
from scipy.stats import gamma, halfnorm, uniform
from astropy.cosmology import LambdaCDM
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
#pretty plotting
sns.set_context("poster")
#enable interactive plotting
%matplotlib inline
Pröst also provides a custom distribution object for the expected redshift of a transient with a given brightness and fixed volumetric rate.
[3]:
from astro_prost.helpers import PriorzObservedTransients, SnRateAbsmag
Next, we import the functions that do the bulk of the work.
[4]:
from astro_prost.associate import associate_sample
Now let’s read in a transient catalog, which should have at least the name and coordinates of the transients. Here, we use the ZTF BTS sample.
[9]:
transient_catalog = pd.read_csv("../../src/astro_prost/data/ZTFBTS_TransientTable.csv")
#only take the first event
transient_catalog = transient_catalog.sample(n=1)
print(transient_catalog.sample(frac=1)[['IAUID', 'RA', 'Dec']])
IAUID RA Dec
4887 SN2023kzm 15:16:54.65 +28:18:26.4
Next, we define the priors for the association. By default, Pröst defines priors on a transient’s observed(!)
Redshift distribution
Fractional radial offset from its host galaxy (defined in units of the host’s Directional Light Radius)
Host galaxy brightness, in absolute magnitude (\(B\)-band if associating with the glade catalog, else the median across \(griz\))
We’ll keep things simple for now, and assume that we detect fewer events with redshift, with broad uniform priors for brightness and fractional offset.
[6]:
# define priors for properties
priorfunc_z = halfnorm(loc=0.0001, scale=0.5)
priorfunc_offset = uniform(loc=0, scale=10)
priorfunc_absmag = uniform(loc=-30, scale=20)
If, instead, you want the redshift prior to be based on an observed distribution of transients within a given absolute magnitude range, we can build an empirical distribution by uniformly distributed transients in a cosmological volume between \(z_{min}\) and \(z_{max}\), and call the the subset with peak brightness above \(mag_{cutoff}\) to be “observed”.
By default, the code draws a transient’s peak brightness from a truncated gaussian from \(absmag_{min}\) to \(absmag_{max}\), with mean of \(absmag_{mean}\).
[7]:
cosmo = LambdaCDM(H0=70, Om0=0.3, Ode0=0.7)
priorfunc_z = PriorzObservedTransients(z_min=0, z_max=1, mag_cutoff=19, absmag_mean=-19, absmag_min=-24, absmag_max=-17, cosmo=cosmo)
We can then plot the resulting distribution:
[8]:
priorfunc_z.plot()
The pdf of the distribution can be evaluated and sampled:
[8]:
z_samples = priorfunc_z.rvs(10)
print(z_samples)
[0.00783099 0.28430976 0.38713431 0.16235576 0.33584785 0.16172475
0.12945304 0.26553305 0.47913023 0.1086904 ]
[9]:
priorfunc_z.pdf(z_samples)
[9]:
array([0.30819933, 1.83619906, 0.84822711, 3.69584314, 1.29049897,
3.70244304, 3.88463503, 2.08064422, 0.4326591 , 3.7741049 ])
Next, we set the likelihoods. Note that we only set these for fractional offset and brightness; the redshift likelihood comes from comparing the photometric redshifts of candidate galaxies with the redshift of the transient (if available).
[10]:
likefunc_offset = gamma(a=0.75)
likefunc_absmag = SnRateAbsmag(a=-25, b=20)
priors = {"offset": priorfunc_offset, "absmag": priorfunc_absmag, "redshift": priorfunc_z}
likes = {"offset": likefunc_offset, "absmag": likefunc_absmag}
[11]:
plt.hist(likefunc_offset.rvs(size=10000));
plt.xlabel(r"Fractional Offset ($\theta$/DLR)");
plt.ylabel("Likelihood PDF");
Our likelihood for fractional offset sharply peaks near 0: if a transient is sitting on top of a galaxy, odds are very good that it’s the host. The likelihood for host galaxy brightness is set here to a supernova-based likelihood, which increases with absolute magnitude.
[12]:
plt.hist(likefunc_absmag.rvs(size=1000))
plt.xlabel(r"Host Absolute Magnitude");
plt.ylabel("Likelihood PDF");
Next, let’s set up the properties of the run:
[13]:
# list of catalogs to search -- options are (in order) glade, decals, panstarrs
# If multiple are listed, the code stops whenever it finds a high-probability host
catalogs = ["panstarrs"]
# The name of the coordinate columns in the dataframe
coord_cols = ("RA", "Dec")
# the column corresponding to transient names
name_col = "IAUID"
# the column corresponding to redshift
redshift_col = 'redshift'
# can be 0, 1, or 2
verbose = 1
# If true, enables multiprocessing with mpire (cannot be run in this notebook)
parallel = False
# If true, saves the results of the run to disk (alternative is to return them directly)
save = False
# If true, shows a progress bar for each association (only available when parallel=True)
progress_bar = False
# If true, concatenates the source properties from the matched catalog to the returned results
cat_cols = True
[14]:
# cosmology can be specified, else flat lambdaCDM is assumed with H0=70, Om0=0.3, Ode0=0.7
transient_catalog_with_hosts = \
associate_sample(
transient_catalog,
priors=priors,
likes=likes,
catalogs=catalogs,
parallel=parallel,
name_col=name_col,
coord_cols=coord_cols,
redshift_col=redshift_col,
verbose=verbose,
save=save,
progress_bar=progress_bar,
cat_cols=cat_cols,
)
Associating SN2022vpn at RA, DEC = 335.593042, 25.839917
Removing panstarrs shreds.
Removed 9 flagged panstarrs sources.
20/20 ━━━━━━━━━━━━━━━━━━━━ 0s 3ms/step
Association successful!
Chosen panstarrs galaxy has catalog ID of 139003355927227995 and RA, DEC = 335.592710, 25.839512
Association of all transients is complete.
Let’s look at the results:
[19]:
transient_catalog_with_hosts[['name', 'host_id', 'host_ra', 'host_dec', 'host_prob', 'smallcone_prob', 'missedcat_prob']]
[19]:
| name | host_id | host_ra | host_dec | host_prob | smallcone_prob | missedcat_prob | |
|---|---|---|---|---|---|---|---|
| 0 | SN2022vpn | 139003355927228000 | 335.59271 | 25.839512 | 0.999914 | 0.0 | 0.000047 |
Prost calculates the posterior probability (host_prob) that the matched host at (host_ra, host_dec) is the true host. smallcone_prob denotes the probability that your cone search is too small to catch the true host’s center (based on the assumed transient redshift), and missedcat_prob denotes the probability that your true galaxy is within the cone region but not within your catalog (e.g., beyond the limiting magnitude, or outside the footprint).