Tuesday, January 23, 2024

[FIXED] How to cross match with python 2 dataframes by (Cartesian) coordinates?

January 23, 2024 astropy, cross-match, database, pandas, python No comments

Issue

I have 2 astronomical catalogues, containing galaxies with their respective sky coordinates (ra, dec). I handle the catalogues as data frames. The catalogs are from different observational surveys and there are some galaxies that appear in both catalogs. I want to cross match these galaxies and put them in a new catalog. How can I do this is with python? I taught there should be some easy way with numpy, pandas, astropy or another package, but I couldn't find a solution? Thx

Solution

After a lot of research the easiest way I have found is by using a package called astroml, here a tutorial. Notebooks I have used it in are called cross_math_data_and_colour_cuts_.ipynb and PS_data_cleaning_and_processing.ipynb.

from astroML.crossmatch import crossmatch_angular
# if you are using google colab use first the line "!pip install astroml"

df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')

# crossmatch catalogs
max_radius = 1. / 3600  # 1 arcsec
# note, that for the below to work the first 2 columns of the catalogs should be ra, dec
# also, df_1 should be the longer of the 2 catalogs, else there will be index errors
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# THE DESIRED SOLUTION IS THEN:
df_crossed = df_1[match]


# ALTERNATIVELY:
# ind contains the indices of the cross-matched galaxies in respect to the second catalog,
# when there is no match it the kind value is the length of the first catalog
# so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in mind]
# that way whenever you have a match 'new_var' will contain the correct value from 'old_var'
# and whenever you have a mismatch it will contain -999 as a flag

If one is in the convenient position of having in both dataframes not only coordinates, but matching IDs of the sources, then one can easily crossmatch with the pandas .merge() function. Let's say we have in df_1 the columns 'ID', 'ra', 'dec', 'object_class' and in df_2 we have 'ID', 'ra', 'dec', 'r_mag', then we can crossmatch with

df_crossed = pd.merge(df_1, df_2, on='ID')

By default this will do an inner cross-match (see for more details here). The resulting df_crossed will have the columns 'ID', 'ra', 'dec', 'object_class', 'r_mag'.

You can also easily crossmatch on multiple columns, e.g. you can crossmatch on 'ID', 'ra', 'dec', by writing:

df_crossed = pd.merge(df_1, df_2, on=['ID', 'ra', 'dec'])

Answered By - NeStack

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, January 23, 2024

[FIXED] How to cross match with python 2 dataframes by (Cartesian) coordinates?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels