Issue
I have two kinds of objects - F and P - with tags on them and I want to calculate the distances between each tag that belongs to different category and then construct a dataframe with one row per pair of tags from different categories and their distance. The code below seems to do what I want:
import itertools
import operator
import numpy as np
import pandas as pd
from scipy.spatial.distance import cdist
i = np.sqrt(2)
j = 2 * i
# dicts mapping category and tag to x, y coordinates
timeframe_f = {('F1', 'tag1f1'): (0, 0), ('F2', 'tag1f2'): (-i, -i)}
timeframe_p = {('B1', 'tag1b1'): (i, i), ('B2', 'tag1b2'): (j, j),
('B2', 'tag2b2'): (2 * j, 2 * j)}
# calculate the distances
distances = cdist(np.array(list(timeframe_f.values())),
np.array(list(timeframe_p.values())), 'sqeuclidean')
print('distances:\n', distances, '\n')
# here is the matrix with the MultiIndex
distances_matrix = pd.DataFrame(data=distances,
index=pd.MultiIndex.from_tuples(
timeframe_f.keys(),
names=['F', 'Ftags']),
columns=pd.MultiIndex.from_tuples(
timeframe_p.keys(),
names=['P', 'Ptags']), )
print('distances_matrix:\n', distances_matrix, '\n')
# hacky construction of the data frame
index = list(map(lambda x: operator.add(*x), (
itertools.product(timeframe_f.keys(), timeframe_p.keys()))))
# print(index)
multi_index = pd.MultiIndex.from_tuples(index)
distances_df = pd.DataFrame(data=distances.ravel(),
index=multi_index, ).reset_index()
print('distances_df:\n', distances_df)
It prints:
distances:
[[ 4. 16. 64.]
[ 16. 36. 100.]]
distances_matrix:
P B1 B2
Ptags tag1b1 tag1b2 tag2b2
F Ftags
F1 tag1f1 4.0 16.0 64.0
F2 tag1f2 16.0 36.0 100.0
distances_df:
level_0 level_1 level_2 level_3 0
0 F1 tag1f1 B1 tag1b1 4.0
1 F1 tag1f1 B2 tag1b2 16.0
2 F1 tag1f1 B2 tag2b2 64.0
3 F2 tag1f2 B1 tag1b1 16.0
4 F2 tag1f2 B2 tag1b2 36.0
5 F2 tag1f2 B2 tag2b2 100.0
but I would like to find a way to do this directly using the distances_matrix
. I had a look at various other questions as:
- How to flatten a hierarchical index in columns: but this manipulates the column names as strings while I want to construct the index using the column product
- Pandas Multiindex Groupby on Columns: here we don't have a multiindex in columns, although having a way to use this would be great as I eventually want to group by category
Solution
Is this what you need ?
distances_matrix.reset_index().melt(id_vars=['F','Ftags'])
Out[434]:
F Ftags P Ptags value
0 F1 tag1f1 B1 tag1b1 4.0
1 F2 tag1f2 B1 tag1b1 16.0
2 F1 tag1f1 B2 tag1b2 16.0
3 F2 tag1f2 B2 tag1b2 36.0
4 F1 tag1f1 B2 tag2b2 64.0
5 F2 tag1f2 B2 tag2b2 100.0
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.