Tuesday, March 1, 2022

[FIXED] Editting values in a dataframe based of the information in another dataframe

March 01, 2022 dataframe, pandas, python, python-2.x No comments

Issue

I have one dataframe called _df1 which looks like this. Please not that this is not the entire dataframe but parts of it.

_df1:

frame   id  x1      y1  x2  y2  
1        1  1363    569 103 241   
2        1  1362    568 103 241  
3        1  1362    568 103 241  
4        1  1362    568 103 241   
964      5  925     932 80  255   
965      5  925     932 79  255   
966      5  925     932 79  255   
967      5  924     932 80  255  
968      5  924     932 79  255   
16       6  631     761 100 251   
17       6  631     761 100 251  
18       6  631     761 100 251   
19       6  631     761 100 251
20       6  631     761 100 251
21       6  631     761 100 251
88       7  623     901 144 123
89       7  623     901 144 123
90       7  623     901 144 123
91       7  623     901 144 123
92       7  623     901 144 123
93       7  623     901 144 123
94       7  623     901 144 123

In the full database, there are 108003 rows and 141 unique IDs in the dataframe. An ID represents a specific object and the ID is repeated as long as that frame has that object. In other words, my data has 141 different objects and 108003 frames. I wrote a code to identify frames that have the same objects but is labelled with a different ID. This is saved in another dataframe called _df2 which looks like this. This is also only part of the dataframe, not the entire thing.

_df2:

indexID  matchID    
   4        5
   6        7
   8        9
   12       13
   18       19
   20       21
       .
       .
       .

The second dataframe shows which indexes has been wrongly classified as a different object. This means that the ID in 'matchID' is actually the same object as 'indexID'. This 'indexID' in _df2 corresponds to 'id' in _df1.

Taking the first line in _df2 as an example, it says that index 4 and 5 is the same. Therefore, I need to change the 'id' values, in _df1, of all the frames with 'id' 5 to 4. This is an example of what the final table should look like since 5 has to be classified as 4 and 7 has to be classified as 6.

Output:

frame   id  x1      y1  x2  y2  
1        1  1363    569 103 241   
2        1  1362    568 103 241  
3        1  1362    568 103 241  
4        1  1362    568 103 241   
964      4  925     932 80  255   
965      4  925     932 79  255   
966      4  925     932 79  255   
967      4  924     932 80  255  
968      4  924     932 79  255   
16       6  631     761 100 251   
17       6  631     761 100 251  
18       6  631     761 100 251   
19       6  631     761 100 251
20       6  631     761 100 251
21       6  631     761 100 251
88       6  623     901 144 123
89       6  623     901 144 123
90       6  623     901 144 123
91       6  623     901 144 123
92       6  623     901 144 123
93       6  623     901 144 123
94       6  623     901 144 123

Solution

Using replace

df1.id=df.id.replace(dict(zip(df2.indexID,df2.matchID)))

Answered By - BENY

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, March 1, 2022

[FIXED] Editting values in a dataframe based of the information in another dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels