Sunday, November 14, 2021

[FIXED] Pandas empty dataframe resulting from an isin function that keeps objects with an ID if the ID is present in a dataFrame of just IDs

November 14, 2021 dataframe, pandas, python, python-3.x No comments

Issue

I've got 2 data frames, one with 1 column currentWorkspaceGuid (workspacesDF) and another with 4 columns currentWorkspaceGuid, modelGuid, memoryUsage, lastModified (extrasDF) and I'm trying to get isin to result in a dataFrame that shows the values from the second dataframe only if the workspaceGuid exists in the workspacesDF . It's giving me an empty dataframe when I use the following code:

import pandas as pd

extrasDF = pd.read_csv("~/downloads/Extras.csv")
workspacesDF = pd.read_csv("~/downloads/workspaces.csv")

not_in_workspaces = extrasDF[(extrasDF.currentWorkspaceGuid.isin(workspacesDF))]

print(not_in_workspaces)

I tried adding in print statements to verify the column matches when it should and doesn't when it shouldn't but it's still returning nothing.

Once I can get this to work correctly, my end goal is to return a list of the items that don't exist in the workspacesDF which I think I can do just by adding ~ to the front of the isin statement which is why I'm not doing a join or merge.

EDIT:

Adding example data from both files for clarification:

from workspaces.csv:

currentWorkspaceGuid
8a81b09c56cdf89c0157345759d75644
8a81948240d60b1901417a266a536462
402882f738cf7433013b612dc5f60bbd
8a8194884c860a53014ca1f6596d54e9
8a8194884a34d3ff014a4f31bea3705a

from Extras.csv:

currentWorkspaceGuid,modelGuid,memoryUsage,lastModified
8a81b09c56cdf89c0157345759d75644,635D5FAAC46D4856AAFD21AC6386DDCA,1191785,"2018-08-08 17:57:45"
8a81948240d60b1901417a266a536462,4076B1A8B1E34D549FFFE9F5FFE4538A,5400000,"2016-09-13 18:32:50"
402882f738cf7433013b612dc5f60bbd,4CA3CDC12CD349ABA8658365480073CA,550000,"2017-11-23 16:26:10"
8a8194884c860a53014ca1f6596d54e9,15E3E6B6087A4CA6838616A418E9657A,830000,"2018-05-22 17:35:50"
8a8194884a34d3ff014a4f31bea3705a,C47D186A479140BFAB24AF8D24E8B2BA,816686,"2018-07-31 09:39:16"

Solution

I think need compare columns (Series):

mask = extrasDF['currentWorkspaceGuid'].isin(workspacesDF['currentWorkspaceGuid'])
in_workspaces = extrasDF[mask] 
print (in_workspaces)
               currentWorkspaceGuid                         modelGuid  \
0  8a81b09c56cdf89c0157345759d75644  635D5FAAC46D4856AAFD21AC6386DDCA   
1  8a81948240d60b1901417a266a536462  4076B1A8B1E34D549FFFE9F5FFE4538A   
2  402882f738cf7433013b612dc5f60bbd  4CA3CDC12CD349ABA8658365480073CA   
3  8a8194884c860a53014ca1f6596d54e9  15E3E6B6087A4CA6838616A418E9657A   
4  8a8194884a34d3ff014a4f31bea3705a  C47D186A479140BFAB24AF8D24E8B2BA   

   memoryUsage         lastModified  
0      1191785  2018-08-08 17:57:45  
1      5400000  2016-09-13 18:32:50  
2       550000  2017-11-23 16:26:10  
3       830000  2018-05-22 17:35:50  
4       816686  2018-07-31 09:39:16

For filter non matched values add ~ for invert boolean mask:

not_in_workspaces = extrasDF[~mask] 
print (not_in_workspaces)
Empty DataFrame
Columns: [currentWorkspaceGuid, modelGuid, memoryUsage, lastModified]
Index: []

Details:

print (mask)
0    True
1    True
2    True
3    True
4    True
Name: currentWorkspaceGuid, dtype: bool

print (~mask)
0    False
1    False
2    False
3    False
4    False
Name: currentWorkspaceGuid, dtype: bool

Answered By - jezrael

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, November 14, 2021

[FIXED] Pandas empty dataframe resulting from an isin function that keeps objects with an ID if the ID is present in a dataFrame of just IDs

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels