Issue
I've got 2 data frames, one with 1 column currentWorkspaceGuid (workspacesDF) and another with 4 columns currentWorkspaceGuid, modelGuid, memoryUsage, lastModified (extrasDF) and I'm trying to get isin to result in a dataFrame that shows the values from the second dataframe only if the workspaceGuid exists in the workspacesDF . It's giving me an empty dataframe when I use the following code:
import pandas as pd
extrasDF = pd.read_csv("~/downloads/Extras.csv")
workspacesDF = pd.read_csv("~/downloads/workspaces.csv")
not_in_workspaces = extrasDF[(extrasDF.currentWorkspaceGuid.isin(workspacesDF))]
print(not_in_workspaces)
I tried adding in print statements to verify the column matches when it should and doesn't when it shouldn't but it's still returning nothing.
Once I can get this to work correctly, my end goal is to return a list of the items that don't exist in the workspacesDF which I think I can do just by adding ~ to the front of the isin statement which is why I'm not doing a join or merge.
EDIT:
Adding example data from both files for clarification:
from workspaces.csv:
currentWorkspaceGuid
8a81b09c56cdf89c0157345759d75644
8a81948240d60b1901417a266a536462
402882f738cf7433013b612dc5f60bbd
8a8194884c860a53014ca1f6596d54e9
8a8194884a34d3ff014a4f31bea3705a
from Extras.csv:
currentWorkspaceGuid,modelGuid,memoryUsage,lastModified
8a81b09c56cdf89c0157345759d75644,635D5FAAC46D4856AAFD21AC6386DDCA,1191785,"2018-08-08 17:57:45"
8a81948240d60b1901417a266a536462,4076B1A8B1E34D549FFFE9F5FFE4538A,5400000,"2016-09-13 18:32:50"
402882f738cf7433013b612dc5f60bbd,4CA3CDC12CD349ABA8658365480073CA,550000,"2017-11-23 16:26:10"
8a8194884c860a53014ca1f6596d54e9,15E3E6B6087A4CA6838616A418E9657A,830000,"2018-05-22 17:35:50"
8a8194884a34d3ff014a4f31bea3705a,C47D186A479140BFAB24AF8D24E8B2BA,816686,"2018-07-31 09:39:16"
Solution
I think need compare columns (Series
):
mask = extrasDF['currentWorkspaceGuid'].isin(workspacesDF['currentWorkspaceGuid'])
in_workspaces = extrasDF[mask]
print (in_workspaces)
currentWorkspaceGuid modelGuid \
0 8a81b09c56cdf89c0157345759d75644 635D5FAAC46D4856AAFD21AC6386DDCA
1 8a81948240d60b1901417a266a536462 4076B1A8B1E34D549FFFE9F5FFE4538A
2 402882f738cf7433013b612dc5f60bbd 4CA3CDC12CD349ABA8658365480073CA
3 8a8194884c860a53014ca1f6596d54e9 15E3E6B6087A4CA6838616A418E9657A
4 8a8194884a34d3ff014a4f31bea3705a C47D186A479140BFAB24AF8D24E8B2BA
memoryUsage lastModified
0 1191785 2018-08-08 17:57:45
1 5400000 2016-09-13 18:32:50
2 550000 2017-11-23 16:26:10
3 830000 2018-05-22 17:35:50
4 816686 2018-07-31 09:39:16
For filter non matched values add ~
for invert boolean mask:
not_in_workspaces = extrasDF[~mask]
print (not_in_workspaces)
Empty DataFrame
Columns: [currentWorkspaceGuid, modelGuid, memoryUsage, lastModified]
Index: []
Details:
print (mask)
0 True
1 True
2 True
3 True
4 True
Name: currentWorkspaceGuid, dtype: bool
print (~mask)
0 False
1 False
2 False
3 False
4 False
Name: currentWorkspaceGuid, dtype: bool
Answered By - jezrael
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.