Issue
I have this setup of a pandas dataframe (or a csv file):
df = {
'colA': ['aa', 'bb', 'cc', 'dd', 'ee'],
'colB': ['aa', 'bb', 'dd', 'qq', 'ee'],
'colC': ['aa', 'bb', 'cc', 'ee', 'dd'],
'colD': ['aa', 'bb', 'ee', 'cc', 'dd']
}
The goal here is to get a list/column with the set of values that appear in all the columns or in other words the entries that are common to all the columns.
Required output:
col
aa
bb
dd
ee
or a output with the common value's list:
common_list = ['aa', 'bb', 'dd', 'ee']
I have a silly solution (but it doesn't seem to be correct as I am not getting what I want when implemented to my dataframe)
import pandas as pd
df = pd.read_csv('Bus Names Concat test.csv') #i/p csv file (pandas df converted into csv)
df = df.stack().value_counts()
core_list = df[df>2].index.tolist() #defining the common list as core list
print(len(core_list))
df_core = pd.DataFrame(core_list)
print(df_core)
Any help/sugggestion/feedback to get the required o/p will be appreciated.
Solution
In your case
s = df.melt().groupby('value')['variable'].nunique()
outlist = s[s==4].index.tolist()
Out[307]: ['aa', 'bb', 'dd', 'ee']
Answered By - BENY
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.