Issue
I have df
technologies = {'Writer1': ['Spark', 'PySpark', 'Hadoop', 'Python'],
'Location1': ['LA', 'RIV', 'CHV', 'LA'],
'Area1': ['L', 'R', 'C', 'L'],
'Writer2': ['Spark', 'Dil', 'Chee', 'Python'],
'Location2': ['RIV', 'RIV', 'LA', 'RIV'],
'Area2': ['R', 'R', 'L', 'R'],}
df = pd.DataFrame(technologies)
I want to create a new Location column indexing location 1/2 but only taking locations in Area1/2 "R" and "C"
So I would want
technologies = {'Writer1': ['Spark', 'PySpark', 'Hadoop', 'Python'],
'Location1': ['LA', 'RIV', 'CHV', 'LA'],
'Area1': ['L', 'R', 'C', 'L'],
'Writer2': ['Spark', 'Dil', 'Chee', 'Python'],
'Location2': ['RIV', 'RIV', 'LA', 'RIV'],
'Area2': ['R', 'R', 'L', 'R'],
'Location3': ['RIV', 'RIV', 'CHV', 'RIV']}
Is this possible? I am stuck and can't think of what would work for so many requirements.
Any help appreciated Thank you ===EDIT Sorry I did not include vital detail. I would like the location to index with the Writer1/2. For example if I index PySpark with RIV, I also want Dil to index with RIV. The code should not bypass a Writer if they both are in RIV or CHV.
Solution
For a generic method to use with any number of Location/Area pairs (in order), you can use:
lst = ['R', 'C']
df = pd.DataFrame(technologies)
df2 = (df.filter(like='Location')
.where(df.filter(like='Area').isin(lst).values)
)
df['Location3'] = df2.stack().groupby(level=0).first()
# first() for first value as preference if many
or, using a MultiIndex:
idx = pd.MultiIndex.from_frame(df.columns.str.extract('(\w+)(\d+)'))
df2 = df.set_axis(idx, axis=1)
df['LocationX'] = (df2['Location'].where(df2['Area'].isin(lst)).stack()
.groupby(level=0).first()
)
output:
Writer1 Location1 Area1 Writer2 Location2 Area2 Location3
0 Spark LA L Spark RIV R RIV
1 PySpark RIV R Dil LA L RIV
2 Hadoop CHV C Chee LA L CHV
3 Python LA L Python RIV R RIV
duplicates
if there are multiple possibilities and you want to keep all
as concatenated values:
lst = ['R', 'C']
df = pd.DataFrame(technologies)
df2 = (df.filter(like='Location')
.where(df.filter(like='Area').isin(lst).values)
)
df['LocationX'] = df2.stack().groupby(level=0).agg(','.join)
output:
Writer1 Location1 Area1 Writer2 Location2 Area2 LocationX
0 Spark LA L Spark RIV R RIV
1 PySpark RIV R Dil RIV R RIV,RIV
2 Hadoop CHV C Chee LA L CHV
3 Python LA L Python RIV R RIV
or, as multiple rows:
lst = ['R', 'C']
df = pd.DataFrame(technologies)
df2 = (df.filter(like='Location')
.where(df.filter(like='Area').isin(lst).values)
)
df = df.join(df2.stack().rename('LocationX').droplevel(1))
output:
Writer1 Location1 Area1 Writer2 Location2 Area2 LocationX
0 Spark LA L Spark RIV R RIV
1 PySpark RIV R Dil RIV R RIV
1 PySpark RIV R Dil RIV R RIV
2 Hadoop CHV C Chee LA L CHV
3 Python LA L Python RIV R RIV
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.