Issue
For example, let's say that I have the dataframe:
NAME = ['BOB', 'BOB', 'BOB', 'SUE', 'SUE', 'MARY', 'JOHN', 'JOHN', 'MARK', 'MARK', 'MARK', 'MARK']
STATE = ['CA','CA','CA','DC','DC','PA','GA','GA','NY','NY','NY','NY']
MAJOR = ['MARKETING','BUSINESS ADM',np.nan,'ECONOMICS','MATH','PSYCHOLOGY','HISTORY','BUSINESS ADM','MATH', 'MEDICAL SCIENCES',np.nan,np.nan]
SCHOOL = ['UCLA','UCSB','CAL STATE','HARVARD','WISCONSIN','YALE','CHICAGO','MIT','UCSD','UCLA','CAL STATE','COMMUNITY']
data = {'NAME':NAME, 'STATE':STATE,'MAJOR':MAJOR, 'SCHOOL':SCHOOL}
df = pd.DataFrame(data)
I am to concatenate rows with multiple unique values for the same name.
I tried:
gr_columns = [x for x in df1.columns if x not in ['MAJOR','SCHOOL']]
df1 = df1.groupby(gr_columns).agg(lambda col: '|'.join(col))
and expected
I am trying to concatenate rows in columns where the NAME field is the same. Conveniently, the STATE field is static for each NAME. I would like the output to look like:
NAME | STATE | MAJOR | SCHOOL |
---|---|---|---|
BOB | CA | MARKETING,BUSINESS ADM | UCLA,UCSB,CAL STATE |
SUE | DC | ECONOMICS,MATH | HARVARD,WISCONSIN |
MARY | PA | PSYCHOLOGY | YALE |
JOHN | GA | HISTORY,BUSINESS ADM | CHICAGO,MIT |
MARK | NY | MATH,MEDICAL SCIENCES | UCSD,UCLA,CAL STATE,COMMUNITY |
but instead, I get a single column containing the concatenated schools.
Solution
It is because your np.nan
cannot be converted to str, so it is dropped automatically by pandas. You need to convert its type to str first:
df.groupby(['NAME', 'STATE']).agg(lambda x: ','.join(x.astype(str)))
To drop na and keep NAME and STATE as columns:
df.groupby(['NAME', 'STATE']).agg(lambda x: ','.join(x.dropna())).reset_index()
Answered By - Z Li
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.