Sunday, May 15, 2022

[FIXED] Python Pandas SUMIF excel equivalent

May 15, 2022 excel, pandas, python-3.x No comments

Issue

I can't figure out how to achieve a certain task in my python script.

I have a dataframe that contains media coverage for a specific topic. One of my columns names the author of the specific article for example...

I am trying to create a pivot table to show the total count for each journalist in the column like so

datajournalist = company1_topline.pivot_table(index='AuthorUsername', values='ContentID', aggfunc= np.count_nonzero )

which would give me something like

AuthorUsername                                                 count_nonzero
Aaron Mehta                                                      1              
Aamer Madhani                                                    1               
Aamer Madhani ; Ben Fox                                          1

What I am looking for is a way to have the pivot table also count the names that appear in multiple value cells, to get a true count of each author. So for example the row that has "Aamer Madhani ; Ben Fox" would also count towards the count for just "Aamer Madhani" so the row for just "Aamer Madhani" would have a count of 2 instead of 1 etc.... Is there a way to do this? In excel this can be achieved with a SUMIF, but I do not know how to do this with Python/Pandas.

Desired output

AuthorUsername                                                 count_nonzero
Aaron Mehta                                                      1              
Aamer Madhani                                                    2               
Aamer Madhani ; Ben Fox                                          1

If anyone can point me in the right direction it would be much appreciated.

Solution

With your DataFrame having the AuthorUsername column like this:

            AuthorUsername
0              Aaron Mehta
1            Aamer Madhani
2  Aamer Madhani ; Ben Fox

You can do:

import collections

# Remove leading and trailing spaces (if any).
df['AuthorUsername'] = df['AuthorUsername'].str.strip()

# Get unique authors and their counts.
authors_counts = collections.Counter(df['AuthorUsername'].str.split('\s*;\s*').sum())

# Add to new column.
real_counts = collections.defaultdict(lambda: 1, authors_counts)
df['count_nonzero'] = [real_counts[a] for a in df['AuthorUsername']]

print(df)

Result:

            AuthorUsername  count_nonzero
0              Aaron Mehta              1
1            Aamer Madhani              2
2  Aamer Madhani ; Ben Fox              1

Edit after comments - sum more metrics:

After the comments, here is a more versatile version to also sum the Metrics column, and potentially others.

Input dataframe:

            AuthorUsername  Metrics
0              Aaron Mehta      1.3
1            Aamer Madhani      2.0
2  Aamer Madhani ; Ben Fox      0.5

Code:

df['AuthorUsername'] = df['AuthorUsername'].str.strip()
df['single_authors'] = df['AuthorUsername'].str.split('\s*;\s*')

df['count_nonzero'] = 1
single_metrics = df.explode('single_authors').groupby('single_authors').sum()
multiple_metrics = df[df['single_authors'].map(len) > 1].groupby('AuthorUsername').sum()

all_metrics = pd.concat([single_metrics, multiple_metrics]).rename_axis('AuthorUsername').reset_index()

df = df.drop(columns=['Metrics', 'count_nonzero', 'single_authors']).merge(all_metrics, how='left', on='AuthorUsername')

print(df)

Result:

            AuthorUsername  Metrics  count_nonzero
0              Aaron Mehta      1.3              1
1            Aamer Madhani      2.5              2
2  Aamer Madhani ; Ben Fox      0.5              1

Answered By - user2246849

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Sunday, May 15, 2022

[FIXED] Python Pandas SUMIF excel equivalent

Issue

Solution

Edit after comments - sum more metrics:

0 comments:

Post a Comment

Popular Posts

Labels