Tuesday, October 4, 2022

[FIXED] How to count the instances of unique values in Python Dataframe

October 04, 2022 pandas, python No comments

Issue

I have a dataframe like below where I have 2 million rows. The sample data can be found here.

The list of matches in every row can be any number between 1 to 761. I want to count the occurrences of every number between 1 to 761 in the matches column altogether. For example, the result of the above data will be:

If a particular id is not found, then the count will be 0 in the output. I tried using for loop approach but it is quite slow.

def readData():
    df = pd.read_excel(file_path)

    pattern_match_count = [0] * 761
    for index, row in df.iterrows():
        matches = row["matches"]

        for pattern_id in range(1, 762):
            if(pattern_id in matches):
                pattern_match_count[pattern_id - 1] = pattern_match_count[pattern_id - 1] + 1

Is there any better approach with pandas to make the implementation faster?

Solution

You can use the .explode() method to "explode" the lists into new rows.

def readData():
    df = pd.read_excel(file_path)
    return df.loc[:, "count"].explode().value_counts()

Answered By - Benjamin Rio

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, October 4, 2022

[FIXED] How to count the instances of unique values in Python Dataframe

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels