Issue
I have a dataframe like below where I have 2 million rows. The sample data can be found here.
The list of matches in every row can be any number between 1 to 761. I want to count the occurrences of every number between 1 to 761 in the matches column altogether. For example, the result of the above data will be:
If a particular id is not found, then the count will be 0 in the output. I tried using for loop approach but it is quite slow.
def readData():
df = pd.read_excel(file_path)
pattern_match_count = [0] * 761
for index, row in df.iterrows():
matches = row["matches"]
for pattern_id in range(1, 762):
if(pattern_id in matches):
pattern_match_count[pattern_id - 1] = pattern_match_count[pattern_id - 1] + 1
Is there any better approach with pandas to make the implementation faster?
Solution
You can use the .explode()
method to "explode" the lists into new rows.
def readData():
df = pd.read_excel(file_path)
return df.loc[:, "count"].explode().value_counts()
Answered By - Benjamin Rio
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.