Issue
Consider an array with a length of 200 thousands, where each element represents a position in the form of (x, y) coordinates. The task is to identify the position that occurs most frequently in the array and determine its count.
For example, given the array A = [(1, 2), (2, 3), (1, 2), (4, 5)]
, the position (1, 2) occurs most frequently, with a count of 2.
Although attempting to utilize numpy.unique()
for this task, it is still slow when handling such a large dataset. Are there alternative, faster methods available for accomplishing this task?
print(f"Array shape: {sub_res.shape}")
t6 = time.perf_counter()
unique_values, counts = np.unique(sub_res, axis=0, return_counts=True)
sorted_indces = np.argsort(-counts)
max_counts = np.max(counts[sorted_indces])
t7 = time.perf_counter()
print("'np unique' time : {}".format(round(t7-t6, 2)))
sub_res_list_tuple = list(tuple(map(tuple, sub_res)))
counts_res = Counter(sub_res_list_tuple)
most_common_temp = counts_res.most_common(1)[0]
unique_values_2, counts_2 = most_common_temp[0], most_common_temp[1]
t8 = time.perf_counter()
print("'Counter' time : {}".format(round(t8 - t7, 2)))
Print info:
Array shape: (218820, 2)
'np unique' time : 0.12
'Counter' time : 0.19
Solution
For large datasets, you can use the Counter class from the collections module in Python to efficiently count occurrences.
from collections import Counter
A = [(1, 2), (2, 3), (1, 2), (4, 5)]
# Use Counter to count occurrences
counts = Counter(A)
# Find the most common position and its count
most_common_position, count = counts.most_common(1)[0]
print("Most common position:", most_common_position)
print("Count:", count)
It's faster because Counter uses C structure underneath. Many python functions are written in C so it's faster to use them than manually writing the code in Python. Source: https://www.reddit.com/r/leetcode/comments/wy506t/why_is_python_collectionscounter_so_much_faster/?rdt=35953
Answered By - Oyinlade Demola
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.