Issue
I am currently working on showing some visuals about how my NER model has performed. The data I currently have looks like this:
counter_list = [
('Name', {'p':0.56,'r':0.56,'f':0.56}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56}),
('Name', {'p':0.14,'r':0.14,'f':0.14}),
('Designation', {'p':0.10,'r':0.20,'f':0.14}),
('Location', {'p':0.56,'r':0.56,'f':0.56})
]
I would like to eliminate the duplicates and add their respective values to only one of each kind. So the output to look like this:
[
('Name', {'p':0.7,'r':0.7,'f':0.7}),
('Designation', {'p':0.2,'r':0.4,'f':0.28}),
('Location', {'p':1.12,'r':1.12,'f':1.12})
]
I have tried to use the reduce function but it gives me only the output for 'Name' entry only.
result = functools.reduce(lambda x, y: (x[0], Counter(x[1])+Counter(y[1])) if x[0]==y[0] else (x[0],x[1]), counter_list)
What would be the right approach? I am trying to create some visuals with the final results, to determine which item has the higher 'f','p' or 'r' component.
Solution
What not use pandas and its ~.groupby
method?
>>> import pandas as pd
>>> keys, data = zip(*counter_list)
>>> df = pd.DataFrame(data=data, index=keys).groupby(level=0).sum()
>>> df
p r f
Designation 0.20 0.40 0.28
Location 1.12 1.12 1.12
Name 0.70 0.70 0.70
and then do
>>> list(df.T.to_dict().items())
[
('Designation', {'p': 0.2, 'r': 0.4, 'f': 0.28}),
('Location', {'p': 1.12, 'r': 1.12, 'f': 1.12}),
('Name', {'p': 0.7, 'r': 0.7, 'f': 0.7})
]
Answered By - keepAlive
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.