Issue
I need to revice values in a Series(column) of Pandas according to another function.
During iterating, after I get the result, I don't want to lookup the series twice, becasue I guess that it wastes time and is not required.
For example:
import pandas as pd
s = pd.Series(['A', 'B', 'C'])
for index, value in s.items():
s[index] = func_hard_to_vectorized(value) # lookup again!!!
In words of C++, "How to get a reference to that cell?"
What I want looks like:
import pandas as pd
s = pd.Series(['A', 'B', 'C'])
for index, value in s.items():
value = func_hard_to_vectorized(value) # change in place
assert_equal(s[index], value)
A same problem about DataFrame exists also, perhaps more heavily influence the performance.
How to get a reference to a row of Pandas.DataFrame?
Solution
You can try to insert your data only once, not at each step:
s[:] = [func_hard_to_vectorized(v) for v in s]
Or:
s[:] = s.apply(func_hard_to_vectorized)
Thus insertion will only occur once with all items at once.
If you don't care having a new Series (i.e. if there is not another name pointing to the Series):
s = s.apply(func_hard_to_vectorized)
can also be used.
example using both index/value:
s = pd.Series(['A', 'B', 'C'])
def f(idx, v):
return f'{v}_{idx}'
s[:] = [f(idx, v) for idx, v in s.items()]
Modified s
:
0 A_0
1 B_1
2 C_2
dtype: object
Answered By - mozway
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.