Issue
Given a Pandas DF column that looks like this:
...how can I turn it into this:
XOM
ZM
AAPL
SOFI
NKLA
TIGR
Although these strings appear to be 4 characters in length maximum, I can't rely on that, I want to be able to have a string like ABCDEFGHIJABCDEFGHIJ
and still be able to turn it into ABCDEFGHIJ
in one column calculation. Preferably WITHOUT for
looping/iterating through the rows.
Solution
You can use regex
pattern like r'\b(\w+)\1\b'
with str.extract
like below:
df = pd.DataFrame({'Symbol':['ZOMZOM', 'ZMZM', 'SOFISOFI',
'ABCDEFGHIJABCDEFGHIJ', 'NOTDUPLICATED']})
print(df['Symbol'].str.extract(r'\b(\w+)\1\b'))
Output:
0
0 ZOM
1 ZM
2 SOFI
3 ABCDEFGHIJ
4 NaN # <- from `NOTDUPLICATED`
Explanation:
\b
is a word boundary(w+)
capture a word\1
references to captured(w+)
of the first group
Answered By - I'mahdi
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.