Issue
I have a sentence like this Give me 4 of ABCD_X and then do something
I need to extract ABCD_X
- any set of characters after Give me 4 of
and before space.
Number can be any size
I am able to do it with this expression (taken from this question):
(?<=^Give me \d of )(.*?)(?=\s)
But the number can be 10 or greater, so
(?<=^Give me \d+ of )(.*?)(?=\s)
returns error in python (pandas column) that positive lookbehind should be fixed width.
Is there a way to avoid positive lookbehind to exract those characters?
Solution
You could try:
^Give me \d+ of (\S+)
See an online demo
^
- Start line anchor.Give me \d+ of
- Literally your searchstring with 1+ digits.(\S+)
- A capture group with 1+ non-whitespace characters.
For example:
import pandas as pd
df = pd.Series(['Give me 4 of ABCD_X and then do something', 'Give me 10 of ABCD_Y and then do something'])
df = df.str.extract(r'^Give me \d+ of (\S+)')
print(df)
Prints:
0
0 ABCD_X
1 ABCD_Y
Note: If you would use a named capture group, the column header will use the name of that group instead of the integer of the group.
Answered By - JvdV
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.