Issue
Here is my sample data:
import pandas as pd
import re
cars = pd.DataFrame({'Engine Information': {0: 'Honda 2.4L 4 cylinder 190 hp 162 ft-lbs',
1: 'Aston Martin 4.7L 8 cylinder 420 hp 346 ft-lbs',
2: 'Dodge 5.7L 8 Cylinder 390hp 407 ft-lbs',
3: 'MINI 1.6L 4 Cylinder 118 hp 114 ft-lbs',
4: 'Ford 5.0L 8 Cylinder 360hp 380 ft-lbs FFV',
5: 'GMC 6.0L 8 Cylinder 352 hp 382 ft-lbs'},
'HP': {0: None, 1: None, 2: None, 3: None, 4: None, 5: None}})
Here is my desired output:
I have created a new column called 'HP' where I want to extract the horsepower figure from the original column ('Engine Information')
Here is the code I have tried to do this:
cars['HP'] = cars['Engine Information'].apply(lambda x: re.match(r'\\d+(?=\\shp|hp)', str(x)))
The idea is I want to regex match the pattern: 'a sequence of numbers that come before either 'hp' or ' hp'. This is because some of the cells have no 'space' in between the number and 'hp' as showed in my example.
I'm sure the regex is correct, because I have successfully done a similar process in R. However, I have tried functions such as str.extract
, re.findall
, re.search
, re.match
. Either returning errors or 'None' values (as shown in the sample). So here I am a bit lost.
Thanks!
Solution
You can use str.extract
:
cars['HP'] = cars['Engine Information'].str.extract(r'(\d+)\s*hp\b', flags=re.I)
Details
(\d+)\s*hp\b
- matches and captures into Group 1 one or more digits, then just matches 0 or more whitespaces (\s*
) andhp
(in a case insensitive way due toflags=re.I
) as a whole word (since\b
marks a word boundary)str.extract
only returns the captured value if there is a capturing group in the pattern, so thehp
and whitespaces are not part of the result.
Python demo results:
>>> cars
Engine Information HP
0 Honda 2.4L 4 cylinder 190 hp 162 ft-lbs 190
1 Aston Martin 4.7L 8 cylinder 420 hp 346 ft-lbs 420
2 Dodge 5.7L 8 Cylinder 390hp 407 ft-lbs 390
3 MINI 1.6L 4 Cylinder 118 hp 114 ft-lbs 118
4 Ford 5.0L 8 Cylinder 360hp 380 ft-lbs FFV 360
5 GMC 6.0L 8 Cylinder 352 hp 382 ft-lbs 352
Answered By - Wiktor Stribiżew
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.