Wednesday, January 24, 2024

[FIXED] ValueError: No valid number words found! Please enter a valid number word (eg. two million twenty three thousand and forty nine)

January 24, 2024 pandas, python No comments

Issue

How can I divide the column and the data in it. I have attached for reference.

Example: Column cement_water with values such as three hundred and two; 203.0 has to be split into two columns named cement and water with values 302.0 and 203.0 respectively. The column values have different delimeters (; , _) which has to be handled and also the values have string data which has to be converted into numeric values using word to number.

Previous/Default such columns:

cement_water                  coarse_fine_aggregate
three hundred and two;203.0     974.0,817.0
one hundred and fifty-one;184.4     992.0;815.9
three hundred and sixty-two_164.9   944.7;755.8

Has to be converted into the following:

cement  water   coarse_aggregate    fine_aggregate
302.0   203.0     974.0                    817.0
151.0   184.4      992.0               815.9
362.0   164.9      944.7               755.8

import pandas as pd
from word2number import w2n

df = pd.read_csv('test.csv - Sheet1.csv')
def convert_words_to_numbers(text):
    words = text.replace('_', ' ').replace(';', ' ').replace(',', ' ').split()
    converted_words = [str(w2n.word_to_num(word)) if word.isalpha() else word for word in words]
    return ' '.join(converted_words)
df['cement_water'] = df['cement_water'].apply(lambda x: convert_words_to_numbers(x))
df[['cement', 'water']] = df['cement_water'].str.split(' ', expand=True)
df[['coarse_aggregate', 'fine_aggregate']] = df['coarse_fine_aggregate'].str.split(';', expand=True)
df = df.drop(['cement_water', 'coarse_fine_aggregate'], axis=1)
df = df.apply(pd.to_numeric, errors='ignore')
print(df)

Error- No valid number words found! Please enter a valid number word (eg. two million twenty three thousand and forty nine)

Solution

This works for me using this variant:

from word2number import w2n

out = (pd.concat([df['cement_water'].str.extract(r'(?P<cement>.*)[;,_](?P<water>\d+.?\d*)$'),
                  df['coarse_fine_aggregate'].str.split('[;,]', expand=True)
                   .rename(columns={0: 'coarse_aggregate', 1: 'fine_aggregate'})], axis=1)
         .assign(cement=lambda d: d['cement'].map(w2n.word_to_num))
         .astype(float)
      )

Output:

   cement  water  coarse_aggregate  fine_aggregate
0   302.0  203.0             974.0           817.0
1   151.0  184.4             992.0           815.9
2   362.0  164.9             944.7           755.8

more generic code with additional example

Here you have a mix of strings and numbers in cement_water, let's first identify the numbers and only parse the strings:

tmp = df['cement_water'].str.extract(r'(?P<cement>.*)[;,_](?P<water>\d+.?\d*)$')
s = pd.to_numeric(tmp['cement'], errors='coerce')
m = s.isna() & df['cement_water'].notna()
tmp.loc[m, 'cement'] = df.loc[m, 'cement_water'].map(w2n.word_to_num)


out = (pd.concat([tmp,
                  df['coarse_fine_aggregate'].str.split('[;,_]', expand=True)
                   .rename(columns={0: 'coarse_aggregate', 1: 'fine_aggregate'})], axis=1)
         .astype(float)
      )

Output:

     cement  water  coarse_aggregate  fine_aggregate
0     200.0  159.2            1043.6           771.9
1     200.0  192.0             965.4           806.2
2     446.0  162.0             967.0           712.0
3     380.0  158.0             903.0           768.0
4     141.0  173.5             882.6           785.3
..      ...    ...               ...             ...
222   200.0  192.0             965.4           806.2
223   270.0  160.6             973.9           875.6
224   150.0  185.7            1040.6           734.3
225   330.0  174.9             944.7           755.8
226   288.0  177.4             907.9           829.5

[227 rows x 4 columns]

Answered By - mozway

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, January 24, 2024

[FIXED] ValueError: No valid number words found! Please enter a valid number word (eg. two million twenty three thousand and forty nine)

Issue

Solution

more generic code with additional example

0 comments:

Post a Comment

Popular Posts

Labels