Issue
I have a dataframe in which one columns values are lists of strings. here the structure of the file to read:
[
{
"key1":"value1 ",
"key2":"2",
"key3":["a","b 2 "," exp white space 210"],
},
{
"key1":"value1 ",
"key2":"2",
"key3":[],
},
]
I need to remove all white space for each item if it is more than one white space. expected output:
[
{
"key1":"value1",
"key2":"2",
"key3":["a","b2","exp white space 210"],
},
{
"key1":"value1",
"key2":"2",
"key3":[],
}
]
Note:
I have some value that are empty in some lines e.g "key3":[]
Solution
If I understand correctly some of your dataframe cells have list type
values.
The file_name.json
content is below:
[
{
"key1": "value1 ",
"key2": "2",
"key3": ["a", "b 2 ", " exp white space 210"]
},
{
"key1": "value1 ",
"key2": "2",
"key3": []
}
]
Possible solution in this case is the following:
import pandas as pd
import re
df = pd.read_json("file_name.json")
def cleanup_data(value):
if value and type(value) is list:
return [re.sub(r'\s+', ' ', x.strip()) for x in value]
elif value and type(value) is str:
return re.sub(r'\s+', ' ', value.strip())
else:
return value
# apply cleanup function to all cells in dataframe
df = df.applymap(cleanup_data)
df
Returns
key1 key2 key3
0 value1 2 [a, b 2, exp white space 210]
1 value1 2 []
Answered By - GreyMurav
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.