Issue
I have data in a csv file that contains car loan data.
The columns are the car make and model, the total cost, and the APR.
So I am using pyPlot and Python to make Pie charts for each make...the pie chart will contain the models for each car manufacturer where the size of the pie section will be determined by how high the APR is and how much the total cost is.
I am trying to use this library: matplotlib.pyplot
But it doesn't like the dollar signs or percentages. Is there a way to fix this?
SyntaxError: invalid syntax
Toyota, Rav4,"$25,814.73",$315.00,3.24%
^
Here is the script:
import pandas as pd
import matplotlib.pyplot as plt
# Read data
file_path = 'input_data.csv'
data = pd.read_csv(file_path, encoding='unicode_escape')
columns = data.columns
for column in columns:
if pd.api.types.is_numeric_dtype(data[column]):
# Ignore symbols
data[column] = pd.to_numeric(data[column].replace('[\%,]', '', regex=True), errors='coerce')
# Plot
colors = plt.cm.Set3.colors
plt.pie(data[column], labels=data.index, autopct='%1.1f%%', colors=colors, startangle=140)
plt.title(f'Pie Chart for {column}')
plt.show()
else:
print(f"Skipping non-numeric column: {column}")
full error:
SyntaxWarning: invalid escape sequence '\%'
data[column] = pd.to_numeric(data[column].replace('[\%,]', '',
regex=True), errors='coerce')
Solution
You should remove %
and $
from your data.
For instance, let's suppose we have the following dataframe:
Car Model Price Rebate %Rebate
0 Toyota Rav4 $25,814.73 $315.00 3.24%
A possible solution:
A possible solution is to delete first the symbols %
and $
import pandas as pd
data = {
'Car': ["Toyota"],
'Model': ["Rav4"],
'Price': ["$25,814.73"],
'Rebate': ["$315.00"],
'%Rebate': ["3.24%"]
}
df = pd.DataFrame(data)
# Remove dollar signs
df['Price'] = pd.to_numeric(df['Price'].replace('[$,]', '', regex=True), errors='coerce')
df['Rebate'] = pd.to_numeric(df['Price'].replace('[$,]', '', regex=True), errors='coerce')
# Remove percentages
df['%Rebate'] = pd.to_numeric(df['%Rebate'].replace('[%,]', '', regex=True), errors='coerce')
print(df)
Result:
Car Model Price Rebate %Rebate
0 Toyota Rav4 25814.73 25814.73 3.24
Suggestion:
You could write a function to simplify the task, like this:
def remove_symb(column, symbol):
return pd.to_numeric(column.replace(f'[{symbol},]', '', regex=True), errors='coerce')
df['Price'] = remove_symb(df['Price'], symbol='$')
Answered By - Laurent B.
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.