Issue
I am cleaning a data set and I want to replace the outlier -9999.9 with the median value of that specific column. Each column represents a month and this is what I have written to address the outlier. When I am done replacing the outlier with the median I concatenate the reformatted columns and the columns I left untouched. See code below:
**import pandas as pd
import numpy as np
#Abottsford British Columbia
abottsfordbc = pd.read_csv("/Users/name/Desktop/Python_Scripts/wind_classifier/data_sets/canadian_windspeeds/abottsford_bc.csv", engine = 'python', sep = ',')
df_abottsfordbc = pd.DataFrame(abottsfordbc)
dataframe_labels = df_abottsfordbc[["Jan","Mar","Apr","May","Jun","Jul","Aug","Sep","Nov","Dec","Annual","Winter","Spring","Summer","Autumn"]]
for i in df_abottsfordbc["Feb"]:
if i == -9999.9:
column_median = df_abottsfordbc["Feb"].median()
outlier_convert = df_abottsfordbc["Feb"].replace(to_replace = [-9999.9], value = [0])
zero_to_medianFeb = outlier_convert.replace(to_replace = [0], value = [column_median])
for i in dataframe_labels["Oct"]:
if i == -9999.9:
column_median = df_abottsfordbc["Oct"].median()
outlier_convert = df_abottsfordbc["Oct"].replace(to_replace = [-9999.9], value = [0])
zero_to_medianOct = outlier_convert.replace(to_replace = [0], value = [column_median])
abottsford_bc_concat = pd.concat([dataframe_labels, zero_to_medianFeb, zero_to_medianOct], axis = 1)**
I'm wondering if anyone could help me with this issue I'm facing. I recently downloaded data from a a Windows 10 computer to a Mac running macOS Catalina and I'm not too sure why this runs fine on Windows 10 but not MacOS, I'm using Spyder version 4.1.4 and Python 3.8 on my Mac. I'm not sure why there is a difference in my Spyder IDE being able to interpret the data and script on Windows10 but not macOS Catalina. I checked the .csv file I am reading in and it is completely fine in Microsoft Excel. All the column names are where they are supposed to be. However, when I print the dataframe I get this:
**
print(abottsfordbc)
Year
1953 16.4 9.5 11.9 10.0 10.0 8.1 8.9 8.6 9.2 8.6 12.7 11.9 10.5 12.9 10.6 8.6 10.2
1954 17.9 16.5 12.6 13.5 10.8 10.5 8.9 7.9 7.9 11.6 11.8 13.1 11.9 15.4 12.3 9.1 10.4
1955 8.6 10.2 11.5 11.2 8.8 7.2 7.1 6.6 5.9 8.7 14.3 11.1 9.3 10.7 10.5 7.0 9.6
1956 10.5 10.0 16.1 13.6 12.6 13.4 10.8 9.9 11.4 14.0 11.1 18.4 12.7 10.5 14.1 11.4 12.2
1957 17.9 18.4 14.9 13.0 10.7 12.4 12.1 9.4 9.5 14.8 11.1 18.4 13.6 18.3 12.9 11.3 11.8
...
2010 10.1 9.0 10.9 13.2 10.3 9.5 9.8 8.5 8.5 7.9 12.3 10.9 10.1 9.4 11.4 9.3 9.6
2011 10.2 14.5 13.2 11.7 9.0 10.4 9.3 7.8 8.0 7.4 10.5 7.7 10.0 11.9 11.3 9.2 8.6
2012 12.9 11.1 13.2 9.8 10.4 9.6 9.2 7.8 6.6 10.6 9.7 10.2 10.1 10.6 11.2 8.8 9.0
2013 7.5 10.4 10.6 11.7 9.0 9.2 10.8 8.4 9.1 7.9 9.2 11.4 9.6 9.3 10.5 9.5 8.7
2014 9.9 17.5 11.5 11.4 9.6 10.3 10.1 8.6 8.7 10.0 13.4 11.4 11.0 12.9 10.8 9.7 10.7
[62 rows x 1 columns]
**
The error code I keep getting can be found below, which makes sense because when I print the dataframe out I can see that "Feb", isn't a column. Does anyone know why my read_csv() is not reading my .csv file properly? Instead of it only interpreting the year as the final column instead of the first and leaving the rest of columns blank? When I open the .csv file in Microsoft Excel it is formatted correctly and saved as a .csv UTF-8 file. Any help would be much appreciated.
runfile('/Users/name/Desktop/Python_Scripts/wind_classifier/cleanwind.py', wdir='/Users/name/Desktop/Python_Scripts/wind_classifier')
Traceback (most recent call last):
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Feb'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/bryanekeh/Desktop/Python_Scripts/wind_classifier/cleanwind.py", line 11, in <module>
for i in abottsfordbc["Feb"]:
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Feb'
Solution
How do you know the file isn't being read in properly? Are you printing it before or after the column operations?
And for your code,
You don't need to cast
pd.read_csv()
topd.DataFrame
. It is already aDataFrame
object.Do not iterate over the column values. In pandas there is almost always a better way. In this case, try
df_abottsfordbc["Feb"] = df_abottsfordbc["Feb"].replace(-9999, df_abottsfordbc["Feb"].median()
and a similar process for df_abottsfordbc["Oct"]
. Additionally, you won't have to perform a concat
.
Hopefully this helps.
Answered By - Julian L
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.