Saturday, March 5, 2022

[FIXED] Reading csv into pandas dataframe and pandas and all the columns except for the very first one get deleted

March 05, 2022 csv, dataframe, pandas, python-3.x, spyder No comments

Issue

I am cleaning a data set and I want to replace the outlier -9999.9 with the median value of that specific column. Each column represents a month and this is what I have written to address the outlier. When I am done replacing the outlier with the median I concatenate the reformatted columns and the columns I left untouched. See code below:

**import pandas as pd 
import numpy as np
#Abottsford British Columbia
abottsfordbc = pd.read_csv("/Users/name/Desktop/Python_Scripts/wind_classifier/data_sets/canadian_windspeeds/abottsford_bc.csv", engine = 'python', sep = ',')
df_abottsfordbc = pd.DataFrame(abottsfordbc)
dataframe_labels = df_abottsfordbc[["Jan","Mar","Apr","May","Jun","Jul","Aug","Sep","Nov","Dec","Annual","Winter","Spring","Summer","Autumn"]]
for i in df_abottsfordbc["Feb"]:
    if i == -9999.9:
        column_median = df_abottsfordbc["Feb"].median()
        outlier_convert = df_abottsfordbc["Feb"].replace(to_replace = [-9999.9], value = [0])
        zero_to_medianFeb = outlier_convert.replace(to_replace = [0], value = [column_median])
for i in dataframe_labels["Oct"]:
    if i == -9999.9:
        column_median = df_abottsfordbc["Oct"].median()
        outlier_convert = df_abottsfordbc["Oct"].replace(to_replace = [-9999.9], value = [0])
        zero_to_medianOct = outlier_convert.replace(to_replace = [0], value = [column_median])
abottsford_bc_concat = pd.concat([dataframe_labels, zero_to_medianFeb, zero_to_medianOct], axis = 1)**

I'm wondering if anyone could help me with this issue I'm facing. I recently downloaded data from a a Windows 10 computer to a Mac running macOS Catalina and I'm not too sure why this runs fine on Windows 10 but not MacOS, I'm using Spyder version 4.1.4 and Python 3.8 on my Mac. I'm not sure why there is a difference in my Spyder IDE being able to interpret the data and script on Windows10 but not macOS Catalina. I checked the .csv file I am reading in and it is completely fine in Microsoft Excel. All the column names are where they are supposed to be. However, when I print the dataframe I get this:

print(abottsfordbc)
                                                                                     Year
1953 16.4 9.5  11.9 10.0 10.0 8.1  8.9  8.6 9.2  8.6  12.7 11.9 10.5 12.9 10.6 8.6   10.2
1954 17.9 16.5 12.6 13.5 10.8 10.5 8.9  7.9 7.9  11.6 11.8 13.1 11.9 15.4 12.3 9.1   10.4
1955 8.6  10.2 11.5 11.2 8.8  7.2  7.1  6.6 5.9  8.7  14.3 11.1 9.3  10.7 10.5 7.0    9.6
1956 10.5 10.0 16.1 13.6 12.6 13.4 10.8 9.9 11.4 14.0 11.1 18.4 12.7 10.5 14.1 11.4  12.2
1957 17.9 18.4 14.9 13.0 10.7 12.4 12.1 9.4 9.5  14.8 11.1 18.4 13.6 18.3 12.9 11.3  11.8
                                                                                  ...
2010 10.1 9.0  10.9 13.2 10.3 9.5  9.8  8.5 8.5  7.9  12.3 10.9 10.1 9.4  11.4 9.3    9.6
2011 10.2 14.5 13.2 11.7 9.0  10.4 9.3  7.8 8.0  7.4  10.5 7.7  10.0 11.9 11.3 9.2    8.6
2012 12.9 11.1 13.2 9.8  10.4 9.6  9.2  7.8 6.6  10.6 9.7  10.2 10.1 10.6 11.2 8.8    9.0
2013 7.5  10.4 10.6 11.7 9.0  9.2  10.8 8.4 9.1  7.9  9.2  11.4 9.6  9.3  10.5 9.5    8.7
2014 9.9  17.5 11.5 11.4 9.6  10.3 10.1 8.6 8.7  10.0 13.4 11.4 11.0 12.9 10.8 9.7   10.7
[62 rows x 1 columns]

The error code I keep getting can be found below, which makes sense because when I print the dataframe out I can see that "Feb", isn't a column. Does anyone know why my read_csv() is not reading my .csv file properly? Instead of it only interpreting the year as the final column instead of the first and leaving the rest of columns blank? When I open the .csv file in Microsoft Excel it is formatted correctly and saved as a .csv UTF-8 file. Any help would be much appreciated.

    runfile('/Users/name/Desktop/Python_Scripts/wind_classifier/cleanwind.py', wdir='/Users/name/Desktop/Python_Scripts/wind_classifier')
    Traceback (most recent call last):
    
      File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
        return self._engine.get_loc(key)
    
      File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
    
      File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
    
      File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
    
      File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
    
    KeyError: 'Feb'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/Users/bryanekeh/Desktop/Python_Scripts/wind_classifier/cleanwind.py", line 11, in <module>
    for i in abottsfordbc["Feb"]:

  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)

  File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Feb'

Solution

How do you know the file isn't being read in properly? Are you printing it before or after the column operations?

And for your code,

You don't need to cast pd.read_csv() to pd.DataFrame. It is already a DataFrame object.
Do not iterate over the column values. In pandas there is almost always a better way. In this case, try

df_abottsfordbc["Feb"] = df_abottsfordbc["Feb"].replace(-9999, df_abottsfordbc["Feb"].median()

and a similar process for df_abottsfordbc["Oct"]. Additionally, you won't have to perform a concat.

Hopefully this helps.

Answered By - Julian L

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Saturday, March 5, 2022

[FIXED] Reading csv into pandas dataframe and pandas and all the columns except for the very first one get deleted

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels