Issue
I am new to data analysis on Python and faced with problem in project making process. Some of the values in csv file has a delimiter in the double quotes, so Pandas can't separate it correctly
top = pd.read_csv(r"C:\Users\User\Desktop\data analytics\Project\Analysis-Spotify-Top-2000\Spotify-2000.csv",delimiter = ",",
encoding = "UTF-8", doublequote=True, engine="python", quotechar='"', quoting=csv.QUOTE_ALL)
I found which records calls that problem:
My teacher advice me to create a new dataframe with these values and the same columns, and those records that has a delimiter in double quotes should be deleted, then the df will merge to the original.
But honestly, I don't know how to do it properly (I made some weird things - screen2)
is_title_null = pd.isnull(top["Title"])
missing_list = top[is_title_null]["Index"].tolist()
list_of_missing_list = []
for i in missing_list:
l = i.split(', ')
list_of_missing_list.append(l)
list_of_missing_list
missing_df = pd.DataFrame(np.empty((0, 15)))
missing_df.columns = ["Index", "Title","Artist","Top Genre","Year","Beats Per Minute
(BPM)","Energy","Danceability","Loudness (dB)","Liveness","Valence","Length
(Duration)","Acousticness","Speechiness","Popularity"]
missing_df.append(list_of_missing_list,ignore_index = True)
Here is my project link in GitHub (here you can see the problem): https://github.com/Sabina-Karenkina/Analysis-Spotify-Top-2000
Solution
Ok. This is not a really elegant way to do things, but as I mentioned in the comment I made previously you will not fix the problem by first creating the dataframe because the file is corrupt to begin with. I managed to find a way to easily solve it.
Open your Spotify-2000
-file with excel and make text to columns. When asked which delimiter, choose , (comma). Save your file as a new ´´´csv´´´-file (Soptify2.csv) but make sure to have ; as delimiter (this is because you might have titles including commas.
Now, use pandas
to read this new file:
top = pd.read_csv(r"C:/Users/k_sego/spotify2.csv",delimiter = ";",
encoding = "iso-8859-1", doublequote=True, engine="python")
top.head(100)
Answered By - Serge de Gosson de Varennes
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.