Issue
I am trying to read csv which is in zip file. My task is to read the file rad_15min.csv file but the issue is when i read zip file (I copied link address by clicking on download button) it gives me error:
Code:
import pandas as pd
df = pd.read_csv('https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7')
Error: ParserError: Error tokenizing data. C error: Expected 1 fields in line 9, saw 2
Data: https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich
Zip file Link: https://www.kaggle.com/datasets/lucafrance/bike-traffic-in-munich/download?datasetVersionNumber=7
I have to read this csv dynamically, I dont want to download it, All just to make a download link and then read csv dynamically. Is there any other approach which i can try ?
Solution
I tried using kaggle API.. but i dont want to download the data, just read dynamically.
I want to read only 1 file in azip
named asrad15_min.csv
, with pandas
You can try making a request with the __Host-KAGGLEID cookie.
I'm not sure if there is a programatic way to get this one but you can always hardcode it. On your keyboard, press (CTRL+SHIFT+I) to open the Developer Tools of your browser and go to Applications
/Cookies
and copy the concerned cookie (and make sure you're logged-in before in kaggle).
import requests
url = "https://www.kaggle.com/datasets/" \
"lucafrance/bike-traffic-in-munich/" \
"download?datasetVersionNumber=7"
cookies = {"__Host-KAGGLEID": "CfDJ8IPkmlRqhQhDn1PidxljKKQWcrozwJuFfsIn..."}
response = requests.get(url, cookies=cookies)
from zipfile import ZipFile
from io import BytesIO
with ZipFile(BytesIO(response.content)) as zf:
df = pd.read_csv(zf.open("rad_15min.csv")) # not rad15_min.csv
NB : If the zip
has only one csv OR if the dataset is not an archive (i.e, a single csv), you can pass BytesIO(response.content)
directly to read_csv
.
Output :
print(df)
datum uhrzeit_start ... richtung_2 gesamt
0 2017.01.01 00:00 ... 0 0
1 2017.01.01 00:00 ... 0 0
2 2017.01.01 00:00 ... 0 0
3 2017.01.01 00:00 ... 0 0
4 2017.01.01 00:00 ... 0 0
... ... ... ... ... ...
1255761 2022.12.31 23:45 ... 2 7
1255762 2022.12.31 23:45 ... 0 0
1255763 2022.12.31 23:45 ... 0 0
1255764 2022.12.31 23:45 ... 0 0
1255765 2022.12.31 23:45 ... 5 17
[1255766 rows x 7 columns]
Answered By - Timeless
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.