Issue
I have a folder (existing in the same directory as the python script) with a lot of csv files starting from 1st Jan to 31st Dec and I want to read only specific csv files within a certain date range from the folder into python and later appending the files into a list.
The files are named as below and there are files for each day of multiple months:
BANK_NIFTY_5MINs_2020-02-01.csv, BANK_NIFTY_5MINs_2020-02-02.csv, ... BANK_NIFTY_5MINs_2020-02-28.csv, BANK_NIFTY_5MINs_2020-03-01, .... BANK_NIFTY_5MINs_2020-03-31 and so on.
Currently, I have the code to fetch the csv files of the whole month of March by using the 'startswith' and 'endswith' syntax. However, doing this allows me to target files for only one month at a time. I want to be able to read multiple months of csv files in within a specified date range for example Oct, Nov and Dec or Feb and March (Basically start and end at any month).
The following code gets only the files for March. I then fetch the files from the list and merge it into a dataframe.
#Accessing csv files from directory
startdate = datetime.strptime("2022-05-01", "%Y-%m-%d")
enddate = datetime.strptime("2022-06-30", "%Y-%m-%d")
all_files = []
path = os.path.realpath(os.path.join(os.getcwd(),os.path.dirname('__file__')))
for root, dirs, files in os.walk(path):
for file in files:
if file.startswith("/BANK_NIFTY_5MINs_") and file.endswith(".csv"):
file_date = datetime.strptime(os.path.basename(file), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
if startdate <= file_date <= enddate:
all_files.append(os.path.join(root, file))
Output of the above looks : 'BANK_NIFTY_5MINs_2020-03-01.csv' and so on but should be the entire path, for example: 'c:\Users\User123\Desktop\Myfolder\2020\BANK\BANK_NIFTY_5MINs_2020-03-01.csv'. The merge function requires the complete path in list to be in this format to process further.
Solution
I would have a different approach for more flexibility
import os
from datetime import datetime
from pprint import pprint
def quick_str_to_date(s: str) -> datetime:
return datetime.strptime(s, "%Y-%m-%d")
def get_file_by_date_range(path: str, startdate: datetime or str, enddate: datetime or str) -> list:
if type(startdate) == str:
startdate = quick_str_to_date(startdate)
if type(enddate) == str:
enddate = quick_str_to_date(enddate)
result = []
for root, dirs, files in os.walk(path):
for filename in files:
if filename.startswith("BANK_NIFTY_5MINs_") and filename.lower().endswith(".csv"):
file_date = datetime.strptime(os.path.basename(filename), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
if startdate <= file_date <= enddate:
result.append(filename)
return result
print("all")
pprint(get_file_by_date_range("/full/path/to/files", "2000-01-01", "2100-12-31"))
print("\nfebuari")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-28"))
print("\none day")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-01"))
output
all
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-03-01.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-03-31.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']
febuari
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
'/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']
one day
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']
Answered By - Edo Akse
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.