Thursday, November 10, 2022

[FIXED] How to select specific csv files for specified date range from a folder in python?

November 10, 2022 dataframe, python, regex No comments

Issue

I have a folder (existing in the same directory as the python script) with a lot of csv files starting from 1st Jan to 31st Dec and I want to read only specific csv files within a certain date range from the folder into python and later appending the files into a list.

The files are named as below and there are files for each day of multiple months:

BANK_NIFTY_5MINs_2020-02-01.csv, BANK_NIFTY_5MINs_2020-02-02.csv, ... BANK_NIFTY_5MINs_2020-02-28.csv, BANK_NIFTY_5MINs_2020-03-01, .... BANK_NIFTY_5MINs_2020-03-31 and so on.

Currently, I have the code to fetch the csv files of the whole month of March by using the 'startswith' and 'endswith' syntax. However, doing this allows me to target files for only one month at a time. I want to be able to read multiple months of csv files in within a specified date range for example Oct, Nov and Dec or Feb and March (Basically start and end at any month).

The following code gets only the files for March. I then fetch the files from the list and merge it into a dataframe.

#Accessing csv files from directory
startdate  = datetime.strptime("2022-05-01", "%Y-%m-%d")
enddate = datetime.strptime("2022-06-30", "%Y-%m-%d")
all_files = []
path = os.path.realpath(os.path.join(os.getcwd(),os.path.dirname('__file__')))
for root, dirs, files in os.walk(path):
    for file in files:
        if file.startswith("/BANK_NIFTY_5MINs_") and file.endswith(".csv"):
             file_date = datetime.strptime(os.path.basename(file), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
             if startdate <= file_date <= enddate:
                  all_files.append(os.path.join(root, file))

Output of the above looks : 'BANK_NIFTY_5MINs_2020-03-01.csv' and so on but should be the entire path, for example: 'c:\Users\User123\Desktop\Myfolder\2020\BANK\BANK_NIFTY_5MINs_2020-03-01.csv'. The merge function requires the complete path in list to be in this format to process further.

Solution

I would have a different approach for more flexibility

import os
from datetime import datetime
from pprint import pprint


def quick_str_to_date(s: str) -> datetime:
    return datetime.strptime(s, "%Y-%m-%d")


def get_file_by_date_range(path: str, startdate: datetime or str, enddate: datetime or str) -> list:
    if type(startdate) == str:
        startdate = quick_str_to_date(startdate)
    if type(enddate) == str:
        enddate = quick_str_to_date(enddate)
    result = []   
    for root, dirs, files in os.walk(path):
        for filename in files:
            if filename.startswith("BANK_NIFTY_5MINs_") and filename.lower().endswith(".csv"):
                file_date = datetime.strptime(os.path.basename(filename), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
                if startdate <= file_date <= enddate:
                    result.append(filename)
    return result


print("all")
pprint(get_file_by_date_range("/full/path/to/files", "2000-01-01", "2100-12-31"))

print("\nfebuari")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-28"))

print("\none day")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-01"))

output

all
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-01.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-31.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

febuari
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

one day
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

Answered By - Edo Akse

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Thursday, November 10, 2022

[FIXED] How to select specific csv files for specified date range from a folder in python?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels