Tuesday, June 14, 2022

[FIXED] How to iterate through pairs of csvs in a directory?

June 14, 2022 numpy, pandas, python-3.x No comments

Issue

I have a dataframe of csvs (25 of them) I am trying to iterate through to find pairs. I have the sorting algorithm implemented, but I cannot figure out how to loop through the dataframe.

I have tried the below code:

# This works

a_list = []
path = os.getcwd()
for filename in os.listdir(path):
    if filename.endswith('.csv'):
        a_list.append(filename)
    
# This does not work

list_of_used_dfs = []
for element_1, element_2 in enumerate(a_list):
    # If name of elements is same, cycle to next element
    if element_1 == element_2:
        element_2 = a_list[element_2 + 1]
        # if element_1 or element_2 are in list_of_used_dfs, cycle to next non-matching pairs
        # I don't believe this is the correct approach
        if element_1, element_2 in list_of_used_dfs:
            element_1 = element_1 + 1
            element_2 = element_2 + 1
    else:
        list_of_used_dfs.append(element_1)
        list_of_used_dfs.append(element_2)

        # dataframes to be used for analysis
        df1 = pd.read_csv(element_1)
        df2 = pd.read_csv(element_2)
    
        

*** algorithm ***

Theoretically, the directory looks like this:

df1.csv
df2.csv
df3.csv
df4.csv
df5.csv
df6.csv

the output would be a list containing the names of the csvs, in the form of

['df1.csv','df2.csv','df3.csv','df4.csv','df5.csv','df6.csv']

Which I have through

a_list = []
path = os.getcwd()
for filename in os.listdir(path):
    if filename.endswith('.csv'):
        a_list.append(filename)

From there, the outcome is cycling through the dataframes into two pairs

FOR df in a_list:
df_a = df1.csv
df_b = df2.csv
# Run code

LOOP
df_a = df1.csv
df_b = df3.csv
# Run code

LOOP
df_a = df1.csv
df_b = df4.csv
# Run code
...
LOOP 
df_a = df2.csv
df_a = df3.csv
# Run Code

I am unsure if the approach is correct, but I've been scratching my head with this for a few hours, so any help or advice is appreciated.

Here is a copy of a dataframe that can be used for each dataframe element

type    timestamp   open    high    low close   base_volume quote_volume    num_orders
pair    1.59896E+12 6.08    15.97   6.08    10.68   1973917.073 2.30E+07    33610
pair    1.59896E+12 10.679  11.49   10.1    10.906  756307.741  8.24E+06    13534
pair    1.59897E+12 10.905  10.918  9.463   9.934   801510.45   8.17E+06    15155
pair    1.59897E+12 9.917   10.87   9.35    10.848  784286.785  7.83E+06    13810
pair    1.59897E+12 10.848  10.985  9.88    10.22   709953.023  7.33E+06    12660
pair    1.59898E+12 10.22   10.888  9.805   10.129  497659.567  5.09E+06    10409
pair    1.59898E+12 10.121  10.917  10.051  10.451  392647.768  4.10E+06    8496
pair    1.59898E+12 10.46   10.67   10.364  10.366  185948.208  1.95E+06    4352
pair    1.59899E+12 10.38   10.415  10.037  10.166  246411.785  2.53E+06    3919
pair    1.59899E+12 10.168  10.769  9.95    10.5    389719.541  4.01E+06    7963

Solution

Unless I didn't understand the problem, you want to generate every unique pair of your dataframes. I used itertools.combinations as suggested by @Jon Clements :

from itertools import combinations
df_paths = ['path_to_df1', 'path_to_df2', 'path_to_df3', 'path_to_df4']
unique_pairs = [items for items in combinations(df_paths, r=2)]
for path_a, path_b in unique_pairs:
    print(path_a, path_b)
    # Do your code here

Output :
path_to_df1 path_to_df2
path_to_df1 path_to_df3
path_to_df1 path_to_df4
path_to_df2 path_to_df3
path_to_df2 path_to_df4
path_to_df3 path_to_df4

Answered By - Jinter

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Tuesday, June 14, 2022

[FIXED] How to iterate through pairs of csvs in a directory?

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels