Issue
I have a dataframe of csvs (25 of them) I am trying to iterate through to find pairs. I have the sorting algorithm implemented, but I cannot figure out how to loop through the dataframe.
I have tried the below code:
# This works
a_list = []
path = os.getcwd()
for filename in os.listdir(path):
if filename.endswith('.csv'):
a_list.append(filename)
# This does not work
list_of_used_dfs = []
for element_1, element_2 in enumerate(a_list):
# If name of elements is same, cycle to next element
if element_1 == element_2:
element_2 = a_list[element_2 + 1]
# if element_1 or element_2 are in list_of_used_dfs, cycle to next non-matching pairs
# I don't believe this is the correct approach
if element_1, element_2 in list_of_used_dfs:
element_1 = element_1 + 1
element_2 = element_2 + 1
else:
list_of_used_dfs.append(element_1)
list_of_used_dfs.append(element_2)
# dataframes to be used for analysis
df1 = pd.read_csv(element_1)
df2 = pd.read_csv(element_2)
*** algorithm ***
Theoretically, the directory looks like this:
df1.csv
df2.csv
df3.csv
df4.csv
df5.csv
df6.csv
the output would be a list containing the names of the csvs, in the form of
['df1.csv','df2.csv','df3.csv','df4.csv','df5.csv','df6.csv']
Which I have through
a_list = []
path = os.getcwd()
for filename in os.listdir(path):
if filename.endswith('.csv'):
a_list.append(filename)
From there, the outcome is cycling through the dataframes into two pairs
FOR df in a_list:
df_a = df1.csv
df_b = df2.csv
# Run code
LOOP
df_a = df1.csv
df_b = df3.csv
# Run code
LOOP
df_a = df1.csv
df_b = df4.csv
# Run code
...
LOOP
df_a = df2.csv
df_a = df3.csv
# Run Code
I am unsure if the approach is correct, but I've been scratching my head with this for a few hours, so any help or advice is appreciated.
Here is a copy of a dataframe that can be used for each dataframe element
type timestamp open high low close base_volume quote_volume num_orders
pair 1.59896E+12 6.08 15.97 6.08 10.68 1973917.073 2.30E+07 33610
pair 1.59896E+12 10.679 11.49 10.1 10.906 756307.741 8.24E+06 13534
pair 1.59897E+12 10.905 10.918 9.463 9.934 801510.45 8.17E+06 15155
pair 1.59897E+12 9.917 10.87 9.35 10.848 784286.785 7.83E+06 13810
pair 1.59897E+12 10.848 10.985 9.88 10.22 709953.023 7.33E+06 12660
pair 1.59898E+12 10.22 10.888 9.805 10.129 497659.567 5.09E+06 10409
pair 1.59898E+12 10.121 10.917 10.051 10.451 392647.768 4.10E+06 8496
pair 1.59898E+12 10.46 10.67 10.364 10.366 185948.208 1.95E+06 4352
pair 1.59899E+12 10.38 10.415 10.037 10.166 246411.785 2.53E+06 3919
pair 1.59899E+12 10.168 10.769 9.95 10.5 389719.541 4.01E+06 7963
Solution
Unless I didn't understand the problem, you want to generate every unique pair of your dataframes. I used itertools.combinations
as suggested by @Jon Clements :
from itertools import combinations
df_paths = ['path_to_df1', 'path_to_df2', 'path_to_df3', 'path_to_df4']
unique_pairs = [items for items in combinations(df_paths, r=2)]
for path_a, path_b in unique_pairs:
print(path_a, path_b)
# Do your code here
Output :
path_to_df1 path_to_df2
path_to_df1 path_to_df3
path_to_df1 path_to_df4
path_to_df2 path_to_df3
path_to_df2 path_to_df4
path_to_df3 path_to_df4
Answered By - Jinter
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.