Issue
Update: I tried the code by Valdi_Bo with some modification
df = df.dropna()
print(df.head(5))
a = df.to_numpy()
row, col = np.where(np.greater(a, 0.8) & np.less(a, 1.0))
selected_corr_df = pd.DataFrame([ (df.index[r][0], df.index[r][1], df.columns[c], df.iloc[r, c])
for r, c in np.c_[row, col] ], columns = ['datetime','code', 'col', 'val'])
so that datetime is included however, the result is
datetime code col val
0 2023-01-01 20:00:00 BTCUSDT BTCUSDT 1.000000
1 2023-01-01 20:00:00 BTCUSDT ETHUSDT 0.839907
2 2023-01-01 20:00:00 ETHUSDT BTCUSDT 0.839907
3 2023-01-01 20:00:00 BNBUSDT BNBUSDT 1.000000
4 2023-01-01 20:00:00 XRPUSDT DOGEUSDT 0.804122
Don't know why 1.0 correlation is still included I tried to print(row,col)
print(row,col)
[ 0 0 1 ... 65307 65314 65317] [0 1 0 ... 7 4 7]
print(df.head(5).iloc[:,:5])
BTCUSDT ETHUSDT BNBUSDT XRPUSDT SOLUSDT
datetime code
2023-01-01 20:00:00 BTCUSDT 1.000000 0.839907 0.628913 0.459101 0.685730
ETHUSDT 0.839907 1.000000 0.654618 0.401911 0.572358
BNBUSDT 0.628913 0.654618 1.000000 0.494672 0.561099
XRPUSDT 0.459101 0.401911 0.494672 1.000000 0.525257
SOLUSDT 0.685730 0.572358 0.561099 0.525257 1.000000
looks like some problem occour on the np.where function
I have the following dataframe, corr_df
BTCUSDT ETHUSDT BNBUSDT
datetime code
2023-09-30 22:00:00 BTCUSDT 1.000000 0.744847 0.758208
ETHUSDT 0.744847 1.000000 0.788360
BNBUSDT 0.758208 0.788360 1.000000
XRPUSDT 0.175308 0.165487 0.330017
SOLUSDT 0.392990 0.433611 0.573683
ADAUSDT 0.326387 0.465547 0.555164
DOGEUSDT 0.584677 0.572698 0.658798
TRXUSDT 0.396398 0.150638 0.330389
MATICUSDT 0.640225 0.603454 0.663140
DOTUSDT 0.741107 0.758502 0.844142
2023-09-30 23:00:00 BTCUSDT 1.000000 0.739415 0.775631
ETHUSDT 0.739415 1.000000 0.798653
BNBUSDT 0.775631 0.798653 1.000000
XRPUSDT 0.185117 0.172995 0.334012
SOLUSDT 0.407861 0.444603 0.579565
ADAUSDT 0.324440 0.462648 0.543445
DOGEUSDT 0.593054 0.577852 0.667520
TRXUSDT 0.414259 0.164620 0.344475
MATICUSDT 0.655499 0.613226 0.670288
DOTUSDT 0.744608 0.759279 0.846514
I want to get the pair names that meet the condition, with datetime as index
I tried to select the data with np.where the condition is
np.where(corr_dict[corr_len]> 0.8) & (corr_dict[corr_len] != 1)
which output 2 arrays I used 2 variable to store it
row, col = np.where(corr_dict[corr_len]> 0.8) & (corr_dict[corr_len] != 1)
However, I stuck when I tried to create a new df, named selected_corr_df I tried to get the column name as 'pair1' column and 2nd index name as 'pair2' column
selected_corr_df = pd.DataFrame(index=corr_dict[corr_len].index, columns=['pair1', 'pair2'])
But I only create errors
Solution
To create your source DataFrame I used the following code:
txt = '''\
datetime code BTCUSDT ETHUSDT BNBUSDT
2023-09-30 22:00:00 BTCUSDT 1.000000 0.744847 0.758208
2023-09-30 22:00:00 ETHUSDT 0.744847 1.000000 0.788360
2023-09-30 22:00:00 BNBUSDT 0.758208 0.788360 1.000000
2023-09-30 22:00:00 XRPUSDT 0.175308 0.165487 0.330017
2023-09-30 22:00:00 SOLUSDT 0.392990 0.433611 0.573683
2023-09-30 22:00:00 ADAUSDT 0.326387 0.465547 0.555164
2023-09-30 22:00:00 DOGEUSDT 0.584677 0.572698 0.658798
2023-09-30 22:00:00 TRXUSDT 0.396398 0.150638 0.330389
2023-09-30 22:00:00 MATICUSDT 0.640225 0.603454 0.663140
2023-09-30 22:00:00 DOTUSDT 0.741107 0.758502 0.844142
2023-09-30 23:00:00 BTCUSDT 1.000000 0.739415 0.775631
2023-09-30 23:00:00 ETHUSDT 0.739415 1.000000 0.798653
2023-09-30 23:00:00 BNBUSDT 0.775631 0.798653 1.000000
2023-09-30 23:00:00 XRPUSDT 0.185117 0.172995 0.334012
2023-09-30 23:00:00 SOLUSDT 0.407861 0.444603 0.579565
2023-09-30 23:00:00 ADAUSDT 0.324440 0.462648 0.543445
2023-09-30 23:00:00 DOGEUSDT 0.593054 0.577852 0.667520
2023-09-30 23:00:00 TRXUSDT 0.414259 0.164620 0.344475
2023-09-30 23:00:00 MATICUSDT 0.655499 0.613226 0.670288
2023-09-30 23:00:00 DOTUSDT 0.744608 0.759279 0.846514
'''
df = pd.read_fwf(io.StringIO(txt), widths=[20, 11, 10, 10, 10])
df.datetime = pd.to_datetime(df.datetime)
df.set_index(['datetime', 'code'], inplace=True)
Then I created a Numpy array:
a = df.to_numpy()
To get row / column numbers of elements of interest, I ran:
row, col = np.where(np.greater(a, 0.8) & np.less(a, 1.0))
And finally, to get the result with three columns (code, col and val), I ran:
selected_corr_df = pd.DataFrame([ (df.index[r][1], df.columns[c], df.iloc[r, c])
for r, c in np.c_[row, col] ], columns = ['code', 'col', 'val'])
The result is:
code col val
0 DOTUSDT BNBUSDT 0.844142
1 DOTUSDT BNBUSDT 0.846514
Answered By - Valdi_Bo
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.