Issue
I have a corporate bond dataframe that has multiple types of bonds with two columns on their yields and years-to-maturity values. When I plot their yields against the years to maturity, I can clearly see at least three, possibly four yield curves. I would like to fit at least three curves on this data and then divide the dataframe into smaller chunks based on distance to the curve. Here's a simple scatter plot:
I have tried the Ransac method described here: Iteratively fitting polynomial curve
and here's what I tried using RANSAC:
y_ax = df_clean.YTW
x_ax = df_clean.YTM
class PolynomialRegression(object):
def __init__(self, degree=3, coeffs=None):
self.degree = degree
self.coeffs = coeffs
def fit(self, X, y):
self.coeffs = np.polyfit(X.ravel(), y, self.degree)
def get_params(self, deep=False):
return {'coeffs': self.coeffs}
def set_params(self, coeffs=None, random_state=None):
self.coeffs = coeffs
def predict(self, X):
poly_eqn = np.poly1d(self.coeffs)
y_hat = poly_eqn(X.ravel())
return y_hat
def score(self, X, y):
return mean_squared_error(y, self.predict(X))
poly_degree = 3
ransac = RANSACRegressor(PolynomialRegression(degree=poly_degree),
residual_threshold=2 * np.std(y_ax),
random_state=0)
ransac.fit(np.expand_dims(x_ax, axis=1), y_ax)
inlier_mask = ransac.inlier_mask_
y_hat = ransac.predict(np.expand_dims(x_vals, axis=1))
plt.plot(x_vals, y_vals, 'bx', label='input samples')
plt.plot(x_vals[inlier_mask], y_vals[inlier_mask], 'go', label='inliers (2*STD)')
plt.plot(x_vals, y_hat, 'r-', label='estimated curve')
Shortly:
- Is there a way such that I can fit 3-4 separate curves on these plot points?
- How can I split the dataframe based on these curves?
The entire data is here (Only YTW & YTM are plotted here): Corp Bonds Data
Solution
I did some exploring of your data and this is what I came up with.
First, I noticed you had a lot of different ID
s and issuer
s. I used pandas' groupby
function to separate your dataframe into groups based on these two columns. I didn't get anything very interesting with ID
, but I did with issuer
.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from scipy.optimize import curve_fit
df = pd.read_excel('./CorpBonds_Clean.xlsx')
groups = df.groupby('issuer')
fig, ax = plt.subplots()
for group in groups:
subdf = group[1].sort_values('YTM')
ax.plot(subdf['YTM'], subdf['YTW'], marker='o')
This is what I got:
Notice how each curve appears to have its own color, suggesting this separation is related with the exponentials you described. If you move the fig, ax = plt.subplots()
line inside the loop, you'll see each group separately. I did that and I saw tons of groups with very few points. I decided on a simple heuristic to separate them: length greater than 5.
Now it's time to fit. I used this exponential, which I got from here
def exp_plateau(x, ym, y0, k):
return ym - (ym - y0) * np.exp(-k * x)
And I fitted all the subgroups. Here's the result:
Here's the code that generated it:
groups = df.groupby('issuer')
fig, ax = plt.subplots()
param_names = ['ym', 'y0', 'k']
for group in groups:
subdf = group[1].sort_values('YTM')
if len(subdf) > 5:
try:
popt, pcov = curve_fit(exp_plateau, subdf['YTM'], subdf['YTW'])
except RuntimeError:
print(f"{group[0]} couldn't be fit. Skipping")
continue
l = ax.plot(subdf['YTM'], subdf['YTW'])
ax.plot(subdf['YTM'], exp_plateau(subdf['YTM'], *popt), color=l[0].get_color(), ls='--')
print(f'"{group[0]}"', *[f'{param_name}: {i:.2f}+/-{j:.2f}' for param_name, i, j in zip(param_names, popt, np.sqrt(np.diag(pcov)))])
#ax.set_title(group[0])
ax.set_xlabel('YTM')
ax.set_ylabel('YTW')
And the output with the fit parameters and their estimated errors:
"407 INTER INC CPN STRIP" ym: 5.66+/-0.09 y0: 3.80+/-0.05 k: 0.29+/-0.03
"AGT LTD COUPON STRIP" ym: 6.42+/-1.66 y0: 3.09+/-0.13 k: 0.23+/-0.17
"BANK OF NOVA SCOTIA" ym: 5.47+/-0.24 y0: 3.36+/-0.08 k: 0.43+/-0.10
"BCE COUPON STRIP" ym: 8.04+/-0.47 y0: 3.95+/-0.03 k: 0.05+/-0.01
"BRCOL GENERIC STRIP" ym: 4.30+/-0.01 y0: -0.12+/-0.68 k: 0.29+/-0.02
"CANADIAN IMP BK COMM HK" ym: 3.37+/-0.17 y0: 2.27+/-0.53 k: 5.88+/-5.41
"CANADIAN TIRE CPN STRIP" ym: 7.30+/-0.42 y0: 3.81+/-0.04 k: 0.08+/-0.02
"GREAT-WEST LIFECO CPN ST" ym: 55.02+/-918.22 y0: 3.79+/-0.18 k: 0.00+/-0.06
"GREATER TORONTO CPN STRP" ym: 6.31+/-0.44 y0: 3.56+/-0.04 k: 0.08+/-0.02
"HYDRO ONE STRIP" ym: 5.48+/-0.16 y0: 3.10+/-0.08 k: 0.19+/-0.03
"LEVIS QUE COUPON STRIP" ym: 3.88+/-0.08 y0: 2.81+/-0.03 k: 0.32+/-0.05
LOBLAW COS CPN STRIP couldn't be fit. Skipping
"NEW BRUN GENERIC CPN STP" ym: 4.32+/-0.01 y0: 2.59+/-0.07 k: 0.27+/-0.02
"SAGUENAY CPN STRIP" ym: 3.89+/-0.06 y0: 2.78+/-0.04 k: 0.33+/-0.05
"SUN LIFE FIN SPN STRIP" ym: 6.47+/-0.20 y0: 3.98+/-0.06 k: 0.11+/-0.02
"TELUS CORP COUPON STRIP" ym: 5.99+/-0.04 y0: 3.29+/-0.07 k: 0.22+/-0.01
TORONTO DOMINION STRIP couldn't be fit. Skipping
"TRANS-CANADA CPN STRIP" ym: 6.53+/-0.39 y0: 3.85+/-0.06 k: 0.13+/-0.03
"TRANSALTA CORP CPN STRIP" ym: 8.91+/-1.91 y0: 4.95+/-0.07 k: 0.07+/-0.05
"WINNIPEG COUPON STRIP" ym: 4.70+/-0.03 y0: -1142.59+/-259776205.48 k: 1.06+/-33065.14
Answered By - K.Cl
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.