Issue
This is ML code and I am beginner. X and y are class and feature matrix
print(X.shape)
X.dtypes
output:
Age int64
Sex int64
chest pain type int64
Trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca object
thal object
dtype: object
from sklearn.feature_selection import SelectKBest, f_classif
#Using ANOVA to create the new dataset with only best three selected features
X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y) #<-------- get error
X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
print("The dataset with best three selected features after using ANOVA:")
print(X_new_anova.head())
kmeans_anova = KMeans(n_clusters = 3).fit(X_new_anova)
labels_anova = kmeans_anova.labels_
#Counting the number of the labels in each cluster and saving the data into clustering_classes
clustering_classes_anova = {
0: [0,0,0,0,0],
1: [0,0,0,0,0],
2: [0,0,0,0,0]
}
for i in range(len(y)):
clustering_classes_anova[labels_anova[i]][y[i]] += 1
###Finding the most appeared label in each cluster and computing the purity score
purity_score_anova = (max(clustering_classes_anova[0])+max(clustering_classes_anova[1])+max(clustering_classes_anova[2]))/len(y)
print(f"Purity score of the new data after using ANOVA {round(purity_score_anova*100, 2)}%")
This is the error I got:
#Using ANOVA to create the new dataset with only best three selected features
----> 4 X_new_anova = SelectKBest(f_classif, k=3).fit_transform(X,y)
5 X_new_anova = pd.DataFrame(X_new_anova, columns = ["Age", "Trestbps","chol"])
6 print("The dataset with best three selected features after using ANOVA:")
ValueError: could not convert string to float: '?'
I don't know what is the meaning of "?" could you please tell me how to avoid this error?
Solution
The meaning of the '?' is that there is this string (?) somewhere within your datafile that it cannot convert. I would just check your datafile to make sure that everything checks out. I would guess whoever made it put a ? somewhere that data could not be found.
can Delete a row using
DataFrame=Dataframe.drop(labels=3,axis=0)
'''
With 3 being used as a placeholder for whatever
row holds the ? so if row 40 has the empty ?, you would do # 40
'''
Answered By - Sierra Walker
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.