Issue
I'm trying to create a 3d model of lithology values using x,y,z values and lithology classifications. The closest solution I got was using a linear interpolation method, however, I want to classify, not interpolate, unknown values based on if they're in between two known points linearly. Is there a way to do this in python? I've tried a logistic regression model, but the problem is the dataset is relatively sparse and each site is different so the model would not generalize well.
My data is in latitude and longitude and NAVD88 surface elevation. I'm aware of the projection issues and I may project my data later to make it more spatially accurate, but for now I'm trying to see if there's a relatively easy way to some kind of classification between two known x,y,z points in python without having to write my own linear expression algorithm.
Thanks in advance!
This is my method to interpolate right now:
#query is my dataset queried from my database
# Dimensions
latitude_vals = query['Y']
longitude_vals = query['X']
z_vals = query['lith_elev_start']
lith_vals = query['SoilID']
#numpy array conversion
x = np.asarray(longitude_vals, dtype=np.float64)
y = np.asarray(latitude_vals, dtype=np.float64)
z = np.asarray(z_vals, dtype=np.float64)
# Map text values to numerical codes
lith_mapping = {lith: code for code, lith in enumerate(lith_vals.unique())} #encode lithology classifications to integers
w = np.array([lith_mapping[lith] for lith in lith_vals], dtype=np.float64) #np array of lith classification values
# Create a grid of coordinates for the output NetCDF
output_x = np.linspace(min(x), max(x), 100)
output_y = np.linspace(min(y), max(y), 100)
output_z = np.linspace(min(z), max(z), 100)
# Use meshgrid to create 3D arrays of coordinates
output_x, output_y, output_z = np.meshgrid(output_x, output_y, output_z, indexing='ij')
# Interpolate on the grid
grid_points = np.array([output_x.flatten(), output_y.flatten(), output_z.flatten()]).T
output_w_linear = griddata((x, y, z), w, grid_points, method='linear', fill_value=-9999)
output_w_linear =output_w_linear.reshape(output_x.shape)
Solution
If I correctly understand your problem:
- You have tridimensional coordinates mapped to classes (lithology);
- You want to predict class at new coordinates
The scipy
library can perform interpolation for multidimensional function, but they are not classifiers. Decimal class will have no sense in this scenario.
On the other hand sklearn
has a lot of built in mechanics to do so...
MCVE
Let's create some a 3D dataset with 3 different class:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import confusion_matrix
s = np.linspace(0, 1, 10)
X, Y, Z = np.meshgrid(s, s, s)
T = np.zeros_like(X)
T[X+Y+Z>2] = 1
T[X+Y-Z>1] = 2
data = np.array([X.ravel(), Y.ravel(), Z.ravel()]).T
target = T.ravel()
Now we can train a classifier and check on test data it performs as expected.
X_train, X_test, y_train, y_test = train_test_split(data, target, test_size=0.2, stratify=target, random_state=123)
model = KNeighborsClassifier(n_neighbors=10, weights="distance")
model.fit(X_train, y_train)
We choose the KNeighborsClassifier
because it seemed to better suit your requirement but you have a lot of choices.
Once the model is fitted, we can predict new location classes for test fold:
y_pred = model.predict(X_test)
And measure the accuracy of this prediction:
model.score(X_test, y_test) # 0.975
confusion_matrix(y_test, y_pred)
# array([[142, 0, 0],
# [ 2, 23, 0],
# [ 3, 0, 30]]
Which is fairly decent for this toy example.
Answered By - jlandercy
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.