Issue
I have two sets of gridded (NetCDF) data with dimensions: Time, S-N, W-E and would like to do a t-test of paired values in time across the entire grid with Python. The computation should return decisions across the entire grid that can then be processed into a spy plot aerial overlay. I am able to process this in MATLAB, after reading the variable, comprising two main steps (as a hint to what is expected in Python):
**[h,p] = ttest2(permute(grid_1,[3 2 1]),permute(griid_2,[3 2 1]))** % [3 2 1] ===> [Time S-N W-E]
**spy(squeeze(h),'k+',1)** % h is 1/0 decision.
Perhaps this would entail array transforms but my knowledge of Python is limited.
A snippet of my gridded data is as follows:
<xarray.DataArray 'WSPD' (Time: 120, south_north: 105, west_east: 120)>
array([[[3.0042849, 3.6635756, 3.5766048, ..., 2.7890186, 3.5537026,
2.4510043],
Coordinates:
XLONG (south_north, west_east) float32 -125.7 -125.7 ... -122.3 -122.2
XLAT (south_north, west_east) float32 55.7 55.7 55.7 ... 56.99 56.99
XTIME (Time) float32 3.003e+04 7.282e+04 ... 1.417e+06 3.742e+06
Time (Time) datetime64[ns] 2010-08-01T08:00:00 ... 2020-07-01T08:00:00
Solution
You can perform paired t-test using scipy.stats.ttest_rel
. It takes in two numpy
arrays (to which xarray
DataArrays
natively converts) and compares them across an axis. Default is axis=0
.
Example
Let's create some sample data
import xarray as xr
from scipy.stats import ttest_rel
import numpy as np
grid_1 = xr.tutorial.open_dataset('air_temperature').load().to_array().squeeze()
grid_2 = grid_1 + np.random.randn(*grid_1.shape) # Normally perturbed grid_1
>>> grid_1
<xarray.DataArray (time: 2920, lat: 25, lon: 53)>
Coordinates:
* lat (lat) float32 75.0 72.5 70.0 67.5 65.0 ... 22.5 20.0 17.5 15.0
* lon (lon) float32 200.0 202.5 205.0 207.5 ... 322.5 325.0 327.5 330.0
* time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00
variable <U3 'air'
Since we want to compare across time
, we need to specify the correct axis. Here time
is first, so we do not need to do anything. Otherwise we'd have to either grid_1.transpose('time', 'lat', 'lon')
or call ttest_rel
with the appropriate axis
argument.
Now we can simply perform the paired t-test.
statres, pval = ttest_rel(grid_1, grid_2)
Plotting
Following the details added to the question, here is how to plot a spy plot with the appropriate ticks.
import matplotlib.pyplot as plt
spy1 = np.where(pval < 0.05, 1, 0)
# Plot spy plot
ax = plt.gca()
ax.spy(spy1, markersize=1, color='black') # 'black' markers
# Prepare ticks
x_ticks = np.arange(0, grid_1.coords['lon'].shape[0], 10)
x_tick_labels = grid_1.coords['lon'].isel(lon=x_ticks).values
y_ticks = np.arange(0, grid_1.coords['lat'].shape[0], 10)
y_tick_labels = grid_1.coords['lat'].isel(lat=y_ticks).values
# Set ticks
ax.set_xticks(x_ticks)
ax.set_xticklabels(x_tick_labels)
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_tick_labels)
Answered By - Dahn
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.