Wednesday, October 20, 2021

[FIXED] Using Mann Kendall in python with a lot of data

October 20, 2021 arrays, jupyter-notebook, numpy, python, statistics No comments

Issue

I have a set of 46 years worth of rainfall data. It's in the form of 46 numpy arrays each with a shape of 145, 192, so each year is a different array of maximum rainfall data at each lat and lon coordinate in the given model.

I need to create a global map of tau values by doing an M-K test (Mann-Kendall) for each coordinate over the 46 years.

I'm still learning python, so I've been having trouble finding a way to go through all the data in a simple way that doesn't involve me making 27840 new arrays for each coordinate.

So far I've looked into how to use scipy.stats.kendalltau and using the definition from here: https://github.com/mps9506/Mann-Kendall-Trend

EDIT:

To clarify and add a little more detail, I need to perform a test on for each coordinate and not just each file individually. For example, for the first M-K test, I would want my x=46 and I would want y=data1[0,0],data2[0,0],data3[0,0]...data46[0,0]. Then to repeat this process for every single coordinate in each array. In total the M-K test would be done 27840 times and leave me with 27840 tau values that I can then plot on a global map.

EDIT 2:

I'm now running into a different problem. Going off of the suggested code, I have the following: for i in range(145): for j in range(192): out[i,j] = mk_test(yrmax[:,i,j],alpha=0.05) print out

I used numpy.stack to stack all 46 arrays into a single array (yrmax) with shape: (46L, 145L, 192L) I've tested it out and it calculates p and tau correctly if I change the code from out[i,j] to just out. However, doing this messes up the for loop so it only takes the results from the last coordinate in stead of all of them. And if I leave the code as it is above, I get the error: TypeError: list indices must be integers, not tuple

My first guess was that it has to do with mk_test and how the information is supposed to be returned in the definition. So I've tried altering the code from the link above to change how the data is returned, but I keep getting errors relating back to tuples. So now I'm not sure where it's going wrong and how to fix it.

EDIT 3:

One more clarification I thought I should add. I've already modified the definition in the link so it returns only the two number values I want for creating maps, p and z.

Solution

Thanks to the answers provided and some work I was able to work out a solution that I'll provide here for anyone else that needs to use the Mann-Kendall test for data analysis.

The first thing I needed to do was flatten the original array I had into a 1D array. I know there is probably an easier way to go about doing this, but I ultimately used the following code based on code Grr suggested using.

`x = 46
out1 = np.empty(x)
out = np.empty((0))
for i in range(146):
    for j in range(193):
        out1 = yrmax[:,i,j]
        out = np.append(out, out1, axis=0) `

Then I reshaped the resulting array (out) as follows:

out2 = np.reshape(out,(27840,46))

I did this so my data would be in a format compatible with scipy.stats.kendalltau 27840 is the total number of values I have at every coordinate that will be on my map (i.e. it's just 145*192) and the 46 is the number of years the data spans.

I then used the following loop I modified from Grr's code to find Kendall-tau and it's respective p-value at each latitude and longitude over the 46 year period.

`x = range(46)
 y = np.zeros((0))
for j in range(27840):
    b = sc.stats.kendalltau(x,out2[j,:])
    y = np.append(y, b, axis=0)`

Finally, I reshaped the data one for time as shown:newdata = np.reshape(y,(145,192,2)) so the final array is in a suitable format to be used to create a global map of both tau and p-values.

Thanks everyone for the assistance!

Answered By - Alex Morrison

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Wednesday, October 20, 2021

[FIXED] Using Mann Kendall in python with a lot of data

Issue

Solution

0 comments:

Post a Comment

Popular Posts

Labels