Issue
Python example
In Numpy there is lexsort to sort one array within another:
Given multiple sorting keys, which can be interpreted as columns in a spreadsheet, lexsort returns an array of integer indices that describes the sort order by multiple columns.
So taking the following example:
import numpy as np
a = np.array([1,1,1,2,2,2])
b = np.array([10,8,11,4,8,0])
sorted_idx = np.lexsort((b,a))
print(b[sorted_idx])
# [ 8 10 11 0 4 8]
So this sorts b within a as we can see like:
1 1 1 2 2 2
8 10 11 0 4 8
I can't find anything similar in Julia so I wonder how this can be achieved? In my case two columns is sufficient
Julia
So lets take the same data to figure that out:
a = Vector([1,1,1,2,2,2])
b = Vector([10,8,11,4,8,0])
Little benchmark
using StructArrays
using DataFrames
using BenchmarkTools
a = rand(100000)
b = rand(100000)
function f1(a, b)
return sortperm(StructArray((a, b)))
end
function f2(a, b)
return sortperm(DataFrame(a=a,b=b, copycols=false))
end
function f3(a, b)
return sortperm(collect(zip(a, b)))
end
@btime f1(a,b)
@btime f2(a,b)
@btime f3(a,b)
Giving:
6.075 ms (8 allocations: 781.50 KiB)
13.808 ms (8291 allocations: 5.93 MiB)
15.892 ms (7 allocations: 2.29 MiB)
So StructArray
is almost twice as fast as the other two and uses less memory
Solution
Similar to the DataFrames solution but a bit more lightweight in terms of dependencies, a nice solution is to use the StructArrays package, which lets you treat a pair of arrays as if it were an array of tuples, without making a copy of the data, which you can then sort (tuples sort lexicographically):
using StructArrays
i = sortperm(StructArray((a, b)))
Instead of finding the permutation array i
and doing b[i]
, you can also do:
sort!(StructArray((a, b)))
which sorts both a
and b
in-place lexicographically by (a[j], b[j])
.
Answered By - SGJ
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.