Issue
What, if any, is the difference between a NumPy "structured array", a "record array" and a "recarray"?
The NumPy docs imply that the first two are the same: if they are, which is the prefered term for this object?
The same documentation says (at the bottom of the page): You can find some more information on recarrays and structured arrays (including the difference between the two) here. Is there a simple explanation of this difference?
Solution
Records/recarrays are implemented in
https://github.com/numpy/numpy/blob/master/numpy/core/records.py
Some relevant quotes from this file
Record Arrays Record arrays expose the fields of structured arrays as properties. The recarray is almost identical to a standard array (which supports named fields already) The biggest difference is that it can use attribute-lookup to find the fields and it is constructed using a record.
recarray
is a subclass of ndarray
(in the same way that matrix
and masked arrays
are). But note that its constructor is different from np.array
. It is more like np.empty(size, dtype)
.
class recarray(ndarray):
"""Construct an ndarray that allows field access using attributes.
This constructor can be compared to ``empty``: it creates a new record
array but does not fill it with data.
The key function for implementing the unique field as attribute behavior is __getattribute__
(__getitem__
implements indexing):
def __getattribute__(self, attr):
# See if ndarray has this attr, and return it if so. (note that this
# means a field with the same name as an ndarray attr cannot be
# accessed by attribute).
try:
return object.__getattribute__(self, attr)
except AttributeError: # attr must be a fieldname
pass
# look for a field with this name
fielddict = ndarray.__getattribute__(self, 'dtype').fields
try:
res = fielddict[attr][:2]
except (TypeError, KeyError):
raise AttributeError("recarray has no attribute %s" % attr)
obj = self.getfield(*res)
# At this point obj will always be a recarray, since (see
# PyArray_GetField) the type of obj is inherited. Next, if obj.dtype is
# non-structured, convert it to an ndarray. If obj is structured leave
# it as a recarray, but make sure to convert to the same dtype.type (eg
# to preserve numpy.record type if present), since nested structured
# fields do not inherit type.
if obj.dtype.fields:
return obj.view(dtype=(self.dtype.type, obj.dtype.fields))
else:
return obj.view(ndarray)
It first it tries to get a regular attribute - things like .shape
, .strides
, .data
, as well as all the methods (.sum
, .reshape
, etc). Failing that it then looks up the name in the dtype
field names. So it is really just a structured array with some redefined access methods.
As best I can tell record array
and recarray
are the same.
Another file shows something of the history
https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py
Collection of utilities to manipulate structured arrays. Most of these functions were initially implemented by John Hunter for matplotlib. They have been rewritten and extended for convenience.
Many of the functions in this file end with:
if asrecarray:
output = output.view(recarray)
The fact that you can return an array as recarray
view shows how 'thin' this layer is.
numpy
has a long history, and merges several independent projects. My impression is that recarray
is an older idea, and structured arrays the current implementation that built on a generalized dtype
. recarrays
seem to be kept for convenience and backward compatibility than any new development. But I'd have to study the github
file history, and any recent issues/pull requests to be sure.
Answered By - hpaulj
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.