Issue
I want to create a dataclass
from a dict
not only with the values of the dict
but also with it's keys
automatically recognized as field names for the dataclass
.
The input is
d = {'a': 3, 'b': 7}
Now I want to do something making like this
import dataclasses
# Hocus pocus
X = dataclasses.dataclass_from_dict(name='X', the_dict=d)
print(X) # <class '__main__.X'>
z = X(a=3, b=99)
print(z) # X(a=3, b=99)
The important point here is that the dataclass and it's fields is created automatically out of the keys of the dictionary. So there is no need to know the structure and the keys of the dict.
What I tried so far
I tried dataclasses.make_dataclass()
but the result (AUTO
) is different from a dataclasse created the usual way (MANUAL
).
>>> d = {'a': 3, 'b': 7}
>>> AUTO = dataclasses.make_dataclass('AUTO', [(key, type(d[key])) for key in d])
>>> @dataclass
... class MANUAL:
... a: int
... b: int
...
>>> AUTO
<class 'types.AUTO'>
>>> MANUAL
<class '__main__.MANUAL'>
Solution
In this scenario, the type hinting and auto-complete benefits would largely be missed, so I would personally suggest going with a custom-built DotDict
approach as outlined below.
I was curious so I timed this against the dataclasses.make_dataclass
approach. If you are interested, I have also attached my complete test code I used for benchmark purposes.
Update (6/22): I’ve come up with a library for this and published on pypi - dotwiz
. Check it out. It should be just as fast as the approach below, with a few noticeable improvements.
import dataclasses
from timeit import timeit
class DotDict(dict):
__getattr__ = dict.__getitem__
__delattr__ = dict.__delitem__
def __repr__(self):
fields = [f'{k}={v!r}' for k, v in self.items()]
return f'{self.__class__.__name__}({", ".join(fields)})'
def make_dot_dict(input_dict: dict) -> DotDict:
"""
Helper method to generate and return a `DotDict` (dot-access dict) from a
Python `dict` object.
"""
return DotDict(
(
k,
make_dot_dict(v) if isinstance(v, dict)
else [make_dot_dict(e) if isinstance(e, dict) else e
for e in v] if isinstance(v, list)
else v
) for k, v in input_dict.items()
)
def main():
d = {'a': 3, 'b': 1, 'c': {'aa': 33, 'bb': [{'x': 77}]}}
X = dataclasses.make_dataclass('X', d)
n = 10_000
globals().update(locals())
time_to_make_dataclass = timeit("dataclasses.make_dataclass('X', d)", number=n, globals=globals())
time_to_instantiate_dataclass = timeit("X(**d)", number=n, globals=globals())
time_to_instantiate_dot_dict = timeit("make_dot_dict(d)", number=n, globals=globals())
print(f'dataclasses.make_dataclass: {time_to_make_dataclass:.3f}')
print(f'instantiate dataclass (X): {time_to_instantiate_dataclass:.3f}')
print(f'instantiate dotdict (DotDict): {time_to_instantiate_dot_dict:.3f}')
print()
create_instance_perc = time_to_instantiate_dot_dict / time_to_instantiate_dataclass
total_time_perc = (time_to_make_dataclass + time_to_instantiate_dataclass) / time_to_instantiate_dot_dict
print(f'It is {create_instance_perc:.0f}x faster to create a dataclass instance')
print(f'It is {total_time_perc:.0f}x faster (overall) to create a DotDict instance')
# create new `DotDict` and check we can use dot-access as well as dict-access
dd = make_dot_dict(d)
assert dd.b == 1
assert dd.c.aa == 33
assert dd['c']['aa'] == 33
assert dd.c.bb[0].x == 77
# create new dataclass `X` instance
x = X(**d)
# assert result is same between both DotDict and dataclass approach
assert dd == x.__dict__
if __name__ == '__main__':
main()
I received the following results on my Mac (M1 chip):
dataclasses.make_dataclass: 1.342
instantiate dataclass (X): 0.002
instantiate dotdict (DotDict): 0.013
It is 6x faster to create a dataclass instance
It is 100x faster (overall) to create a DotDict instance
As expected, I found the DotDict
approach to perform overall much better in the general case. This is mainly because it doesn't need to dynamically generate a new class, and scan through the dict
object once to generate the dataclass fields and their types.
Though once the class is initially created, I was surprised to find that the dataclass approach performs about 5x better in an average case.
Answered By - rv.kvetch
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.