Monday, June 20, 2022

[FIXED] Python Add Dataclass to Set

June 20, 2022 python, python-3.x No comments

Issue

I have a dataclass defined like so:

from typing import List
from dataclasses import dataclass, field

@dataclass
class Speaker:
    id: int
    name: str
    statements: List[str] = field(default_factory=list)

    def __eq__(self, other):
        return self.id == other.id and self.name == other.name

    def __hash__(self):
        return hash((self.id, self.name))

and I have a list of names and statements which I want to combine. Each item in the list is going to have an id which may be shared any number of times. I want to append the statement part of each item in the list to the set of speakers.

This is what I have so far:

test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]

speakers = set()
for i in test:
    id, name, statement = i
    Speaker(id, name)
    
    # This line needs to change
    speakers.add(Speaker(id, name, [statement]))

print(speakers)

Current output

{Speaker(id=1, name='john', statements=['foo']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}

What I want

{Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}

Please let me know if you have any suggestions. The number fields are subject to change (I may add title, ect.), so converting to a dict probably won't work.

Edit: Added an extra field called name to clarify the situation.

Solution

Instead of using a set, I think it makes more sense to use a dict type in this case. This should be O(1) for lookup, and also this way we can avoid next to find an element by its Speaker hashed value. Example below.

from pprint import pprint
from typing import List
from dataclasses import dataclass, field


@dataclass
class Speaker:
    id: int
    name: str
    statements: List[str] = field(default_factory=list)

    def __eq__(self, other):
        return self.id == other.id and self.name == other.name

    def __hash__(self):
        return hash((self.id, self.name))


test = [(1, 'john', 'foo'), (1, 'john', 'bar'), (2, 'jane', 'near'), (2, 'george', 'far')]
speakers = {}

for id, name, statement in test:
    key = Speaker(id, name)
    speaker = speakers.setdefault(key, key)
    speaker.statements.append(statement)

print(speakers)
print()
pprint(list(speakers.values()))

Out:

{Speaker(id=1, name='john', statements=['foo', 'bar']): Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='jane', statements=['near']): Speaker(id=2, name='jane', statements=['near']), Speaker(id=2, name='george', statements=['far']): Speaker(id=2, name='george', statements=['far'])}

[Speaker(id=1, name='john', statements=['foo', 'bar']),
 Speaker(id=2, name='jane', statements=['near']),
 Speaker(id=2, name='george', statements=['far'])]

Edit: as the dataclass itself is hashable, here I think it makes sense to store the Speaker object itself (uniquely identified by id, name, any other fields that are defined in __hash__) as both the key and value. This is a little roundabout, so if you wanted you could also store the tuple of the values you want to hash - i.e. (id, name) - which should also work. In either case, this should still be more efficient since it uses dict.setdefault which is still O(1) time to best of my knowledge.

Answered By - rv.kvetch

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Monday, June 20, 2022

[FIXED] Python Add Dataclass to Set

Issue

Current output

What I want

Solution

0 comments:

Post a Comment

Popular Posts

Labels