Issue
I have a dataclass defined like so:
from typing import List
from dataclasses import dataclass, field
@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))
and I have a list of names and statements which I want to combine. Each item in the list is going to have an id which may be shared any number of times. I want to append the statement part of each item in the list to the set of speakers.
This is what I have so far:
test = [(1, 'john', 'foo'),(1, 'john', 'bar'),(2, 'jane', 'near'),(2, 'george', 'far')]
speakers = set()
for i in test:
id, name, statement = i
Speaker(id, name)
# This line needs to change
speakers.add(Speaker(id, name, [statement]))
print(speakers)
Current output
{Speaker(id=1, name='john', statements=['foo']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}
What I want
{Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name= 'jane', statements=['near']), Speaker(id=2, name= 'george', statements=['far'])}
Please let me know if you have any suggestions. The number fields are subject to change (I may add title, ect.), so converting to a dict probably won't work.
Edit: Added an extra field called name to clarify the situation.
Solution
Instead of using a set
, I think it makes more sense to use a dict
type in this case. This should be O(1)
for lookup, and also this way we can avoid next
to find an element by its Speaker
hashed value. Example below.
from pprint import pprint
from typing import List
from dataclasses import dataclass, field
@dataclass
class Speaker:
id: int
name: str
statements: List[str] = field(default_factory=list)
def __eq__(self, other):
return self.id == other.id and self.name == other.name
def __hash__(self):
return hash((self.id, self.name))
test = [(1, 'john', 'foo'), (1, 'john', 'bar'), (2, 'jane', 'near'), (2, 'george', 'far')]
speakers = {}
for id, name, statement in test:
key = Speaker(id, name)
speaker = speakers.setdefault(key, key)
speaker.statements.append(statement)
print(speakers)
print()
pprint(list(speakers.values()))
Out:
{Speaker(id=1, name='john', statements=['foo', 'bar']): Speaker(id=1, name='john', statements=['foo', 'bar']), Speaker(id=2, name='jane', statements=['near']): Speaker(id=2, name='jane', statements=['near']), Speaker(id=2, name='george', statements=['far']): Speaker(id=2, name='george', statements=['far'])}
[Speaker(id=1, name='john', statements=['foo', 'bar']),
Speaker(id=2, name='jane', statements=['near']),
Speaker(id=2, name='george', statements=['far'])]
Edit: as the dataclass itself is hashable, here I think it makes sense to store the Speaker object itself (uniquely identified by id
, name
, any other fields that are defined in __hash__
) as both the key and value. This is a little roundabout, so if you wanted you could also store the tuple of the values you want to hash - i.e. (id, name)
- which should also work. In either case, this should still be more efficient since it uses dict.setdefault
which is still O(1)
time to best of my knowledge.
Answered By - rv.kvetch
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.