Issue
I have a list of dictionaries in this form: (example) [{name: aa, year: 2022}, {name: aa, year: 2021}, {name: bb, year: 2016}, {name: cc, year: 2015}]
. What i need is to remove the items where the name is the same, but make a list where the years are added together (every year can be in a list, for my purposes, this doesn't matter). So the example list of dictionaries would look like this: [{name: aa, year: [2022, 2021}, {name: bb, year: [2016]}, {name: cc, year: [2015]}]. My current code looks like this.
def read_csv_file(self, path):
book_list = []
with open(path) as f:
read_dict = csv.DictReader(f)
for i in read_dict:
book_list.append(i)
bestsellers = []
for i in list_of_books:
seen_books = []
years_list = []
if i["Name"] not in seen_books:
years_list.append(i["Year"])
seen_books.append(i)
else:
years_list.append(i["Year"])
if i['Genre'] == 'Non Fiction':
bestsellers.append(FictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
else:
bestsellers.append(NonFictionBook(i["Name"], i["Author"], float(i["User Rating"]), int(i["Reviews"]), float(i["Price"]), years_list, i["Genre"]))
for i in bestseller:
print(i.title)
Ultimately my code needs to extract data from a csv file and then create instances of the class Fictionbook or Nonfictionbook depending on the genre. I think i have the CSV file and making the books finished, i just need to filter the near-duplicate dictionaries and merge them in the lists of years if that makes sense. If anything is unclear please let me know, so i can explain further.
Solution
Use dict.setdefault()
to create a list if the key has not yet been seen:
lod=[{'name': 'aa', 'year': 2022}, {'name': 'aa', 'year': 2021}, {'name': 'bb', 'year': 2016}, {'name': 'cc', 'year': 2015}]
result={}
for d in lod:
result.setdefault(d['name'], []).append(d['year'])
>>> result
{'aa': [2022, 2021], 'bb': [2016], 'cc': [2015]}
Then put the list back together:
>>> [{'name': n, 'year': v} for n,v in result.items()]
[{'name': 'aa', 'year': [2022, 2021]}, {'name': 'bb', 'year': [2016]}, {'name': 'cc', 'year': [2015]}]
From comments:
Great answer, thanks. How would i go about in implementing this in my system if i have more than 2 key,value pairs per dictionary? For example {name: aa, singer: bb, album: gg, year: 2022}
I would do what you are describing differently. It appears you are creating a database of books, albums and authors. Use a class to describe piece of data that you want to catalog.
Consider this simple entry for a piece of art, book, etc:
class Entry:
def __init__(self, n, name=None, author=None, singer=None, title=None, year=None):
self.num=n
self.title=title
self.singer=singer
self.name=name
self.year=year
self.author=author
# etc
def __repr__(self): # allows each item to be printed
return repr(({self.num}, {self.year}, {self.author}))
Now create some dummy entries:
import random
entries=[Entry(i,
author=random.choice(['Bob', 'Carol', 'Ted', 'Alice', 'Lisa']),
year=random.randint(1700, 2022)
)
for i in range(3_000_000)]
Creating 3,000,000 entries (a bit more than 1% of the Library of Congress book catalog) takes about 5 seconds.
You could query it like so:
# book for 1799 with an author with 'a' in the name?
[e for e in entries if e.year==1799 and 'a' in e.author.lower() ]
That query took about 1.4 secs on my computer.
It would be monumentally faster using a better data structure than a list of objects (with those objects being dicts or the object shown here.)
A candidate would be a form of a tree but it all depends on what you are looking to query from this data. The Dewey Decimal System is a particular form of a tree.
Answered By - dawg
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.