Issue
I 'm trying to parse data from website using beautifulsoap in python and finally I pulled data from website so I want to save data in json file but it saves the data as follows according to the code I wrote
json file
[
{
"collocation": "\nabove average",
"meaning": "more than average, esp. in amount, age, height, weight etc. "
},
{
"collocation": "\nabsolutely necessary",
"meaning": "totally or completely necessary"
},
{
"collocation": "\nabuse drugs",
"meaning": "to use drugs in a way that's harmful to yourself or others"
},
{
"collocation": "\nabuse of power",
"meaning": "the harmful or unethical use of power"
},
{
"collocation": "\naccept (a) defeat",
"meaning": "to accept the fact that you didn't win a game, match, contest, election, etc."
},
my code:
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import pandas as pd
import json
url = "https://www.englishclub.com/ref/Collocations/"
mylist = [
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W"
]
list = []
for i in range(23):
result = requests.get(url+mylist[i]+"/", headers=headers)
doc = BeautifulSoup(result.text, "html.parser")
collocations = doc.find_all(class_="linklisting")
for tag in collocations:
case = {
"collocation": tag.a.string,
"meaning": tag.div.string
}
list.append(case)
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(list, f, ensure_ascii=False, indent=4)
but for example, I want to have a list for each letter, for example, one list for A and one more list for B so that I can easily find which one starts with which letter and use it. How can I do that. And as you can see in the json file there is always \
at the beginning of the collocation how can I remove it?
Solution
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json
url = "https://www.englishclub.com/ref/Collocations/"
mylist = [
"A",
"B",
"C",
"D",
"E",
"F",
"G",
"H",
"I",
"J",
"K",
"L",
"M",
"N",
"O",
"P",
"Q",
"R",
"S",
"T",
"U",
"V",
"W"
]
#you can use dictionary instead list. suits your needs better
list = {}
#just for quick testing, i set range to 4
for i in range(4):
list[mylist[i]] = [] #make an empty list for your collocations
result = requests.get(url+mylist[i]+"/")
doc = BeautifulSoup(result.text, "html.parser")
collocations = doc.find_all(class_="linklisting")
for tag in collocations:
case = {
"collocation": tag.a.string.replace("\n",""),#replace \n indentations
"meaning": tag.div.string
}
list[mylist[i]].append(case)#add collocation to related list
with open('data.json', 'w', encoding='utf-8') as f:
json.dump(list, f, ensure_ascii=False, indent=4)
I have written a comment for changed parts. We created an array for every letter you have in dictionary. So in the future uses, you can get them only with keys without worry about indexes
However this is the output
{
"A": [
{
"collocation": "above average",
"meaning": "more than average, esp. in amount, age, height, weight etc. "
},
{
"collocation": "absolutely necessary",
"meaning": "totally or completely necessary"
}
],
"B": [
{
"collocation": "back pay",
"meaning": "money a worker earned in the past but hasn't been paid yet "
},
{
"collocation": "back road",
"meaning": "a small country road "
},
{
"collocation": "back street",
"meaning": "a street in a town or city that's away from major roads or central areas"
}
],
"C": [
{
"collocation": "call a meeting",
"meaning": "to order or invite people to hold a meeting"
},
{
"collocation": "call a name",
"meaning": "to say somebody's name loudly"
},
{
"collocation": "call a strike",
"meaning": "to decide that workers will protest by not going to work "
}
],
"D": [
{
"collocation": "daily life",
"meaning": "life as experienced from day to day"
},
{
"collocation": "dead ahead",
"meaning": "straight ahead"
},
{
"collocation": "dead body",
"meaning": "corpse, or the body of someone who's died"
}
]
}
Answered By - Mustafa KÜÇÜKDEMİRCİ
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.