Issue
I have a list of JSON data and hope to extract all ids from "data-target-user-id". How can I do this using regex or beautifulsoup?
my_lst_JSON = [‘<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>’,
‘<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>’,
‘<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>’,
‘<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>’,
‘<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>’]
Expected outputs:
['115389', '1109', '2890', '22567', '33872']
I tried the approach suggested by @zx485 with the coding below:
ids_lst = []
for data in my_lst_JSON:
soup = BeautifulSoup(data, 'xml')
ids_lst.append([item.text for item in soup.findAll('div/@user-id')])
It returns a blank list...
Any thought is appreciated!
Thank you.
Solution
This should works
from bs4 import BeautifulSoup as bs
my_lst_JSON = ['<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>']
user_ids = [bs(item).find("div")["user-id"] for item in my_lst_JSON]
print(user_ids)
But I like @Boomshakalaka solution better, it's more elegant
import re
my_lst_JSON = ['<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="115389" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="1109" user-type="9" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="2890" user-type="7" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="22567" user-type="8" data-viewing-self="false" <!--section --></div></div></div>',
'<div id="a-exse-tile" class="j-exse profile-tile" data-exp-enabled="true" user-id="33872" user-type="1" data-viewing-self="false" <!--section --></div></div></div>']
user_ids = [re.findall('\d+|$', item)[0] for item in my_lst_JSON]
print(user_ids)
Answered By - justgoodin
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.