Issue
I am working on a small Python function to scrape data from clinicalTrials.gov. From each Study Record, I wish to scrape the conditions that the study is targeting. For example, for this study record I want the following:
conditions = ['Rhinoconjunctivitis', 'Rhinitis', 'Conjunctivitis'. 'Allergy']
However, in each study record, there are different numbers of conditions. I have written the following script which gets the data:
page = requests.get('https://clinicaltrials.gov/ct2/show/study/NCT00550550')
soup = BeautifulSoup(page.text, 'html.parser')
studyDesign = soup.find_all(headers='studyInfoColData')
condition = soup.find(attrs={'class':'data_table'}).find_all('span')
for each in condition:
print(each.text.encode('utf-8').strip())
like so:
b'Condition or disease'
b'Intervention/treatment'
b'Phase'
b'Rhinoconjunctivitis'
b'Rhinitis'
b'Conjunctivitis'
b'Allergy'
b'Drug: Placebo'
b'Biological: SCH 697243'
b'Drug: Loratadine Syrup 1 mg/mL Rescue Treatment'
b'Drug: Loratadine 10 mg Rescue Treatment'
b'Drug: Olopatadine 0.1% Rescue Treatment'
b'Drug: Mometasone furoate 50 mcg Rescue Treatment'
b'Drug: Albuterol 108 mcg Rescue Treatment'
b'Drug: Fluticasone 44 mcg Rescue Treatment'
b'Drug: Prednisone 5 mg Rescue Treatment'
b'Phase 3'
How can I now only get the condition without the intervention/treatment info?
Solution
You can just use the first table
with class data_table
& extract span
element in td
:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://clinicaltrials.gov/ct2/show/study/NCT00550550')
soup = BeautifulSoup(page.text, 'html.parser')
studyDesign = soup.find("table", {"class" : "data_table"}).find('td')
conditions = [ t.text.strip() for t in studyDesign.find_all('span') ]
print(conditions)
which gives :
[u'Rhinoconjunctivitis', u'Rhinitis', u'Conjunctivitis', u'Allergy']
Answered By - Bertrand Martel
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.