Issue
I would like to extract focal mechanism information from the GCMT catalog (https://www.globalcmt.org/). In the future I plan on doing this in an automated way in python to extract earthquake information within python
outside of the GCMT webpage for plotting/analysis.
Here's the code I have so far with an example URL:
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = "https://www.globalcmt.org/cgi-bin/globalcmt-cgi-bin/CMT5/form?itype=ymd&yr=1976&mo=1&day=1&oyr=1976&omo=1&oday=1&jyr=1976&jday=1&ojyr=1976&ojday=1&otype=nd&nday=365&lmw=0&umw=10&lms=0&ums=10&lmb=0&umb=10&llat=-90&ulat=90&llon=-180&ulon=180&lhd=0&uhd=1000<s=-9999&uts=9999&lpe1=0&upe1=90&lpe2=0&upe2=90&list=6"
r = requests.get(URL).text
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html5lib")
text = soup.body.get_text(separator= '\n', strip=True)
print(text)
Global CMT Catalog
Search criteria:
Start date: 1976/1/1 End date: 1976/12/30
-90 <=lat<= 90 -180 <=lon<= 180
0 <=depth<= 1000 -9999 <=time shift<= 9999
0 <=mb<= 10 0<=Ms<= 10 0<=Mw<= 10
0 <=tension plunge<= 90 0 <=null plunge<= 90
Results
Output in
GMT
psmeca (GMT v>3.3) format
Columns: lon lat depth mrr mtt mpp mrt mrp mtp iexp name
-176.96 -29.25 48 7.68 0.09 -7.77 1.39 4.52 -3.26 26 X Y 010176A
-75.14 -13.42 85 -1.78 -0.59 2.37 -1.28 1.97 -2.90 24 X Y 010576A
159.50 51.45 15 1.10 -0.30 -0.80 1.05 1.24 -0.56 25 X Y 010676A
...
I'm still new to python/webscraping but I would like to extract the data from containing (Columns: lon lat depth mrr mtt mpp mrt mrp mtp iexp name) excluding the footer information (End of events found with given criteria.) and beyond.
The output would contain column information: lon lat depth mrr mtt mpp mrt mrp mtp iexp name
Then the data (e.g.): -176.96 -29.25 48 7.68 0.09 -7.77 1.39 4.52 -3.26 26 X Y 010176A
Solution
You could create a list
of dicts
from header
and values:
header = soup.select_one('pre:nth-of-type(2)').find_previous(text=True).split()[1:]
header[10:10] = ['x','y']
for l in soup.select_one('pre:nth-of-type(2)').text.splitlines():
d = l.split()
#d[10:13] = [' '.join([str(x) for x in d[10:13]])]
# del d[10:12]
data.append(dict(zip(header,d)))
Tricky part in my opinion is that you have to handle the the last elements in your list
to avoid missmatch to headers.
Assuming "X Y ..." belong together:
d[10:13] = [' '.join([str(x) for x in d[10:13]])]
or if they are not needed simply delete them:
del d[10:12]
or adjust the headers instead:
header[10:10] = ['x','y']
Example
import requests
from bs4 import BeautifulSoup
import pandas as pd
URL = "https://www.globalcmt.org/cgi-bin/globalcmt-cgi-bin/CMT5/form?itype=ymd&yr=1976&mo=1&day=1&oyr=1976&omo=1&oday=1&jyr=1976&jday=1&ojyr=1976&ojday=1&otype=nd&nday=365&lmw=0&umw=10&lms=0&ums=10&lmb=0&umb=10&llat=-90&ulat=90&llon=-180&ulon=180&lhd=0&uhd=1000<s=-9999&uts=9999&lpe1=0&upe1=90&lpe2=0&upe2=90&list=6"
r = requests.get(URL).text
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html5lib")
data = []
header = soup.select_one('pre:nth-of-type(2)').find_previous(text=True).split()[1:]
header[10:10] = ['x','y']
for l in soup.select_one('pre:nth-of-type(2)').text.splitlines():
d = l.split()
#d[10:13] = [' '.join([str(x) for x in d[10:13]])]
# del d[10:12]
data.append(dict(zip(header,d)))
pd.DataFrame(data)
Output
lon | lat | depth | mrr | mtt | mpp | mrt | mrp | mtp | iexp | x | y | name | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | -176.96 | -29.25 | 48 | 7.68 | 0.09 | -7.77 | 1.39 | 4.52 | -3.26 | 26 | X | Y | 010176A |
1 | -75.14 | -13.42 | 85 | -1.78 | -0.59 | 2.37 | -1.28 | 1.97 | -2.9 | 24 | X | Y | 010576A |
2 | 159.5 | 51.45 | 15 | 1.1 | -0.3 | -0.8 | 1.05 | 1.24 | -0.56 | 25 | X | Y | 010676A |
3 | 167.81 | -15.97 | 174 | -1.7 | 2.29 | -0.59 | -2.33 | -1.23 | 2.01 | 25 | X | Y | 010976A |
4 | -16.29 | 66.33 | 15 | -0.51 | -2.86 | 3.37 | 0.05 | -0.78 | -0.86 | 25 | X | Y | 011376A |
5 | -177.04 | -29.69 | 47 | 4.78 | -0.49 | -4.3 | 0.83 | 3.62 | -1.32 | 27 | X | Y | 011476A |
6 | -176.75 | -28.72 | 18 | 2.56 | 0.18 | -2.74 | 3.58 | 6.77 | -1.23 | 27 | X | Y | 011476B |
7 | -176.62 | -28.61 | 15 | 2.34 | 0.24 | -2.58 | 0.62 | 3.71 | -0.68 | 25 | X | Y | 011476C |
8 | -176.63 | -30.25 | 15 | 1.44 | 0.06 | -1.5 | 0.3 | 1.18 | -0.46 | 25 | X | Y | 011576A |
...
Answered By - HedgeHog
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.