Issue
import re
import requests
from bs4 import BeautifulSoup
webpage = 'https://xgis.maaamet.ee/xgis2/page/app/ristipuud'
----------
response = requests.get(site)
bsoup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = \[img\['src'\] for img in img_tags\]
for url in urls:
filename = re.search(r'/(\[\\w\_-\]+\[.\](jpg|gif|tif|png))$', url)
if not filename:
print("didn't match with the url: {}".format(url))
continue
with open(filename.group(1), 'wb') as f:
if 'http' not in url:
url = '{}{}'.format(webpage, url)
response = requests.get(url)
f.write(response.content)`
#code for Lithuania
import time
import requests
from bs4 import BeautifulSoup
import os
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
def get_file_name(url):
tokens = url.split("/")
file_name = tokens[-1].split("?")[0]
return file_name
# Start timer
start_time = time.time()
print("Start time: ", start_time)
# Create image directory
image_directory = 'images'
isExist = os.path.exists(image_directory)
if not isExist:
os.makedirs(image_directory)
template = "https://www.geoportal.lt/map/webapp/rest/mapgateway/6100e156c755e15f6e46a8820824d8c595d30ae51?f=json"
response = requests.get(template)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
link = soup.find("a")
if link is not None:
url = 'https://www.geoportal.lt/' + link['href']
file_name = get_file_name(url)
print(file_name)
# Save zip file
download_url(url, './' + image_directory + '/' + file_name)
# End timer
end_time = time.time()
# Calculate elapsed time
elapsed_time = end_time - start_time[![image attached[![\]\[1\]][1]][1]][1]
print("Elapsed time: ", elapsed_time)
Link: https://www.geoportal.lt/map/index.jsp?lang=en
I want to download satellite images from this website (link: https://xgis.maaamet.ee/xgis2/page/app/ristipuud). There is about 6000 satellite images in tif format. Among them, I want to get 500 for my research. I have to repeat the same process frequently, so want to get it by scraping. but I am having problem. When I run this code, it doesnt show me any error but it also not downloading any data. Images on the website are divided into tiles and it can be downloaded separately by searching with the tile number from this link https://geoportaal.maaamet.ee/eng/Maps-and-Data/Orthophotos/Download-Orthophotos-p662.html . RGB Orthophotos comes in a zip file in .tif format. There are multiple version of the image depending on year and I want to get the latest one. But, unfortunately, my code is not working. Could you please help me to identify mistakes in my code or share your experience. I am novice in coding and trying to learn more.
Solution
This code can download the zipped map files.
import time
import requests
from bs4 import BeautifulSoup
import os
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
def get_file_name(url):
tokens = url.split("&")
for token in tokens:
if(token[:2] == 'f='):
return token[2:]
return ''
# Start timer
start_time = time.time()
print("Start time: ", start_time)
# create image directory
image_directory = 'images'
isExist = os.path.exists(image_directory)
if not isExist:
os.makedirs(image_directory)
# get zip URL and file name
start_sheet = 44744
end_sheet = 44844 # you need to change with 74331, I just test 100 range
total_download = 0
for index in range(start_sheet, end_sheet):
template = "https://geoportaal.maaamet.ee/index.php?lang_id=2&plugin_act=otsing&page_id=662&&kaardiruut={sheet_number:n}&andmetyyp=ortofoto_eesti_rgb"
webpage = template.format(sheet_number = index)
response = requests.get(webpage)
if (response.status_code == 200):
soup = BeautifulSoup(response.content, "html.parser")
link = soup.find("a")
if link is not None:
url = 'https://geoportaal.maaamet.ee/' + link['href']
file_name = get_file_name(url)
print(file_name)
# save zip file
download_url(url, './' + image_directory + '/' + get_file_name(url))
total_download = total_download + 1
# End timer
end_time = time.time()
# Calculate elapsed time
elapsed_time = end_time - start_time
print("Elapsed time: ", elapsed_time)
print("Total Download zip files: ", total_download)
Result after finish
If unzip, you can see the geoTIFF file.
Main Idea
As you pointed this URL
https://geoportaal.maaamet.ee/eng/Maps-and-Data/Orthophotos/Download-Orthophotos-p662.html
The sheet number range is indicated
Map sheet numbers of 1:10000 scale are between 44744 to 74331.
In the Chrome(or Firefox), if press F12
key, Dev Tool
will show.
The 'Network' tab can see the https
call at header
tab.
After open this screen, you can see the request URL when you search with sheet number(44744) by pressing search
button.
This is the template URL.
https://geoportaal.maaamet.ee/index.php?lang_id=2&plugin_act=otsing&page_id=662&&kaardiruut=44744&andmetyyp=ortofoto_eesti_rgb&_=1686945341505
The kaardiruut
parameter is key to the switch sheet number.
kaardiruut=44744
To download a program increases its number value for changing another area.
Update for Lithuania with Orthphoto 2021-2013
Lithuania map is not support zip download, it support direct map image download.
This map server is a good example of tile Map
https://www.maptiler.com/google-maps-coordinates-tile-bounds-projection/#10/24.70/56.21
https://www.geoportal.lt/map
format
"https://www.geoportal.lt/map/webapp/rest/mapgateway/{year_id:s}/tile/{scale:n}/{y:n}/{x:n}"
year_id example
Ortophoto 2021-2023 is '6100e156c755e15f6e46a8820824d8c595d30ae50'
Ortophoto 2018~2020 is '8ddf422a20f8a22fd7c116ef7d6a46eec4126d521'
scale = 8 # (1 : 10 000)
x (longitude) range for 2021-2023
start_x = min number 8263
end_x = max number 8510
y (latitude) range for 2021-2023
start_y = min number 5524
end_y = max number 5839
Demo code
import time
import requests
import os
import requests, imghdr
def download_url(image_url, save_path):
# copy from Chrome's Network Tab/Headers/Request Headers/Cookie
cookies = {'JSESSIONID_MWEB': '26F90E44851C5CC9CD41E7A1AE056C54;'}
response = requests.get(url=image_url, cookies=cookies)
if response.status_code == 200:
extension = imghdr.what(file=None, h=response.content)
print(save_path + '.' + extension)
with open(save_path + '.' + extension, 'wb') as handler:
handler.write(response.content)
return True
return False
def get_file_name(url):
file_name = url.rsplit('/',1)[1] # file name
return file_name
def get_directory_name(url):
x = url.rsplit('/',2)[1]
scale = url.rsplit('/',3)[1]
return scale + '/' + x
def create_directory_name(directory_name):
isExist = os.path.exists(directory_name)
if not isExist:
os.makedirs(directory_name)
start_x = 8330 # min number 8263
end_x = 8334 # max number 8510
start_y = 5635 # min number 5524
end_y = 5640 # max number 5839
total_download = 0
year_id = '6100e156c755e15f6e46a8820824d8c595d30ae50' # Ortophoto 2021-2023
scale = 8 # (1 : 10 000)
# Start timer
start_time = time.time()
print("Start time: ", start_time)
for x_number in range(start_x, end_x):
for y_number in range(start_y, end_y):
# ~/{year_id}/tile/{scale}/{y}/{x}
template = "https://www.geoportal.lt/map/webapp/rest/mapgateway/{year_id:s}/tile/{scale:n}/{y:n}/{x:n}"
url = template.format(year_id = year_id, scale = scale, y = y_number, x = x_number)
directory=get_directory_name(url)
create_directory_name('./' + directory + '/')
success = download_url(url, './' + directory + '/' + get_file_name(url))
if (success == True):
total_download = total_download + 1
# End timer
end_time = time.time()
# Calculate elapsed time
elapsed_time = end_time - start_time
print("Elapsed time: ", elapsed_time, " Secs")
print("Total Download tile files: ", total_download)
I got URL and cookies from Dev Tools
Update distance, geo location and image resolution
You can see the grids over tile and location by meter resolution on left bottom area. (I found a defect, X and Y is switched) The unit is meter.
https://www.geoportal.lt/map/index.jsp?lang=en
I calculate red rectangle distance by mouse hover with display x, y location capture values. And pasted each point and calculate the distance. (again X, Y needs to switch - that was bug)
X distance = 3,909 m (yellow color)
Y distance = 4,637 m (green color)
Back to our program to figure out how much tile's resolution
All of tile (256 * 256) pixels image - that is a small size file.
Re-run my program for getting that area
start_x = 8416 # min number 8263
end_x = 8424 # max number 8510
start_y = 5604 # min number 5524
end_y = 5611 # max number 5839
Got this result
I will calculate how much pixel size is the real-world distance? the red crossed width of late is 1092 meters (I measured mouse hover by the left bottom area tool) delta X = 582123 - 581031 = 1092 m
total pixels 418 pixels = 56 + 256 + 106
pixel per meter = 1085 m/ 418 px= 2.59 m/pixel
So my calculation is 2.5 m/pixel. (guessing)
1 tile size = 256 pixels * 256 pixels = 640 m * 640 m
If you many images, example 20000 * 20000 pixels (like ESTIJA's GeoTiff), like 78 tiles * 78 tiles It will be high-resolution images.
I hope this my guessing is matched the real size. Good luck! I have no more time to spend on this question. Other areas investigate by yourself.
Answered By - Bench Vue
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.