Issue
I'm using Proxybroker which is successfully printing out to a .txt file... However here is the (non-usable) format.
<Proxy US 0.18s [HTTP: High] 148.76.97.250:80>
<Proxy US 0.19s [HTTP: High] 47.88.62.42:80>
<Proxy US 0.43s [HTTP: High, HTTPS] 107.173.153.197:7777>
<Proxy US 0.35s [HTTP: High] 47.56.110.204:8989>
<Proxy US 0.42s [HTTP: High] 216.137.184.253:80>
<Proxy US 0.32s [HTTPS] 20.111.54.16:80>
<Proxy US 0.33s [HTTPS] 20.206.106.192:80>
<Proxy US 0.37s [HTTPS] 20.210.113.32:80>
<Proxy US 0.57s [HTTP: High] 4.175.121.88:80>
<Proxy US 0.40s [HTTPS] 20.205.61.143:80>
<Proxy US 0.78s [HTTP: High] 104.45.128.122:80>
<Proxy US 1.01s [HTTP: High] 162.223.91.11:80>
<Proxy US 0.75s [HTTP: High, HTTPS] 8.209.114.72:3129>
<Proxy US 0.85s [HTTPS] 20.24.43.214:80>
<Proxy US 1.60s [HTTP: High, HTTPS] 8.219.97.248:80>
<Proxy US 2.17s [HTTPS] 64.189.106.6:3129>
<Proxy US 2.52s [HTTPS] 209.97.188.59:3128>
<Proxy US 0.42s [HTTPS] 107.172.157.246:3128>
<Proxy US 0.69s [HTTPS] 12.218.209.130:53281>
<Proxy US 3.00s [HTTP: High] 74.208.177.198:80>
<Proxy US 3.66s [HTTP: High] 162.223.89.84:80>
<Proxy US 2.18s [HTTP: High] 162.240.76.92:80>
<Proxy US 3.62s [HTTP: High] 157.245.97.60:80>
<Proxy US 3.49s [HTTP: High] 104.211.29.96:80>
<Proxy US 2.68s [HTTP: High] 138.68.225.200:80>
<Proxy US 1.79s [HTTPS] 47.91.65.23:3128>
How do I strip the trailing ">", & all of the text leading up to the number e.g. "<Proxy US 1.79s [HTTPS] "; while still keeping track of whether HTTPS/HTTP (If that matters)
Thank you all a Million!!!
I really appreciate the communities help in helping me figure out this problem /\ <3
Solution
you can use the regular expressions
module to get the IP addresses. Here is a quick example that works or you can modify accordingly.
import re
text = """<Proxy US 0.18s [HTTP: High] 148.76.97.250:80>
<Proxy US 0.19s [HTTP: High] 47.88.62.42:80>
<Proxy US 0.43s [HTTP: High, HTTPS] 107.173.153.197:7777>
<Proxy US 0.35s [HTTP: High] 47.56.110.204:8989>
<Proxy US 0.42s [HTTP: High] 216.137.184.253:80>
<Proxy US 0.32s [HTTPS] 20.111.54.16:80>
<Proxy US 0.33s [HTTPS] 20.206.106.192:80>
<Proxy US 0.37s [HTTPS] 20.210.113.32:80>
<Proxy US 0.57s [HTTP: High] 4.175.121.88:80>
<Proxy US 0.40s [HTTPS] 20.205.61.143:80>
<Proxy US 0.78s [HTTP: High] 104.45.128.122:80>
<Proxy US 1.01s [HTTP: High] 162.223.91.11:80>
<Proxy US 0.75s [HTTP: High, HTTPS] 8.209.114.72:3129>
<Proxy US 0.85s [HTTPS] 20.24.43.214:80>
<Proxy US 1.60s [HTTP: High, HTTPS] 8.219.97.248:80>
<Proxy US 2.17s [HTTPS] 64.189.106.6:3129>
<Proxy US 2.52s [HTTPS] 209.97.188.59:3128>
<Proxy US 0.42s [HTTPS] 107.172.157.246:3128>
<Proxy US 0.69s [HTTPS] 12.218.209.130:53281>
<Proxy US 3.00s [HTTP: High] 74.208.177.198:80>
<Proxy US 3.66s [HTTP: High] 162.223.89.84:80>
<Proxy US 2.18s [HTTP: High] 162.240.76.92:80>
<Proxy US 3.62s [HTTP: High] 157.245.97.60:80>
<Proxy US 3.49s [HTTP: High] 104.211.29.96:80>
<Proxy US 2.68s [HTTP: High] 138.68.225.200:80>
<Proxy US 1.79s [HTTPS] 47.91.65.23:3128>"""
pattern = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
ips = pattern.findall(text)
print(ips)
And it returns this list:
['148.76.97.250', '47.88.62.42', '107.173.153.197', '47.56.110.204', '216.137.184.253', '20.111.54.16', '20.206.106.192', '20.210.113.32', '4.175.121.88', '20.205.61.143', '104.45.128.122', '162.223.91.11', '8.209.114.72', '20.24.43.214', '8.219.97.248', '64.189.106.6', '209.97.188.59', '107.172.157.246', '12.218.209.130', '74.208.177.198', '162.223.89.84', '162.240.76.92', '157.245.97.60', '104.211.29.96', '138.68.225.200', '47.91.65.23']
if you also want to get the bit after the semicolon, you would need to modify the above slightly, but i leave that as an excercise for the OP.
Answered By - D.L
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.