-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Empty proxy list #74
Comments
Looks like the existing proxy providers (premproxy, FreeProxy, SslProxy) changed some html attributes and we cannot parse them anymore -- thus the ProxyListException: list is empty This will require some parser rework |
HTTP_Request_Randomizer/http_request_randomizer/requests/parsers/FreeProxyParser.py Line 27 in 5c41348
For those who are interested, removing the This is because FreeProxy hasn't changed anything and |
@d3banjan it not worked to me. Anyone did can to resolve "list empty"? |
making
works for FreeProxy. |
Had same problem, @metegenez solution worked for me |
@metegenez solution works for FreeProxy and SslProxy For PremProxy, it looks like they setup CORS and user agent checks, you can bypass it by spoofing the origin and referer headers, and leveraging a random user agent In UnPacker.py make the following changes + def __init__(self, js_file_url, headers=None):
logger.info("JS UnPacker init path: {}".format(js_file_url))
+ r = requests.get(js_file_url, headers=headers)
encrypted = r.text.strip()
encrypted = "(" + encrypted.split("}(")[1][:-1]
unpacked = eval(
"self.unpack" + encrypted
) # string of the js code in unpacked form
matches = re.findall(r".*?\('\.([a-zA-Z0-9]{1,6})'\).*?\((\d+)\)", unpacked)
self.ports = dict((key, port) for key, port in matches)
logger.debug("portmap: " + str(self.ports)) In PremProxyParse.py make the following changes import logging
import requests
from bs4 import BeautifulSoup
from http_request_randomizer.requests.parsers.js.UnPacker import JsUnPacker
from http_request_randomizer.requests.parsers.UrlParser import UrlParser
from http_request_randomizer.requests.proxy.ProxyObject import ProxyObject, AnonymityLevel, Protocol
+ from http_request_randomizer.requests.useragent.userAgent import UserAgentManager
def __init__(self, id, web_url, timeout=None):
self.base_url = web_url
web_url += "/list/"
# Ports decoded by the JS unpacker
self.js_unpacker = None
+ self.useragent = UserAgentManager()
+ self.headers = {
+ "User-Agent": self.useragent.get_random_user_agent(),
+ "Origin": self.base_url,
+ "Referer": self.base_url
+ }
UrlParser.__init__(self, id=id, web_url=web_url, timeout=timeout) def parse_proxyList(self):
curr_proxy_list = []
try:
# Parse all proxy pages -> format: /list/{num}.htm
# Get the pageRange from the 'pagination' table
page_set = self.get_pagination_set()
logger.debug("Pages: {}".format(page_set))
# One JS unpacker per provider (not per page)
self.js_unpacker = self.init_js_unpacker()
for page in page_set:
+ response = requests.get("{0}{1}".format(self.get_url(), page), timeout=self.timeout, headers=self.headers)
if not response.ok: def get_pagination_set(self):
+ response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
page_set = set()
# Could not parse pagination page - Let user know
if not response.ok:
logger.warning("Proxy Provider url failed: {}".format(self.get_url())
return page_set def init_js_unpacker(self):
+ response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
# Could not parse provider page - Let user know
if not response.ok:
logger.warning("Proxy Provider url failed: {}".format(self.get_url()))
return None
content = response.content
soup = BeautifulSoup(content, "html.parser")
# js file contains the values for the ports
for script in soup.findAll('script'):
if '/js/' in script.get('src'):
jsUrl = self.base_url + script.get('src')
+ return JsUnPacker(jsUrl, headers=self.headers)
return None
|
Proof https://replit.com/@d3banjan/KhakiStarchyCustomer#main.py
The following code fails ---
... with the following error message --
The text was updated successfully, but these errors were encountered: