Empty proxy list #74

d3banjan · 2021-08-18T10:16:06Z

Proof https://replit.com/@d3banjan/KhakiStarchyCustomer#main.py

The following code fails ---

from random import choice
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy

proxies = RequestProxy().get_proxy_list()

PROXY = choice(proxies).get_address()

print(PROXY, type(PROXY))

... with the following error message --

2021-08-18 10:14:57,091 http_request_randomizer.requests.useragent.userAgent INFO     Using local file for user agents: /opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/../data/user_agents.txt
2021-08-18 10:14:57,093 root   DEBUG    === Initialized Proxy Parsers ===
2021-08-18 10:14:57,093 root   DEBUG         FreeProxy parser of 'http://free-proxy-list.net' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG         PremProxy parser of 'https://premproxy.com/list/' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG         SslProxy parser of 'https://www.sslproxies.org' with required bandwidth: '150' KBs
2021-08-18 10:14:57,093 root   DEBUG    =================================
2021-08-18 10:14:57,525 http_request_randomizer.requests.parsers.FreeProxyParser ERROR    Provider FreeProxy failed with Attribute error: 'NoneType' object has no attribute 'find'
2021-08-18 10:14:57,526 root   DEBUG    Added 0 proxies from FreeProxy
2021-08-18 10:14:58,051 http_request_randomizer.requests.parsers.PremProxyParser WARNING  Proxy Provider url failed: https://premproxy.com/list/
2021-08-18 10:14:58,051 http_request_randomizer.requests.parsers.PremProxyParser DEBUG    Pages: set()
2021-08-18 10:14:58,465 http_request_randomizer.requests.parsers.PremProxyParser WARNING  Proxy Provider url failed: https://premproxy.com/list/
2021-08-18 10:14:58,466 root   DEBUG    Added 0 proxies from PremProxy
2021-08-18 10:14:58,792 http_request_randomizer.requests.parsers.SslProxyParser ERROR    Provider SslProxy failed with Attribute error: 'NoneType' object has no attribute 'find'
2021-08-18 10:14:58,792 root   DEBUG    Added 0 proxies from SslProxy
2021-08-18 10:14:58,792 root   DEBUG    Total proxies = 0
2021-08-18 10:14:58,792 root   DEBUG    Filtered proxies = 0
Traceback (most recent call last):
  File "main.py", line 4, in <module>
    proxies = RequestProxy().get_proxy_list()
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 69, in __init__
    self.current_proxy = self.randomize_proxy()
  File "/opt/virtualenvs/python3/lib/python3.8/site-packages/http_request_randomizer/requests/proxy/requestProxy.py", line 86, in randomize_proxy
    raise ProxyListException("list is empty")
http_request_randomizer.requests.errors.ProxyListException.ProxyListException: list is empty

The text was updated successfully, but these errors were encountered:

pgaref · 2021-08-18T14:18:02Z

Looks like the existing proxy providers (premproxy, FreeProxy, SslProxy) changed some html attributes and we cannot parse them anymore -- thus the ProxyListException: list is empty

This will require some parser rework

d3banjan · 2021-08-18T18:59:30Z

HTTP_Request_Randomizer/http_request_randomizer/requests/parsers/FreeProxyParser.py

Line 27 in 5c41348

table = soup.find("table", attrs={"id": "proxylisttable"})

For those who are interested, removing the attrs filter works as a short term workaround,

This is because FreeProxy hasn't changed anything and find gets the first table.

guerradesouza · 2021-09-20T11:29:53Z

@d3banjan it not worked to me. Anyone did can to resolve "list empty"?

metegenez · 2021-10-03T22:38:48Z

making
table = soup.find("table", attrs={"id": "proxylisttable"})

table = soup.find("table", attrs={"class": "table table-striped table-bordered"})

works for FreeProxy.

akashgoyal119 · 2021-10-30T14:41:51Z

Had same problem, @metegenez solution worked for me

JasonBristol · 2021-12-01T07:56:25Z

@metegenez solution works for FreeProxy and SslProxy

For PremProxy, it looks like they setup CORS and user agent checks, you can bypass it by spoofing the origin and referer headers, and leveraging a random user agent

In UnPacker.py make the following changes

+ def __init__(self, js_file_url, headers=None):
        logger.info("JS UnPacker init path: {}".format(js_file_url))
+       r = requests.get(js_file_url, headers=headers)
        encrypted = r.text.strip()
        encrypted = "(" + encrypted.split("}(")[1][:-1]
        unpacked = eval(
            "self.unpack" + encrypted
        )  # string of the js code in unpacked form
        matches = re.findall(r".*?\('\.([a-zA-Z0-9]{1,6})'\).*?\((\d+)\)", unpacked)
        self.ports = dict((key, port) for key, port in matches)
        logger.debug("portmap: " + str(self.ports))

In PremProxyParse.py make the following changes

import logging

import requests
from bs4 import BeautifulSoup

from http_request_randomizer.requests.parsers.js.UnPacker import JsUnPacker
from http_request_randomizer.requests.parsers.UrlParser import UrlParser
from http_request_randomizer.requests.proxy.ProxyObject import ProxyObject, AnonymityLevel, Protocol
+ from http_request_randomizer.requests.useragent.userAgent import UserAgentManager

def __init__(self, id, web_url, timeout=None):
        self.base_url = web_url
        web_url += "/list/"
        # Ports decoded by the JS unpacker
        self.js_unpacker = None
+     self.useragent = UserAgentManager()
+     self.headers = {
+         "User-Agent": self.useragent.get_random_user_agent(),
+         "Origin": self.base_url,
+         "Referer": self.base_url
+     }
        UrlParser.__init__(self, id=id, web_url=web_url, timeout=timeout)

def parse_proxyList(self):
        curr_proxy_list = []
        try:
            # Parse all proxy pages -> format: /list/{num}.htm
            # Get the pageRange from the 'pagination' table
            page_set = self.get_pagination_set()
            logger.debug("Pages: {}".format(page_set))
            # One JS unpacker per provider (not per page)
            self.js_unpacker = self.init_js_unpacker()

            for page in page_set:
+             response = requests.get("{0}{1}".format(self.get_url(), page), timeout=self.timeout, headers=self.headers)
               if not response.ok:

def get_pagination_set(self):
+      response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
        page_set = set()
        # Could not parse pagination page - Let user know
        if not response.ok:
            logger.warning("Proxy Provider url failed: {}".format(self.get_url())
            return page_set

def init_js_unpacker(self):
+      response = requests.get(self.get_url(), timeout=self.timeout, headers=self.headers)
        # Could not parse provider page - Let user know
        if not response.ok:
            logger.warning("Proxy Provider url failed: {}".format(self.get_url()))
            return None
        content = response.content
        soup = BeautifulSoup(content, "html.parser")

        # js file contains the values for the ports
        for script in soup.findAll('script'):
            if '/js/' in script.get('src'):
                jsUrl = self.base_url + script.get('src')
+               return JsUnPacker(jsUrl, headers=self.headers)
        return None

2021-12-01 03:06:24,255 root   DEBUG    === Initialized Proxy Parsers ===
2021-12-01 03:06:24,256 root   DEBUG    FreeProxy parser of 'http://free-proxy-list.net' with required bandwidth: '150' KBs
2021-12-01 03:06:24,256 root   DEBUG    PremProxy parser of 'https://premproxy.com/list/' with required bandwidth: '150' KBs
2021-12-01 03:06:24,257 root   DEBUG    SslProxy parser of 'https://www.sslproxies.org' with required bandwidth: '150' KBs
2021-12-01 03:06:24,257 root   DEBUG    =================================
2021-12-01 03:06:24,548 root   DEBUG    Added 299 proxies from FreeProxy
2021-12-01 03:06:29,969 vendor.http_request_randomizer.requests.parsers.PremProxyParser DEBUG    Pages: {'', '09.htm', '02.htm', '11.htm', '07.htm', '08.htm', '10.htm', '03.htm', '04.htm', '06.htm', '05.htm'}
2021-12-01 03:06:35,722 vendor.http_request_randomizer.requests.parsers.js.UnPacker INFO     JS UnPacker init path: https://premproxy.com/js/e5f85.js
2021-12-01 03:06:57,201 vendor.http_request_randomizer.requests.parsers.js.UnPacker DEBUG    portmap: {'ref78': '8080', 'r1d7a': '26691', 'rf4c5': '8000', 'r2cc7': '8088', 'rdd07': '80', 'r9d12': '8085', 'rd6c3': '22800', 'r8f5c': '3128', 'r0e76': '8118', 'r2d80': '53281', 'r331b': '21231', 'r65df': '35709', 'r83db': '8081', 'rcf74': '61047', 'r02fd': '30950', 'r149e': '57797', 'r00bd': '808', 'rbf12': '999', 'r7a2c': '33630', 'ra7fd': '9292', 'rec6b': '45006', 'r243e': '55443', 'rb9b8': '60731', 'r945a': '37475', 'rc0dc': '36984', 'reb86': '63141', 'r07aa': '21776', 'rb92e': '9300', 'r5b2a': '10248', 'rfc6b': '3228', 're041': '8888', 'r132e': '9991', 'rae2e': '9797', 'rcd20': '9080', 'rcb08': '9090', 'r93e2': '32191', 'rdc5c': '40387', 'r0136': '41258', 'r9450': '53959', 'r3ce8': '9999', 'r2fd8': '46611', 'r4c7b': '6969', 'r49f7': '3127', 'r8e32': '1976', 'r5f53': '1981', 'r5ff7': '38525', 'r3db9': '59394', 'r96f1': '42119', 'r377e': '443', 'r2676': '5566', 're051': '8008', 'r4126': '53410', 'r0d46': '8380', 'r09c5': '8197', 'r3e49': '40014', 'r32b7': '83', 'rf98a': '44047', 'rf392': '40390', 'r37ea': '8083', 'r0b6d': '2021', 'rad3d': '8181', 'r5204': '53128', 'r29f8': '6565', 'r4736': '54190', 'r8784': '31475', 'r123b': '8060', 'r9bfb': '8010', 'r9cc8': '12345', 'r5754': '5678', 'rc0a8': '45944', 'rc09f': '10040', 'r0099': '31409', 'rd8ea': '9091', 're258': '8090', 'r8b1a': '60792', 'rc949': '56145', 'rf203': '56644', 'ref86': '47615', 'ra341': '41917', 'r42f1': '46669', 'r63be': '54018', 'r8f71': '38970', 'rb4ac': '52271', 'r8f6e': '43631', 'r58e5': '8020', 'rd542': '54555', 'r5403': '23500', 'rbe81': '39272', 'r2c90': '53805', 'r7f29': '42033', 'r01a0': '10101', 'r4eeb': '47548', 'rf12d': '58136', 're30c': '34808', 'r8d68': '10000', 'r6b7c': '39617', 'ra500': '6000', 'r361b': '49044', 'r7406': '1337', 'rf53a': '52342', 'r615f': '57367', 'r2414': '50330', 'r15c0': '8082', 'r2920': '50782', 'rd979': '1256', 'r50ce': '59083', 'r352a': '37444', 'r8c95': '56975', 'r09fc': '61711', 'r61ce': '32161', 'radb3': '8998', 'r7dbb': '60808', 'r3a72': '45381', 'rdbee': '49285', 'rd78c': '10001', 'r858e': '3161', 'rbc7a': '43403', 'rc941': '32018', 'rbda4': '44380', 'r789e': '60000', 'radd4': '43496'}
2021-12-01 03:07:57,774 root   DEBUG    Added 521 proxies from PremProxy
2021-12-01 03:07:57,932 root   DEBUG    Added 99 proxies from SslProxy
2021-12-01 03:07:57,933 root   DEBUG    Total proxies = 919
2021-12-01 03:07:57,933 root   DEBUG    Filtered proxies = 919
2021-12-01 03:07:57,934 root   DEBUG    Initialization took: 93.67911338806152 sec
2021-12-01 03:07:57,934 root   DEBUG    Size: 919

pgaref mentioned this issue Jan 9, 2022

FIX for PremProxy and FreeProxy #75

Open

18520339 mentioned this issue Apr 2, 2023

Proxy List Exception 18520339/facebook-data-extraction#10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty proxy list #74

Empty proxy list #74

d3banjan commented Aug 18, 2021

pgaref commented Aug 18, 2021

d3banjan commented Aug 18, 2021

guerradesouza commented Sep 20, 2021

metegenez commented Oct 3, 2021

akashgoyal119 commented Oct 30, 2021

JasonBristol commented Dec 1, 2021 •

edited

Loading

Empty proxy list #74

Empty proxy list #74

Comments

d3banjan commented Aug 18, 2021

pgaref commented Aug 18, 2021

d3banjan commented Aug 18, 2021

guerradesouza commented Sep 20, 2021

metegenez commented Oct 3, 2021

akashgoyal119 commented Oct 30, 2021

JasonBristol commented Dec 1, 2021 • edited Loading

JasonBristol commented Dec 1, 2021 •

edited

Loading