Scrapy is a Python web scraping framework. This guide shows how to configure it to use Stat Proxies via a custom downloader middleware.
Middleware setup
Create a proxy middleware in your Scrapy project:
# myproject/middlewares.py
import base64
class StatProxiesMiddleware:
def __init__(self, proxy_url, proxy_user, proxy_pass):
self.proxy_url = proxy_url
self.proxy_auth = base64.b64encode(
f"{proxy_user}:{proxy_pass}".encode()
).decode()
@classmethod
def from_crawler(cls, crawler):
return cls(
proxy_url=crawler.settings.get('STAT_PROXY_URL'),
proxy_user=crawler.settings.get('STAT_PROXY_USER'),
proxy_pass=crawler.settings.get('STAT_PROXY_PASS'),
)
def process_request(self, request, spider):
request.meta['proxy'] = self.proxy_url
request.headers['Proxy-Authorization'] = f'Basic {self.proxy_auth}'
Add your proxy credentials and enable the middleware in settings.py:
# settings.py
STAT_PROXY_URL = 'http://192.168.1.1:3128'
STAT_PROXY_USER = 'myuser'
STAT_PROXY_PASS = 'mypass'
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.StatProxiesMiddleware': 350,
}
Rotating through multiple proxies
To distribute requests across multiple proxies, modify the middleware:
import random
import base64
class RotatingProxiesMiddleware:
def __init__(self, proxies):
self.proxies = proxies
@classmethod
def from_crawler(cls, crawler):
proxy_list = crawler.settings.getlist('STAT_PROXY_LIST')
return cls(proxy_list)
def process_request(self, request, spider):
proxy = random.choice(self.proxies)
# Format: host:port:user:pass
host, port, user, password = proxy.split(':')
request.meta['proxy'] = f'http://{host}:{port}'
auth = base64.b64encode(f"{user}:{password}".encode()).decode()
request.headers['Proxy-Authorization'] = f'Basic {auth}'
# settings.py
STAT_PROXY_LIST = [
'192.168.1.1:3128:user1:pass1',
'192.168.1.2:3128:user2:pass2',
'192.168.1.3:3128:user3:pass3',
]
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.RotatingProxiesMiddleware': 350,
}
Verifying it works
Add a quick spider to confirm your proxy is active:
import scrapy
class IPCheckSpider(scrapy.Spider):
name = 'ipcheck'
start_urls = ['https://httpbin.org/ip']
def parse(self, response):
self.logger.info(f'Proxy IP: {response.json()["origin"]}')
Run it:
The logged IP should be your proxy IP, not your real one.
Make sure to replace the example credentials with your actual proxy details from the Stat Proxies dashboard.
Python Requests Library
Simpler Python proxy usage with requests
Blocked Requests
How to handle blocks and improve success rates