Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--rewrite-host-header flag for reverse proxy #1492

Merged
merged 7 commits into from
Oct 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/test-library.yml
Original file line number Diff line number Diff line change
Expand Up @@ -610,6 +610,7 @@ jobs:
--enable-web-server
--enable-reverse-proxy
--plugin proxy.plugin.ReverseProxyPlugin
--rewrite-host-header
&&
./tests/integration/test_integration.sh 8899
Expand Down
39 changes: 36 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1075,7 +1075,7 @@ following `Nginx` config:

```console
location /get {
proxy_pass http://httpbin.org/get
proxy_pass http://httpbin.org/get;
}
```

Expand All @@ -1094,6 +1094,36 @@ Verify using `curl -v localhost:8899/get`:
}
```

#### Rewrite Host Header

With above example, you may sometimes see:

```console
>
* Empty reply from server
* Closing connection
curl: (52) Empty reply from server
```

This is happenening because our default reverse proxy plugin `ReverseProxyPlugin` is configured
with a `http` and a `https` upstream server. And, by default `ReverseProxyPlugin` preserves the
original host header. While this works with `https` upstreams, this doesn't work reliably with
`http` upstreams. To work around this problem use the `--rewrite-host-header` flags.

Example:


```console
❯ proxy --enable-reverse-proxy \
--plugins proxy.plugin.ReverseProxyPlugin \
--rewrite-host-header
```

This will ensure that `Host` header field is set as `httpbin.org` and works with both `http` and
`https` upstreams.

> NOTE: Whether to use `--rewrite-host-header` or not depends upon your use-case.

## Plugin Ordering

When using multiple plugins, depending upon plugin functionality,
Expand Down Expand Up @@ -2613,7 +2643,7 @@ usage: -m [-h] [--tunnel-hostname TUNNEL_HOSTNAME] [--tunnel-port TUNNEL_PORT]
[--proxy-pool PROXY_POOL] [--enable-web-server]
[--enable-static-server] [--static-server-dir STATIC_SERVER_DIR]
[--min-compression-length MIN_COMPRESSION_LENGTH]
[--enable-reverse-proxy] [--enable-metrics]
[--enable-reverse-proxy] [--rewrite-host-header] [--enable-metrics]
[--metrics-path METRICS_PATH] [--pac-file PAC_FILE]
[--pac-file-url-path PAC_FILE_URL_PATH]
[--cloudflare-dns-mode CLOUDFLARE_DNS_MODE]
Expand All @@ -2622,7 +2652,7 @@ usage: -m [-h] [--tunnel-hostname TUNNEL_HOSTNAME] [--tunnel-port TUNNEL_PORT]
[--filtered-client-ips FILTERED_CLIENT_IPS]
[--filtered-url-regex-config FILTERED_URL_REGEX_CONFIG]

proxy.py v2.4.6.dev25+g2754b928.d20240812
proxy.py v2.4.8.dev8+gc703edac.d20241013

options:
-h, --help show this help message and exit
Expand Down Expand Up @@ -2791,6 +2821,9 @@ options:
response that will be compressed (gzipped).
--enable-reverse-proxy
Default: False. Whether to enable reverse proxy core.
--rewrite-host-header
Default: False. If used, reverse proxy server will
rewrite Host header field before sending to upstream.
--enable-metrics Default: False. Enables metrics.
--metrics-path METRICS_PATH
Default: /metrics. Web server path to serve proxy.py
Expand Down
1 change: 1 addition & 0 deletions proxy/common/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ def _env_threadless_compliant() -> bool:
if sys.version_info >= (3, 10)
else (ssl.OP_NO_SSLv2 | ssl.OP_NO_SSLv3 | ssl.OP_NO_TLSv1 | ssl.OP_NO_TLSv1_1)
)
DEFAULT_ENABLE_REWRITE_HOST = False

DEFAULT_DEVTOOLS_DOC_URL = 'http://proxy'
DEFAULT_DEVTOOLS_FRAME_ID = secrets.token_hex(8)
Expand Down
6 changes: 4 additions & 2 deletions proxy/common/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,6 @@
from types import TracebackType
from typing import Any, Dict, List, Type, Tuple, Callable, Optional

import _ssl # noqa: WPS436

from .types import HostPort
from .constants import (
CRLF, COLON, HTTP_1_1, IS_WINDOWS, WHITESPACE, DEFAULT_TIMEOUT,
Expand All @@ -42,6 +40,9 @@

def cert_der_to_dict(der: Optional[bytes]) -> Dict[str, Any]:
"""Parse a DER formatted certificate to a python dict"""
# pylint: disable=import-outside-toplevel
import _ssl # noqa: WPS436

if not der:
return {}
with tempfile.NamedTemporaryFile(delete=False) as cert_file:
Expand Down Expand Up @@ -322,6 +323,7 @@ def set_open_file_limit(soft_limit: int) -> None:
if IS_WINDOWS: # pragma: no cover
return

# pylint: disable=possibly-used-before-assignment
curr_soft_limit, curr_hard_limit = resource.getrlimit(
resource.RLIMIT_NOFILE,
)
Expand Down
6 changes: 3 additions & 3 deletions proxy/core/connection/connection.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ def connection(self) -> TcpOrTlsSocket:

def send(self, data: Union[memoryview, bytes]) -> int:
"""Users must handle BrokenPipeError exceptions"""
# logger.info(data.tobytes())
return self.connection.send(data)

def recv(
Expand All @@ -67,7 +66,7 @@ def recv(
return memoryview(data)

def close(self) -> bool:
if not self.closed:
if not self.closed and self.connection:
self.connection.close()
self.closed = True
return self.closed
Expand Down Expand Up @@ -97,8 +96,9 @@ def flush(self, max_send_size: Optional[int] = None) -> int:
self._num_buffer -= 1
else:
self.buffer[0] = mv[sent:]
del mv
logger.debug('flushed %d bytes to %s' % (sent, self.tag))
# logger.info(mv[:sent].tobytes())
del mv
return sent

def is_reusable(self) -> bool:
Expand Down
28 changes: 22 additions & 6 deletions proxy/http/parser/parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,12 @@ def parse(
self.state = httpParserStates.COMPLETE
self.buffer = None if raw == b'' else raw

def build(self, disable_headers: Optional[List[bytes]] = None, for_proxy: bool = False) -> bytes:
def build(
self,
disable_headers: Optional[List[bytes]] = None,
for_proxy: bool = False,
host: Optional[bytes] = None,
) -> bytes:
"""Rebuild the request object."""
assert self.method and self.version and self.type == httpParserTypes.REQUEST_PARSER
if disable_headers is None:
Expand All @@ -301,11 +306,22 @@ def build(self, disable_headers: Optional[List[bytes]] = None, for_proxy: bool =
path
) if not self._is_https_tunnel else (self.host + COLON + str(self.port).encode())
return build_http_request(
self.method, path, self.version,
headers={} if not self.headers else {
self.headers[k][0]: self.headers[k][1] for k in self.headers if
k.lower() not in disable_headers
},
self.method,
path,
self.version,
headers=(
{}
if not self.headers
else {
self.headers[k][0]: (
self.headers[k][1]
if host is None or self.headers[k][0].lower() != b'host'
else host
)
for k in self.headers
if k.lower() not in disable_headers
}
),
body=body,
no_ua=True,
)
Expand Down
29 changes: 21 additions & 8 deletions proxy/http/server/reverse.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@
from proxy.core.base import TcpUpstreamConnectionHandler
from proxy.http.parser import HttpParser
from proxy.http.server import HttpWebServerBasePlugin
from proxy.common.utils import text_
from proxy.common.utils import text_, bytes_
from proxy.http.exception import HttpProtocolException
from proxy.common.constants import (
HTTPS_PROTO, DEFAULT_HTTP_PORT, DEFAULT_HTTPS_PORT,
COLON, HTTP_PROTO, HTTPS_PROTO, DEFAULT_HTTP_PORT, DEFAULT_HTTPS_PORT,
DEFAULT_REVERSE_PROXY_ACCESS_LOG_FORMAT,
)
from ...common.types import Readables, Writables, Descriptors
Expand Down Expand Up @@ -111,23 +111,36 @@ def handle_request(self, request: HttpParser) -> None:
assert self.choice and self.choice.hostname
port = (
self.choice.port or DEFAULT_HTTP_PORT
if self.choice.scheme == b'http'
else DEFAULT_HTTPS_PORT
if self.choice.scheme == HTTP_PROTO
else self.choice.port or DEFAULT_HTTPS_PORT
)
self.initialize_upstream(text_(self.choice.hostname), port)
assert self.upstream
try:
self.upstream.connect()
if self.choice.scheme == HTTPS_PROTO:
self.upstream.wrap(
text_(
self.choice.hostname,
),
text_(self.choice.hostname),
as_non_blocking=True,
ca_file=self.flags.ca_file,
)
request.path = self.choice.remainder
self.upstream.queue(memoryview(request.build()))
self.upstream.queue(
memoryview(
request.build(
host=(
self.choice.hostname
+ (
COLON + bytes_(self.choice.port)
if self.choice.port is not None
else b''
)
if self.flags.rewrite_host_header
else None
),
),
),
)
except ConnectionRefusedError:
raise HttpProtocolException( # pragma: no cover
'Connection refused by upstream server {0}:{1}'.format(
Expand Down
15 changes: 13 additions & 2 deletions proxy/http/server/web.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@
from ...common.utils import text_, build_websocket_handshake_response
from ...common.constants import (
DEFAULT_ENABLE_WEB_SERVER, DEFAULT_STATIC_SERVER_DIR,
DEFAULT_ENABLE_REVERSE_PROXY, DEFAULT_ENABLE_STATIC_SERVER,
DEFAULT_WEB_ACCESS_LOG_FORMAT, DEFAULT_MIN_COMPRESSION_LENGTH,
DEFAULT_ENABLE_REWRITE_HOST, DEFAULT_ENABLE_REVERSE_PROXY,
DEFAULT_ENABLE_STATIC_SERVER, DEFAULT_WEB_ACCESS_LOG_FORMAT,
DEFAULT_MIN_COMPRESSION_LENGTH,
)


Expand Down Expand Up @@ -78,6 +79,16 @@
help='Default: False. Whether to enable reverse proxy core.',
)

flags.add_argument(
'--rewrite-host-header',
action='store_true',
default=DEFAULT_ENABLE_REWRITE_HOST,
help='Default: '
+ str(DEFAULT_ENABLE_REWRITE_HOST)
+ '. '
+ 'If used, reverse proxy server will rewrite Host header field before sending to upstream.',
)


class HttpWebServerPlugin(HttpProtocolHandlerPlugin):
"""HttpProtocolHandler plugin which handles incoming requests to local web server."""
Expand Down
32 changes: 22 additions & 10 deletions tests/integration/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,18 +181,30 @@ def proxy_py_subprocess(request: Any) -> Generator[int, None, None]:
ca_cert_dir = TEMP_DIR / ('certificates-%s' % run_id)
os.makedirs(ca_cert_dir, exist_ok=True)
proxy_cmd = (
sys.executable, '-m', 'proxy',
'--hostname', '127.0.0.1',
'--port', '0',
'--port-file', str(port_file),
sys.executable,
'-m',
'proxy',
'--hostname',
'127.0.0.1',
'--port',
'0',
'--port-file',
str(port_file),
'--enable-web-server',
'--plugin', 'proxy.plugin.WebServerPlugin',
'--plugin',
'proxy.plugin.WebServerPlugin',
'--enable-reverse-proxy',
'--plugin', 'proxy.plugin.ReverseProxyPlugin',
'--num-acceptors', '3',
'--num-workers', '3',
'--ca-cert-dir', str(ca_cert_dir),
'--log-level', 'd',
'--plugin',
'proxy.plugin.ReverseProxyPlugin',
'--rewrite-host-header',
'--num-acceptors',
'3',
'--num-workers',
'3',
'--ca-cert-dir',
str(ca_cert_dir),
'--log-level',
'd',
) + tuple(request.param.split())
proxy_proc = Popen(proxy_cmd, stderr=subprocess.STDOUT)
# Needed because port file might not be available immediately
Expand Down
15 changes: 10 additions & 5 deletions tests/integration/test_integration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
# For github action, we simply bank upon GitHub
# to clean up any background process including
# proxy.py
#
set -x

PROXY_PY_PORT=$1
if [[ -z "$PROXY_PY_PORT" ]]; then
Expand Down Expand Up @@ -164,8 +166,14 @@ cat downloaded2.whl | $SHASUM -c downloaded2.hash
VERIFIED5=$?
rm downloaded2.whl downloaded2.hash

# Without --rewrite-host-header we will receive localhost:<port> as host header back in response
# read -r -d '' REVERSE_PROXY_RESPONSE << EOM
# "localhost:$PROXY_PY_PORT"
# EOM

# With --rewrite-host-header we will receive httpbingo.org as host header back in response
read -r -d '' REVERSE_PROXY_RESPONSE << EOM
"localhost:$PROXY_PY_PORT"
"httpbingo.org"
EOM

echo "[Test Reverse Proxy Plugin]"
Expand All @@ -174,8 +182,5 @@ RESPONSE=$($CMD 2> /dev/null)
verify_contains "$RESPONSE" "$REVERSE_PROXY_RESPONSE"
VERIFIED6=$?

# FIXME: VERIFIED6 NOT ASSERTED BECAUSE WE STARTED GETTING EMPTY RESPONSE FROM UPSTREAM
# AFTER CHANGE FROM HTTPBIN TO HTTPBINGO. This test works and passes perfectly when
# run from a local system
EXIT_CODE=$(( $VERIFIED1 || $VERIFIED2 || $VERIFIED3 || $VERIFIED4 || $VERIFIED5 ))
EXIT_CODE=$(( $VERIFIED1 || $VERIFIED2 || $VERIFIED3 || $VERIFIED4 || $VERIFIED5 || $VERIFIED6 ))
exit $EXIT_CODE
Loading