google / inverting-proxy Goto Github PK

Reverse proxy that inverts the direction of traffic

License: Apache License 2.0

Makefile 0.57% Go 98.22% Dockerfile 1.21%

inverting-proxy's Introduction

Inverting Proxy and Agent

This repository defines a reverse proxy that inverts the direction of traffic between the proxy and the backend servers.

That design makes configuring and hosting backends much simpler than what is required for traditional reverse proxies:

The communication channel between the proxy and backends can be secured using the SSL certificate of the proxy. This removes the need to setup SSL certificates for each backend.
The backends do not need to be exposed to incoming requests from the proxy. In particular, the backends can run in different networks without needing to expose them to incoming traffic from the internet.

Disclaimer

This is not an official Google product

Background

A reverse proxy is a server that forwards requests to one or more backend servers.

Traditionally, this has been done by configuring the reverse proxy with the address of each backend, and then having it send a request to a backend for each request it receives from a client.

That meant each backend had to be able to receive incoming requests from the proxy. This implied one of two choices.

Place the backend servers on a private network shared with the proxy.
Place the backend servers on the public internet.

The first choice requires the proxy and backends to be controlled by the same hosting provider, while the second requires the backends handle all of the same overhead required for any internet hosting (SSL termination, protection against DDoS attacks, etc).

This project aims to limit the overhead of hosting a public server to just the proxy while still allowing anyone to run a backend.

The details of how this is accomplished are outlined below.

Design

This project defines two components:

Something that we are calling the "Inverting Proxy"
An agent that runs alongside a backend and forwards requests to it

The inverting proxy receives incoming requests and forwards those requests to the appropriate backend, via the agent.

Since neither the backend nor the agent will be accessible from the public internet, requests are not directly forwarded to the backend. Instead, the direction of that traffic is inverted. The agent sends a request to the inverting proxy, the proxy waits for incoming client requests, and then the proxy responds to the request from the agent with those client requests in the response body.

The agent then forwards those requests to the backend, takes the responses that it gets back, and sends them as a request to the inverting proxy.

Finally, the inverting proxy extracts the backend's response from the agent's request and sends it as the response to the original client request.

An example request flow looks like this:

+--------+         +-------+         +-------+         +---------+
| Client |         | Proxy | <-(1)-- | Agent |         | Backend |
|        | --(2)-> |       |         |       |         |         |
|        |         |       | --(3)-> |       |         |         |
|        |         |       |         |       | --(4)-> |         |
|        |         |       |         |       | <-(5)-- |         |
|        |         |       | <-(6)-- |       |         |         |
|        | <-(7)-- |       |         |       |         |         |
|        |         |       | --(8)-> |       |         |         |
+--------+         +-------+         +-------+         +---------+

The important thing to note is that the request from the client (#2) matches the request to the backend (#4) and the response from the backend (#5) matches the response to the client (#7).

Components

There are two components required for the inverting proxy to work:

The proxy itself, which must be accessible from the public internet.
The forwarding agent, which must be able to forward requests to the backend.

Inverting Proxy

The inverting proxy serves the same role as a reverse proxy, but additionally inverts the direction of traffic to the agent.

There are two different versions of the inverting proxy: one that runs in App Engine, and one that runs as a stand-alone binary. The App Engine version requires incoming requests be authenticated via the Users API.

Both versions of the proxy are written in Go. The source code for the App Engine version is under the 'app' subdirectory, and the source code for the stand-alone version is under the 'server' subdirectory.

Forwarding Agent

The agent receives the inverted traffic from the proxy and inverts it again before forwarding it to the backend. This ensures that the requests from the client match the requests to the backend and that the responses from the backend match the responses to the client.

The forwarding agent is standalone binary written in Go and its source code is under the 'agent' subdirectory.

Usage

Prerequisites

First, create a Google Cloud Platform project that will host your proxy, and ensure that you have the Google Cloud SDK installed on your machine.

Setup

Save the ID of your project in the environment variable PROJECT_ID, and then run the command

make deploy

This will deploy 3 App Engine services to your project (one for the proxy, one for agents to contact, and one that implements an API for managing proxy endpoints).

Registering Backends

Before you can connect a backend server to your proxy, you must use the proxy's admin API to create a record of that backend. This is just a simple HTTP REST API with create, list, and delete operations. There is no client tool provided for the admin API, but you can access it using curl.

To list the current backends:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://api-dot-${PROJECT_ID}.appspot.com/api/backends

To create a backend:

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -d "${BACKEND_RECORD}" \
    https://api-dot-${PROJECT_ID}.appspot.com/api/backends

... where "${BACKEND_RECORD}" is a JSON object with the following fields:

id: An arbitrary name for the backend
endUser: The email address of the end user connecting to that backend. The email address must be for a Google account. Alternatively, you can use the special string "allUsers" to make your server public.
backendUser: The email address of account running the agent. This should be the account listed as "active" when you run gcloud auth list on the machine where the agent runs.
pathPrefixes: This specifies the list of URL paths served by the backend. To match all request, use ["/"].

Finally, to delete a backend:

curl -X DELETE \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://api-dot-${PROJECT_ID}.appspot.com/api/backends/${BACKEND_ID}

Running Backends

Once you have created the record for your backend using the API, you can run the agent alongside your server with the following command:

docker run --net=host --rm -it \
    -v "${HOME}/.config:/root/.config" \
    --env="PORT=${PORT}" \
    --env="BACKEND=${BACKEND_ID}" \
    --env="PROXY=https://agent-dot-${PROJECT_ID}.appspot.com/" \
    gcr.io/inverting-proxy/agent

And then you can access your backend by vising https://${PROJECT_ID}.appspot.com

Dockerfile

There is a Dockerfile in the agent directory that can be used to build a Docker image for the agent. That Docker image contains the agent binary along with the following third-party packages installed via apt-get:

curl
git
ca-certificates

Limitations

Currently, the App Engine version of the inverting proxy only supports HTTP requests. In particular, websockets are not supported, so you have to use an adapter like socket.io if you want to use the inverting proxy with a service that requires websockets.

Additionally, the proxy agent will not work in combination with IAP, as it does not currently support using OIDC tokens to authenticate against the proxy.

inverting-proxy's People

Contributors

Stargazers

Watchers

inverting-proxy's Issues

make test is failing

Cmd: make test

Error in output
vet: agent/utils/utils_test.go:180:7: undeclared name: bytes
make: *** [Makefile:35: vet] Error 2

After adding an import for bytes in utils_test.go, one test is failing

--- FAIL: TestReadRequestWithRetries (0.00s)
    utils_test.go:172: Error status while reading "request" from the proxy
2022/07/21 11:57:21 suppressing panic for copyResponse error in test; copy error: io: read/write on closed pipe
FAIL
FAIL	github.com/google/inverting-proxy/agent/utils	0.059s
FAIL
make: *** [Makefile:9: test] Error 1

I believe this is a recent regression from the changes in #98.

`Invalid IAP credentials: Unable to parse JWT` from readme `curl` cmds

Running readme curl (e.g. list backends) commands returns:

Invalid IAP credentials: Unable to parse JWT

The issue seems to be the token returned by gcloud auth print-access-token, when I retrieve a token using instructions from https://cloud.google.com/iap/docs/authentication-howto it works fine. Am I doing sth wrong or is the readme outdated?

Cache cookies agent-side.

Hello,

we have a need to run 3rd party software in Vertex AI Workbench that uses cookies. It turns out that Google managed proxies listed here are cutting Set-Cookie headers in some way. On the other hand I checked that inverting-proxy image is used as proxy-agent on the VM provisioned by Vertex.

I got two questions to the issue:

are there any known solutions/workaround that would allow us to pass cookies through proxy?
if no, what approach would you recommend? My current solution is following:

add another lru.Cache that stores cookies per URL & another pointer. In my case I'm using parametrized path in headers like DATALAB_TUNNEL_TOKEN which is unique per user.
restore cache in similar way sessionResponseWriter does

All above are derived only from looking at attempt-register-vm-on-proxy.sh script, if anything is not true please correct me.

Would be happy to contribute with a bit of guidance!

The first service (module) you upload to a new application must be the 'default' service

I get the following error when running make deploy

INVALID_ARGUMENT: The first service (module) you upload to a new application must be the 'default' service (module). Please upload a version of the 'default' service (module) before uploading a version for the 'agent' service (module)

Looks like the deploy command need to deploy the services in specific order which is not currently handled

gcloud app deploy --project "${PROJECT_ID}" --version v1 ./app/*.yaml

agent: Add support for wrapping/consolidating http-only cookies from backends

Since the proxy can forward to arbitrary backend servers, those servers can set arbitrary cookies.

For proxy administrators who do not wish to allow arbitrary cookies to be set by the backend server, it might be useful to intercept such cookies and replace them by one controlled by the administrator.

My thinking for how this could work is for the agent to maintain an LRU cache mapping opaque session IDs to cookie jars.

Then, when an end-user request comes in, the agent looks up the session ID from that request's cookies (if it has one), and then adds any applicable cookies from the session's cookie jar to the request before forwarding it to the backend.

If the request did not have a session cookie, then the agent would add a Set-Cookie header on the response to add one.

Finally, any Set-Cookie response headers from the backend would be removed from the response before forwarding it to the end-user, and instead such cookies would be set in the cookie jar maintained by the agent for that session.

This approach would only work for HTTP-only cookies, as the intercepted cookies would not be accessible to client-side Javascript. As such, we should also include a flag to control whether we also intercept other cookies (effectively making them HTTP-only) or let them through.

The added session ID cookie should be marked as Secure, HTTP-only, and have a flag-controlled expiry.

[feature] reduce proxy-agent image size

https://github.com/google/inverting-proxy/blob/master/agent/Dockerfile

Problem

The proxy agent image contains full golang dev environment, so

the image is over 1 GB size
there can be unnecessary vulnerabilities.

Proposal

We can optimize it by building the go binary and putting it in a distroless base image, this can reduce image size to <100MB.

TODOs:

use go modules to manage dependencies
refactor the dockerfile, use distroless as base image

Add support for Go Modules

https://github.com/golang/go/wiki/Modules
https://blog.golang.org/using-go-modules
https://blog.golang.org/migrating-to-go-modules

Need Clarification: Can an agent talk to multiple backends.

Hi @ojarjur ,
I was looking at this project and stumbled upon a doubt. In the https://github.com/google/inverting-proxy#running-backends it is mentioned that

you can run the agent alongside your server with the following command...

This suggests that there is a 1:1 mapping between agent and backend(server). Is this assumption correct or can there be an agent talking to multiple backends?
It would be really helpful if you could clarify this doubt.

Thanks

Set a timeout in the HTTP client used by the agent.

On this line, we should add something like client.Timeout = 60 * time.Second

API: Validate Backend Attributes During Creation

Posting to /api/backends with an empty json object leads to the successful creation of a backend record.

curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -d "{}" \
    https://api-dot-$PROJECT_ID.appspot.com/api/backends

Retrieving the list of backends results in a list of backend objects with only the lastUsed key defined.

$ curl -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    https://api-dot-$PROJECT_ID.appspot.com/api/backends

[{"lastUsed":"1970-01-01T00:00:00Z"},{"lastUsed":"1970-01-01T00:00:00Z"}]%

The id, endUser, backendUser, pathPrefixes keys should be validated for presence before storing into the db. This will prevent indelible backend records from being created because of the absence of an id.

Gracefully shutdown the agent

Add a new optional (via flag) feature for enabling gracefully shutting down the agent
We should stop pulling new requests and drain the already pulled/being processed ones during the graceful shutdown time window. Basically the agent should act in a "lame duck" mode when received a shutdown signal from the OS.

agent: Normalize URLs

We should normalize all URLs before sending requests.

Currently, an extra empty path segment in the URL can cause problems if the request method is rewritten.

Repeating banner in Safari and Firefox

Opening links from the iframe adds a new banner for each request.

Make the agent support connecting to backends over SSL

Right now, the agent assumes that the backend server is connected to over HTTP (without SSL). That is normally the case since the agent is usually run on the same machine as the backend, but there is no reason to force that to be the case.

Ideally, we would allow the user to specify a full URL for the backend when starting the agent, and if that URL includes a scheme then we would honor it.

This would require changing (at least) the code here that hard-codes the scheme as "http"

Request headers for websocket requests are getting stripped

Currently the websocket shim is not pass in any request header for the initial websocket connection. It should pass them through to allow backend web server be able to use them.

App deployment fails - compute/metadata requires [email protected]: missing go.sum entry

Get the below error when deploying default app
gcloud app deploy --project "${PROJECT_ID}" --version v1 ./app/app.yaml

Beginning deployment of service [default]...
╔════════════════════════════════════════════════════════════╗
╠═ Uploading 0 files to Google Cloud Storage                ═╣
╚════════════════════════════════════════════════════════════╝
File upload done.
Updating service [default]...failed.
ERROR: (gcloud.app.deploy) Error Response: [9] Cloud build <build-id> status: FAILURE
go: cloud.google.com/go/compute/[email protected] requires
        cloud.google.com/go/[email protected]: missing go.sum entry; to add it:
        go mod download cloud.google.com/go/compute

Fix golint errors

I apparently forgot to lint this code before committing it.

We need to go through and fix all of the issues reported by running golint ./...

Example of websocket setup (e.g. Jupyter Lab)?

👋 please correct me if I'm wrong, the "AI Notebooks" use inverting-proxy to expose Jupyter Lab which uses websockets, I'm curious if you have an example of config/setup for a backend that uses websockets, e.g. Jupyter Lab? Or could you please shed some more light on how "AI Notebooks" use inverting-proxy please?

Btw I have see the comment in README:

Currently, the App Engine version of the inverting proxy only supports HTTP requests. In particular, websockets are not supported, so you have to use an adapter like socket.io if you want to use the inverting proxy with a service that requires websockets.

agent: Error out if an agent request to the proxy results in a redirect that changes the request method

Currently, if the agent is created with proxy URL that results in redirects, then the requests to list and read pending requests will work, but any attempts to post back the response will fail.

This is because the response is sent to the proxy with a POST request, but the redirect will cause the request method to change from POST to GET.

This is especially problematic because it is silently done. The agent thinks it is sending a POST but the proxy thinks it is getting a GET, and neither side logs anything about this change.

Ideally, we would not change the method on a redirect, but at the very least we should detect if the method has changed and report an error if that happens.

It looks like we can implement the detect-and-error approach by defining a non-nil CheckRedirect field in our http.Client struct. We can define one that compares the method of the new request against the method of the old request(s), and return an error if they do not match.

We would need to define that field on the client used for sending requests to the proxy, which is defined here

Is Google App Engine required?

The Usage section states that GAE is a prerequisite, but IIUC it is just an example usage with GAE. It would be nice if we could be more explicit, and provide instructions on running the inverting proxy in non-GAE environments.

`make deploy` fails to properly deploy working app

👋 I'm very excited about inverting-proxy, thanks for open sourcing it, I would appreciate some deployment guidance, I had some trouble deploying to GAE mentioned in #91, even tho fixing that allowed to deploy to GAE (using the make deploy), the apps are not actually properly deployed, in the logs I can see:

/bin/sh: 1: exec: main: Permission denied

I unfortunately don't have a lot of experience with GAE or go, and it's a bit hard to debug GAE without access to the VMs. After some reading/trial/error I could workaround the issue by adding a main akin to https://cloud.google.com/appengine/docs/standard/go/reference/services/bundled/latest#google_golang_org_appengine_v2_Main

Any help would be greatly appreciated.

Env:

~ gcloud version
Google Cloud SDK 344.0.0
alpha 2021.06.04
app-engine-go 1.9.72
app-engine-python 1.9.98
beta 2021.06.04
bq 2.0.69
core 2021.06.04
gsutil 4.62

~ go version
go version go1.13.9 linux/amd64

Binary websocket messages come as text

I ran into a problem trying to get certain IPywidgets work under the proxy on Google Cloud. For instance, the following notebook renders fine on a custom VM with nginx proxy on GCE, but not on an AI Platform notebook with the pre-configured inverting proxy: https://github.com/seidlr/voila-interactive-football-pitch/blob/master/Interactive-Football-Pitch.ipynb.

With the inverting proxy, binary WebSocket or poll messages are treated as ordinary text messages by the browser, resulting in the JSON parsing error, with or without Shim.