Wednesday, February 20, 2013

EZproxy Wish List: HTTP Compression Support

While looking at ways to make our EZproxy servers more efficient, I re-discovered something that I already knew, but had been ignoring:

EZproxy strips out the Accept-Encoding header from requests, and requests uncompressed content from the upstream servers and sends uncompressed content to the downstream clients.

One might think that simply adding

HTTPHeader Accept-Encoding

to the proxy configuration would be enough to handle this, and it does fix part of the problem.  This allows the browser's Accept-Encoding header to be passed through to the upstream server, but it is not a complete solution (and can beak in certain corner cases):

Client => EZproxy

GET / HTTP/1.1
Host: www.example.com

Accept-Encoding: gzip,deflate,sdch

EZproxy => Server

GET / HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip,deflate,sdch

Server => EZproxy

HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 6202

EZproxy => Client


HTTP/1.1 200 OK
Content-Encoding: none

When EZproxy receives the reply from the upstream server, it decompresses the content so that it can rewrite the content as necessary to keep users from breaking out of the proxy.  The missing step is that EZproxy does not then re-compress the content before sending it back to the user's browser.

Just how big of a deal is this?  Well, on just that one request, the uncompressed content was 26.5KiB vs. 6KiB, so the proxy transferred 4.4 times as much data from the server and to the client.  For fun, ask your IT department what they would do with ~75% more bandwidth...

So why not just add the HTTPHeader line globally, and at least benefit from the Server => EZproxy compression?  Well, some vendors have tried to be smart and dynamically compress or minify JavaScript on the fly, depending on the client browser's capabilities.  In the cited example, the minify handling was broken, and served out corrupted JavaScript files.

It is not a stretch to think that there may be other issues lurking out there when the server is told that the client can handle something that it will not be given.  Look closely at that Accept-Encoding line from Chrome.  Notice "sdch"?  Yeah, I had to look it up too:  Shared Dictionary Compression over HTTP.  There are a few posts that give an overview of what SDCH is about, but in short, it's a technique for sending a delta between a web page that you have and the web page that the server is getting ready to send.  Think of it like a diff function for HTTP content.

Now, what if the upstream Server supports SDCH and sends back a reply that EZproxy has no idea how to cope with properly?  You're going to get sporadic reports of problems, and it may take a while to narrow down that it's isolated to Chrome users, and maybe even longer to figure out it's SDCH at play.

That's just one example of how blindly passing through Accept-Encoding can go wrong, so I'm not opposed to EZproxy manipulating that header.  All of the mainstream browsers handle gzip encoding, and it's easy enough to support.

There is no good reason that I can think of that EZproxy could not simply filter the Accept-Encoding header to only contain gzip (and maybe even deflate), then decompress the server reply on the fly, apply any content changes to keep the users on the proxy, re-compress the content, and send it on to the client.  Once upon a time, someone might have piped up "CPU Cycles!", but I think the days that argument is pretty much dead these days thanks to Moore's Law.

With compression support, seeing a decrease in non-graphics content (HTML, JavaScript, CSS, JSON, XML, etc) of 80% is not an unreasonable expectation.  Add in caching support to handle the graphics, and EZproxy could be significantly more bandwidth friendly.

2 comments:

  1. Newer versions of EZproxy Option AllowSendGZip which enable gzip between EZproxy => Client.

    ReplyDelete
    Replies
    1. That is correct, "Option AllowSendGZip" was silently added in the version 6 line sometime prior to October 2015, and received an update to correct some issues with the support when 6.1.10 was released in December 2015.

      One wishlist item down, 7 or so to go.

      Delete