Wednesday, February 6, 2013

EZproxy wish list: Embrace the platform

One of EZproxy's greatest strengths is its simplicity.

One of EZproxy's greatest weaknesses is its simplicity.

How can that be?  Put simply, by trying to abstract out the underlying platform, EZproxy has to be more than it needs to be.

EZproxy is distributed as a single binary, statically linked, with a self-extracting function to unpack a directory structure, a startup script that is a simple wrapper for the downloaded binary, and even a sample configuration that demonstrates how to get the server up and running within a few minutes of the download finishing.

That sounds great, right?  A one-stop shop.  Download, install, run, done.

By taking this stance, EZproxy does not embrace the platform that it is being run on, and does not benefit from system services and reap the benefits of following standard conventions. So what's missing?

Log rotation for one.  On Linux systems, most (all?) use logrotate to handle rotating, compressing and retention of log files.  Installed software drops a file into /etc/logrotate.d, and everything else is taken care of automatically.  The closest you can come on this in EZproxy is to use the LogFile directive with the -strftime option.  This will allow you to create a separate log file per time period, but it does not address compression, rotation, or retention.

Another issue is hidden among the files extracted by EZproxy: the mimetypes file.  This is used mainly for EZproxy's handling of mime types when it is serving files locally from one of the directories under docs.  Why is this an issue?  Because it is a very minimal file that is installed by EZproxy:
text/html                      html htm shtml
text/css                       css
image/gif                      gif
image/jpeg                     jpeg jpg jpe
image/png                      png
image/bmp                      bmp
image/tiff                     tiff tif
application/pdf                pdf
application/x-javascript       js
application/msword             doc
application/vnd.ms-powerpoint  ppt
application/vnd.ms-excel       xls
application/vnd.openxmlformats docx pptx xlsx
application/octet-stream       bin exe
application/zip                zip
audio/mpeg                     mp3
That list covers probably 80% of the files that you are likely to serve, but for the other 20%, you're going to wind up with a mime type like text/plain, which can cause issues when serving binary files.  I've even seen issues with HTML files that were created as UTF-8 or UTF-16 files because of the editor they were created with; this is more of an issue with the editor, but still leads to unexpected results.

Next there is the startup/shutdown script, which is just a loose wrapper around the EZproxy binary itself.  The issue here is that the return values are not compliant with the Linux Standards Base (LSB).  This means that other tools written to expect certain behavior from the script will not get the values they are expecting, and thus not be able to manage EZproxy the way other applications can be managed.  The most notable current exception is the "status" command handling not returning the expected values for the various states that EZproxy could be in.  Word is this behavior will be fixed in one of the 6.x releases.

And then there is filesystem layout in general.

Let's start with PID file handling and lock file handling.  In general, a Unix daemon process creates a file containing it's Process ID (PID) that can later be used for checking health of the process and sending signals to the process (HUP to reload, STOP to shutdown, etc), and have a designated place on the system (/var/run).  The IPC file (ezproxy.ipc) should be under /var/run as well.  Similarly, lock files (ezproxy.lck) belong in /var/lock.  Why?  Because systems clean up /var/run/* and /var/lock/* when they reboot.  This would solve the issue of EZproxy not being able to cleanly start after a server crash because the lock and ipc files are left on disk.  Work with the system, not against it.

Then there is log file location (/var/log), configuration file location (/etc), SSL certificate handling (/etc/pki/tls), and the docs directory (/var).

So at the end, what might things look like?  Something along these lines:

/etc/ezproxy/config.txt
/etc/ezproxy/user.txt
/etc/ezproxy/ezproxy.key
/etc/pki/tls/private/ezproxy.key
/etc/pki/tls/certs/ezproxy.crt
/usr/sbin/ezproxy
/usr/share/ezproxy-<version>/license.txt
/var/log/ezproxy/audit/<auditfiles>
/var/log/ezproxy/<logfiles>
/var/log/ezproxy/<messagefiles>
/var/run/ezproxy/ezproxy.pid
/var/run/ezproxy/ezproxy.ipc
/var/lock/ezproxy.lck
/var/ezproxy/<html files>
/var/ezproxy/docs/limited
/var/ezproxy/docs/loggedin
/var/ezproxy/docs/public

And there you have a daemon that behaves like just about every other piece of software on the system.  This layout lends itself to easy packaging, a fairly straightforward SELinux policy, and does not violate the theory of least surprise.

This structure also opens up the doors for future enhancements.  Consider:

/etc/ezproxy/<virtualhost>/config.txt
/etc/ezproxy/<virtualhost>/user.txt
/etc/ezproxy/<virtualhost>/ezproxy.key

In this alternative layout, a command line option (I was going to suggest "-c", but that is already used for network connectivity checking, so maybe "-C" instead) could be used something like this in a real startup script (adapted from vsftpd):


        CONFS=`ls /etc/ezproxy/*/config.txt 2>/dev/null`
        [ -z "$CONFS" ] && exit 6
        for i in $CONFS; do
                        site=`basename $i .conf`
                        echo -n $"Starting $prog for $site: "
                        daemon /usr/sbin/ezproxy -C $i
                        RETVAL=$?
            [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
            echo
        done

This structure would allow a single server to run multiple instances of EZproxy, with a unique configuration file per instance, something that you cannot do today without multiple installations of EZproxy itself.  Going down that route today is not an effortless path: You will need to write a custom startup/shutdown script that can handle starting N independent instances of EZproxy.  Today you would have to worry about maintaining N copies of EZproxy, where the proposed structure would allow you to use just a single EZproxy binary to manage multiple sites, so there would be less systems management overhead as well.

Embrace the platform, reap the benefits.

No comments:

Post a Comment