Tuesday, March 12, 2013

RHEL 6.4 SELinux user mapping

Normally, Red Hat does a decent job on their release notes about explaining what a new feature or change brings with each new RHEL minor release, but this time the release notes did not do the full scope of the change justice:
SSSD Fully Supported Features
A number of features introduced in Red Hat Enterprise Linux 6.3 are now fully supported in Red Hat Enterprise Linux 6.4. Specifically:
  • support for central management of SSH keys,
  • SELinux user mapping,
  • and support for automount map caching.
The key line here was "SELinux user mapping".  This played in with IPA to generate some undesirable results, since the installation of IPA that was serving these systems did not have any SELinux user mapping defined, other than the default role.

Prior to the upgrade, mappings were not enforced, so the SELinux context was unconfined:
$ id -Z
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
Post upgrade, the IPA default user was enforced.  This was not obvious at first, though.  The symptoms were that things that normally worked, didn't: sudo, su, dmesg, looking at log files.  Standard sysadmin diagnostics stuff suddenly stopped working completely.  

Trying to dig into this lead to very strange results, like not being able to run ls -l /bin/su directly and getting "?" for the su record when running ls -l /bin.  The same kind of strange behavior happened to /var, /root, and a few other key areas.  It behaved a lot like filesystem damage at first, but the fsck came back clean.  Logging in as root on the system console did work though, so that was my first clue.

Because this behavior was so bizarre, I tried a setenforce 0 to rule out SELinux, and all of a sudden things started working for non-root users logged in remotely.  I have been used to beating SELinux into submission a daemon context, but this was the first time I'd run into it as a user.
$id -Z
guest_u:guest_r:guest_t:s0-s0:c0.c1023
Up until now, there were no SELinux User Maps in IPA, so default mapping to the guest role was being used.  Thus, the first step was to define some non-guest roles to use.

There was not a lot of information about this on the IPA wiki, but I did find a decent writeup about configuring SELinux on Gentoo's site.  This gave me the key pieces that I needed to setup a map for the user_u:user_r:user_t and the staff_u:staff_r:staff_t mappings for mere mortals and sysadmins.

Amazing what happens when you have permissions to actually run things!

The next step required was to cleanly handle the staff_t to sysadm_t transition for sudo without requiring a separate newrole command to be run.  Thankfully, sudo is SELinux aware, so adding "role=sysadm_r" to the Sudo Rule in IPA that allows sysadmin commands took care of that part.

Monday, February 25, 2013

I go, you go, we all go for SPNEGO

While working through a web SSO Kerberos authentication issue (SPNEGO), I tried testing Safari and Chrome as well as Firefox to make sure that what I was running into was not a bug in Firefox.

The experience left a lot to be desired.

To be fair, I have been working with FreeIPA, so Firefox was already mostly configured for SPNEGO,  since it already had network.negotiate-auth.delegation-uris and network.negotiate-auth.trusted-uris set for my domain.  But that's about the only trick to getting Firefox to work with SPNEGO, and when I went to use it on another server in the same REALM, it appears to be properly sending the correct authentication negotiation headers to the server.

Safari has no such settings, since it relies on the Kerberos setup at the OS level. I have used it with other Apache's mod_auth_kerb module with other servers in the same REALM, so I know it basically just works.  For some reason though, the server was not sending back a 401 authentication challenge, so Safari just may not be supported by this application.  Que sera sera.

On to Chrome.

Oh my!  Chrome requires command line arguments to enable SPNEGO support.  There are no preferences in the UI that you can set.  There is no .plist or .ini or any other kind of file you can edit to cleanly enable it in a persistent manner.  You have to type in this abomination of a command line in a terminal window to run Chrome on a Mac with SPNEGO support:

open '/Applications/Google Chrome.app' --args --auth-server-whitelist="<server>" --auth-negotiate-delegate-whitelist="<server>" --auth-schemes="digest,ntlm,negotiate" https://<server>/


I don't object to using a terminal window; in fact I spend most of my time working in one.  But one would think that Google could come up with a more graceful way to handle that.  And that's not the only time I've had to resort to that for Chrome -- certain developer options require command line switches to enable as well, but I can forgive them -- a little -- in that case.

(This also implies that it will be a cold day in the Valley before Android tablets will have reasonable SPNEGO support.  You can't exactly pass command line options to browsers on tablets without jumping through hoops.  After I get the desktop browsers sorted out, I'll have to see just how bad the situation is on the tablet front.)

Moral of this story: out of the 3 major browsers for the Mac, Firefox seems to have the most widely supported and least troublesome Kerberos/SPNEGO support of them all.

Friday, February 22, 2013

Crontab and percent signs

It's funny how long you can work with a piece of software, and never run into certain features.  Even the most basic software is not immune to this.

I was recently setting up a cron job to do some processing on the previous day's log file:
/path/to/command --logfile=/path/to/logfile-$(date +'%Y%m%d' -d 'yesterday').log
For the non-UNIX literate readers, the $() construct says to run the command inside the parenthesis and use the output.  In this case, I wanted the date for yesterday formatted as YYYYMMDD.

Tested and worked just find from the command line, but when I created a cron job for it, I found this in my inbox the next day:
/bin/sh: -c: line 0: unexpected EOF while looking for matching ``'
/bin/sh: -c: line 1: syntax error: unexpected end of file
First thing I thought is that I had missed a "'" character somewhere, but I hadn't.  How odd.

What does cron have to say for itself?
Feb 21 02:00:01 server CROND[17834]: (root) CMD (/path/to/command --logfile=/path/to/logfile-$(date +')
 Hmm.  Truncated at the first "%" sign, now why would that happen?  Well, according to the manual page for crontab, the "%" character has special meaning:
Percent-signs (%) in the command, unless escaped with backslash (\), will be changed into newline characters, and all data after the first % will be sent to the command as standard input.
I shudder to think how many years I've been using cron, and have managed to side-step this particular feature.  I guess I've always put date functions like that into scripts, and had cron call the script, so I never had to escape the "%" in the actual crontab before.

So now the cron command looks like this:
/path/to/command --logfile=/path/to/logfile-$(date +'\%Y\%m\%d' -d 'yesterday').log
and problem solved.  I had to chuckle to myself, though, because that feature has been around for at least 20 years, and somehow this is the first time I've run into it.

Thursday, February 21, 2013

EZproxy + Squid: Bolting on a caching layer

In an earlier wish list post for native caching support in EZproxy, I stated that the user could easily save 10-20% of their requests to vendor databases if EZproxy natively supported web caching.

I was wrong.

The actual number is closer to double that estimate.

I recently setup a Squid cache confederation upstream from EZproxy, did some testing against Gale and ProQuest databases, and found that the real world number is between 30-40% savings by adding a caching layer.

This re-validates that studies done in the late 90's on HTTP caching appear to still hold true today:
Journal of the Brazilian Computer Society
Performance Analysis of WWW Cache Proxy HierarchiesPrint version ISSN 0104-6500
J. Braz. Comp. Soc. vol. 5 n. 2 Campinas Nov. 1998
http://dx.doi.org/10.1590/S0104-65001998000300003
A Performance Study of the Squid Proxy on HTTP/1.0Alex Rousskov / National Laboratory for Applied Network Research
Valery Soloviev / Inktomi Corporation
Enhancement and Validation of Squid’s Cache Replacement Policy John Dilley, Martin Arlitt, Stéphane Perret
Internet Systems and Applications Laboratory
HP Laboratories Palo Alto
It was very interesting that in my limited testing that my results were largely inline with those studies from over a decade ago:
  • 30-40% cache hit rates with a Squid memory-only cache configuration
  • 5-10% improvement in cache hit ratio by just adding one peer cache
This, despite all of the technology changes that have become commonplace thanks to Web 2.0 that did not exist back when these studies were originally made.

I opted to not configure disk-based storage for the cache for this test, but I may re-visit that at some point in the future, given that Rousskov and Soloviev were reporting nearly 70% hit ratios in their study.

Disk based storage for the cache deserves a look, but  my initial expectation is that in an academic library search setting, one is unlikely to achieve a greater than 40% hit ratio, simply due to the nature of the web sites being used.  Some things that are going to prevent a higher ratio include:
  • Search term auto completion using AJAX calls
  • The search results themselves
  • Search filtering and refinement
In a general purpose library setting, a proxy may be able to achieve higher ratios as patrons go to the same sets of web sites for news, job postings, social networks, etc.  In an academic setting, though, with patrons executing individual searches, I am not convinced that achieving the higher cache hit ratios is a reasonable expectation.

The working set of cached objects between Gale and ProQuest was approximately 90MB, so it was well within the default 256MB memory cache size Squid uses by default.  With that workload, the only thing that a disk cache could be expected to do is to re-populate the in-memory cache copy when the server is restarted.  The cache will be quickly primed after only a few requests, though, so it's not the same as a busy cache that may have gigabytes of data stored on disk.

Another interesting behavior that I observed was that even though the working set could be fully held in either cache's memory, what I saw develop over time was one of the peer caches would hold a subset of objects until they expired, and then the other cache would pick up the baton, refresh the objects, and serve the newly refreshed objects to the cache cluster.  Wash, rinse, repeat, and you start seeing a pendulum pattern as the fresh content moves between the cache peers, with ICP requests fulfilling requests from the peer before doing the long haul to the origin server.

Even a 30-40% cache hit rate is nothing to downplay, though.  That is a significant bandwidth (and to a certain extent time) savings, and given that EZproxy does not support HTTP compression, this may be the best that can be hoped for in the short term.

Wednesday, February 20, 2013

EZproxy Wish List: HTTP Compression Support

While looking at ways to make our EZproxy servers more efficient, I re-discovered something that I already knew, but had been ignoring:

EZproxy strips out the Accept-Encoding header from requests, and requests uncompressed content from the upstream servers and sends uncompressed content to the downstream clients.

One might think that simply adding

HTTPHeader Accept-Encoding

to the proxy configuration would be enough to handle this, and it does fix part of the problem.  This allows the browser's Accept-Encoding header to be passed through to the upstream server, but it is not a complete solution (and can beak in certain corner cases):

Client => EZproxy

GET / HTTP/1.1
Host: www.example.com

Accept-Encoding: gzip,deflate,sdch

EZproxy => Server

GET / HTTP/1.1
Host: www.example.com
Accept-Encoding: gzip,deflate,sdch

Server => EZproxy

HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 6202

EZproxy => Client


HTTP/1.1 200 OK
Content-Encoding: none

When EZproxy receives the reply from the upstream server, it decompresses the content so that it can rewrite the content as necessary to keep users from breaking out of the proxy.  The missing step is that EZproxy does not then re-compress the content before sending it back to the user's browser.

Just how big of a deal is this?  Well, on just that one request, the uncompressed content was 26.5KiB vs. 6KiB, so the proxy transferred 4.4 times as much data from the server and to the client.  For fun, ask your IT department what they would do with ~75% more bandwidth...

So why not just add the HTTPHeader line globally, and at least benefit from the Server => EZproxy compression?  Well, some vendors have tried to be smart and dynamically compress or minify JavaScript on the fly, depending on the client browser's capabilities.  In the cited example, the minify handling was broken, and served out corrupted JavaScript files.

It is not a stretch to think that there may be other issues lurking out there when the server is told that the client can handle something that it will not be given.  Look closely at that Accept-Encoding line from Chrome.  Notice "sdch"?  Yeah, I had to look it up too:  Shared Dictionary Compression over HTTP.  There are a few posts that give an overview of what SDCH is about, but in short, it's a technique for sending a delta between a web page that you have and the web page that the server is getting ready to send.  Think of it like a diff function for HTTP content.

Now, what if the upstream Server supports SDCH and sends back a reply that EZproxy has no idea how to cope with properly?  You're going to get sporadic reports of problems, and it may take a while to narrow down that it's isolated to Chrome users, and maybe even longer to figure out it's SDCH at play.

That's just one example of how blindly passing through Accept-Encoding can go wrong, so I'm not opposed to EZproxy manipulating that header.  All of the mainstream browsers handle gzip encoding, and it's easy enough to support.

There is no good reason that I can think of that EZproxy could not simply filter the Accept-Encoding header to only contain gzip (and maybe even deflate), then decompress the server reply on the fly, apply any content changes to keep the users on the proxy, re-compress the content, and send it on to the client.  Once upon a time, someone might have piped up "CPU Cycles!", but I think the days that argument is pretty much dead these days thanks to Moore's Law.

With compression support, seeing a decrease in non-graphics content (HTML, JavaScript, CSS, JSON, XML, etc) of 80% is not an unreasonable expectation.  Add in caching support to handle the graphics, and EZproxy could be significantly more bandwidth friendly.

Tuesday, February 19, 2013

Using IPA with EZproxy: Where the Wildcards Aren't

Fedora's FreeIPA project is an interesting piece of software that has pulled together several pieces of open source software and given it a point-and-click interface as something of an answer to Active Directory.

The major pieces are the BIND DNS server from ISC, the 389 Directory Server (which started its life as the Netscape Directory Server so many moons ago), DogTag Certificate Server (also of Netscape lineage), and MIT Kerberos.  Each of those packages can be daunting to setup on their own, but the IPA project has done an admirable job of integrating them and making their setup and use simple.

I have been tracking the evolution of IPA for some time now and have finally decided to take the plunge.  So far things have been fairly smooth, with one exception:

IPA does not currently support wildcard DNS.

For a lot of people this would not matter, but when combined with EZproxy in a proxy-by-hostname configuration, it becomes a major problem, since wildcard support is key to its function.

The root of the problem is that the software that ties the DNS server to the LDAP storage engine -- the creatively named bind-dyndb-ldap -- does not support wildcard DNS entries yet.  Like others, when the UI did not allow me to create the wildcard entries, I opened up my favorite LDAP editor (Apache Directory Studio) and created an entry manually.  Alas, it was not a simple UI issue, but rather non-support in the back-end software.  There are bugs entered for both IPA (3148) and bind-dyndb-ldap (95) to track the issue.

Until that is addressed, sites that adopt IPA and use EZproxy need a work-around for this issue.  All both of us.

Now in a traditional BIND setup, you could of course use wildcard entries directly, or you could just point the NS record at the EZproxy server a documented by OCLC on the DNS configuration page, and enabling the DNS functionality built into EZproxy:

ezproxy.example.edu IN A 192.0.2.1
ezproxy.example.edu IN NS ezproxy.example.edu.

Unfortunately, adding that NS record does not seem to work in IPA.  I have not yet taken time to peel back the layers of the onion to figure out exactly why it does not work and where it fails, but adding a NS record for the EZproxy host to IPA caused the lookups to fail completely when they were added to the host entry.

I tried a few other approaches to get this to work in IPA -- setting up a dummy zone for the proxy server and changing with the zone forwarder settings, putting the host and the service names in different zones -- to no avail.  Some approaches looked more promising than others, but none ultimately worked.

Clearly I was not going to be able to address this in the 2.2.0 release of IPA, and would need to go outside the system until wildcard support is natively supported.

My first instinct was to just setup a traditional BIND zone file for each proxy server.  This certainly worked, but required both a named.conf entry, as well as a separate file for each proxy server "zone".  I wanted a solution that would involve less configuration litter to clean up later.

What I finally settled on was setting up simple static-stub zones in BIND with forwarders set to EZproxy:

zone "ezproxy.example.edu" IN {
  type static-stub;
  server-named { "ezproxy.example.edu"; };
  forwarders { 192.0.2.1; };
};

It feels a little dirty having to resort to that, and I'm reminded of the scene from Star Trek: First Contact where Dr. Crusher mutters "I swore I'd never use one of these things" as she activates the Emergency Medical Hologram to create a diversion as she escapes the Borg, but it does work and will buy time until the bind-dyndb-ldap developers can figure out how they want to support wildcard DNS entries.

Monday, February 18, 2013

Communication is a lost art

I recently reported an issue to one of my vendors regarding one of their web sites that they use as a vanity entry point to their service platform.

The initial report was:
When sending users to <website> via HTML form with a GET method, WebKit browsers (Chrome, Safari, and some Android browsers) append a "?" character (see WebKit bug 30103 (https://bugs.webkit.org/show_bug.cgi?id=30103) and Chrome bugs 108690, 121380 (http://code.google.com/p/chromium/issues/detail?id=108690,http://code.google.com/p/chromium/issues/detail?id=121380). 
This causes the browser to access "http://<website>/?" which redirects to http://<vendor website>/?" Note the trailing "?" on the <vendor> URL. The "?" is preserved, and appended to the password field, rending the URL invalid, and presenting the user with a login screen. 
Could you please update the redirect handling on <website> to not preserve the "?" character that WebKit is sending? Side note: any value sent after <website> is also preserved, triggering the same behavior. E.g.:<website>/foo redirects to <vendor website/foo ; while one can take the stance "don't do that", it would be a better user experience to not preserve any path data if it is going to cause errors like this.
I reported the issue, the cause of the issue, the symptoms, the URLs involved, and a resolution path.
Could you provide the screenshots of the issue you reported until  the "?" is preserved, and appended to the password field, rending the URL invalid, and presenting the user with a login screen. This will help in forwarding this issue to the concerned department for further investigation. 
Screenshots?

Really?

To address an issue with how a vanity entry web site mishandles any extra data passed in the URL you want screenshots?

Really?!?

OK, fine, I'll take some screen captures of exactly what I stated and send them.


The website URL

The vendor URL
See?  Start at http://<website>/? and you get redirected to http://<vendorsite>/login..../?

The "?" abides.
Thank you for your email and also for the screenshots. I have forwarded this issue to the appropriate department for further investigation. I will contact you as soon as there is any information regarding this.
I have passed the gauntlet, there is hope that this issue will be fixed!
I received an update from the concerned department regarding the issue with  sending users to <website> via HTML form with a GET method, WebKit browsers (Chrome, Safari, and some Android browsers) append a "?" character. The concerned department has requested for the exact sequence of steps to duplicate this issue. Could you provide the same.
Umm.  HTML form....GET method...this is not looking good.
Load this basic HTML form in a Chrome browser:
<form method="get" action="<website>" >
<input type="submit"/>
</form>
Click the submit button.  Chrome will append a "?" character to the URL, and the <vendor> login error page will be generated. 
I can appreciate the need for a good test case, but this one seemed pretty straightforward...
Thank you for your email and for the additional information too. I have forwarded this to the appropriate department for further review. I will keep you updated as I receive any information in this direction.
My enthusiasm has been diminished, but we'll see if that's the missing part the vendor needed to resolve this.
I received the following update from the concerned department regarding " WebKit browsers (Chrome, Safari, and some Android browsers) appending a "?" character. The update is that 'this appears to be an issue with webkit itself and will have to be fixed by google or apple in the webkit engine. Unfortunately we do not think there is anything we can do on the <vendor> side since this is not specific to <vendor service>. This would happen with any URL.
Sigh. I am not asking them to fix WebKit; I am asking them to fix the way their vanity entry website handles redirecting users into their service platform website to address a very specific browser issue.

I can't help but think of old school burlesque/vaudeville comedy routines (Who's on First, etc.), and the memorable scene from Pulp Fiction between Jules Winnfield (Samuel L. Jackson) and Brett (Frank Whaley).

Apparently the "concerned department" does not grasp the concept of the Robustness Principle.  Funny thing is, the other vanity entry web sites for this vendor work just fine, it's only this one entry point that is broken.

Get the popcorn, kids! This one could drag out for a while.

Friday, February 15, 2013

Collector's Cards: rpm2cpio

Once upon a time, at a job far, far away, we used to refer to bugs as "collectors cards".  Here's an example of why...

It all started innocently enough.  I wanted to crack open a RPM to inspect the contents without actually installing the RPM on a system.  The way I normally do this is using rpm2cpio:
rpm2cpio <RPM> | cpio -id
This takes the RPM payload -- which is in CPIO format -- and dumps it to standard output for the cpio utility to extract.  Then you can go spelunking through the extracted files to see whatever you might be looking for.  (This is also a great rabbit to have in your sysadmin hat for recovering from any number of systems failure scenarios, BTW.)

This simple command normally works great.  That is, until I tried it on a CentOS 6 RPM on the CentoOS 5 system that still manages our internal mirrored content.

When I tried it this time, I consistently got:
cpio: premature end of archive
It didn't matter if I was working on the streamed output (thinking a read error may have caused a failure that was silently eaten by the act of streaming the output into a pipe) or on a file that I piped the output to.  The rpm2cpio extraction seemed to run fine, it's just what was supposed to be cpio content was not decipherable:

$ file output.cpio
output.cpio: data

Taking advantage of a bit of knowledge of the mechanisms behind RPM's payload handling, I deduced that the archive was compressed by something that rpm2cpio was not handling correctly, as I tried the usual suspects: gzip, bzip2, uncompress, zip, with no success.  The file was not identifiable by file, either, but had this in its header:

$ od -a output.cpio | head -1
0000000 } 7 z X Z nul nul nl a { ff ! stx nul ! soh
Hmmm..  "7z" "XZ".  I've heard of compression algorithm 7-zip, and I remember something about "xz" compression being more rsync friendly, and remember talk of RPM using that compression format to make Fedora content more efficient more efficient to mirror.

That got me on the right path, and sure enough, there is a bug (602423) in Red Hat's bugzilla on this very issue, along with a pointer to the unxz command that I had not had a need to use before:
$ cat output.cpio | unxz > output
$ file output
output: ASCII cpio archive (SVR4 with no CRC)
Ahh, there we are, finally the output I was after.

So there are multiple failures at play here:
  1. The file command does not understand how to identify the data compressed with the xz format.
  2. The rpm2cpio command only understands how to handle gzip and bzip2 compressed content.
Both of these are understandable for newly developed code; the ecosystem needs time to catch up with new development work.  The file lag is even more understandable since it is a separate package altogether, and the database that it works from needs to be updated.

The reason this is a "collector's card":  This issue was first reported in the middle of 2010.  It is now 2013, almost 2 1/2 years later.  Support for xz compressed payloads for RPM was added during the Fedora 12 release cycle, which is what served as the basis for RHEL 6.  You're honestly telling me that at no time in the past 2 1/2 years Red Hat could not have released an updated version of RPM on RHEL 5 to one that understood xz compressed payloads?

Here is my prediction of how this bug is going to play out:

This bug will be ignored until RHEL 5 reaches the end of one of its Production cycles that dictates that no further updates will be shipped at that stage of the product's lifecycle.  

If this is deemed an "enhancement" rather than a "bug fix", then that milestone has already passed on Jan. 8 2013.  I highly doubt this will be classified as an "Urgent Priority Bug Fix" worthy of an errata, so the window has likely already closed.

Why does this rub me the wrong way?  Mainly because this has become the modus operandi for how far too many RHEL bugs are "resolved":  Let them fester in bugzilla for a few years, until the time window for dealing with them has passed, and then close them as "too late to fix it now".

"But you can't really expect Red Hat to ship support for new features on old systems!" you say.  

This is an interesting point to address.  Red Hat did change RPM mid-release several years ago, during RHL 6 (no "E") when they updated from RPM 3 to RPM 4.  This created all kinds of challenges when building software during the second half of that product's lifetime.  You had to update any newly installed systems to the RPM 4 binaries before you could install any custom-built software from your own repositories.  I think even released errata had this issue -- you had to update to RPM 4 before you could fully update the system.  Nasty stuff!

This is not quite the same situation, as it is an update that accommodates a new payload compression format, rather than a new RPM header structure.  But is it really that unreasonable to ask that RHEL N-1 be able to understand RHEL N's RPM package format?  Or that support tools like rpm2cpio be able to, if for no other justification than it makes mirror management easier and keeps system recovery options open.

Thursday, February 14, 2013

SELinux policy for EZproxy

My "wish list" item for EZproxy to adopt support for SELinux seems to have generated a bit of general interest.  It seems that I am not the only one who distrusts big binary blobs of software, and wants to contain them as much as possible.

So, without further ado, here is the policy that I have developed for EZproxy on RHEL6/CentOS6.

ezproxy.te:

policy_module(ezproxy,1.0.20)

########################################
#
# Declarations
#

type ezproxy_t;
type ezproxy_exec_t;

init_daemon_domain(ezproxy_t, ezproxy_exec_t)

type ezproxy_script_exec_t;
init_script_file(ezproxy_script_exec_t)

type ezproxy_rw_t;
files_type(ezproxy_rw_t)

gen_require(`
        type initrc_exec_t;
        type sysctl_kernel_t;
        type proc_t;
        type fs_t;
        type tmpfs_t;
        type usr_t;
        type port_t;
        type dns_port_t;
        type public_content_t;
        type public_content_rw_t;
')

########################################
#
# ezproxy local policy
#
allow ezproxy_t self:capability { dac_read_search dac_override chown ipc_owner kill sys_resour
ce setgid setuid };
allow ezproxy_t self:fifo_file rw_file_perms;
allow ezproxy_t self:unix_stream_socket create_stream_socket_perms;allow ezproxy_t self:shm { create unix_read read setattr getattr associate unix_write write de
stroy };

# Init script handling
init_domtrans(ezproxy_t)
domain_use_interactive_fds(ezproxy_t)

allow ezproxy_t sysctl_kernel_t:dir { search read };
allow ezproxy_t sysctl_kernel_t:file read;

allow ezproxy_t self:process { setrlimit execmem };
allow ezproxy_t fs_t:filesystem getattr;
allow ezproxy_t tmpfs_t:file { read write };

allow ezproxy_t usr_t:file { read getattr open };
allow ezproxy_t proc_t:file { read open };

allow ezproxy_t initrc_t:process { signull sigkill };
allow ezproxy_t self:process { signull sigkill };
allow ezproxy_t initrc_t:shm { unix_read unix_write };

files_read_etc_files(ezproxy_t)

libs_use_ld_so(ezproxy_t)
libs_use_shared_libs(ezproxy_t)

miscfiles_read_localization(ezproxy_t)

allow ezproxy_t ezproxy_exec_t:file execute_no_trans;

allow ezproxy_t ezproxy_rw_t:file { manage_file_perms read write };allow ezproxy_t ezproxy_rw_t:dir { search create_dir_perms read write add_name remove_name ope
n };

# Allow the document directory to be a symlink into the ftp directory
allow ezproxy_t ezproxy_rw_t:lnk_file read;

allow ezproxy_t public_content_rw_t:lnk_file read;
allow ezproxy_t public_content_rw_t:dir { search read open };
allow ezproxy_t public_content_rw_t:file { getattr read open };

allow ezproxy_t public_content_t:lnk_file read;
allow ezproxy_t public_content_t:dir { search read };
allow ezproxy_t public_content_t:file { getattr read open };




allow ezproxy_t dns_port_t:udp_socket name_bind;
allow ezproxy_t dns_port_t:tcp_socket name_bind;

sysnet_dns_name_resolve(ezproxy_t)
corenet_all_recvfrom_unlabeled(ezproxy_t)

allow ezproxy_t self:tcp_socket create_stream_socket_perms;
allow ezproxy_t port_t:tcp_socket name_bind;
corenet_tcp_sendrecv_all_if(ezproxy_t)
corenet_tcp_sendrecv_all_nodes(ezproxy_t)
corenet_tcp_sendrecv_all_ports(ezproxy_t)
corenet_tcp_bind_all_nodes(ezproxy_t)
corenet_tcp_connect_all_ports(ezproxy_t)
corenet_tcp_bind_http_port(ezproxy_t)

auth_use_nsswitch(ezproxy_t)

dev_read_rand(ezproxy_t)
dev_read_urand(ezproxy_t)


ezproxy.fc:
/opt/ezproxy/ezproxy    --      gen_context(system_u:object_r:ezproxy_exec_t,s0)
/etc/init.d/ezproxy     --      gen_context(system_u:object_r:ezproxy_script_exec_t,s0)
/opt/ezproxy(/.*)?              gen_context(system_u:object_r:ezproxy_rw_t,s0)
/opt/ezproxy/docs(/.*)?         gen_context(system_u:object_r:public_content_t,s0)
Save those, make sure you have the SELinux development environment installed, and you should be able to just run "make -f /usr/share/selinux/devel/Makefile" to generate the ezproxy.pp file.

I have only used that in proxy-by-hostname configuration, so since I have not really tested proxy-by-port, there may be some gremlins in port-based setups.

Wednesday, February 13, 2013

High availability in IPv6: Harder than it should be

For almost every case, there is more than one way to solve a problem.  The specific nut that I was trying to crack this time was how to share a network address between 2 nodes that are going to be setup for active/passive load balancing services.

In Linux, there are several options to choose from.  You can use the built-in piranha, keepalived, heartbeat, vrrpd, or ucarp (and there may be more options).

My needs are modest.  I just need a floating IP address, some relatively easy to configure software that I can write a puppet module for.  I don't need monitoring, health checks, or any other feature.  Bonus points if the software does not have an abundance of package requirements, as I like to keep the systems footprint slim.

I've worked with piranha before, but it's essentially abandonware at this point.  Red Hat developed it to a point, then shutdown the project and put the code in the wild.  No notable improvements have been made since the RHEL 2.1 timeframe.  Don't get me wrong, it does work (or did the last time I used it) and it comes with a passable management interface.  But like I said, I plan on using puppet to manage the configuration, so the PHP-based management interface is actually a minus for my needs.  I could have been convinced to use if if it had evolved a conf.d-style file layout, rather than a monolithic configuration file.  So that ruled out piranha, as I'm just not inspired enough to generate a puppet configuration for this project from scratch using storeconfigs to communicate cross-node data.

Next up was keepalived.  Good software, lots of features, but too many for this project, and it suffered from the same monolithic configuration file as piranha.  I looked at a few existing puppet modules for keepalived which may have worked, but I quickly wound up in module dependency hell, and none of the existing modules lived up to the implicit promise of "puppet module install" and run.  Besides, keepalived does a lot more than I need for this project, so it seemed overkill.

Next in line: heartbeat.  I've not worked with this software before, but it is something that I'm going to keep in mind for other projects.  I liked what I saw about process handling as part of the managed resources, but again, far too sophisticated for my current needs.

Down to vrrpd and ucarp.  Vrrpd is the implementation of the VRRP protocol from Cisco.  Unfortunately, it has not been updated in a long time.  So long in fact that RFC 5798 (VRRP v3 for IPv4 and IPv6) has been published since its last update 4 years ago.  I need IPv6 support, so sorry vrrpd, you were a non-starter for me.

That just leaves ucarp, OpenBSD's answer to VRRP in case Cisco decides to enforce their VRRP patents.  I've always liked OpenBSD, so I decided to kick the tires on it.

The good news: ucarp is in EPEL, installs as a single stand-alone binary, does not pull in half of the package universe for support, and each shared IP is configured in a separate configuration file.  There goes most of my requirements right there.

The bad news: ucarp does not support IPv6.  At least not the portable version of it.  I found some references to FreeBSD users talking about some issues with IPv6 support in ucarp, so someone is doing some work on IPv6 support, but it's not made it back into the main release line yet.

So now the quandry: do I roll up my sleeves, pull down the FreeBSD source, try to massage the IPv6 support into a workable state, or do I find another solution?

While I enjoy some good low-level programming, for now I found another option.  Ucarpd calls shell scripts when it brings a VIP up and down.  The scripts that ship with the EPEL package are simple wrappers around the /sbin/ip command to add/delete the addresses from the interface.

In short order, I was able to massage these scripts to synthesize IPv6 support:
#!/bin/sh
exec 2>/dev/null
PREFIX="fc00::"
/sbin/ip address add "${2}"/32 dev "${1}"
V6ADDR=$( echo "${PREFIX}${2}" | sed -e 's/\./:/g' )
/sbin/ip address add "${V6ADDR}"/128 dev "${1}"
I take advantage of the fact that a dotted quad IPv4 address can be translated into a valid IPv6 address by simply substituting "." with ":" and appending the morphed IPv4 address to an IPv6 prefix.  It's a handy trick, IMNSHO.

Voilà!  Instant IPv6 support.  Not the prettiest thing you'll ever see, but it took far less time than rewriting part of the network code inside ucarp.

There's one last piece to this puzzle, though, and even had I used one of the other solutions, I would have likely run into this part as well.  The final piece has to do with IPv6's Neighbor Discovery Protocol.

NDP is IPv6's answer to ARP in IPv4.  In IPv4 you could send out a gratuitous arp packet to preemptively announce to the network that an IP address has moved, which effectively flushes the MAC address from the ARP cache of routers, switches and same-network hosts.

I experimented with the "valid lifetime" and "preferred lifetime" options on the address, thinking that there may be a DHCP-esqe release option built into NDP, but it does not appear to be handled that way.  It's too bad, really, because it seems like it would have been a perfect use of those values -- change the address to advertise that it is good for 1 second to the network, let it expire and the network would re-discover the address on the new host.

The quick-fix blunt-force answer is to drop the NDP expiration timer on the network equipment, but studing RFC 5798 Section 8.2 it seems there is a more elegant and IPv6 friendly way of doing this without having to muck with the network's NDP expiration timers.  It looks like a Neighbor Advertisement Message Packet with the override flag set should do the trick.  Fortunately there is a security research tool out there: SI6 Networks IPv6 Toolkit, and the na6 command from that software appears to be just what I need to generate one of those packets.

Quite a bit of trial-and-error later, and the vip-up script now looks like this:

#!/bin/sh
exec 2>/dev/null
[ -f /etc/sysconfig/network-scripts/ifcfg-${1} ] && . /etc/sysconfig/network-scripts/ifcfg-${1}
PREFIX="fc00::"
/sbin/ip address add "${2}"/32 dev "${1}"
V6ADDR=$( echo "${PREFIX}${2}" | sed -e 's/\./:/g' )
/sbin/ip address add "${V6ADDR}"/128 dev "${1}"
IP6ROUTER="<router IPv6 address>"
ROUTER_MAC="<router mac address>"
/usr/sbin/na6 -i "${1}" -s "${V6ADDR}"/128 -d "${IP6ROUTER}"-t  "${IP6ROUTER}" -S "${HWADDR}" -D "${ROUTER_MAC}" -o -c -v
/bin/ping6 -c 10 "${IP6ROUTER}" &
There's a lot more moving parts in that script now.  I found it interesting that just the Network Advertisement Message with the override flag was insufficient to kick the entry out of the Neighbor table on the router; I had to send a few ICMP6 packets as well before the network equipment would pick up the address on the new master host.  I would have expected the "Hey! I'm here" NAM override packet would have been adequate.

I'm not convinced of the absolute stability of this solution yet, as I was seeing a few Duplicate Address Detection failures before I backgrounded the ping6 command because of the time that the script was taking to execute and how ucarp was handling the extended script execution time.  I may yet search the FreeBSD sources for the state of their IPv6 support and see how it looks, but for now ucarp has Frankenstein support for IPv6.

Tuesday, February 12, 2013

EZproxy wish list: Review default values

EZproxy has several tuning parameters that affect how many users the proxy can reasonably support.  Given how much the state of website development has changed over the years, reviewing the default values for these limits is probably long overdue.

Let's take a look at each of these limits, and see how they hold up:

MaxConcurrentTransfers (MC, default 200) determines how many HTTP transfers can be in progress concurrently. Most web browsers are configured to attempt four simultaneous HTTP transfers so that they can load web pages and graphics at the same time. The default value of 200 allows for 50 people to concurrently downloading 4 files each without reaching this limit.
These days, modern browsers support 6-8 simultaneous connections, so the default of 200 now only allows 25-33 people concurrent access.  Doubling this default value should be reasonable, or maybe even tripling it.  The real constraint here is the open file descriptor limit, which varies by system, but should be at least 1024.  Each connection is going to burn 2 file descriptors (one from the client to the proxy, one from the proxy to the server), so it should be safe to set this as high as 500 without having to worry about doing any system tuning to increase the number of file descriptors available to EZproxy.  If OCLC embraces the platform, a file could be installed into /etc/security/conf.d to increase the number of file descriptors available to EZproxy, and the limit could be even higher by default.
MaxLifetime (ML, default 120) determines how long in minutes an EZproxy session should remain valid after the last time it is accessed. The default of 120 determines that a session remains valid until 2 hours after the last time the user accesses a database through EZproxy. MaxLifetime is the only setting that is position dependent in config.txt/ezproxy.cfg. In normal use, it should appear before the first TITLE line.
This is an interesting one, and one that each site is going to have to determine for itself.  The only caution I would give here is to remember that the current behavior of EZproxy (up to 5.6.3GA) is to reset the session timers when EZproxy is restarted.  I know some places put in an automatic restart of EZproxy nightly to pick up any configuration changes made during the day, so those sites will want to ensure that MaxLifetime never exceeds the time between server restarts.
MaxSessions (MS, default 500) determines the maximum number of EZproxy sessions that can exists concurrently. A user's session is created during login and ends after MaxLifetime minutes of inactivity (default 2 hours) have occurred or when the user accesses a URL like http://ezproxy.yourlib.org:2048/logout to explicitly logout.
MaxSessions has a few possible impacts on the server that I can see:

  1. Since sessions are stored on disk, not having a limit exposes EZproxy to a denial of service attack by filling up the drive.
  2. The flip side is that if this is set too low, a resource exhaustion denial of service attack can be launched against the proxy to consume all sessions available, thus locking everyone out of the proxy -- including an admin user!
  3. Since sessions are stored in a flat file on disk, having an unusually large number may lead to performance degradation as the server has to sequentially scan a larger and larger file to find the session.  This could be remedied by putting the session into a hash file, or by using an embedded database like SQLite for session storage.
As a default, 500 is probably a reasonable value, but if the number was chosen as a function of the number of people that MaxConcurrent allows to be concurrently downloading, it may need to be adjusted as well.

MaxVirtualHosts (MV, default 200) determines the maximum number of virtual web servers that EZproxy can create. A virtual web server represents a single host name/port combination. For example, if EZproxy assigns port 2050 to www.somedb.com, 2051 to www.somedb.com:180, and 2052 to www.otherdb.com, these three ports represent three virtual hosts. In normal use, increase this parameter by no more than 50 - 100 each time, as it provides a safe guard against configuration errors in config.txt/ezproxy.cfg that might lead to the creation of excessive, unneeded virtual hosts.
This is probably the single most common limit that EZproxy admins run into.  Having worked with other proxy systems, I suspect that this has more to do with an internal implementation detail of EZproxy, specifically in its support of proxy-by-port vs. proxy-by-name.  Other than that I'm not sure that I can convince myself of why this limit exists at all.  Either way, though, I would suggest that bumping this limit up an order of magnitude would not do any harm these days.  Many moons ago when EZproxy was being run on machines with only 32MB of RAM, this may have been significant, but I'm not sure you can find a Linux server distribution that is supported in running with less that 256MB of RAM these days.

Monday, February 11, 2013

EZproxy wish list: Management API

EZproxy includes a management web site that allows for normal day-to-day operational tasks to be carried out with just a few clicks.  Need to setup SSL? No problem!  Need to review access logs?  They're right there.  Test network connectivity?  Click, click, done.  You can even test out new authentication settings prior to adding them to the configuration file.

There are a few areas where the interface falls short, though.

One area is with session handling.  We were working with session lifetimes, trying to find a good balance between having the patron re-authenticate and their proxy session lifetime.  That was when we discovered a bug where EZproxy resets the session timers for all sessions every time it is restarted.  Since we were changing the session lifetime values, we were restarting the proxy server more frequently than normal, and soon wound up with thousands of sessions that were considered active.

The admin interface to EZproxy would have you go to server status, click on each session, and click Terminate Session.  That's OK for a handful of sessions, but when you have hundreds of incoming sessions, trying to clean out stale sessions to keep the service up is like trying to drink from a fire hose.  I eventually wound up writing a data-scraping EZproxy management application to work around this.  It's not the most elegant solution, but it meets my needs to be able to script interaction with the proxy server.  Hopefully others find it useful as well.

It would have been easier to write that application, though, if EZproxy had an API that I could have tied into, rather than having to data-scrape the status page to load all of the sessions to enable terminating them one-by-one or en masse.  I originally did not want to do data-scraping, but the structure of the ezproxy.hst file is not documented, and is considered internal to EZproxy, thus subject to changing between versions.  The cleanest way to implement the tool was to go through the web interface and treat it as a management API.

Recently, I added the ability to execute Host Management commands to clear stale/unused entries in the proxy tables.  To add that support, I had to run strings against the EZproxy binary to find the embedded HTML page so that I could support all possible operations, not just the ones that were visible on my proxy server at the time I was developing the support.

Again, not an optimal approach, but a functional one, and now about the only thing that the management application does not support is SSL certificates.  I don't plan on adding that, but I can suggest certwatch as a utility that can be useful for keeping track of the SSL expiration dates.

Taking a step back, what I feel like I'm really asking for here is a little more of the Unix Philosophy be incorporated into EZproxy.  Break up the functionality a little bit, embrace modularity, separate the engine from the interface, decouple the act of management from the mechanism by which the management takes place, and open up the interface so that other tools can interoperate with the server.

Friday, February 8, 2013

EZProxy wish list: Better scripting support

EZproxy has a built-in expression language that is used in various capacities within config.txt.  A common mechanism is to use the expressions during login to set variables that can be used in other parts of the login sequence, or later on in database definitions.  I have even seen references to these variables being able to be parsed in the various EZproxy HTML files to do things like add a link in web pages for administrators to navigate to the /admin URL easily.

It is a powerful feature, no doubt, but one that is shrouded in mystery and mystique.

Then there are the conditions and actions settings, which can be used in the user.txt file.  A common technique is to use these to override access for specific accounts retrieved from a central account repository.

And while we're talking about scripting, let's not forget about SPUEdit as well.  I'm not sure that I totally understand the purpose of this directive, but I think it's intended to be used to provide a way to accommodate legacy links while not having to fill your database stanzas with cruft for old Starting Point Urls (hence the SPU in SPUEdit).

Here's today's challenge, though:  Find a good Howto, Tutorial, or Cookbook reference for EZproxy's expression language.  

Go ahead, I'll wait...

Couldn't find one either, eh?

This is something that OCLC needs to address in addition to finishing out the EZproxy reference manual.  Producing a Cookbook document showing various goals, and how to achieve them using EZproxy's expression language would go a long way towards documenting the expression language, and perhaps open our eyes to ways to harness the power of this facility to accomplish things that we cannot today.

Earlier in this series, I had put for the idea of adopting Lua for scripting within EZproxy for services that needed to perform a authentication handshake to generate a token value for patrons to use to connect to the service.

Why not go all the way, and adopt Lua as the internal scripting language for all of EZproxy?  This would give EZproxy the benefit of a mature scripting language with third party library support.

Want to authenticate against a SQL database?  You're out of luck today if you are not running Windows and can use ODBC.  But with LuaSQL, you could tie into the most popular databases on the market from either platform.  Want to go all Web 2.0 and adopt OAuth, OpenID, or some other authentication framework?  I'm pretty sure you can find Lua support for it.

This is why opening up EZproxy would be a good thing for OCLC.  They would not have to go all the way to open sourcing EZproxy (which I think would be a good thing for them to do), even if all they did was open up and expose APIs and interfaces, and adopt a standard scripting language that could be extended, it would be a fantastic step in the right direction.

Thursday, February 7, 2013

Raising the bar: Now there is no lawn

Recently I found an issue in augeas that I fixed, so I wanted to be a good netizen and report the problem along with the patch.

I've never needed to contribute to a Fedora Hosted project before, so I do not possess a Fedora Account, which is required for their bug reporting system.

OK, no big deal, their account signup is not too onerous: email, name, password, security Q&A and a math capcha....  A dozen tries later, I finally abandon the idea that I would ever get past the image capcha.  The validator must be using the new math, because last time I checked 22+53=75.

They have an audio capcha option, too, let's try that....  I'm guessing it's using the Ogg Vorbis codec, because I couldn't get it to play.

Alright, let's try asking on the IRC channel if the account system is having problems....  Except it's been so long since I've used freenode that I have no idea what my password might have been, so I can't join any channel to ask.

Mailing list! There's a developer mailing list!  I'll just drop them a quick note with the patch.
You are not allowed to post to this mailing list, and your message has been automatically rejected.  If you think that your messages are being rejected in error, contact the mailing list owner.
I really don't feel like joining the list just to post a single patch, because the chances that I'll ever find a bug worth reporting again are pretty slim.

By this point, I've already spent more time just trying to report the bug and the fix than it took to find and fix the bug in the first place.

UNCLE!

Finally I looked through the mailing list and just dropped a message with the patch to one of the maintainers, and he committed the fix -- thanks David!

Once upon a time, the Internet used to be an open and collaborative environment, but these days, it's just wall after wall to keep the pests off the lawn.

Wednesday, February 6, 2013

EZproxy wish list: Embrace the platform

One of EZproxy's greatest strengths is its simplicity.

One of EZproxy's greatest weaknesses is its simplicity.

How can that be?  Put simply, by trying to abstract out the underlying platform, EZproxy has to be more than it needs to be.

EZproxy is distributed as a single binary, statically linked, with a self-extracting function to unpack a directory structure, a startup script that is a simple wrapper for the downloaded binary, and even a sample configuration that demonstrates how to get the server up and running within a few minutes of the download finishing.

That sounds great, right?  A one-stop shop.  Download, install, run, done.

By taking this stance, EZproxy does not embrace the platform that it is being run on, and does not benefit from system services and reap the benefits of following standard conventions. So what's missing?

Log rotation for one.  On Linux systems, most (all?) use logrotate to handle rotating, compressing and retention of log files.  Installed software drops a file into /etc/logrotate.d, and everything else is taken care of automatically.  The closest you can come on this in EZproxy is to use the LogFile directive with the -strftime option.  This will allow you to create a separate log file per time period, but it does not address compression, rotation, or retention.

Another issue is hidden among the files extracted by EZproxy: the mimetypes file.  This is used mainly for EZproxy's handling of mime types when it is serving files locally from one of the directories under docs.  Why is this an issue?  Because it is a very minimal file that is installed by EZproxy:
text/html                      html htm shtml
text/css                       css
image/gif                      gif
image/jpeg                     jpeg jpg jpe
image/png                      png
image/bmp                      bmp
image/tiff                     tiff tif
application/pdf                pdf
application/x-javascript       js
application/msword             doc
application/vnd.ms-powerpoint  ppt
application/vnd.ms-excel       xls
application/vnd.openxmlformats docx pptx xlsx
application/octet-stream       bin exe
application/zip                zip
audio/mpeg                     mp3
That list covers probably 80% of the files that you are likely to serve, but for the other 20%, you're going to wind up with a mime type like text/plain, which can cause issues when serving binary files.  I've even seen issues with HTML files that were created as UTF-8 or UTF-16 files because of the editor they were created with; this is more of an issue with the editor, but still leads to unexpected results.

Next there is the startup/shutdown script, which is just a loose wrapper around the EZproxy binary itself.  The issue here is that the return values are not compliant with the Linux Standards Base (LSB).  This means that other tools written to expect certain behavior from the script will not get the values they are expecting, and thus not be able to manage EZproxy the way other applications can be managed.  The most notable current exception is the "status" command handling not returning the expected values for the various states that EZproxy could be in.  Word is this behavior will be fixed in one of the 6.x releases.

And then there is filesystem layout in general.

Let's start with PID file handling and lock file handling.  In general, a Unix daemon process creates a file containing it's Process ID (PID) that can later be used for checking health of the process and sending signals to the process (HUP to reload, STOP to shutdown, etc), and have a designated place on the system (/var/run).  The IPC file (ezproxy.ipc) should be under /var/run as well.  Similarly, lock files (ezproxy.lck) belong in /var/lock.  Why?  Because systems clean up /var/run/* and /var/lock/* when they reboot.  This would solve the issue of EZproxy not being able to cleanly start after a server crash because the lock and ipc files are left on disk.  Work with the system, not against it.

Then there is log file location (/var/log), configuration file location (/etc), SSL certificate handling (/etc/pki/tls), and the docs directory (/var).

So at the end, what might things look like?  Something along these lines:

/etc/ezproxy/config.txt
/etc/ezproxy/user.txt
/etc/ezproxy/ezproxy.key
/etc/pki/tls/private/ezproxy.key
/etc/pki/tls/certs/ezproxy.crt
/usr/sbin/ezproxy
/usr/share/ezproxy-<version>/license.txt
/var/log/ezproxy/audit/<auditfiles>
/var/log/ezproxy/<logfiles>
/var/log/ezproxy/<messagefiles>
/var/run/ezproxy/ezproxy.pid
/var/run/ezproxy/ezproxy.ipc
/var/lock/ezproxy.lck
/var/ezproxy/<html files>
/var/ezproxy/docs/limited
/var/ezproxy/docs/loggedin
/var/ezproxy/docs/public

And there you have a daemon that behaves like just about every other piece of software on the system.  This layout lends itself to easy packaging, a fairly straightforward SELinux policy, and does not violate the theory of least surprise.

This structure also opens up the doors for future enhancements.  Consider:

/etc/ezproxy/<virtualhost>/config.txt
/etc/ezproxy/<virtualhost>/user.txt
/etc/ezproxy/<virtualhost>/ezproxy.key

In this alternative layout, a command line option (I was going to suggest "-c", but that is already used for network connectivity checking, so maybe "-C" instead) could be used something like this in a real startup script (adapted from vsftpd):


        CONFS=`ls /etc/ezproxy/*/config.txt 2>/dev/null`
        [ -z "$CONFS" ] && exit 6
        for i in $CONFS; do
                        site=`basename $i .conf`
                        echo -n $"Starting $prog for $site: "
                        daemon /usr/sbin/ezproxy -C $i
                        RETVAL=$?
            [ $RETVAL -eq 0 ] && touch /var/lock/subsys/$prog
            echo
        done

This structure would allow a single server to run multiple instances of EZproxy, with a unique configuration file per instance, something that you cannot do today without multiple installations of EZproxy itself.  Going down that route today is not an effortless path: You will need to write a custom startup/shutdown script that can handle starting N independent instances of EZproxy.  Today you would have to worry about maintaining N copies of EZproxy, where the proposed structure would allow you to use just a single EZproxy binary to manage multiple sites, so there would be less systems management overhead as well.

Embrace the platform, reap the benefits.

Tuesday, February 5, 2013

EZproxy wish list: IPv6 support

Of all the wish list items that I have mentioned, this is one that OCLC has committed to, so this is technically more of an "anxiously awaiting" rather than "wished for", but this way we can cross an item off the list when OCLC ship the 6.0 version of EZproxy.

OCLC has stated that the 6.0 release (slated for March 2013) will include support for IPv6, and will build upon the support in subsequent updates.  Exactly what we can expect in the initial version is not crystal clear, but hopefully listening on IPv6 and being able to send requests to databases that are on IPv4 networks will be a part of the supported functionality of that initial release.

Last year, OCLC hosted a virtual users group meeting on IPv6 where they outlined their IPv6 plans.  In that meeting, it sounds like the first build that supports IPv6 may only support IPv6 to IPv6 connections.  What I'm waiting to see is if it is going to be supported for EZproxy answer to IPv6 connections but connect users to IPv4 services, as that will be the most common scenario in the short term with IPv6 clients connecting to databases that are still running on IPv4 addresses.

Short term IPv6 slide from the user's group presentation
In the diagram above it makes it seem that 6to4 handling may not be part of the features supported for the first iteration.  The discussion around this slide was saying that the "ADC" (load balancers to us old-timers) would be handling the network protocol translation, which imply that the vendors will be responsible for putting up IPv6 to IPv4 devices to accept IPv6 connections.

That could just be a shortcoming of the amount of space that was available when making that graphic, but they do not show a dual-stack EZproxy server as part of the setup, rather what appears to be a separate EZproxy instance for IPv4 and IPv6.  I had asked about this in the Q&A portion of the presentation, but it was missed in the initial flurry of questions.

Perhaps the following graphic shows the desired dual-stack functionality better:

What this hopefully expresses is that clients can connect to the proxy from either protocol (IPv4 or IPv6) and connect to web servers across either protocol, regardless of which network the request originated.

From a programming point of view, once the connection is accepted on a IPv6 socket, there should be no problem copying the TCP payload to an outbound IPv4 socket for a connection to the vendor, so I am really expecting the first release to be able to handle this.

The good news is there is not much time left to wait before the 6.0 beta should be available.  I for one am hoping to be a happy camper, as we already see native IPv6 traffic, and EZproxy is the last piece of our infrastructure that is not IPv6 ready.  If the initial release does not support IPv6 to IPv4 connections, I may have to get creative on the proxy's connection to the backbone network instead...

Monday, February 4, 2013

EZproxy wish list: Structured configuration file

A common point of confusion among EZproxy users arises when they start using position sensitive directives in the EZproxy configuration file.

Some rules, like the SSL Cipher directives, must be positioned before other directives (LoginPortSSL in this case).

Others turn certain features on and have a complimentary directive that turns them off (Option DomainCookieOnly / Option Cookie; AutoLoginIP / ExcludeIP / IncludeIP; etc)

Getting to know all of these directives, their relationships, when to use, and in some cases when to not use them can be a steep learning curve for new EZproxy users.

The solution to several of these issues would be to adopt a new structured configuration file format, one similar to Apache HTTPD.

Let's take a look at what a before and after might look like for a few stanzas:

Option DomainCookieOnly
Title -hide EBSCO LinkSource
URL http://linksource.ebsco.com
HJ linksource.ebsco.com
Option Cookie

Here's what this might look like if the Apache HTTPD style were adopted:

<Database "EBSCO LinkSource">
    HideTitle
    URL http://linksource.ebsco.com
    HJ linksource.ebsco.com
    Option DomainCookieOnly
</Database>

It's not a dramatic change, but now it is clear what the DomainCookieOnly is limited in scope to the EBSCO LinkSource database.  It eliminates the possibility that the "Option Cookie" line is forgotten and impacts a different database stanza.

Here's another idea:

URL http://alexanderstreet.com/
Domain alexanderstreet.com
Host ahiv.alexanderstreet.com
...
Host womv.alexanderstreet.com

Almost a screen full of Host directives complete that stanza!

Why not enable making it simpler, like so:

<Database "Alexander Street Press">
    Domain alexanderstreetpress.com
    Host *.alexanderstreetpress.com
</Database>

Those seem pretty straightforward, and easy to follow.  Now let's look at some core configuration.

Before:

Interface Any
LoginPort 80
Option ForceHTTPSLogin
Option DisableSSL56bit
Option DisableSSLv2
LoginPortSSL 443
IncludeFile databases/vendorA
IncludeFile databases/vendorB
IncludeFile databases/vendorC

As mentioned above, there is an ordering that needs to be observed for those directives to work, and do what they are intended.  If you have a group of Option directives in you configuration file that happens to appear after the LoginPort, and blindly add the DisableSSL settings, you will not actually disable the weak SSL cipher support.

What if that were configured more like Apache VirtualHosts:

<VirtualHost [IP Address #1]:80>
    IncludeFile databases/vendorA
    <Authentication ReferringURL>
        URL http://lib.example.edu/
    </Authentication>
</VirtualHost >

<VirtualHost [IP Address #2]:80>
    IncludeFile databases/vendorB
    <Authentication Ticket>
        TimeValid 10
        MD5 somekey
        Expired Deny expired.html
    </Authentication> 

</VirtualHost >

<VirtualHost [IP Address #1]:443>
    ForceHTTPSLogin
    DisableSSL56bit
    DisableSSLv2
    IncludeFile databases/vendorC
    <Authentication File>
        URL file:///path/to/users.txt
    </Authentication>
</VirtualHost >

Here we have 3 proxy instances on 2 different IP addresses, one setup for SSL.  The first does referring URL authentication to Vendor A, the second does ticket-based authentication for Vendor B, and the third uses SSL for Vendor C for a subset of users.

An approach like this would make the multi-site proxy approach easier to setup.  The nightmare scenario where you have multiple campus locations, each teaching different programs, each with different capabilities when it comes to user authentication, each with different sophistication levels of the web site that patrons are going to be using.

Setting up something as complex as that example in EZproxy today with its flat configuration would require the use of Groups, multiple Interface directives, very careful ordering of directives, and extensive testing.  I could probably generate a working equivalent, but it would take a bit of time to work it all out and make sure that it was functioning as designed.

With a structured configuration format like the strawman proposal above, you can see how it is obvious which rules are applying to which database, without having to scour the file for context, and with the right hierarchy of directives, the amount of time spent designing and testing such a setup can be reduced significantly.

Friday, February 1, 2013

EZproxy wish list: modular interface for authentication and services

Between the built-in authentication support and the catch-all CGI mechanism, EZproxy does a decent job of handling a diverse set of authentication scenarios.  What is implemented is the 80-90% use case, which -- as a developer myself -- I can appreciate.  Once you step outside the realm of what is implemented, though, about the only option left available are to use the CGI interface as glue logic into other systems.  This can be made to work for many things, but not everything.

For example, RADIUS is supported, but I do not know of any way to flag an account in RADIUS as an administrator account for EZproxy.  Nor is RADIUS accounting support implemented (not that many people would use accounting in a typical setting, but in a hosted environment it could open up interesting options for usage-based billing).

One thing that I would really like to have, though, is better Kerberos support.  There is already some level of Kerberos built into EZproxy for Active Directory support, but I cannot setup a SPNEGO authenticated connection leveraging my existing non-Microsoft Kerberos servers, because Kerberos is not able to be configured independently of Active Directory.

What would be nice is if OCLC would adopt a modular interface for EZproxy so that those of us with both the skill set as well as the proverbial itch can write extensions to EZproxy that close the feature gap on the 10-20% case that is left out.

If OCLC does go down the modularization route, let's not forget about service handling.  Look at current services that require a magic token to connect, like Books24x7, Netlibrary, eBrary, etc.  These services perform a handshake with the vendor to retrieve a token value, and then redirect the user's browser into the service using the token as the authentication key.

There is no reason that EZproxy could not use a scripting language like Lua (which is the scripting language behind a surprising amount of commercial programs and games) to perform these handshakes, making the service handling as flexible as the database stanzas are now.  With the right design, the scripting interface might even be able to make some of the database stanzas simpler.

Take the recent phenomenon of .NET applications that use the _VIEWSTATE variable to maintain application state as described in this MSDN article.  It is just screaming for better EZproxy support.  Creative use of the Find/Replace directives can handle populating the username and password fields, and then inject JavaScript into the page to auto-submit the form for a seamless login, but each of these has to be hand-crafted today.

Wouldn't it be nice if EZproxy used Lua as a macro language, and the EZproxy user community could write a function to handle this case more gracefully and share that code the same way we share database stanzas today?

Thursday, January 31, 2013

EZproxy wish list: Don't write to the license file

EZproxy is a licensed product, and has a configuration file, ezproxy.key, that holds the license key.

In most other products, only administrators touch the license key file, or the software does, but only upon a human interaction, that is when the license is entered or updated.

EZproxy takes a different approach.  It reads the file, performs a validation of the key file (which appears to be a local validation, rather than a remote call to OCLC servers), and then writes the file back out, with a new timestamp value, then re-reads the license file, and continues the startup processing.

This happens each and every time the server is started or restarted as part of its initialization.

Here's where the problem comes in.  Before this change (which happened 3 or so years ago, I forget exactly which version introduced this "feature"), I used to be able to make this file read-only, and not writable by the ezproxy RunAS user.  (You are using RunAS, right?)  After this change, I had to make the file not only read-writable but read-writable by the RunAS user.

Sorry, but this is a BROKEN DESIGN.

I'm sure there are other pieces of software that behave this badly, but I am hard pressed to name any.  Perhaps it's the advanced repression techniques kicking in.

Look at any other piece of software, and the concept that the administrative user owns the files, and the non-priviledged user just reads the configurations and runs the software is a pervasive concept.

Why do I feel so strongly about this?
  1. This leads to service outages.
  2. This negates part of the benefits of RunAS.
  3. This can introduce unintended consequences.
Let's explore each of these:

1) Service Outages.

Be honest, EZproxy is such a low-maintenance piece of software that it is very easy to set it up, and forget about it until there is a problem.  Sometimes you can automate your way out of many of them, but the truth is that the squeaky wheel gets the grease, and EZproxy generally doesn't squeak.

One of the scenarios leading to a service outage is a disk full situation due to log files.  Even with filtering, rotation, and compression, given enough time, disks will fill up, especially on a busy proxy server.  Even with disk space monitoring, you may not appreciate the seriousness of the alert until it's too late, or you might *gasp* be on vacation when it happens.

In a normal scenario, when the disk fills up, EZproxy will happily keep running.   You just lose your ability to record log data.   Not optimal, but not catastrophic either.

That is, until you restart the software.

What happens?  EZproxy reads the license file, validates the license, writes the license file ... oops, no disk space ... *BOOM* bye-bye proxy:
open("ezproxy.key", O_RDONLY) = 5
...
open("ezproxy.key", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
See that O_TRUNC flag in the open() function call?

       O_TRUNC
              If  the file already exists and is a regular file and the open mode allows writing (i.e.,
              is O_RDWR or O_WRONLY) it will be truncated to length 0.  If the file is a FIFO or termi-
              nal device file, the O_TRUNC flag is ignored. Otherwise the effect of O_TRUNC is unspeci-
              fied.

The file is truncated as a result, so now you no longer have a valid license file because there is no disk space left to re-write the file, and when the subsequent re-read of the license file occurs, the license file is empty, and the server is now unlicensed and will not start.

Reason #1 why having the software re-write its license file on the fly is a BAD IDEA.

2) Weakening the security model of RunAS

Running software as a non-administrative user is a very good thing.   Running as a unique user, separate from any other system tasks is a very good thing.  Partitioning run-time processing from configure-time processing is a very good thing.

Except by writing to the license file, this breaks the partitioning run-time from configure-time.  Look at the model that most other software uses:

The root (administrative) user starts the daemon process
The process opens any ports that require root permissions
The process opens/reads any files that require root permissions
(Some software will chroot() to an empty directory to raise the "you must be this tall" bar for compromising the system at this stage.)
The process will drop root permissions and run as (RunAS, get it?) a non-administrative user

By making it so that the license key file is writable by the RunAS user, a security weakness is introduced where an attacker who finds a way to the RunAS user account can setup a denial of service attack via the license file:  delete it, corrupt it, fill up the disk space that holds the file (and there are several nasty ways to do that under the radar), etc.


Reason #2 why having the software re-write its license file on the fly is a BAD IDEA.

This also sets up the next issue....

3) Introducing unintended consequences

There are probably as many different ways to manage an EZproxy server as there are EZproxy servers.

Some of these may involve giving out access to the RunAS user for various reasons.  Your site might have administrators install the software, then hand it over to an electronic services librarian who configures and maintains it.  Or manages user authentication files.  Or updates the database definitions.  Or maintains the files in the public/loggedin/limited directories.  The point is that is is not hard to imagine a scenario where users might share access to the RunAS user account, or are put into the same group as EZproxy and have write access to the license file, either intentionally or by oversight.

Now, combine this with the fact that the license file has to be writable by the RunAS user, the overall system is made less secure.  On the innocent side, users make mistakes and accidents happen.  Ever do a "rm -rf . /*"?  You'll (hopefully) only do that once, and learn a painful enough lesson that you won't ever do it again.

On the nefarious side, ever have a staff member leave under less than optimal circumstances?  One simple change to the license file, and your proxy is now a ticking logic bomb.

Either way, an action that is normally benign -- a proxy software or server restart -- will now turn into a major problem.  How long will it take you to figure out what the problem is, find the license code, and fix it?  Murphy says this will happen after support hours before an extended holiday, all of your backup tapes were stored "on top of the new cabinet" (which turns out to be a transformer), and the only person who knows the license code will be on a pilgrimage to Motuo County.


Reason #3 why having the software re-write its license file on the fly is a BAD IDEA.

In short, all of these real and potential problems are introduced just so the server can log this message:
2013-01-31 09:20:15 Thank you for your purchase of this licensed copy of EZproxy.  EZproxy was last able to validate the license on 2013-01-31 09:20:15.
Is that feel-good message really worth it?  Can we please drop the useless timestamp in that message, go back to just validating the license, and leave the license file alone?