Tuesday, February 12, 2013

EZproxy wish list: Review default values

EZproxy has several tuning parameters that affect how many users the proxy can reasonably support.  Given how much the state of website development has changed over the years, reviewing the default values for these limits is probably long overdue.

Let's take a look at each of these limits, and see how they hold up:

MaxConcurrentTransfers (MC, default 200) determines how many HTTP transfers can be in progress concurrently. Most web browsers are configured to attempt four simultaneous HTTP transfers so that they can load web pages and graphics at the same time. The default value of 200 allows for 50 people to concurrently downloading 4 files each without reaching this limit.
These days, modern browsers support 6-8 simultaneous connections, so the default of 200 now only allows 25-33 people concurrent access.  Doubling this default value should be reasonable, or maybe even tripling it.  The real constraint here is the open file descriptor limit, which varies by system, but should be at least 1024.  Each connection is going to burn 2 file descriptors (one from the client to the proxy, one from the proxy to the server), so it should be safe to set this as high as 500 without having to worry about doing any system tuning to increase the number of file descriptors available to EZproxy.  If OCLC embraces the platform, a file could be installed into /etc/security/conf.d to increase the number of file descriptors available to EZproxy, and the limit could be even higher by default.
MaxLifetime (ML, default 120) determines how long in minutes an EZproxy session should remain valid after the last time it is accessed. The default of 120 determines that a session remains valid until 2 hours after the last time the user accesses a database through EZproxy. MaxLifetime is the only setting that is position dependent in config.txt/ezproxy.cfg. In normal use, it should appear before the first TITLE line.
This is an interesting one, and one that each site is going to have to determine for itself.  The only caution I would give here is to remember that the current behavior of EZproxy (up to 5.6.3GA) is to reset the session timers when EZproxy is restarted.  I know some places put in an automatic restart of EZproxy nightly to pick up any configuration changes made during the day, so those sites will want to ensure that MaxLifetime never exceeds the time between server restarts.
MaxSessions (MS, default 500) determines the maximum number of EZproxy sessions that can exists concurrently. A user's session is created during login and ends after MaxLifetime minutes of inactivity (default 2 hours) have occurred or when the user accesses a URL like http://ezproxy.yourlib.org:2048/logout to explicitly logout.
MaxSessions has a few possible impacts on the server that I can see:

  1. Since sessions are stored on disk, not having a limit exposes EZproxy to a denial of service attack by filling up the drive.
  2. The flip side is that if this is set too low, a resource exhaustion denial of service attack can be launched against the proxy to consume all sessions available, thus locking everyone out of the proxy -- including an admin user!
  3. Since sessions are stored in a flat file on disk, having an unusually large number may lead to performance degradation as the server has to sequentially scan a larger and larger file to find the session.  This could be remedied by putting the session into a hash file, or by using an embedded database like SQLite for session storage.
As a default, 500 is probably a reasonable value, but if the number was chosen as a function of the number of people that MaxConcurrent allows to be concurrently downloading, it may need to be adjusted as well.

MaxVirtualHosts (MV, default 200) determines the maximum number of virtual web servers that EZproxy can create. A virtual web server represents a single host name/port combination. For example, if EZproxy assigns port 2050 to www.somedb.com, 2051 to www.somedb.com:180, and 2052 to www.otherdb.com, these three ports represent three virtual hosts. In normal use, increase this parameter by no more than 50 - 100 each time, as it provides a safe guard against configuration errors in config.txt/ezproxy.cfg that might lead to the creation of excessive, unneeded virtual hosts.
This is probably the single most common limit that EZproxy admins run into.  Having worked with other proxy systems, I suspect that this has more to do with an internal implementation detail of EZproxy, specifically in its support of proxy-by-port vs. proxy-by-name.  Other than that I'm not sure that I can convince myself of why this limit exists at all.  Either way, though, I would suggest that bumping this limit up an order of magnitude would not do any harm these days.  Many moons ago when EZproxy was being run on machines with only 32MB of RAM, this may have been significant, but I'm not sure you can find a Linux server distribution that is supported in running with less that 256MB of RAM these days.

1 comment:

  1. Recommend running a script like this as a cron job:

    #!/bin/bash

    cd /usr/local/ezproxy

    #test if hst file exists
    # if ezproxy.hst exists, check its size
    if [ -f ./ezproxy.hst ]
    then
    num=`grep ^H ezproxy.hst | wc -l`
    # echo "$num"
    if [ "$num" -lt "500" ]
    then
    # echo "less than $num hosts"
    exit 0
    else
    # echo "more than $num hosts, stopping ezproxy."
    ./ezproxy stop
    rm -rf ezproxy.hst
    fi
    fi

    if ./ezproxy restart
    then
    echo "ezproxy.hst was deleted and ezproxy restarted." | mail -s \
    "ezproxy successfully reset on $HOST" helpme@myschool.edu
    # echo "ezproxy successsfully reset"
    else
    echo "encountered problem restarting ezproxy" | mail -s \
    "failed to restart ezproxy on $HOST" helpme@myschool.edu
    # echo "failed to restart ezproxy"
    fi

    ReplyDelete