Monday, September 19, 2016

Space based data centers?

For those of you who were in the IT space about a decade ago, you may remember Sun Microsystems started promoting "data center in a box" where they converted a standard shipping container into a mobile server room that could be deployed to major events, provide disaster support relief, etc.  But taking a bunch of these, slapping a nose cone around them, and sending them into space is not what this post is about.

A recent article in the Winnipeg Free Press about the modern space race between billionaires Elon Musk, Jeff Bezos, Paul Allen, and Richard Branson reminded me about SpaceX's plans to develop a global satellite internet service, and got me thinking about where their initiative could lead.  But first, a little background:

While traditional communication satellites orbit geosynchronously at 22,236 miles, the current plans for SpaceX's constellation of satellites would orbit at only a 750 mile altitude --  much lower than traditional communications satellites.   This would place the SpaceX constellation not quite twice as high as the Iridium constellation today.

Using a lower orbit will require more satellites to provide the same coverage, but solves one of the fundamental issues with satellite internet service -- latency.  A one-way trip from the earth's surface to a geostationary satellite takes approximately 120ms.  Then it has to be re-transmitted to a ground station, adding another 120ms.  Only then can the request be sent to the servers, incurring what most of us consider "normal" latency, but then it has to make 2 more trips -- one up and another down -- to the return to the user, adding another 120ms again each way.  Not accounting for any terrestrial delays, this means that every single packet going through traditional satellite internet systems today takes about 1/2 second simply to go to the geostationary satellite and back 4 times.

How satellite internet works today (right hand side shows the data path)


This means that every web site, even those using AJAX calls for user interface responsiveness, will instantly feel "slow" on existing satellite services.  Research by Jacob Nielsen has shown that for a system to feel like it is responding instantaneously, it needs to update in 0.1 seconds.  For a user's train of thought to not be interrupted, it needs to update in 1.0 seconds.  With 0.5 seconds of guaranteed latency, the "instantaneous" is already out the window with traditional satellite internet, and "normal" internet transmission and processing delays make meeting the 1.0 second deadline a challenge for anything but the most basic of actions.

Choosing a significantly lower orbit changes this.  Instead of 120ms, it only takes signals about 4ms to reach a satellite 750 miles up.  The 4-way trip only adds 16ms to the transmission time, which is on par with terrestrial network performance.  This means that latency should give way to bandwidth as the primary concern for users of this system.

So far, so good, right?  Now, assuming you have all 4,000 satellites circling the Earth at 750 miles and providing internet service, what else could you do with them?

Having been operating in the library world for several years now, and using caching proxies in a commercial setting for many years before that, I am very familiar with the challenges posed by limited bandwidth.  Just ask any IT department what one of their biggest problems is, and bandwidth is going to be in the short list.  To combat this, caching proxy servers can be used, and in fact are already deployed in various ways by existing internet service providers, especially the satellite ISPs.  Some may provide customer equipment that includes a proxy server, others deploy proxy servers at the ground stations.  So far, however, I have been unable to find an instance where a satellite operator deployed a proxy server on the satellite itself.  This is where things start to get interesting.

One of the cost containment measures that SpaceX employs on the Falcon 9 is to prefer deploying off-the-shelf components instead of specialized hardware at several multiples of the cost.  CubeSats today use a similar approach, leveraging off-the-shelf components crammed into a cube (thus the name) 10 cm of useful volume.  [While I'm not sure that SpaceX could quite cram enough into a CubeSat to build their network, the thought of a single Falcon 9 deploying a swarm of 4,000 CubeSats at once amuses me, and would certainly set multiple records.]

Given SpaceX's existing experience with Falcon 9 hardware, as well as the data from CubeSat experiments, it is possible that SpaceX may eschew traditional radiation hardened CPUs, and test off-the-shelf components for their satellite constellation, adding radiation shielding and designing redundancy into the circuits to mitigate the radiation effects instead.  This could mean anything from ARM processors to x86 based designs, but for sake of imagination, let's assume that an x86 design was chosen, and that the processor selected had all of the virtualization bells and whistles enabled, allowing for satellite control operations to be cleanly segmented away from satellite service operations.  What might that mean for their platform?

Well, for one, SpaceX could deploy caching proxy servers on the satellites themselves.  For static assets (graphics, javascript, css, etc.) this would save the satellite to ground station leg, avoid the internet service times, and reduce the latency to about 8ms for the trip up to the satellite, servicing by the local proxy, and return to the ground.  If the satellites were also mesh networked, they could operate as a proxy cluster, sharing assets and query their local peer group for content as well.

The concept of a local peer group in a meshed satellite constellation is a very interesting concept to me.  Without doing all of the detailed math, allow me to do some hand waving to move the discussion along.  A sphere with a 4,700 mile radius (average Earth circumference of 3,950 miles + 750 miles of orbital altitude) has a surface area of about 277,591,000 square miles.  Assume that the satellites are evenly distributed across that sphere (here's where the hand waving is) and that means that each satellite will cover approximately 70,000 square miles on the surface of that sphere.  Applying basic geometry means that each satellite will be on the order of 150 miles apart from each other, allowing each satellite to reasonably communicate with 5 of its peers without adding noticeable  latency to the system, assuming they fly in a formation similar to a geodesic dome configuration.

Why is this significant?  It means that in addition to the proxy server running on each individual satellite, each proxy could query its peers for assets as well, with the satellite-to-satellite communication still being faster than communication the ground station.  Having worked with clustered proxy configurations before, this serves to amplify the effective capacity of the cache cluster.  Depending on the nature of the cached requests, and the exact configuration of the satellite constellation,  it might make sense to define the local cache cluster as not just the immediate peer satellites, but also include their peers, further amplifying the overall benefits of the peering relationships.

"OK, that's great,", you may be thinking, "but what does this have to do with space-based servers?"  Well, remember how I asserted that SpaceX may be able to use off-the-shelf hardware for their satellites, and then laid out one application as a specific example of how an internet service (caching proxy) that could take advantage of running directly on the platform?

What about the remaining cores that are sitting idle on the satellite CPUs?

Recent Intel Phi processors have 72 cores (yes, I realize that product is targeted at the HPC market, but even traditional virtualization targeted CPUs have 24-32 cores these days, thus the point still remains), so if this were the processor of choice for the satellites, control operations could have a core dedicated to it, proxy services could take a second core, leaving 70 cores twiddling their thumbs.  On 4,000 satellites.  With reasonable latency not only to the ground, but between each other.

What would you do with over a quarter of a million CPU cores sitting idle on a low-latency space-based network?  If I were SpaceX, I would look at renting them out.  "The Cloud" is widely used today to talk about hosted servers on the internet, but this would be a true cloud platform, one circling the Earth like an electron around an atom.  And there is no reason to assume that one application has to be statically mapped to one core, either.  Applications could be deployed as docker containers instead of fully virtualized servers, raising the effective capacity of the entire swarm.

Traditional CDN providers would seem to be a natural fit for this platform, but what would major internet services do with access to a platform like this?  It would not be large enough to displace their terrestrial operations, but with a small collection of smart edge nodes to boost their services, what functionality would that open?

Add migration capabilities to the platform, and a single application could move between satellites as they orbit, maintaining coverage over a specific terrestrial geography 24x7.

SpaceX could also choose to expand the scope of the network a bit by throwing a few extra sensors on the satellites and sell time to scientists for research.  Add a couple of cameras for real-time earth imaging, and they could open up not only earth observation science, but also real-time image feeds from space for various commercial applications.  Could the platform also function as an alternative to TDRSS for other science missions?

Take this same orbital cloud, put it on a Falcon Heavy, and deploy it to Mars with the same (or better) capabilities to establish a global network there before any human sets foot on the planet, and you can start exploration, colonization, and research on Mars with full planetary monitoring, voice and electronic communications, file storage and sharing, and other IoT conveniences available from day one.

Add larger transmission relay nodes at the Lagrange points, and you could interface each planetary orbital cloud with a high-powered transmitter to enable high-bandwidth store-and-forward communications at the interplanetary scale.  Mail from bob@domain.com.earth to alice@domain.org.mars, anyone?