The NSF Cloud Workshop

On December 11th and 12th the NSF hosted a workshop on experimental support for cloud computing.   The focus of the event was to acquaint the community with two new and exciting cloud research testbeds.   One project is the Chameleon cloud and the other is the CloudLab  project.   These two very interesting projects grow out of two different communities of researchers, but the both seek to satisfy some important goals.   Chameleon is an evolution of the FutureGrid project and led by Kate Keahey of the Computation Institute of the University of Chicago and Argonne and Dan Stanzione of the Texas Advanced Computing Center.  Additional participants include Joe Mambretti from Northwestern, D.K. Panda from Ohio State, Paul Rad from UT San Antonio and Warren Smith from UT Austin.  CloudLab is the result of a collaboration of the University of Utah, Clemson University, the University of Wisconsin Madison, the University of Massachusetts Amherst led by Robert Ricci, Aditya Akella, KC Wang, Chip Elliott, Mike Zink and Glen Richart.   CloudLab is really an evolution of aspects of the long running and successful NSF GINI project.

With one project coming from the tradition of high performance computing and the other from the world of advanced computer networking, I was expecting to see a fairly divergent set of plans.   Instead what we saw was a pair of well thought out infrastructure facility designs that can support the systems research community  very nicely.

The two facilities are different but they are also complementary.   CloudLab emphasizes a more heterogeneous set of resources.  For example it includes a subcluster that is based on low power ARM processors and even some FPGA support on another subcluster.    Chameleon  is a distributed by homogeneous system with part of it housed at Argonne and rest at TACC.    Both projects make extensive use of software defined networks and Internet2 to provide high bandwidth networking between sites.  And both provide bare metal to Openstack-based software stacks.

An important point was made about the role of these two facilities in the nations cyber-infrastructure.  These facilities are not designed to replace access to public clouds like those from Amazon, Google and Microsoft.  Nor are they intended to serve the same HPC workloads as the NSF Supercomputers. They are designed to provide a platform for the basic systems research that cannot be carried out on these existing resources.   For example,  consider research that optimizes the cloud software stack.   Or applications that explore  dynamically moving computational loads across geographic regions or offloading computationally intensive tasks to dedicated HPC resources.  For example, can we transparently migrate large machine learning tasks from the Chameleon cloud to the TACC supercomputer?   How can we optimize the storage system so that large data collections can be prefetched and streamed to computational analytics as it is needed?  How does one optimize  virtualization to support new analytics algorithms that can utilize FPGAs or GPGPUs?

The workshop itself allowed the community to present ideas for applications of the infrastructure.  One presentation from Amit Majumdar described the Science Gateway project from the XSEDE NSF Supercomputing collaboration.   The Science Gateways provide web interfaces to application domain tools.  Together the science gateways support about 1/3 of the XSEDE scientific users.  For example, CIPRES is a science gateway for phylogentics research and NSG is a gateway for computational  neuroscience.  CIPRES provides access to several high performance parallel codes for inference of large phylogenetic trees through a web interface.  This is a perfect example of how a cloud infrastructure and a supercomputing facility can work together.   Clouds were created to support thousands of concurrent users and Gateway web interfaces are a perfect use case.  The gateway can support hundreds of “small” (32 core) parallel runs submitted by users and bigger computations can be submitted to the supercomputing queue.  NSG provides similar  services to the neuroscience community.  These and other gateways also  provide tools for scientific communities to share data and collaborate.  Again this is a perfect cloud application.   The only real concern I have is that much of what can be done with science gateways  can be also done on existing public clouds.  The challenge is to devise a research plan that makes use of the unique capabilities of Chameleon and CloudLab.

Another topic that was discussed was real-time management of data from cars and other sensor driven systems.  It is conceivable that we can redesign our entire transportation system if we can collect the data streaming from autonomous vehicles and send it to intelligent management systems that “instantly” calculate route and velocity data for each vehicle.  This is not possible with current public clouds because of latency and other performance issues.  This is a perfect topic to be supported by Chameleon and CloudLab.