Deploying the IPython Notebook to Brisk’s Spark

The IPython notebook is one of the most fun interactive scientific data analytics tools out there.    Elastacloud has a new way to access Spark services on Microsoft’s Azure cloud using their new Brisk service.   It is likely that they will add the IPython notebook into future releases.  But if you can’t wait, here is a step-by-step way to  do it.  You will need to log into the head node of your Brisk Spark cluster.   Next

  1. Make sure to install ipython with the notebook.   If you have python installed, then the easiest way to do this is

$pip install “ipython[notebook]”

2. You will want to add matplotlib.

$sudo apt-get install python-matplotlib

3. Create a pyspark profile with

$ipython profile create pyspark

4. Next you need to add a certificates directory in your home and run

$openssl req -x509 -nodes -days 365 -newkey rsa:1024 -keyout mycert.pem -out mycert.pem

5. Edit the .ipython/profile_pyspark/ipython_notebook/config.py file with this content

—————————-

# Configuration file for ipython-notebook.
c = get_config()
c.NotebookApp.pylab = ‘inline’
c.NotebookApp.certfile = u’/home/briskuser/certificates/mycert.pem’
c.NotebookApp.ip = ‘*’
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8888
PWDFILE = ‘/home/briskuser/.ipython/profile_pyspark/nbpasswd.txt’
c.NotebookApp.password = open(PWDFILE).read().strip()

——————————-

6. In the profile_pyspark directory you need to add this file to the startup subdirectory.  the file name is:  00-pyspark-setup.py with content

——————————————-

import os
import sys
spark_home = os.environ.get(‘SPARK_HOME’, None)
if not spark_home:
raise ValueError(‘SPARK_HOME environment variable is not set’)sys.path.insert(0, os.path.join(spark_home, ‘python’))
sys.path.insert(0, os.path.join(spark_home, ‘python/lib/py4j-0.8.1-src.zip’))execfile(os.path.join(spark_home, ‘python/pyspark/shell.py’))

———————————————

7. Set up a password for the notebook

$python -c ‘from IPython.lib import passwd; print passwd()’ > ~/.ipython/profile_pyspark/nbpasswd.txt

8. Set up the SPARK_HOME variable

$export SPARK_HOME=’/usr/spark’

9. Make sure you expose https as an endpoint with private port 8888.       use the azure portal to do this.

10. Runit!

$ipython notebook –profile=pyspark

11. go to the https://<yourbriskcluster&gt;.cloudapp.net

and you are on!