apache spark notebook #30

  • By bigdata-dev
  • Latest version (#30)
  • trusty
  • Stable
  • Edge

Description

The IPython Notebook is an interactive computational environment, in which
you can combine code execution, rich text, mathematics, plots, and rich media.
IPython Notebook and Spark’s Python API are a powerful combination for data
science.

Overview

IPython Notebook is a web-based notebook that enables interactive data
analytics for Spark. The developers of Apache Spark have given thoughtful
consideration to Python as a language of choice for data analysis. They have
developed the PySpark API for working with RDDs in Python, and further support
using the powerful IPythonshell instead of the builtin Python REPL.

The developers of IPython have invested considerable effort in building the
IPython Notebook, a system inspired by Mathematica that allows you to create
"executable documents." IPython Notebooks can integrate formatted text
(Markdown), executable code (Python), mathematical formulae (LaTeX), and
graphics/visualizations (matplotlib) into a single document that captures the
flow of an exploration and can be exported as a formatted report or an
executable script.

Usage

This is a subordinate charm that requires the apache-spark interface. This
means that you will need to deploy a base Apache Spark cluster to use
IPython Notebook. An easy way to deploy the recommended environment is to use
the apache-hadoop-spark-notebook
bundle. This will deploy the Apache Hadoop platform with an Apache Spark +
IPython Notebook unit that communicates with the cluster by relating to the
apache-hadoop-plugin subordinate charm:

juju-quickstart apache-hadoop-spark-notebook

Alternatively, you may manually deploy the recommended environment as follows:

juju deploy apache-hadoop-hdfs-master hdfs-master
juju deploy apache-hadoop-yarn-master yarn-master
juju deploy apache-hadoop-compute-slave compute-slave
juju deploy apache-hadoop-plugin plugin
juju deploy apache-spark spark
juju deploy apache-spark-notebook notebook

juju add-relation yarn-master hdfs-master
juju add-relation compute-slave yarn-master
juju add-relation compute-slave hdfs-master
juju add-relation plugin yarn-master
juju add-relation plugin hdfs-master
juju add-relation spark plugin
juju add-relation notebook spark

Once deployment is complete, expose the notebook service:

juju expose notebook

You may now access the web interface at
http://{spark_unit_ip_address}:8880. The ip address can be found by running
juju status spark | grep public-address.

Testing the deployment

From the IPython Notebook web interface, click on the "New Notebook" button.
In the notebook cell type "sc." followed by the "Tab" key. The Spark API
completion menu should appear. This verifies the notebook can communicate
with the Spark unit.

Contact Information

Help