apache spark common #5

  • By bigdata-dev
  • Latest version (#5)
  • trusty
  • Stable
  • Edge

Description

Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster
on disk. Spark powers a stack of high-level tools including Spark SQL, MLlib for
machine learning, GraphX, and Spark Streaming. You can combine these libraries
seamlessly in the same application.


Overview

Spark Standalone 1.3.x cluster

Apache Spark™ is a fast and general purpose engine for large-scale data processing.
Key features:
The IPython Notebook is an interactive computational environment, in which you
can combine code execution, rich text, mathematics, plots and rich media.
Speed:
Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.
Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.
Ease of Use:
Write applications quickly in Java, Scala or Python.
Spark offers over 80 high-level operators that make it easy to build parallel apps.
And you can use it interactively from the Scala and Python shells.
General Purpose Engine:
Combine SQL, streaming, and complex analytics.
Spark powers a stack of high-level tools including Shark for SQL, MLlib for
machine learning, GraphX, and Spark Streaming. You can combine these frameworks
seamlessly in the same application.

Usage

Testing the installation

Smoke test Hive

S## Smoke tests after deployment
# Spark admins use ssh to access spark console from master node
1) juju ssh spark-master/0 <<= ssh to spark master
2) Use spark-submit to run your application:
spark-submit --class org.apache.spark.examples.SparkPi /usr/lib/spark/lib/spark-examples*.jar 10
you should get pi = 3.14
or execute demo.sh from /home/ubuntu

3) Spark’s shell provides a simple way to learn the API, as well as a powerful 
tool to analyze data interactively. It is available in either Scala or Python. 
Start it by running the following in the Spark directory:
$spark-shell <== for interaction using scala 
$pyspark     <== for interaction using python

Contact Information

bigdata-dev@canonical.com

Help

Configuration

namenode_port
(int) Please enter namenode's port number
8020
resources_mirror
(string) URL from which to fetch resources (e.g., Spark binaries) instead of Launchpad.
namenode_hostname
(string) Please enter namenode's hostname (or ip address)