HBase is the Hadoop dataBASE . Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.


HBase Overview

HBase is the Hadoop database. Think of it as a distributed scalable Big Data store.

Use HBase when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.

HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.

HBase provides:

  • Linear and modular scalability.
  • Strictly consistent reads and writes.
  • Automatic and configurable sharding of tables
  • Automatic failover support between RegionServers.
  • Convenient base classes for backing Hadoop MapReduce jobs with HBase tables.
  • Easy to use Java API for client access.
  • Block cache and Bloom Filters for real-time queries.
  • Query predicate push down via server side Filters
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options
  • Extensible jruby-based (JIRB) shell
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX.

See the homepage for more information.

This charm provides the hbase master and regionserver roles which form part of an overall hbase deployment


A HBase deployment consists of a HBase master service and one or more HBase RegionServer services::

juju deploy hbase hbase-master
juju deploy hbase hbase-regioncluster-01

In order to function correctly the hbase master and regionserver services have a mandatory relationship with zookeeper - please use the zookeeper charm to create a functional zookeeper quorum and then relate it to this charm::

juju deploy zookeeper hbase-zookeeper
juju add-units -n 2 hbase-zookeeper
juju add-relation hbase-master hbase-zookeeper
juju add-relation hbase-regioncluster-01 hbase-zookeeper

Remember that quorums come in odd numbers start from 3 (but it will work with one BUT with no resilience).

The hbase services also require the services of an hdfs namenode; these are provided by the hadoop charm.

HBase requires that append mode is enabled in DFS - this can be set by providing a config.yaml file::

    hbase: True
    hbase: True

Its really important to ensure that both the master and the slave services have the same configuration in this deployment scenario::

juju deploy --config config.yaml hadoop hdfs-namenode
juju deploy --config config.yaml hadoop hdfs-datacluster-01
juju add-relation hdfs-namenode:namenode hdfs-datacluster-01:datanode

The hadoop services can also support mapreduce - please see the hadoop charm for more details.

The namenode can then be related to the hbase deployment::

juju add-relation hdfs-namenode:namenode hbase-master:namenode
juju add-relation hdfs-namenode:namenode hbase-regioncluster-01:namenode

Once the hbase services have been related to both zookeeper and hdfs they can be related to each other::

juju add-relation hbase-master:master hbase-regioncluster-01:regionserver

At this point the role of each service is fixed and CANNOT be changed. ever. period.

Its also possible to run with more that one hbase master service unit::

juju add-unit hbase-master

The masters will coordinate through zookeeper to establish control of the cluster and will re-coordinate if one of the master service units disappears.

You can also add additional regionservers::

juju add-unit -n 2 hbase-regioncluster-01

The charm also supports use of the thrift, avro and rest gateways. Any hbase service can be used in this way by associating another service with it::

juju add-relation hush:thrift hbase-regioncluster-01:thrift

OR you can deploy a seperate gateway server::

juju deploy hbase hbase-thrift
juju add-relation hbase-thrift hbase-zookeeper
juju add-relation hush:thrift hbase-thrift:thrift

thrift, avro and rest all operate over HTTP and are stateless so use with haproxy is possible::

juju deploy haproxy rest-gateway
juju add-relation rest-gateway hbase-regioncluster-01:rest

Rolling Restarts

Restarting a HBase deployment is potentially disruptive so the charm will NOT automatically restart HBase when the following events occur:

  • Zookeeper service units joining or departing relations.
  • Upgrading the charm or changing the configuration.

However the charm will update configuration files and automatically sets up SSH key authentication between nodes within a service deployment and from the master service to regionserver services.

A rolling restart script is provided by the charm which will restart you HBase deployment in a controlled fashion::

juju ssh hbase-master/0 hbase-rolling-restart

If any inconsistencies are found in HBase the restart will not happen. The script also supports just restarting regionservers::

juju ssh hbase-master/0 hbase-rolling-restart --rs-only

or just masters::

juju ssh hbase-master/0 hbase-rolling-restart --master-only

This script must be run from a HBase master.


(string) Location and packages to install hbase: . * dev: Install using the hbase packages from ppa:hadoop-ubuntu/dev. * testing: Install using the hbase packages from ppa:hadoop-ubuntu/testing. * stable: Install using the hbase packages from ppa:hadoop-ubuntu/stable. . The packages provided in the hadoop-ubuntu team PPA's are based directly on upstream hbase releases but are not fully built from source.
(int) The maximum heap size in MB to allocate for daemons processes within the service units managed by this charm. . The recommended configurations vary based on role: . * Master: 4GB. * RegionServer: 12GB, or the majority of available memory. . The above recommendations are taken from HBase: The Definitive Guide by Lars George. . Obviously you need to ensure that the servers supporting each service unit have sufficient memory to accomodate this setting - it should be no more than 75% of the total memory in the system excluding swap.
(boolean) To install Apache Pig on all service units alongside Hadoop set this configuration to 'True'. . Apache Pig is a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is amenable to substantial parallelization, which in turns enables them to handle very large data sets.