Description
HBase is an open source, non-relational, distributed database modeled after
Google's BigTable
Learn more at http://hbase.apache.org.
Overview
HBase is an open source, non-relational, distributed database modeled after
Google's BigTable and written in Java. It is developed as part of Apache
Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop
Distributed Filesystem), providing BigTable-like capabilities for Hadoop.
Features
- Linear and modular scalability.
- Strictly consistent reads and writes.
- Automatic and configurable sharding of tables
- Automatic failover support between RegionServers.
- Convenient base classes for backing Hadoop MapReduce jobs with Apache HBase
tables. - Easy to use Java API for client access.
- Block cache and Bloom Filters for real-time queries.
- Query predicate push down via server side Filters
- Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and
binary data encoding options - Extensible jruby-based (JIRB) shell
- Support for exporting metrics via the Hadoop metrics subsystem to files or
Ganglia; or via JMX
When Would I Use Apache HBase?
Use Apache HBaseâ„¢ when you need random, realtime read/write access to your
Big Data. This project's goal is the hosting of very large tables -- billions
of rows X millions of columns -- atop clusters of commodity hardware.
Apache HBase is an open-source, distributed, versioned, non-relational
database modeled after Google's Bigtable: A Distributed Storage System
for Structured Data by Chang et al. Just as Bigtable leverages the distributed
data storage provided by the Google File System, Apache HBase provides
Bigtable-like capabilities on top of Hadoop and HDFS.
How Our Apache HBase Solution Works?
Apache HBase scales linearly by requiring all tables to have a primary key.
The key space is divided into sequential blocks that are then allotted to a
region. RegionServers own one or more regions, so the load is spread uniformly
across the cluster. If the keys within a region are frequently accessed, Apache
HBase can further subdivide the region by splitting it automatically, so that
manual data sharding is not necessary.
Apache ZooKeeper and HMaster servers make information about the cluster
topology available to clients. Clients connect to these nodes using Juju
relations and download a list of RegionServers, the regions contained within
those RegionServers and the key ranges hosted by the regions. Clients know
exactly where any piece of data is in HDFS and can contact the RegionServer
directly without any need for a central coordinator.
Usage
This charm leverages our pluggable Hadoop model with the hadoop-plugin
interface and Apache Zookeeper charm. This means that you will need to
deploy a base Apache Hadoop cluster and Apache Zookeeper quorum to run HBase.
The suggested deployment method is to use the
apache-hadoop-hbase
bundle. This will deploy the Apache Hadoop/Zookeeper platform with a single
Apache HBase HMaster and two scalable Apache HBase RegionServer units that
communicates with the cluster by relating to the
apache-hadoop-plugin subordinate charm:
juju-quickstart u/bigdata-dev/apache-hadoop-hbase
Alternatively, you may manually deploy the recommended environment as follows:
juju deploy apache-hadoop-hdfs-master hdfs-master
juju deploy apache-hadoop-yarn-master yarn-master
juju deploy apache-hadoop-compute-slave compute-slave
juju deploy apache-hadoop-plugin plugin
juju deploy -n 3 apache-zookeeper zookeeper
juju deploy apache-hbase hbase-master
juju deploy apache-hbase hbase-regionserver
juju add-relation yarn-master hdfs-master
juju add-relation compute-slave yarn-master
juju add-relation compute-slave hdfs-master
juju add-relation plugin yarn-master
juju add-relation plugin hdfs-master
juju add-relation hbase-master plugin
juju add-relation hbase-regionserver plugin
juju add-relation hbase-master:master hbase-regionserver:regionserver
juju add-relation zookeeper hbase-master
juju add-relation zookeeper hbase-regionserver
Once deployment is complete, you can manually load and run HBase shell or
access the web interface at http://{hbase_master_ip}:60010
- Apache HBase shell
The Apache HBase Shell is (J)Ruby's IRB with some HBase particular commands
added. Anything you can do in IRB, you should be able to do in the HBase Shell.
Type help and then
run the HBase shell, do as follows:
juju ssh hbase-master/0
./bin/hbase shell
Configuration
Testing the deployment
Smoke test HBase
SSH to the HBase unit and run the smoke test as follows:
juju ssh hbase-master/0
~/hbase_test.sh test_table
Verify Job History
Verify the Job History server shows the previous test results by visiting
http://{hbase_master_ip}:60010
Contact Information
Help
Configuration
- resources_mirror
- (string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.