Bigdata Dev Apache Hadoop Hdfs Checkpoint

By Juju Big Data Development
Big Data

Architecture:

Channel	Revision	Published	Runs on
latest/stable	15	18 Mar 2021	Ubuntu 14.04
latest/edge	15	18 Mar 2021	Ubuntu 14.04

Learn to deploy on juju >

Platform:

14.04

Learn about configurations >

dfs_block_size | int

Default: 134217728

The default block size for new files (default to 64MB). Increase this in larger deployments for better large data set performance.
dfs_datanode_max_xcievers | int

Default: 4096

The number of files that an datanode will serve at any one time. An Hadoop HDFS datanode has an upper bound on the number of files that it will serve at any one time. This defaults to 256 (which is low) in hadoop 1.x - however this charm increases that to 4096.
dfs_heartbeat_interval | int

Default: 3

Determines datanode heartbeat interval in seconds.
dfs_namenode_handler_count | int

Default: 10

The number of server threads for the namenode. Increase this in larger deployments to ensure the namenode can cope with the number of datanodes that it has to deal with.
dfs_namenode_heartbeat_recheck_interval | int

Default: 300000

Determines datanode recheck heartbeat interval in milliseconds. It is used to calculate the final tineout value for namenode. Calculation is as follows: 10.30 min = 2 * (dfs.namenode.heartbeat.recheck-interval=5*60*1000) + 10 * 1000 * (dfs.heartbeat.interval=3)
dfs_replication | int

Default: 3

Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified at file creation time.
hadoop_dir_base | string

Default: /usr/local/hadoop/data

The directory under which all other hadoop data is stored. Use this to take advantage of extra storage that might be avaliable. You can change this in a running deployment but all existing data in HDFS will be inaccessible; you can of course switch it back if you do this by mistake.
hdfs_log_dir | string

Default: /var/log/hadoop/hdfs

Directory for storing the HDFS logs.
io_file_buffer_size | int

Default: 4096

The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.
mapred_child_java_opts | string

Default: -Xmx200m

Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes.
mapred_job_tracker_handler_count | int

Default: 10

The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.
mapreduce_framework_name | string

Default: yarn

Execution framework set to Hadoop YARN.** DO NOT CHANGE **
mapreduce_reduce_shuffle_parallelcopies | int

Default: 5

The default number of parallel transfers run by reduce during the copy(shuffle) phase.
mapreduce_task_io_sort_factor | int

Default: 10

More streams merged at once while sorting files. This determines the number of open file handles.
mapreduce_task_io_sort_mb | int

Default: 100

Higher memory-limit while sorting data for efficiency.
resources_mirror | string

URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.
tasktracker_http_threads | int

Default: 40

The number of worker threads that for the http server. This is used for map output fetching.
yarn_local_dir | string

Default: /grid/hadoop/yarn/local

Space separated list of directories where YARN will store temporary data.
yarn_local_log_dir | string

Default: /grid/hadoop/yarn/local

Space separated list of directories where YARN will store container log data.
yarn_log_dir | string

Default: /var/log/hadoop/yarn

Directory for storing the YARN logs.
yarn_nodemanager_aux-services | string

Default: mapreduce_shuffle

Shuffle service that needs to be set for Map Reduce applications.
yarn_nodemanager_aux-services_mapreduce_shuffle_class | string

Default: org.apache.hadoop.mapred.ShuffleHandler

Shuffle service that needs to be set for Map Reduce applications.