apache hadoop hdfs checkpoint #15

  • By bigdata-dev
  • Latest version (#15)
  • trusty
  • Stable
  • Edge

Description

This charm has been superceeded by apache-hadoop-hdfs-secondary. No future updates will be made to this charm. Please update any references to this charm with apache-hadoop-hdfs-secondary.


Overview

This charm has been superceeded by apache-hadoop-hdfs-secondary. No future
updates will be made to this charm. Please update any references to this
charm with apache-hadoop-hdfs-secondary.

Configuration

dfs_datanode_max_xcievers
(int) The number of files that an datanode will serve at any one time. An Hadoop HDFS datanode has an upper bound on the number of files that it will serve at any one time. This defaults to 256 (which is low) in hadoop 1.x - however this charm increases that to 4096.
4096
resources_mirror
(string) URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.
hdfs_log_dir
(string) Directory for storing the HDFS logs.
/var/log/hadoop/hdfs
mapred_job_tracker_handler_count
(int) The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.
10
mapreduce_framework_name
(string) Execution framework set to Hadoop YARN.** DO NOT CHANGE **
yarn
yarn_local_dir
(string) Space separated list of directories where YARN will store temporary data.
/grid/hadoop/yarn/local
tasktracker_http_threads
(int) The number of worker threads that for the http server. This is used for map output fetching.
40
mapreduce_reduce_shuffle_parallelcopies
(int) The default number of parallel transfers run by reduce during the copy(shuffle) phase.
5
hadoop_dir_base
(string) The directory under which all other hadoop data is stored. Use this to take advantage of extra storage that might be avaliable. You can change this in a running deployment but all existing data in HDFS will be inaccessible; you can of course switch it back if you do this by mistake.
/usr/local/hadoop/data
mapred_child_java_opts
(string) Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes.
-Xmx200m
dfs_replication
(int) Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified at file creation time.
3
yarn_local_log_dir
(string) Space separated list of directories where YARN will store container log data.
/grid/hadoop/yarn/local
yarn_nodemanager_aux-services_mapreduce_shuffle_class
(string) Shuffle service that needs to be set for Map Reduce applications.
org.apache.hadoop.mapred.ShuffleHandler
mapreduce_task_io_sort_factor
(int) More streams merged at once while sorting files. This determines the number of open file handles.
10
dfs_block_size
(int) The default block size for new files (default to 64MB). Increase this in larger deployments for better large data set performance.
134217728
yarn_nodemanager_aux-services
(string) Shuffle service that needs to be set for Map Reduce applications.
mapreduce_shuffle
mapreduce_task_io_sort_mb
(int) Higher memory-limit while sorting data for efficiency.
100
dfs_namenode_handler_count
(int) The number of server threads for the namenode. Increase this in larger deployments to ensure the namenode can cope with the number of datanodes that it has to deal with.
10
io_file_buffer_size
(int) The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.
4096
yarn_log_dir
(string) Directory for storing the YARN logs.
/var/log/hadoop/yarn
dfs_namenode_heartbeat_recheck_interval
(int) Determines datanode recheck heartbeat interval in milliseconds. It is used to calculate the final tineout value for namenode. Calculation is as follows: 10.30 min = 2 * (dfs.namenode.heartbeat.recheck-interval=5*60*1000) + 10 * 1000 * (dfs.heartbeat.interval=3)
300000
dfs_heartbeat_interval
(int) Determines datanode heartbeat interval in seconds.
3