Bigdata Dev Apache Hadoop Hdfs Checkpoint
- By Juju Big Data Development
- Big Data
Channel | Revision | Published | Runs on |
---|---|---|---|
latest/stable | 15 | 18 Mar 2021 | |
latest/edge | 15 | 18 Mar 2021 |
juju deploy bigdata-dev-apache-hadoop-hdfs-checkpoint
Deploy universal operators easily with Juju, the Universal Operator Lifecycle Manager.
Platform:
-
dfs_block_size | int
Default: 134217728
The default block size for new files (default to 64MB). Increase this in larger deployments for better large data set performance.
-
dfs_datanode_max_xcievers | int
Default: 4096
The number of files that an datanode will serve at any one time. An Hadoop HDFS datanode has an upper bound on the number of files that it will serve at any one time. This defaults to 256 (which is low) in hadoop 1.x - however this charm increases that to 4096.
-
dfs_heartbeat_interval | int
Default: 3
Determines datanode heartbeat interval in seconds.
-
dfs_namenode_handler_count | int
Default: 10
The number of server threads for the namenode. Increase this in larger deployments to ensure the namenode can cope with the number of datanodes that it has to deal with.
-
dfs_namenode_heartbeat_recheck_interval | int
Default: 300000
Determines datanode recheck heartbeat interval in milliseconds. It is used to calculate the final tineout value for namenode. Calculation is as follows: 10.30 min = 2 * (dfs.namenode.heartbeat.recheck-interval=5*60*1000) + 10 * 1000 * (dfs.heartbeat.interval=3)
-
dfs_replication | int
Default: 3
Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified at file creation time.
-
hadoop_dir_base | string
Default: /usr/local/hadoop/data
The directory under which all other hadoop data is stored. Use this to take advantage of extra storage that might be avaliable. You can change this in a running deployment but all existing data in HDFS will be inaccessible; you can of course switch it back if you do this by mistake.
-
hdfs_log_dir | string
Default: /var/log/hadoop/hdfs
Directory for storing the HDFS logs.
-
io_file_buffer_size | int
Default: 4096
The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.
-
mapred_child_java_opts | string
Default: -Xmx200m
Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes.
-
mapred_job_tracker_handler_count | int
Default: 10
The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.
-
mapreduce_framework_name | string
Default: yarn
Execution framework set to Hadoop YARN.** DO NOT CHANGE **
-
mapreduce_reduce_shuffle_parallelcopies | int
Default: 5
The default number of parallel transfers run by reduce during the copy(shuffle) phase.
-
mapreduce_task_io_sort_factor | int
Default: 10
More streams merged at once while sorting files. This determines the number of open file handles.
-
mapreduce_task_io_sort_mb | int
Default: 100
Higher memory-limit while sorting data for efficiency.
-
resources_mirror | string
URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.
-
tasktracker_http_threads | int
Default: 40
The number of worker threads that for the http server. This is used for map output fetching.
-
yarn_local_dir | string
Default: /grid/hadoop/yarn/local
Space separated list of directories where YARN will store temporary data.
-
yarn_local_log_dir | string
Default: /grid/hadoop/yarn/local
Space separated list of directories where YARN will store container log data.
-
yarn_log_dir | string
Default: /var/log/hadoop/yarn
Directory for storing the YARN logs.
-
yarn_nodemanager_aux-services | string
Default: mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
-
yarn_nodemanager_aux-services_mapreduce_shuffle_class | string
Default: org.apache.hadoop.mapred.ShuffleHandler
Shuffle service that needs to be set for Map Reduce applications.