Bigdata Dev Apache Hadoop Hdfs Checkpoint

Channel Revision Published Runs on
latest/stable 15 18 Mar 2021
Ubuntu 14.04
latest/edge 15 18 Mar 2021
Ubuntu 14.04
juju deploy bigdata-dev-apache-hadoop-hdfs-checkpoint
Show information

Platform:

Ubuntu
14.04

Learn about configurations >

  • dfs_block_size | int

    Default: 134217728

    The default block size for new files (default to 64MB). Increase this in larger deployments for better large data set performance.

  • dfs_datanode_max_xcievers | int

    Default: 4096

    The number of files that an datanode will serve at any one time. An Hadoop HDFS datanode has an upper bound on the number of files that it will serve at any one time. This defaults to 256 (which is low) in hadoop 1.x - however this charm increases that to 4096.

  • dfs_heartbeat_interval | int

    Default: 3

    Determines datanode heartbeat interval in seconds.

  • dfs_namenode_handler_count | int

    Default: 10

    The number of server threads for the namenode. Increase this in larger deployments to ensure the namenode can cope with the number of datanodes that it has to deal with.

  • dfs_namenode_heartbeat_recheck_interval | int

    Default: 300000

    Determines datanode recheck heartbeat interval in milliseconds. It is used to calculate the final tineout value for namenode. Calculation is as follows: 10.30 min = 2 * (dfs.namenode.heartbeat.recheck-interval=5*60*1000) + 10 * 1000 * (dfs.heartbeat.interval=3)

  • dfs_replication | int

    Default: 3

    Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified at file creation time.

  • hadoop_dir_base | string

    Default: /usr/local/hadoop/data

    The directory under which all other hadoop data is stored. Use this to take advantage of extra storage that might be avaliable. You can change this in a running deployment but all existing data in HDFS will be inaccessible; you can of course switch it back if you do this by mistake.

  • hdfs_log_dir | string

    Default: /var/log/hadoop/hdfs

    Directory for storing the HDFS logs.

  • io_file_buffer_size | int

    Default: 4096

    The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.

  • mapred_child_java_opts | string

    Default: -Xmx200m

    Java opts for the task tracker child processes. The following symbol, if present, will be interpolated: @taskid@ is replaced by current TaskID. Any other occurrences of '@' will go unchanged. For example, to enable verbose gc logging to a file named for the taskid in /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of: -Xmx1024m -verbose:gc -Xloggc:/tmp/@taskid@.gc The configuration variable mapred.child.ulimit can be used to control the maximum virtual memory of the child processes.

  • mapred_job_tracker_handler_count | int

    Default: 10

    The number of server threads for the JobTracker. This should be roughly 4% of the number of tasktracker nodes.

  • mapreduce_framework_name | string

    Default: yarn

    Execution framework set to Hadoop YARN.** DO NOT CHANGE **

  • mapreduce_reduce_shuffle_parallelcopies | int

    Default: 5

    The default number of parallel transfers run by reduce during the copy(shuffle) phase.

  • mapreduce_task_io_sort_factor | int

    Default: 10

    More streams merged at once while sorting files. This determines the number of open file handles.

  • mapreduce_task_io_sort_mb | int

    Default: 100

    Higher memory-limit while sorting data for efficiency.

  • resources_mirror | string

    URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.

  • tasktracker_http_threads | int

    Default: 40

    The number of worker threads that for the http server. This is used for map output fetching.

  • yarn_local_dir | string

    Default: /grid/hadoop/yarn/local

    Space separated list of directories where YARN will store temporary data.

  • yarn_local_log_dir | string

    Default: /grid/hadoop/yarn/local

    Space separated list of directories where YARN will store container log data.

  • yarn_log_dir | string

    Default: /var/log/hadoop/yarn

    Directory for storing the YARN logs.

  • yarn_nodemanager_aux-services | string

    Default: mapreduce_shuffle

    Shuffle service that needs to be set for Map Reduce applications.

  • yarn_nodemanager_aux-services_mapreduce_shuffle_class | string

    Default: org.apache.hadoop.mapred.ShuffleHandler

    Shuffle service that needs to be set for Map Reduce applications.