Hadoop monitoring

Deprecation notice

This extension documentation is now deprecated and will no longer be updated. We recommend using the new Hadoop extension for improved functionality and support.

Hadoop monitoring in Dynatrace provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.

Prerequisites

  • Dynatrace OneAgent version 1.103+
  • For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
  • Linux OS
  • Hadoop version 2.4.1+

Enabling Hadoop monitoring globally

  1. Go to Settings.
  2. Select Monitoring > Monitored technologies.
  3. On the Supported technologies tab, find the Hadoop entry.
  4. Turn on the Hadoop switch.

With Hadoop monitoring enabled globally, Dynatrace automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.

Analyzing your Hadoop components

  1. Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
  2. Select the Hadoop tile on the Technology overview page.
  3. Select an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.

Enhanced insights for HDFS

Viewing NameNode metrics

  1. In the Process group table, select a NameNode process group.
  2. Select Process group details.
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
  4. Further down the page, you’ll find a number of cluster-specific charts.

Viewing DataNode metrics

  1. In the Process group table, select a DataNode process group.
  2. Select Process group details.
  3. On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
  4. Select the Hadoop HDFS metrics tab.

Enhanced insights for MapReduce

Viewing ResourceManager metrics

  1. Expand the Details section of the ResourceManager process group.
  2. Select Process group details.
  3. On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
  4. Further down the page, you’ll find a number of ResourceManager-specific charts.

Viewing MRAppMaster metrics

  1. Expand the Details section of an MRAppMaster process group.
  2. Select Process group details.
  3. On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
  4. Select the Hadoop MapReduce tab.

To view NodeManager metrics

  1. Expand the Details section of the NodeManager manager process group.
  2. Select Process group details.
  3. On the Process group details page, select the Technology-specific metrics tab and select a NodeManager process.
  4. Select the Hadoop MapReduce.

NameNode metrics

Metric
Description

Total

Raw capacity of DataNodes in bytes.

Used

Used capacity across all DataNodes in bytes.

Remaining

Remaining capacity in bytes.

Total load

The number of connections.

Total

The number of allocated blocks in the system.

Pending deletion

The number of blocks pending deletion.

Files total

Total number of files.

Pending replication

The number of blocks pending to be replicated.

Under replicated

The number of under-replicated blocks.

Scheduled replication

The number of blocks scheduled for replication.

Live

The number of live DataNodes.

Dead

The number of dead DataNodes.

Decommission Live

The number of decommissioning live DataNodes.

Decommission Dead

The number of decommissioning dead DataNodes.

Usage – Volume failures total

Total volume failures.

Estimated capacity lost total

Estimated capacity lost in bytes.

Decommission Decommissioning

The number of decommissioning data DataNodes.

Stale

The number of stale DataNodes.

Blocks missing and corrupt – Missing

The number of missing blocks.

Capacity

Cache capacity in bytes.

Used

Cache used in bytes.

Blocks missing and corrupt – Corrupt

The number of corrupt blocks.

Capacity in bytes – Used, non-DFS

Capacity used, non-DFS in bytes.

Appended

The number of files appended.

Created

The number of files and directories created by create or mkdir operations.

Deleted

The number of files and directories deleted by delete or rename operations.

Renamed

The number of rename operations.

DataNode metrics

Metric
Description

Live

The number of live DataNodes.

Dead

The number of dead DataNodes.

Decommission Live

The number of decommissioning live DataNodes.

Decommission Dead

The number of decommissioning dead DataNodes.

Decommission Decommissioning

The number of decommissioning data DataNodes.

Stale

The number of stale DataNodes.

Capacity

Cache capacity in bytes.

Used

Cache used in bytes.

Capacity

Disk capacity in bytes.

DfsUsed

Disk usage in bytes.

Cached

The number of blocks cached.

Failed to cache

The number of blocks that failed to cache.

Failed to uncache

The number of blocks that failed to remove from cache.

Number of failed volumes

The number of volume failures occurred.

Capacity in bytes – Remaining

The remaining disk space left in bytes.

Blocks

The number of blocks read from DataNode.

Removed

The number of blocks removed.

Replicated

The number of blocks replicated.

Verified

The number of blocks verified.

Blocks

The number of blocks written to DataNode.

Bytes

The number of bytes read from DataNode.

Bytes

The number of bytes written to DataNode.

ResourceManager metrics

Metric
Description

Active

Number of active NodeManagers.

Decommissioned

Number of decommissioned NodeManagers.

Lost

Number of lost NodeManagers – no heartbeats.

Rebooted

Number of rebooted NodeManagers.

Unhealthy

Number of unhealthy NodeManagers.

Allocated

Number of allocated containers.

Allocated

Allocated memory in bytes.

Allocated

Number of allocated CPU in virtual cores.

Completed

Number of successfully completed applications.

Failed

Number of failed applications.

Killed

Number of killed applications.

Pending

Number of pending applications.

Running

Number of running applications.

Submitted

Number of submitted applications.

Available

Amount of available memory in bytes.

Available

Number of available CPU in virtual cores.

Pending

Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler.

Pending

Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler.

Reserved

Amount of reserved memory in bytes.

Reserved

Number of reserved CPU in virtual cores.