Hadoop monitoring
This extension documentation is now deprecated and will no longer be updated. We recommend using the new Hadoop extension for improved functionality and support.
Hadoop monitoring in Dynatrace provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.
Prerequisites
- Dynatrace OneAgent version 1.103+
- For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
- Linux OS
- Hadoop version 2.4.1+
Enabling Hadoop monitoring globally
- Go to Settings.
- Select Monitoring > Monitored technologies.
- On the Supported technologies tab, find the Hadoop entry.
- Turn on the Hadoop switch.
With Hadoop monitoring enabled globally, Dynatrace automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.
Analyzing your Hadoop components
- Go to Technologies & Processes or Technologies & Processes Classic (latest Dynatrace).
- Select the Hadoop tile on the Technology overview page.
- Select an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.
Enhanced insights for HDFS
Viewing NameNode metrics
- In the Process group table, select a NameNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
- Further down the page, you’ll find a number of cluster-specific charts.
Viewing DataNode metrics
- In the Process group table, select a DataNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
- Select the Hadoop HDFS metrics tab.
Enhanced insights for MapReduce
Viewing ResourceManager metrics
- Expand the Details section of the ResourceManager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
- Further down the page, you’ll find a number of ResourceManager-specific charts.
Viewing MRAppMaster metrics
- Expand the Details section of an MRAppMaster process group.
- Select Process group details.
- On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
- Select the Hadoop MapReduce tab.
To view NodeManager metrics
- Expand the Details section of the NodeManager manager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select a NodeManager process.
- Select the Hadoop MapReduce.
NameNode metrics
Total
Raw capacity of DataNodes in bytes.
Used
Used capacity across all DataNodes in bytes.
Remaining
Remaining capacity in bytes.
Total load
The number of connections.
Total
The number of allocated blocks in the system.
Pending deletion
The number of blocks pending deletion.
Files total
Total number of files.
Pending replication
The number of blocks pending to be replicated.
Under replicated
The number of under-replicated blocks.
Scheduled replication
The number of blocks scheduled for replication.
Live
The number of live DataNodes.
Dead
The number of dead DataNodes.
Decommission Live
The number of decommissioning live DataNodes.
Decommission Dead
The number of decommissioning dead DataNodes.
Usage – Volume failures total
Total volume failures.
Estimated capacity lost total
Estimated capacity lost in bytes.
Decommission Decommissioning
The number of decommissioning data DataNodes.
Stale
The number of stale DataNodes.
Blocks missing and corrupt – Missing
The number of missing blocks.
Capacity
Cache capacity in bytes.
Used
Cache used in bytes.
Blocks missing and corrupt – Corrupt
The number of corrupt blocks.
Capacity in bytes – Used, non-DFS
Capacity used, non-DFS in bytes.
Appended
The number of files appended.
Created
The number of files and directories created by create or mkdir operations.
Deleted
The number of files and directories deleted by delete or rename operations.
Renamed
The number of rename operations.
DataNode metrics
Live
The number of live DataNodes.
Dead
The number of dead DataNodes.
Decommission Live
The number of decommissioning live DataNodes.
Decommission Dead
The number of decommissioning dead DataNodes.
Decommission Decommissioning
The number of decommissioning data DataNodes.
Stale
The number of stale DataNodes.
Capacity
Cache capacity in bytes.
Used
Cache used in bytes.
Capacity
Disk capacity in bytes.
DfsUsed
Disk usage in bytes.
Cached
The number of blocks cached.
Failed to cache
The number of blocks that failed to cache.
Failed to uncache
The number of blocks that failed to remove from cache.
Number of failed volumes
The number of volume failures occurred.
Capacity in bytes – Remaining
The remaining disk space left in bytes.
Blocks
The number of blocks read from DataNode.
Removed
The number of blocks removed.
Replicated
The number of blocks replicated.
Verified
The number of blocks verified.
Blocks
The number of blocks written to DataNode.
Bytes
The number of bytes read from DataNode.
Bytes
The number of bytes written to DataNode.
ResourceManager metrics
Active
Number of active NodeManagers.
Decommissioned
Number of decommissioned NodeManagers.
Lost
Number of lost NodeManagers – no heartbeats.
Rebooted
Number of rebooted NodeManagers.
Unhealthy
Number of unhealthy NodeManagers.
Allocated
Number of allocated containers.
Allocated
Allocated memory in bytes.
Allocated
Number of allocated CPU in virtual cores.
Completed
Number of successfully completed applications.
Failed
Number of failed applications.
Killed
Number of killed applications.
Pending
Number of pending applications.
Running
Number of running applications.
Submitted
Number of submitted applications.
Available
Amount of available memory in bytes.
Available
Number of available CPU in virtual cores.
Pending
Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler.
Pending
Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler.
Reserved
Amount of reserved memory in bytes.
Reserved
Number of reserved CPU in virtual cores.