Hadoop monitoring
This extension documentation is now deprecated and will no longer be updated. We recommend using the new Hadoop extension for improved functionality and support.
Hadoop monitoring in Dynatrace provides a high-level overview of the main Hadoop components within your cluster. Enhanced insights are available for HDFS and MapReduce. Hadoop-specific metrics are presented alongside all infrastructure measurements, providing you with in-depth Hadoop performance analysis of both current and historical data.
Prerequisites
Dynatrace OneAgent version 1.103+
For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
Linux OS
Hadoop version 2.4.1+
Enabling Hadoop monitoring globally
- In the Dynatrace menu, go to Settings.
- Select Monitoring > Monitored technologies.
- On the Supported technologies tab, find the Hadoop entry.
- Turn on the Hadoop switch.
With Hadoop monitoring enabled globally, Dynatrace automatically collects Hadoop metrics whenever a new host running Hadoop is detected in your environment.
Analyzing your Hadoop components
- In the Dynatrace menu, go to Technologies.
- Select the Hadoop tile on the Technology overview page.
- Select an individual Hadoop component in the Process group table to view metrics and a timeline chart specific to that component.
Enhanced insights for HDFS
Viewing NameNode metrics
- In the Process group table, select a NameNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop NameNode pages provide details about your HDFS capacity, usage, blocks, cache, files, and data-node health.
Further down the page, you’ll find a number of cluster-specific charts.
Viewing DataNode metrics
- In the Process group table, select a DataNode process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select the DataNode.
- Select the Hadoop HDFS metrics tab.
Enhanced insights for MapReduce
Viewing ResourceManager metrics
- Expand the Details section of the ResourceManager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab to view relevant cluster charts and metrics. Hadoop ResourceManager metrics pages provide information about your nodes, applications, memory, cores, and containers.
Further down the page, you’ll find a number of ResourceManager-specific charts.
Viewing MRAppMaster metrics
- Expand the Details section of an MRAppMaster process group.
- Select Process group details.
- On the Process group details page, click the Technology-specific metrics tab and select the MRAppMaster process.
- Select the Hadoop MapReduce tab.
To view NodeManager metrics
- Expand the Details section of the NodeManager manager process group.
- Select Process group details.
- On the Process group details page, select the Technology-specific metrics tab and select a NodeManager process.
- Select the Hadoop MapReduce.
NameNode metrics
Metric | Description |
---|---|
Total | Raw capacity of DataNodes in bytes. |
Used | Used capacity across all DataNodes in bytes. |
Remaining | Remaining capacity in bytes. |
Total load | The number of connections. |
Total | The number of allocated blocks in the system. |
Pending deletion | The number of blocks pending deletion. |
Files total | Total number of files. |
Pending replication | The number of blocks pending to be replicated. |
Under replicated | The number of under-replicated blocks. |
Scheduled replication | The number of blocks scheduled for replication. |
Live | The number of live DataNodes. |
Dead | The number of dead DataNodes. |
Decommission Live | The number of decommissioning live DataNodes. |
Decommission Dead | The number of decommissioning dead DataNodes. |
Usage – Volume failures total | Total volume failures. |
Estimated capacity lost total | Estimated capacity lost in bytes. |
Decommission Decommissioning | The number of decommissioning data DataNodes. |
Stale | The number of stale DataNodes. |
Blocks missing and corrupt – Missing | The number of missing blocks. |
Capacity | Cache capacity in bytes. |
Used | Cache used in bytes. |
Blocks missing and corrupt – Corrupt | The number of corrupt blocks. |
Capacity in bytes – Used, non-DFS | Capacity used, non-DFS in bytes. |
Appended | The number of files appended. |
Created | The number of files and directories created by create or mkdir operations. |
Deleted | The number of files and directories deleted by delete or rename operations. |
Renamed | The number of rename operations. |
DataNode metrics
Metric | Description |
---|---|
Live | The number of live DataNodes. |
Dead | The number of dead DataNodes. |
Decommission Live | The number of decommissioning live DataNodes. |
Decommission Dead | The number of decommissioning dead DataNodes. |
Decommission Decommissioning | The number of decommissioning data DataNodes. |
Stale | The number of stale DataNodes. |
Capacity | Cache capacity in bytes. |
Used | Cache used in bytes. |
Capacity | Disk capacity in bytes. |
DfsUsed | Disk usage in bytes. |
Cached | The number of blocks cached. |
Failed to cache | The number of blocks that failed to cache. |
Failed to uncache | The number of blocks that failed to remove from cache. |
Number of failed volumes | The number of volume failures occurred. |
Capacity in bytes – Remaining | The remaining disk space left in bytes. |
Blocks | The number of blocks read from DataNode. |
Removed | The number of blocks removed. |
Replicated | The number of blocks replicated. |
Verified | The number of blocks verified. |
Blocks | The number of blocks written to DataNode. |
Bytes | The number of bytes read from DataNode. |
Bytes | The number of bytes written to DataNode. |
ResourceManager metrics
Metric | Description |
---|---|
Active | Number of active NodeManagers. |
Decommissioned | Number of decommissioned NodeManagers. |
Lost | Number of lost NodeManagers – no heartbeats. |
Rebooted | Number of rebooted NodeManagers. |
Unhealthy | Number of unhealthy NodeManagers. |
Allocated | Number of allocated containers. |
Allocated | Allocated memory in bytes. |
Allocated | Number of allocated CPU in virtual cores. |
Completed | Number of successfully completed applications. |
Failed | Number of failed applications. |
Killed | Number of killed applications. |
Pending | Number of pending applications. |
Running | Number of running applications. |
Submitted | Number of submitted applications. |
Available | Amount of available memory in bytes. |
Available | Number of available CPU in virtual cores. |
Pending | Amount of pending memory resource requests in bytes that are not yet fulfilled by the scheduler. |
Pending | Pending CPU allocation requests in virtual cores that are not yet fulfilled by the scheduler. |
Reserved | Amount of reserved memory in bytes. |
Reserved | Number of reserved CPU in virtual cores. |