Enhanced insights for Hadoop HDFS and Yarn services
Get started
Overview
Hadoop monitoring in Dynatrace provides a high-level overview of the main Hadoop components within your cluster. These enhanced insights provides additional metrics directly from the HDFS and YARN services.
Activate this extension in your Dynatrace environment from the in-product Hub and simply select which OneAgents to enable this on.
Use cases
The extension enables insights into the overall health of Hadoop HDFS and YARN services.
Compatibility information
Hadoop version 2.4.1+
Linux OS
For full Hadoop visibility, OneAgent must be installed on all machines running the following Hadoop processes: NameNode, ResourceManager, NodeManager, DataNode, and MRAppMaster
Details
This extension provides extra Hadoop metrics through the use of JMX queries.
HDFS - Hadoop Distributed File System
Improved visibility into the health of your HDFS NameNodes and DataNodes
YARN - Yet Another Resource Negotiator
Improved visibility into the health of your YARN NodeManagers, ResourceManagers and MRAppMaster services
Feature sets
When activating your extension using monitoring configuration, you can limit monitoring to one of the feature sets. To work properly the extension has to collect at least one metric after the activation.
In highly segmented networks, feature sets can reflect the segments of your environment. Then, when you create a monitoring configuration, you can select a feature set and a corresponding ActiveGate group that can connect to this particular segment.
All metrics that aren't categorized into any feature set are considered to be the default and are always reported.
A metric inherits the feature set of a subgroup, which in turn inherits the feature set of a group. Also, the feature set defined on the metric level overrides the feature set defined on the subgroup level, which in turn overrides the feature set defined on the group level.
YARN-ResourceManager-ClusterMetrics
Metric name
Metric key
Description
NumActiveNMs
hadoop.yarn.resourcemanager.NumActiveNMs
Current number of active NodeManagers
NumDecommissioningNMs
hadoop.yarn.resourcemanager.NumDecommissionedNMs
Current number of NodeManagers being decommissioned
NumLostNMs
hadoop.yarn.resourcemanager.NumLostNMs
Current number of lost NodeManagers for not sending heartbeats
NumRebootedNMs
hadoop.yarn.resourcemanager.NumRebootedNMs
Current number of rebooted NodeManagers
NumUnhealthyNMs
hadoop.yarn.resourcemanager.NumUnhealthyNMs
Current number of unhealthy NodeManagers
HDFS-NameNode-NameNodeInfo
Metric name
Metric key
Description
NumberOfMissingBlocks
hadoop.hdfs.namenode.NumberOfMissingBlocks
—
CacheCapacity
hadoop.hdfs.namenode.CacheCapacity
The total cache capacity of all DataNodes
CacheUsed
hadoop.hdfs.namenode.CacheUsed
The total cache used by all DataNodes
YARN-ResourceManager-QueueMetrics
Metric name
Metric key
Description
AllocatedContainers
hadoop.yarn.resourcemanager.AllocatedContainers
Queue Current number of allocated containers
AllocatedMB
hadoop.yarn.resourcemanager.AllocatedMB
Current allocated memory in MB
AllocatedVCores
hadoop.yarn.resourcemanager.AllocatedVCores
Current allocated CPU in virtual cores
AppsCompleted
hadoop.yarn.resourcemanager.AppsCompleted.count
Total number of completed applications
AppsFailed
hadoop.yarn.resourcemanager.AppsFailed.count
Total number of failed applications
AppsKilled
hadoop.yarn.resourcemanager.AppsKilled.count
Total number of killed applications
AppsPending
hadoop.yarn.resourcemanager.AppsPending.count
Current number of applications that have not yet been assigned by any containers
AppsRunning
hadoop.yarn.resourcemanager.AppsRunning.count
Current number of running applications
AppsSubmitted
hadoop.yarn.resourcemanager.AppsSubmitted.count
Total number of submitted applications
AvailableMB
hadoop.yarn.resourcemanager.AvailableMB
Current available memory in MB
AvailableVCores
hadoop.yarn.resourcemanager.AvailableVCores
Current available CPU in virtual cores
PendingMB
hadoop.yarn.resourcemanager.PendingMB
Current memory requests in MB that are pending to be fulfilled by the scheduler
PendingVCores
hadoop.yarn.resourcemanager.PendingVCores
Current CPU requests in virtual cores that are pending to be fulfilled by the scheduler
ReservedMB
hadoop.yarn.resourcemanager.ReservedMB
Current reserved memory in MB
ReservedVCores
hadoop.yarn.resourcemanager.ReservedVCores
Current reserved CPU in virtual cores
HDFS-NameNode-NameNodeActivity
Metric name
Metric key
Description
FilesAppended
hadoop.hdfs.namenode.FilesAppended.count
Total number of files appended
FilesCreated
hadoop.hdfs.namenode.FilesCreated.count
Total number of files and directories created by create or mkdir operations
FilesDeleted
hadoop.hdfs.namenode.FilesDeleted.count
Total number of files and directories deleted by delete or rename operations
FilesRenamed
hadoop.hdfs.namenode.FilesRenamed.count
Total number of rename operations (NOT number of files/dirs renamed)
HDFS-DataNode-FSDatasetState
Metric name
Metric key
Description
DataNode CacheCapacity
hadoop.hdfs.datanode.CacheCapacity
The cache capacity of the DataNode
DataNode CacheUsed
hadoop.hdfs.datanode.CacheUsed
The cache used the DataNode
DataNode Capacity
hadoop.hdfs.datanode.Capacity
Current raw capacity of the DataNodes in bytes
DataNode DfsUsed
hadoop.hdfs.datanode.DfsUsed
The storage space that has been used up by HDFS.
DataNode NumBlocksCached
hadoop.hdfs.datanode.NumBlocksCached
The number of blocks cached on the DataNode
DataNode NumBlocksFailedToCache
hadoop.hdfs.datanode.NumBlocksFailedToCache
The number of blocks that failed to cache on the DataNode
DataNode NumBlocksFailedToUncache
hadoop.hdfs.datanode.NumBlocksFailedToUncache
The number of failed blocks to remove from cache.
DataNode NumFailedVolumes
hadoop.hdfs.datanode.NumFailedVolumes
Number of failed volumes.
DataNode Remaining
hadoop.hdfs.datanode.Remaining
The remaining DataNode disk space left in Percent
YARN-MRAppMaster-MRAppMetrics
Metric name
Metric key
Description
JobsCompleted
hadoop.yarn.mrappmaster.JobsCompleted
Number of completed jobs
JobsFailed
hadoop.yarn.mrappmaster.JobsFailed
Number of failed jobs
JobsKilled
hadoop.yarn.mrappmaster.JobsKilled
Number of killed jobs
JobsPreparing
hadoop.yarn.mrappmaster.JobsPreparing
Number of preparing jobs
JobsRunning
hadoop.yarn.mrappmaster.JobsRunning
Number of running jobs
MapsCompleted
hadoop.yarn.mrappmaster.MapsCompleted
Number of maps completed
MapsFailed
hadoop.yarn.mrappmaster.MapsFailed
Number of maps failed
MapsKilled
hadoop.yarn.mrappmaster.MapsKilled
Number of maps killed
MapsRunning
hadoop.yarn.mrappmaster.MapsRunning
Number of maps running
MapsWaiting
hadoop.yarn.mrappmaster.MapsWaiting
Number of maps waiting
ReducesCompleted
hadoop.yarn.mrappmaster.ReducesCompleted
Number of completed reduces
ReducesFailed
hadoop.yarn.mrappmaster.ReducesFailed
Number of failed reduces
ReducesKilled
hadoop.yarn.mrappmaster.ReducesKilled
Number of killed reduces
ReducesRunning
hadoop.yarn.mrappmaster.ReducesRunning
Number of running reduces
ReducesWaiting
hadoop.yarn.mrappmaster.ReducesWaiting
Number of waiting reduces
HDFS-DataNode-*
Metric name
Metric key
Description
DataNode BlocksRead
hadoop.hdfs.datanode.BlocksRead.count
Total number of blocks read from DataNode
DataNode BlocksRemoved
hadoop.hdfs.datanode.BlocksRemoved.count
Total number of blocks removed from DataNode
DataNode BlocksReplicated
hadoop.hdfs.datanode.BlocksReplicated.count
Total number of blocks replicated
DataNode BlocksVerified
hadoop.hdfs.datanode.BlocksVerified.count
Total number of blocks verified
DataNode BlocksWritten
hadoop.hdfs.datanode.BlocksWritten.count
Total number of blocks written to DataNode
DataNode BytesRead
hadoop.hdfs.datanode.BytesRead.count
Total number of bytes read from DataNode
DataNode BytesWritten
hadoop.hdfs.datanode.BytesWritten.count
Total number of bytes written to DataNode
HDFS-NameNode-FSNamesystemState
Metric name
Metric key
Description
FilesTotal
hadoop.hdfs.namenode.FilesTotal
Current number of files and directories
PendingReplicationBlocks
hadoop.hdfs.namenode.PendingReplicationBlocks
Current number of blocks pending to be replicated
UnderReplicatedBlocks
hadoop.hdfs.namenode.UnderReplicatedBlocks
Current number of blocks under replicated
ScheduledReplicationBlocks
hadoop.hdfs.namenode.ScheduledReplicationBlocks
Current number of blocks scheduled for replications
NumLiveDataNodes
hadoop.hdfs.namenode.NumLiveDataNodes
Number of datanodes which are currently live
NumDeadDataNodes
hadoop.hdfs.namenode.NumDeadDataNodes
Number of datanodes which are currently dead
NumDecomLiveDataNodes
hadoop.hdfs.namenode.NumDecomLiveDataNodes
Number of datanodes which have been decommissioned and are now live
NumDecomDeadDataNodes
hadoop.hdfs.namenode.NumDecomDeadDataNodes
Number of datanodes which have been decommissioned and are now dead
VolumeFailuresTotal
hadoop.hdfs.namenode.VolumeFailuresTotal
Total number of volume failures across all Datanodes
EstimatedCapacityLostTotal
hadoop.hdfs.namenode.EstimatedCapacityLostTotal
An estimate of the total capacity lost due to volume failures
NumDecommissioningDataNodes
hadoop.hdfs.namenode.NumDecommissioningDataNodes
Number of datanodes in decommissioning state
NumStaleDataNodes
hadoop.hdfs.namenode.NumStaleDataNodes
Number of datanodes marked stale due to delayed hearbeat.
HDFS-NameNode-FSNamesystem
Metric name
Metric key
Description
CapacityTotal
hadoop.hdfs.namenode.CapacityTotal
Current raw capacity of DataNodes in bytes
CapacityUsed
hadoop.hdfs.namenode.CapacityUsed
Current used capacity across all DataNodes in bytes
CapacityRemaining
hadoop.hdfs.namenode.CapacityRemaining
Current remaining capacity in bytes
TotalLoad
hadoop.hdfs.namenode.TotalLoad
Current number of connections
BlocksTotal
hadoop.hdfs.namenode.BlocksTotal
Current number of allocated blocks in the system
PendingDeletionBlocks
hadoop.hdfs.namenode.PendingDeletionBlocks
Current number of blocks pending deletion
CorruptBlocks
hadoop.hdfs.namenode.CorruptBlocks
Current number of blocks with corrupt replicas.
CapacityUsedNonDFS
hadoop.hdfs.namenode.CapacityUsedNonDFS
Current space used by DataNodes for non DFS purposes in bytes