OneAgent extension lifecycle

Extensions 1.0 end of life

OneAgent and ActiveGate version 1.299 are the last versions supporting OneAgent and ActiveGate Extensions 1.0 framework. You can continue using Extensions 1.0 if you stay at OneAgent or ActiveGate version 1.299. Note that this means you'll be using an unsupported Python version 3.8. We strongly recommend migrating your extensions to the latest Extensions 2.0 framework.

For more information, see General guidance and how to migrate.

While OneAgent automates all extension activity (for example, deciding whether or not a particular extension should be run, restarting error-generating extensions, and gathering required configuration details) there are situations where you might need deeper understanding of internal OneAgent mechanism. For example, if a newly developed extension doesn't start, an outdated version of an extension is running, or you face other unexpected behavior.

Here are the stages of an extension life cycle:

Loading
Activation
Running
Closing

Extension loading

The extension loading process takes place when OneAgent starts. This process evaluates whether or not the extension is compatible with the current version of OneAgent and other extensions.

When the loading process is over, extensions wait until certain conditions are met before they are activated.

OneAgent uses three locations to load extensions:

plugin_deployment directory - located in the root of your OneAgent installation
extensions downloaded from Dynatrace Cluster Node by OneAgent
extensions distributed with the OneAgent installer

Each extension is checked for compatibility with already loaded extensions. Two extensions may declare that they require a certain library in a conflicting version. The extension that causes the conflict will not be loaded. The extension causing the conflict id determined by the order of loading, which is the order of locations stated above, and the lexicographical order of the extension names within each location.

For details of possible extension incompatibility and ways to cope with this, see Limitations.

The last 2 locations are used only internally. Files located here should not be modified manually. The plugin_deployment folder can however be used in any way you want.

Extension activation

In most cases, an extension starts when it detects the process it is to monitor.

Once a extension is loaded, OneAgent decides if it should activate it. Activating an extension triggers an attempt to get the configuration, which is stored on the server. OneAgent also confirms that the extension is enabled. Once available, the extension is run.

The most important factor in extension activation is the process snapshot. This data structure contains information about the important processes recognized on your system. If a match is detected between a process snapshot and information contained in plugin.json, the extension is activated. In most cases, this information takes the form of detection of a process of a given type. Consequently, if data from the process snapshot disappears, the extension will no longer be active.

The second factor that determines the extension activation is the plugin.json file. In this file, you can declare which process types are to trigger your extension activation.

3 types of activation are currently available:

Run a single extension instance when a triggering process is detected
Run as many extension instances as there are detected monitored process group instances
Always run the extension

So much for the description. Now let's take a look at possible activation types:

Activate a single extension instance for all monitored processes

In most common cases (as presented in our OneAgent extensions hands-on it's best to create one instance of an extension when a process of a specified type is detected. This requires the following in plugin.json:

{
  "entity": "PROCESS_GROUP_INSTANCE",
  "technologies": [ "PYTHON" ],
  "source": {
    "activation": "Singleton"
  },
}

With above code snippet included in a plugin.json, any Python process group instance present in the snapshot will activate the extension. So, for example, the following snapshot would be sufficient:

ProcessSnapshot(
     host_id=11711730974707348096,
     entries=[
         ProcessSnapshotEntry(
             group_id=9849894537414073908,
             node_id=0,
             group_instance_id=11914897446187082808,
             group_name='plugin_sdk.demo_app',
             processes=[
                 ProcessInfo(
                     pid=1541,
                     process_name='python3.5',
                     properties={
                         'CmdLine': '-m plugin_sdk.demo_app',
                         'WorkDir': '/home/demo',
                         'ListeningPorts': '8090'
                     })
                 ],
             properties={"Technologies": "PYTHON"}
         ),
         ProcessSnapshotEntry(
             group_id=483552688914919364,
             node_id=0,
             group_instance_id=11834758190185815364,
             group_name='puppet',
             processes=[
                 ProcessInfo(
                     pid=1257,
                     process_name='puppet',
                     properties={'CmdLine': '/usr/bin/puppet agent', 'WorkDir': '/'})
             ],
             properties={'Technologies': 'RUBY'}
         )
     containers=[]
 )

This snapshot contains 2 process groups, one of type python, which includes plugin_sdk.demo_app, and one of type ruby, which runs puppet.

Only one instance of the extension is created regardless of how many Python processes are running. This means that if the process type is a common one (like Python, Java, or Ruby), you'll need to check if the snapshot contains the process you want to monitor.

Activate an extension for each process group of a given type

In some circumstances, it's best to create one extension instance per process group instance detected on the host. With this approach, you don't need to worry about searching the process snapshot to make sure the process you want to monitor is running. On the downside, if your extension requires configuration, and multiple process group instances of the monitored process are running, you can't use Dynatrace Server to provide the configuration (because it would provide the same configuration to each extension` instance).

A good example of an extension that works fine with the "per process group instance" approach is the MSSQL extension. This extension requires no additional configuration, and its plugin.json file only declares:

{
  "entity": "PROCESS_GROUP_INSTANCE",
  "technologies": [ "MSSQL" ]
}

So, given this example process snapshot:

ProcessSnapshot(host_id=16649240629743570171, entries=[
    ProcessSnapshotEntry(
        group_id=4337044249244370985,
        node_id=0,
        group_instance_id=9687064182437432279,
        group_name='MSSQL10_50.NAMED_ID',
        processes=[
            ProcessInfo(
                pid=26988,
                process_name='sqlservr.exe',
                properties={'CmdLine': '-sNAMED_INSTANCE_01', 'WorkDir': 'C:\\Windows\\system32\\'})
            ],
            properties={'Technologies': 'MSSQL', 'mssql_instance_name': 'NAMED_INSTANCE_01'}),
    ProcessSnapshotEntry(
        group_id=12107707763631947228,
        node_id=0,
        group_instance_id=10160066155805379574,
        group_name='MSSQL10.SQLEXPRESS',
        processes=[
            ProcessInfo(
                pid=36632,
                process_name='sqlservr.exe',
                properties={'CmdLine': '-sSQLEXPRESS', 'WorkDir': 'C:\\Windows\\system32\\'})
            ],
        properties={'Technologies': 'MSSQL', 'mssql_instance_name': 'SQLEXPRESS'})
    containers=[]
)

Two instances of such an extension would be created by OneAgent, as there are 2 group instances of process technology type Python (in this case, these correspond to MSSQL instances).

In summary, if your extension monitors a process of an uncommon type, and you don't need to use Dynatrace Server for additional configuration, activating a single extention instance per process group instance is a viable approach.

Activate only if a process name matches pattern

In cases where activation per process type is too open an approach, you can specify additional criteria to determine when an extension should be run. Modify your plugin.json file to contain a section as below:

{
  "entity": "PROCESS_GROUP_INSTANCE",
  "technologies": [ "PYTHON" ],
  "source": {
    "activation_name_pattern": "^plugin_sdk.demo_app$"
  }
}

In this case the extension will be activated only if detected processes have a name that matches the specified pattern. Matching is done via Python's re.search function. The difference between these two modes of activation is that the first one creates an extension instance for each occurrence of the process that meets the matching criteria. Whereas the second activation mode creates at most 1 instance of the extension.

Keep extension active continuously

The last option for activation is to have the extension continuously active. This is achieved with the following configuration:

{
  "entity": "HOST"
}

In this case, as soon as OneAgent receives a process snapshot it activates the extension. This activation type is however not recommended for a few reasons:

Your extension needs to do all the work required to check if it has any data to gather.
Each measurement you specify in the plugin.json file needs also to declare the entity type that it's associated with. Otherwise the measurements will be associated with the host and it won't be visible on the server.
Each measurement you gather in Python code needs to have an entity ID extracted from the process snapshot.

Activation tips

If your extension isn't activated:

Take a look at the agent logs to see if your extension is activated. For more information see troubleshooting guide.
Make sure the process you want to monitor is running.
Make sure the process you want to monitor is relevant (confirm that it's listed on the corresponding Host page in the in the UI).
Use demo_oneagent_plugin_snapshot extension from Extension SDK examples to get the information about all discovered processes, entities IDs and process names which can be used for activation. Just deploy the extension to your production machine with the technology you want to monitor running and you will get the process snapshot information in extension agent log file. The extension activation information will be also displayed on UI in the Properties section of each process as Main technology. There are Python extension activation technologies which can be used as the technologies property in the plugin.json and Python extension activation_name_pattern which can be used as the activation_name_pattern in the source section of plugin.json.

Running extensions

Once active extensions have received their required configurations they're ready to do their work. In the most simple case, an extension query method is run with a 1-minute interval. Nonetheless, there are a few rules you should follow:

OneAgent tries to limit the amount of extension instances. That is, as long as an extension is working properly (no exceptions are thrown from its methods) and its configuration hasn't changed, all calls will happen on the same instance object. This is useful if you want to maintain some state in your extension. This is easily achieved by overriding the initialize()method.

When the extension configuration changes (for example, if you modified the credentials used to connect to a monitored system) the previous extension instance is discarded, and the query method is called on a new extension instance. Before this occurs you have the option of cleaning up the close() method, if you overrode it with your extension.

The extension instance will also be replaced if your extension throws exceptions.

If the exception comes from ruxit.api.exceptions, we classify it as a recoverable error. The extension will be quickly restarted a couple times and then OneAgent will attempt to run it every hour.
If the exception isn't recognized, and the extension failed to execute correctly for a limited number of trials, the extension is no longer scheduled.
Successful extension execution resets any crash count associated with it.
The current crash limit is set to 20.
While the extension query method executes on a given extension instance, it won't be scheduled again until it completes. If your extension gathers data for longer than a minute (for example 2 minutes), it will only be executed every 2 minutes. If the exception hangs indefinitely, OneAgent won't schedule another round of data gathering (though other extensions will be unaffected).

Note that creating a new instance of your extension:

Doesn't affect the ResultsBuilder associated with your extension. This remains the same until your extension is deactivated (for example, the monitored process is no longer detected).
Doesn't reload the Python modules used by your extension. This can only be achieved by restarting OneAgent.

Extension closing

An extension is closed when its life comes to an end. At the end of the road, the close() method is called, so if your extension acquired some resources, it can release them. The most common reasons for calling this method are:

the extension instance is being replaced with a new one due to an error or configuration change
the extension is being deactivated (monitored processes are no longer detected on the system)
OneAgent is closing, and it closes all the extension as well.