TensorFlow Keras Observability
TensorFlow and its user-friendly Keras sequential interface represent state-of-the-art technology for training and running deep learning models.
TensorFlow represents a general-purpose machine learning framework that allows data scientists to build, train and run all kinds of AI models on top.
TensorFlow also ships together with a convenient debugging server called TensorBoard that allows data scientists to collect and visualize all relevant training information such as logs, events, and metrics within a Web dashboard.
See below an example screenshot of TensorBoard and how it visualizes the progress of model training:
While TensorBoard is a great tool for local debugging of your AI model, it is not applicable for long-term observability of your running AI model in production.
As TensorBoard data collection is built on top of a flexible TensorFlow callback receiver interface, it is easy to directly send observability information about your running AI model to Dynatrace.
All that is necessary is a dedicated TensorFlow callback implementation that collects the data and forwards to a Dynatrace monitoring environment.
Dynatrace TensorFlow callback receiver
A TensorFlow callback receiver implementation does receive important information updates during training and evaluation phase of a model.
See below the implementation of a Dynatrace TensorFlow callback receiver that forwards metric data during training and evaluation of a model.
1import tensorflow as tf2from tensorflow import keras34import requests5# Custom TensorFlow Keras callback receiver that sends the logged metrics6# to a Dynatrace monitoring environment.7# Read more about writing your own callback receiver here:8# https://www.tensorflow.org/guide/keras/custom_callback9class DynatraceKerasCallback(keras.callbacks.Callback):10 metricprefix = ''11 modelname = ''12 url = ''13 apitoken = ''14 batch = ''1516 # Constructor that takes a metric prefix, the name of the current model that is used,17 # the Dynatrace metric ingest API endpoint (e.g.: https://your.live.dynatrace.com/api/v2/metrics/ingest)18 # and the Dynatrace API token (with metric ingest scope enabled)19 def __init__(self, metricprefix='tensorflow.', modelname='', url='', apitoken=''):20 self.metricprefix = metricprefix21 self.modelname = modelname22 self.url = url23 self.apitoken = apitoken2425 def send_metric(self, name, value, tags):26 tags_str = ''27 for tag_key in tags:28 tags_str = tags_str + ',{key}={value}'.format(key=tag_key, value=tags[tag_key])29 line = '{prefix}.{name}{tags} {value}\n'.format(prefix=self.metricprefix, tags=tags_str, model=self.modelname, name=name, value=value)30 self.batch = self.batch + line3132 def flush(self):33 print(self.batch)34 r = requests.post(self.url, headers={'Content-Type': 'text/plain', 'Authorization' : 'Api-Token ' + self.apitoken}, data=self.batch)35 self.batch = ''3637 def on_train_end(self, logs=None):38 keys = list(logs.keys())39 for m in keys:40 self.send_metric(m, logs[m], { 'model' : self.modelname, 'stage' : 'train' })41 self.flush()4243 def on_epoch_end(self, epoch, logs=None):44 keys = list(logs.keys())45 for m in keys:46 self.send_metric(m, logs[m], { 'model' : self.modelname, 'stage' : 'train' })47 self.flush()4849 def on_test_end(self, logs=None):50 keys = list(logs.keys())51 for m in keys:52 self.send_metric(m, logs[m], { 'model' : self.modelname, 'stage' : 'test' })53 self.flush()5455 def on_predict_end(self, logs=None):56 keys = list(logs.keys())57 for m in keys:58 self.send_metric(m, logs[m], { 'model' : self.modelname, 'stage' : 'predict' })59 self.flush()
Example Keras model training and evaluation
In the following example, a Keras model loads a well-known sample data set (MNIST) and trains a Keras sequential model.
A Dynatrace TensorFlow callback receiver is configured that automatically receives and forwards the accuracy and loss metric to the configured monitoring environment.
In a production deployment of the model (the evaluation step) the same Dynatrace callback receiver is used to continuously receive observability data about the running model.
1import tensorflow as tf2print("TensorFlow version:", tf.__version__)34import time56# load the Dynatrace callback receiver7from dynatrace import DynatraceKerasCallback89# Load a sample data set10mnist = tf.keras.datasets.mnist1112(x_train, y_train), (x_test, y_test) = mnist.load_data()13x_train, x_test = x_train / 255.0, x_test / 255.01415# Define a model16model = tf.keras.models.Sequential([17 tf.keras.layers.Flatten(input_shape=(28, 28)),18 tf.keras.layers.Dense(128, activation='relu'),19 tf.keras.layers.Dropout(0.2),20 tf.keras.layers.Dense(10)21])2223# Define a loss function24loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)2526# Compile the model27model.compile(optimizer='adam',28 loss=loss_fn,29 metrics=['accuracy'])3031# Define the tensor board callbacks32dt_callback = DynatraceKerasCallback(metricprefix='tensorflow', modelname='mnist-classifier', url='https://<YOUR_ENV>.live.dynatrace.com/api/v2/metrics/ingest', apitoken='<YOUR_TOKEN>')3334# Train the model35model.fit(x_train, y_train, epochs=5, callbacks=[dt_callback])3637# Use the model in production38while True:39 model.evaluate(x_test, y_test, verbose=2, callbacks=[dt_callback])40 time.sleep(60)
Visualize TensorFlow model metrics in Dynatrace
Once the Dynatrace TensorFlow callback receiver is registered within your own AI model, all the collected metrics are forwarded to your monitoring environment.
See a screenshot below that shows the Dynatrace data explorer visualizing the accuracy metric that was collected from the TensorFlow callback receiver: