Managed Online Endpoints in Azure Machine Learning


Before we introduce Managed Online Endpoints in Azure Machine Learning, let’s revisit deployment in Azure Machine Learning. For real-time model deployment in Azure Machine Learning, one needs to perform the following steps:

  1. Create Scoring Script
  2. Define the Environment Config
  3. Create an Inference Config
  4. Create the Deployment Config
  5. Deploy the model

For more details, refer to Section 1.4 i.e. Model Serving, in our article.  One key challenge is the provisioning and maintenance of the Deployment targets like Azure Kubernetes Service/Azure ML Compute Clusters. Fortunately, Microsoft has come up with Managed Endpoints in Azure Machine Learning, using which we could bypass the hassles around maintaining the deployment targets/environments.

Similar to the aforementioned article, we will create a real-time/online endpoint and test it. But there are some pre-requisites.


  • Azure Machine Learning Workspace.
  • Azure CLI (v2) is installed locally

Step 0: Register a Model to Azure ML Workspace

Before Deployment, a Model has to be registered to the Azure ML workspace. Refer to the aforementioned article titled Azure Databricks and Azure Machine Learning make a great pairfor detailed steps. Here is an example of the same:

model_name = 'california-housing-prices'
model_description = 'Model to predict housing prices in California.'
model_tags = {"Type": "GradientBoostingRegressor", 
"Run ID":, 
"Metrics": aml_run.get_metrics()}

registered_model = Model.register(model_path=model_file_path, #Path to the saved model file

Step 1: Install Azure CLI ml extension

Apart from Azure CLI(v2), you need to install the new ‘ml’ extension.  It enables you to train and deploy models from the command line, with features that speed up data science while tracking the model lifecycle. Follow this document for step-by-step instructions. Once the extension is added, open the command prompt and run az -v. Make sure that the ml extension version is the latest (2.2.3 as of now).

Step 2: Create the Online Endpoint Configuration (YAML).

There are two steps for creating a Managed Online Endpoint:

  1. Create an Endpoint
  2. Create the Deployment

The following YAML structure creates the endpoint:

name: california-housing-service
auth_mode: key

Step 3: Create the Online Deployment Configuration (YAML).

The next step is creating the Deployment. Below is an example of the Deployment YAML Schema:

name: default
endpoint_name: california-housing-service
model: azureml:california-housing-prices:1
  code: ./
environment: azureml:california-housing-env:1
instance_type: Standard_F2s_v2
instance_count: 1

There are two aspects to note here, i.e. a scoring script and environment.  A scoring script has two functions viz. init() and run(). The init function loads a registered model to be scored against, while the run executes the scoring logic. The Environment comprises the details of scoring dependencies like the libraries.

The scoring script has the following structure:

import os
import json
import numpy as np
import pandas as pd
import sklearn
import joblib
from azureml.core.model import Model

columns = ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup','Latitude','Longitude']

def init():
    global model
    model_filename = 'california-housing.pkl'
    model_path = os.path.join(os.environ['AZUREML_MODEL_DIR'], model_filename)
    model = joblib.load(model_path)

def run(input_json):
    # Get predictions and explanations for each data point
    inputs = json.loads(input_json)
    data_df = pd.DataFrame(np.array(inputs['data']).reshape(-1, len(columns)), columns = columns)
    # Make prediction
    predictions = model.predict(data_df)
    # You can return any data type as long as it is JSON-serializable
    return {'predictions': predictions.tolist()}

As far as the environment is concerned, either you can use a pre-built one or a custom one. In Azure Machine Learning, you can define your custom environments and register the same in the AML workspace. In this example, we use a custom environment. Run the following script, in the azure machine learning compute instance jupyter notebooks, to create an environment:

from azureml.core import Workspace
from azureml.core import Environment
from azureml.core.environment import CondaDependencies


myenv=Environment.get(workspace=ws, name='AzureML-Minimal').clone(my_env_name)

print("Review the deployment environment.")

# Register the environment

Step 4: Create the Online Endpoint.

Now, we have all the pieces in place. The folder structure looks like this:

Open a windows command prompt and change the directory to the above folder. Use the following command to log in to the appropriate Azure tenant:

az login

Further, set the subscription using the following command:

az account set  --subscription <name or id>

Now, create the Online Endpoint using the following command:

az ml online-endpoint create -f endpoint.yml --resource-group <your-resource-group> --workspace-name <your-azureml-workspace>

Step 6: Create the Online Endpoint Deployment.

The previous step creates an endpoint, which is a shell. To associate it to an appropriate compute environment, you create a deployment, using the deployment configuration defined in deployment.yml. Note that you can create multiple deployments for an endpoint. Here is the command to create the online deployment:

az ml online-deployment create -f deployment.yml --resource-group <your-resource-group> --workspace-name <your-azureml-workspace>

Once the deployment is completed, the endpoint looks like this:

Note the secure HTTPS Endpoint.

Step 7: Test the Endpoint

Once the deployment is complete, we can test the endpoint in two steps:

  • Create a request JSON
  • Use Az CLI to invoke the endpoint and get back results

Here is the sample JSON data:

      [8.1, 41,4.04, 1.2, 900.0, 3.560606, 37.50, -127.00], [1.5603, 25, 5.045455, 1.133333, 845.0, 2.560606, 39.48, -121.09]

Next is the Az CLI command to invoke the endpoint:

az ml online-endpoint invoke --name california-housing-service --deployment default --resource-group <your-resource-group> --workspace-name <your-azureml-workspace> --request-file Sample_Request.json

And, here are the results:

"{\"predictions\": [4.731896241931953, 0.6704102705036317]}"

However, it is not practical to use CLI commands to score the endpoint. Hence, we will use the python script. This template for the python script can be found in the consume tab shown above. However, before that, the deployment has to be updated to accept the 100% of traffic. Here is the CLI command for the same:

az ml online-endpoint update --name california-housing-service --resource-group <your-resource-group> --workspace-name <your-azureml-workspace> --traffic "default=100"

Finally, here is the python script to infer from Managed Online Endpoints in Azure Machine Learning:

import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
# bypass the server certificate verification on client side
if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

X = {
"data": [[8.1, 41,4.04, 1.2, 900.0, 3.560606, 37.50, -127.00], [1.5603, 25, 5.045455, 1.133333, 845.0, 2.560606, 39.48, -121.09]]

body = str.encode(json.dumps(X))

url = '<your-managed-online-endpoint>'
api_key = '<your-online-endpoint-key>' # Replace this with the API key for the web service
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key)}

req = urllib.request.Request(url, body, headers)

    response = urllib.request.urlopen(req)

    result =

except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print("utf8", 'ignore'))

Here are the results:

b'{"predictions": [4.731896241931953, 0.6704102705036317]}'

Also read: Introducing Machine Learning System Design

I am a Data Scientist with 6+ years of experience.

Leave a Reply