Hosting a Machine Learning Model in ASP.NET Core 3.0

The romantic days of machine learning being the science of a few geeks are over. To be effective and ubiquitous as top managers claim they want it to be in the enterprise, machine learning must move into a more integrated and agile environment and, more than everything else, be effectively hosted in line-of-business applications. In this article, I’ll try to explain why this particular point is problematic today that most solutions, including shallow learning solutions, are primarily coded in Python. The essence of the article can be summarized in: a tighter integration between machine learning solutions and host application environments is, at the very minimum, worth exploring. This means looking beyond Python; and machine learning is now available (and fast-growing) right in the .NET platform, natively with existing .NET Framework applications and newer .NET Core applications. After a short introduction on the motivation for machine learning in .NET, I’ll discuss how to load an existing machine learning model into an ASP.NET Core 3.0 application and the practical issues you may face in production.

Why Python When Doing Machine Learning?

Python and C++ are the most commonly used languages in machine learning, but the actual reasons for their popularity are probably hard to investigate. At first sight, one may think it’s a matter of performance. The argument is valid for C++, but not for Python, which is an interpreted language with some support for multithreading, but only if you write that part of the code in C! The reason for the popularity of Python in machine learning is another: pure development convenience. When it comes to implementing machine learning, Python is on the forefront because of its impressive ecosystem of dedicated tools and libraries.

Once the same ecosystem grows in other programming platforms, then Python itself could be replaced with other languages that are easier to use than C++ and more powerful and better integrated with popular enterprise software platforms than Python. In particular, C# (and F#) and Java (and Scala and Kotlin).

Most models today are built with Python libraries (e.g., scikit-learn) and most neural networks are built with PyTorch or TensorFlow. Note, though, that TensorFlow is not an exclusive part of the Python ecosystem as it offers native bindings to other languages including C, C++ and Java and C# via dedicated proxies (e.g., TensorFlow.NET).

How would you use a trained model in production?

A model is only a binary file that must be loaded into some client application to be consumed. The model is not made of directly executable code but contains the analytical description of a computation graph in much the same way an IQueryable tree contains the graph of data query to execute. The machine learning model is inert if not embedded in some client application. There are three main ways to consume a Python model from outside any Python environment, such as a .NET or Java application.

Hosting the model in a remote (web) service
Hosting the model in-process and consuming it via bindings (if available) for the language in which the client application is written
Loading the model via the universal ONNX format

Needless to say, all known ways to host a trained model for consumption work, but none is perfect.

Hosting Core Issues

The most common way to expose a trained model is via a web wrapper. Both scikit-learn and TensorFlow have their favorite web infrastructures to expose the model via a dedicated web API. TensorFlow has the TensorFlow Serving infrastructure and scikit-learn relies on the Flask framework to quickly loading the model and return a URL for clients to call it over HTTP.

It should be noted, though, that an HTTP service has an impact on the overall application architecture because it forces to rely on external dependencies, thus adding latency in the system. The challenge is having a trained model that could be integrated within the same process of the application or the same process of the microservice that needs to invoke the model. This is what the second option refers to: importing the trained model as a serialized file and interact with it natively. This is only possible if the founding infrastructure—the library the model is built on—provides bindings for the language in which the client application is written. For example, a TensorFlow model can be natively used in C, C++ or Python and even in .NET through the intermediation of the TensorFlow.NET library. The same, however, can’t be said for model built with scikit-learn. It’s worth noting here that with TensorFlow, you build deep learning neural networks whereas with scikit-learn you mostly work out models based on simpler, shallow learning algorithms whose computation graph is sometimes as simple as calculating a mathematical function.

The third option refers to exporting the trained model in the universal ONNX format that enables multiple application environments, in multiple languages, to load the model. All the client application needs to do is incorporate a wrapper for consuming ONNX binaries, and all comes easy then. Note, though, that ONNX is a sort of common denominator of many libraries, and an ONNX importer is still not so common. Furthermore, at the moment, ONNX lacks support for certain areas of each original framework. ONNX, though, is a promising area for standardization of the serialized models.

Towards Machine Learning in .NET

More and more applications in need of consuming machine learning models are written in the context of enterprise solutions based on Java or .NET. Therefore, a native machine learning framework in any of those platforms is (or will be soon) a strict necessity. Generally available since the spring of 2019, ML.NET is a free, cross-platform and open-source framework designed to build and train learning models and host them within .NET Core and .NET Framework applications as well as .NET standard libraries. You can learn more about ML.NET at https://dotnet.microsoft.com/apps/machinelearning-ai/ml-dotnet.

Currently available as version 1.4, ML.NET aims at democratizing machine learning for developers, therefore, trying to simplify it to levels that are easy enough for developers. ML.NET doesn’t specifically target data scientists meaning there might be some data science approaches (and algorithms) that ML.NET doesn’t cover yet. On the other tip of the scale, though, ML.NET puts the ability to host models in-process in .NET applications and to develop them without gaining or buying Python skills. Compared to the pillars of the Python ecosystem, ML.NET can be seen primarily as the counterpart of the scikit-learn library. The framework, however, also includes some basic facilities for data preparation and analysis that you can find in Python helper libraries such as Pandas or NumPy.

To start with the ML.NET framework, you need to install ML.NET package and start creating models with plain C# using any editor on Windows, Mac or Linux. A specific extension–the Model Builder–is available for Visual Studio 2019. Providing a primer on ML.NET, or even machine learning as a whole, is beyond the purposes of this article. (It would anyway be good to hear from you on how such articles would be received!) In the rest of the article, we’ll assume to have an existing model trained with an ML.NET-based algorithm and proceed to see what it takes to host it effectively in an ASP.NET Core application.

Hosting a Model in ASP.NET

The output of ML.NET training is a ZIP file that contains, serialized in some proprietary way, the computation graph of the model. If you’re curious about the format, have a look at the source code of ML.NET. The code below shows how to load a previously trained model in a C# class invoked via an HTTP request.

var mlContext = new MLContext();

var model = mlContext

.Model

.Load(modelFilePath, out var inputSchema);

As the first thing, the code creates a required instance of the MLContext class—the root object of the ML.NET library. Next, a model is located by file path and loaded in memory. At this point, the MLContext instance is ready to use to model to perform the computation stored in the bits of the model. In the ML.NET lingo, this requires getting ahold of a prediction engine.

// Create prediction engine for the trained model

var predEngine = mlContext

.Model

.CreatePredictionEngine<InputData, OutputData>(model);

In the code snippet above, InputData and OutputData refer to two .NET types respectively describing the data being passed to the model and the data being returned by the model at the end of the hardcoded computation. The model describes a calculation to take place, and InputData and OutputData describe what the model receives and what the model returns. Those two types are used in two distinct phases: training and production. Training is the phase that generates the model and is commonly organized around an ML.NET-based console application. The console application refers to those types via a shared class library. The prediction engine returned by the call to CreatePredictionEngine is the actual instance that runs the computation stored in the model and returns the actual response.

1	var output = predEngine.Predict(input);

As an example, imagine you developed a model to predict the cost of a taxi ride. In this case, the input variable will be a C# object containing the parameters that will guide the model towards a prediction. The output variable is the C# description of the prediction—possibly just a numeric value denoting the expected amount of money to pay for the ride.

Scalability Concerns

It all seems easy, doesn’t it? Well, there are a few aspects to consider. First and foremost, the code presented works well, but it is not optimized for any multithreaded host scenarios, such as web applications, web API or even Azure Functions. The code presented is acceptable if used within a desktop or mobile application. Specifically, the code above has two main issues. One is that the model is being loaded on every HTTP request that causes it to execute. It’s just a matter of poor performance that becomes patent when the model is significantly large. At the very minimum, the trained model should be coded as a singleton and shared across the application. Technically, a model in ML.NET is an instance of the ITransformer type which is known to be thread-safe and then sharing it as a singleton is acceptable. In ASP.NET Core, the easiest is loading it at startup and sharing via dependency injection. Also, a global variable, however, works fine.

The other problem is more serious and relates to the PredictionEngine type. As mentioned, the type wraps up the trained model and invokes it. Getting an instance of the type is time-consuming, so it is not recommended to create a fresh instance every time a specific request comes in. Unfortunately, though, this type is also not thread-safe, meaning that the singleton workaround discussed for the model can’t be applied and a more sophisticated solution is recommended such as using object pooling. The good news is that you don’t have to work out an object pool yourself. The ML.NET team has an ASP.NET Core integration package in the works that provides a prediction engine pool out of the box, perfectly integrated with the dependency injection layer of ASP.NET Core. Here’s the code you need to have, based on the current preview package.

public void ConfigureServices(IServiceCollection services)

{

// Other startup.cs code here

// ...

services.AddPredictionEnginePool<InputData, OutputData>()

.FromFile(modelName: "MyModel",

filePath:"MyModel.zip);

}

The startup.cs file adds a call to AddPredictionEnginePool specifying the name and the path to the trained model. Any controller that needs the services of the prediction engine has only to inject it.

The controller takes the following form:

public class SomeController : Controller

{

private readonly SomePredictionService _service;

public SomeController(

PredictionEnginePool<InputData, OutputData> engine)

{

_service = new SomePredictionService(engine);

}

// More code here

}

My favorite approach is injecting the external dependency in the controller and from the controller into an intermediate worker service that will do the job. An instance of a prediction engine is picked up from the pool and served to the ongoing request. The instance will then be returned to the pool when the request completes. The worker service—the class SomePredictionService—receives and uses the reference to the ML.NET prediction engine in a way that doesn’t affect threads and subsequently scalability of the host.

var output = engine

.GetPredictionEngine("MyModel")

.Predict(input);

The prediction engine pool is contained in the Microsoft.Extensions.ML Nuget package.

Summary

As machine learning transitions towards some sort of general availability and ubiquity, it becomes crucial to integrate it in the development cycle. This means agility between data science and development teams and also technical solutions that let the model being hosted in process when this makes technical and business sense. This article discusses the problems you face when hosting a trained model into an ASP.NET applications.

Register for Simple Talk

Hosting a Machine Learning Model in ASP.NET Core 3.0

Why Python When Doing Machine Learning?

Hosting Core Issues

Towards Machine Learning in .NET

Hosting a Model in ASP.NET

Scalability Concerns

Summary

Article tags

About the author

Dino Esposito

Dino Esposito's contributions

Articles

Books

Top topics

Dino Esposito's latest contributions:

Git in action

Git anatomy

How We Ended up with Git

Why Python When Doing Machine Learning?

Hosting Core Issues

Towards Machine Learning in .NET

Hosting a Model in ASP.NET

Scalability Concerns

Summary

Article tags

Recommended

About the author

Dino Esposito

Dino Esposito's contributions

Articles

Books

Top topics

Dino Esposito's latest contributions:

Git in action

Git anatomy

How We Ended up with Git