Link Search Menu Expand Document

ONNX Runtime C# API

The ONNX runtime provides a C# .NET binding for running inference on ONNX models in any of the .NET standard platforms.


Supported Versions

.NET standard 1.1


Artifact Description Supported Platforms
Microsoft.ML.OnnxRuntime CPU (Release) Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility
Microsoft.ML.OnnxRuntime.Gpu GPU - CUDA (Release) Windows, Linux, Mac, X64…more details: compatibility
Microsoft.ML.OnnxRuntime.DirectML GPU - DirectML (Release) Windows 10 1709+
ort-nightly CPU, GPU (Dev) Same as Release versions

API Reference

C# API Reference

Reuse input/output tensor buffers

In some scenarios, you may want to reuse input/output tensors. This often happens when you want to chain 2 models (ie. feed one’s output as input to another), or want to accelerate inference speed during multiple inference runs.

Chaining: Feed model A’s output(s) as input(s) to model B

InferenceSession session1, session2;  // let's say 2 sessions are initialized

Tensor<float> t1;  // let's say data is fed into the Tensor objects
var inputs1 = new List<NamedOnnxValue>()
                  NamedOnnxValue.CreateFromTensor<float>("name1", t1)
// session1 inference
using (var outputs1 = session1.Run(inputs1))
    // get intermediate value
    var input2 = outputs1.First();
    // modify the name of the ONNX value
    input2.Name = "name2";

    // create input list for session2
    var inputs2 = new List<NamedOnnxValue>() { input2 };

    // session2 inference
    using (var results = session2.Run(inputs2))
        // manipulate the results

Multiple inference runs with fixed sized input(s) and output(s)

If the model have fixed sized inputs and outputs of numeric tensors, you can use “FixedBufferOnnxValue” to accelerate the inference speed. By using “FixedBufferOnnxValue”, the container objects only need to be allocated/disposed one time during multiple InferenceSession.Run() calls. This avoids some overhead which may be beneficial for smaller models where the time is noticeable in the overall running time.

An example can be found at TestReusingFixedBufferOnnxValueNonStringTypeMultiInferences():

Running on GPU (Optional)

If using the GPU package, simply use the appropriate SessionOptions when creating an InferenceSession.

int gpuDeviceId = 0; // The GPU device ID to execute on
var session = new InferenceSession("model.onnx", SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId));


See Tutorials: Basics - C#