Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (2024)

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (1)

Woonggon Kim

·

Follow

30 min read

·

Apr 3, 2024

--

Are you looking to harness the precision and efficiency of YOLOv8 Segmentation in your C# projects? This comprehensive guide will walk you through the process of integrating YOLOv8 Segmentation within a C# environment, leveraging the power of C++ Dynamic Link Libraries (DLLs) for enhanced performance. By utilizing the Libtorch version that supports CUDA, you’ll be able to tackle complex object detection and segmentation tasks with ease.

Why YOLOv8 Segmentation Matters

YOLOv8 Segmentation is a state-of-the-art deep learning model that excels in object detection and segmentation tasks. Its ability to accurately identify and outline objects in images and videos makes it a valuable tool for a wide range of applications, from autonomous vehicles to medical imaging. By mastering YOLOv8 Segmentation, you’ll be equipped with the skills to tackle related challenges like landmark detection, keypoint recognition, and classification.

Bridging the Gap Between C# and C++

One of the key aspects of this guide is the seamless integration of C# and C++ code. By meticulously defining and manipulating variables and methods that bridge the managed and unmanaged code realms, you’ll ensure smooth operation and data exchange between C# and C++. This integration opens up a world of possibilities, allowing you to embed Deep Learning Models into various platforms that support C++ or C# code, such as Unreal Engine and Unity.

Unleashing the Potential of C++ DLLs, Libtorch, and CUDA

By harnessing the power of C++ DLLs, Libtorch, and CUDA, you’ll elevate your YOLOv8 Segmentation implementation to new heights. C++ DLLs provide unmatched performance and versatility, enabling you to develop highly optimized code that seamlessly integrates with your C# projects.

Libtorch, the C++ PyTorch library, offers a robust and user-friendly interface for working with deep learning models. With Libtorch, you can effortlessly load, manipulate, and execute YOLOv8 Segmentation models, leveraging the library’s efficient tensor operations and memory management capabilities.

CUDA, NVIDIA’s parallel computing platform and programming model, allows you to unleash the power of GPU acceleration. By combining CUDA with Libtorch, you can dramatically accelerate the inference process of YOLOv8 Segmentation, enabling real-time processing of images and videos. This fusion of technologies unlocks a vast array of possibilities for building high-performance computer vision applications.

This guide covers every aspect of implementing YOLOv8 Segmentation in C++ and C#, from setting up your development environment to performing inference on images and videos. You’ll learn how to:

  1. Converting YOLOv8 Segmentation Weights to TorchScript for Use with LibTorch
  2. Configuring Visual Studio 2022 C++ for DLL Creation with LibTorch, CUDA, and OpenCV
  3. YOLOv8 C++ DLL Codes
  4. C# Environment Configuration
  5. Loading the C# DLL and Defining Functions
  6. Performing Inference in C#
  7. Summary

Each step is accompanied by detailed explanations, code snippets, and best practices to ensure a smooth implementation process.

You can refer to the source code at:

https://github.com/kimwoonggon/Cpp_Libtorch_DLL_YoloV8Segmentation_CSharpProject

The YOLOv8 segmentation model is converted to TorchScript. The resulting files are yolov8s-seg.torchscript and yolov8s-seg.onnx.


# Install the Ultralytics package quietly without showing output.
!pip install ultralytics -q

# Import the YOLO class from the ultralytics package.
from ultralytics import YOLO

# Load a pretrained model by specifying the model file.
model = YOLO("yolov8s-seg.pt")

# Export the model to TorchScript format for optimized inference
# and compatibility.
model.export(format="torchscript")

# Additionally, export the model to ONNX format to facilitate
# inspecting input and output structures using Netron.
model.export(format="onnx")

You can inspect the structure of the network using yolov8s-seg.onnx at https://netron.app/.

The input is related to the network's image input size, with a shape of [1, 3, 640, 640]. There are two outputs, with shapes of [1, 116, 8400] and [1, 32, 160, 160], respectively.

Looking into the meaning of 116 in the shape [1, 116, 8400], indices 0 to 83 are related to detection, and 83 to 115 are related to segmentation.

Examining 116 in detail, the first four numbers, 0, 1, 2, 3, correspond to the detected object’s center x, center y, width, and height, while the next 80 are related to the probabilities of the detected object’s class. The remaining 32 are related to segmentation, and through operations with the second output, [1, 32, 160, 160], the final segmentation is determined. This will be explored further in the code to come.

Before we begin this tutorial, let me first introduce the experimental environment used:

OS : Windows 10

CUDA : 11.8 (https://developer.nvidia.com/cuda-11-8-0-download-archive)

CUDNN : 8.9.6.50 (https://developer.nvidia.com/rdp/cudnn-archive)

Libtorch : Cuda 11.8 Release Verison (https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.2.2%2Bcu118.zip)

OpenCV C++ : 4.9.0 windows (https://opencv.org/releases/)

C++ 17, x64, Release

C# : .Net Framework 4.7.2

OpenCvSharp4 by shimat (Nuget package in visual studio 2022)

Cuda 11.8 Installation

Please download and install Cuda 11.8. Make sure that Nsight NVTX is checked during the installation.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (3)

CUDNN Installation

After downloading CUDNN, copy the contents of CUDNN’s bin to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin.

Copy the contents of CUDNN’s Include to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include.

Copy the contents of CUDNN’s lib\x64 to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib\x64.

Libtorch Installation

Download the Release version of libtorch for Cuda 11.8 and copy it into the C:\libtorch folder.

OpenCV Installation

After downloading OpenCV, please install it in c:\opencv.

Run C:\opencv\build\x64\vc16\bin\opencv_version_win32.exe to check if OpenCV is working correctly.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (4)

Environment Variables Setting

Although these settings are automatically configured upon installation, for this tutorial, please ensure the following:

In Windows System Properties -> Environment Variables, check if the User Variables Path includes the contents as shown below.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (5)

Also, in System Variables, confirm that CUDA_PATH is set to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8.

Creating a Visual Studio 2022 DLL Project

Run Visual Studio 2022 and select C++ Dynamic-Link Library (DLL). I have named the project YoloV8DLLProject.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (6)

Settings for C++

The settings can be tricky, so I will explain them step-by-step:

In the project properties -> Configuration Properties -> General, set the C++ Language Standard to C++ 17 Standard.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (7)

In Configuration Properties -> General -> C/C++ -> Language, change the Conformance Mode to No.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (8)

Change SDL in Configuration Properties -> General -> C/C++ to No.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (9)

Add the following content to Configuration Properties -> General -> C/C++ -> Additional Include Directories.

  • C:\opencv\build\include
  • C:\libtorch\include
  • C:\libtorch\include\torch\csrc\api\include
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include
  • C:\Program Files\NVIDIA Corporation\NvToolsExt\include
Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (10)

Add the following content to Configuration Properties -> Linker -> Input.

  • C:\opencv\build\x64\vc16\lib\*.lib
  • C:\libtorch\lib\*.lib
  • C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\lib\*.lib
  • C:\Program Files\NVIDIA Corporation\NvToolsExt\lib\x64\nvToolsExt64_1.lib
Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (11)

In Configuration Properties -> Linker -> Command Line, add /INCLUDE:”?warp_size@cuda@at@@YAHXZ” to Additional Options.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (12)

Set up a Post-Build Event for smooth execution after the build.

  • xcopy “C:\opencv\build\x64\vc16\bin\opencv_world490.dll” “$(SolutionDir)$(Platform)\dll\” /c /y
  • xcopy “C:\libtorch\lib\*.dll” “$(SolutionDir)$(Platform)\dll\” /c /y
  • xcopy “C:\libtorch\lib\*.lib” “$(SolutionDir)$(Platform)\dll\” /c /y
  • xcopy “C:\Program Files\NVIDIA Corporation\NvToolsExt\bin\x64\nvToolsExt64_1.dll” “$(SolutionDir)$(Platform)\dll\” /c /y

To accelerate the compilation process, I use the pch.h file.

//pch.h

#ifndef PCH_H
#define PCH_H
// You don't need "framework.h"
//#include "framework.h"
#include <iostream>
#include <algorithm>
#include <time.h>
#include <torch/script.h>
#include <torch/torch.h>
#include <ATen/ATen.h>
#include <opencv2/opencv.hpp>

#endif

Let’s start updating dllmain.cpp

When you start the DLL Project, you will see DllMain, but we will not use this code. Therefore, you can delete the DllMain function.

// dllmain.cpp

BOOL APIENTRY DllMain( HMODULE hModule,
DWORD ul_reason_for_call,
LPVOID lpReserved
)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
case DLL_PROCESS_DETACH:
break;
}
return TRUE;
}

Add the necessary header files for the task.


// dllmain.cpp
#include <time.h>
#include <algorithm>
#include <chrono>
#include <cstring>
#include <filesystem>
#include <iostream>
#include <memory>
#include <stdexcept>
#include <typeinfo>
#include "pch.h"

namespace fs = std::filesystem;

To address name mangling issues, we will wrap all the codes with extern "C".

// Warp all the code to deal with name-mangling problems.
extern "C" {

}

Before writing functions required by YOLOv8, we will add the necessary variables for the task.

// The Yolov8 segmentation model loaded from a file, used for inference.
torch::jit::Module network;
// The type of device on which the network will run (e.g., CPU, CUDA).
torch::DeviceType device_type;

// The actual dimensions of the network input.
int real_net_width;
int real_net_height;
// Original image dimensions before resizing for the network.
int _org_height;
int _org_width;

// Thresholds for filtering out predictions with low confidence.
float _score_thresh = 0.5f; // Minimum score to consider a detection valid.
float _iou_thresh = 0.5f; // IOU threshold for non-maximum suppression.
float _seg_thresh = 0.5f; // Threshold for segmentation mask confidence.

// A vector to hold detection results (bounding boxes, scores, etc.).
std::vector<torch::Tensor> dets_vec;

// Name or identifier for the model
std::string _modelName;

// Tensors holding the predicted locations and segmentation predictions.
torch::Tensor predLoc; // Predicted bounding box locations.
torch::Tensor seg_pred; // Predicted segmentation masks.

// An OpenCV matrix to hold the aggregated segmentation map.
cv::Mat total_seg_map;

// Struct to represent a detected object.
struct YoloObject {
float left; // Left coordinate of the bounding box.
float top; // Top coordinate of the bounding box.
float right; // Right coordinate of the bounding box.
float bottom; // Bottom coordinate of the bounding box.
float score; // Confidence score of the class detection.
float classID; // ID of the detected class.
uchar* seg_map; // Pointer to the segmentation map for this object.
};

// A predefined set of colors for visualizing detection and segmentation results.
std::vector<cv::Vec3b> colors = {
cv::Vec3b(0, 0, 255), // Red
cv::Vec3b(0, 255, 0), // Green
cv::Vec3b(255, 0, 0), // Blue
cv::Vec3b(255, 255, 0), // Cyan
cv::Vec3b(255, 0, 255), // Magenta
cv::Vec3b(0, 255, 255), // Yellow
cv::Vec3b(128, 0, 0), // Dark Red
cv::Vec3b(0, 128, 0), // Dark Green
cv::Vec3b(0, 0, 128), // Dark Blue
cv::Vec3b(128, 128, 0), // Olive
cv::Vec3b(128, 0, 128), // Purple
cv::Vec3b(0, 128, 128), // Teal
cv::Vec3b(192, 192, 192), // Silver
cv::Vec3b(128, 128, 128), // Gray
cv::Vec3b(64, 0, 0), // Maroon
cv::Vec3b(0, 64, 0), // Dark green
cv::Vec3b(0, 0, 64), // Navy
cv::Vec3b(64, 64, 0), // Dark Olive
cv::Vec3b(64, 0, 64), // Indigo
cv::Vec3b(0, 64, 64), // Dark Cyan
cv::Vec3b(192, 192, 0), // Mustard
cv::Vec3b(192, 0, 192), // Pink
cv::Vec3b(0, 192, 192), // Sky Blue
cv::Vec3b(64, 192, 0), // Lime Green
cv::Vec3b(192, 64, 0), // Orange
cv::Vec3b(0, 192, 64), // Sea Green
cv::Vec3b(64, 0, 192), // Royal Blue
cv::Vec3b(192, 0, 64), // Deep Pink
cv::Vec3b(0, 64, 192), // Cerulean
cv::Vec3b(64, 192, 192), // Turquoise
cv::Vec3b(192, 64, 192), // Orchid
cv::Vec3b(192, 192, 64), // Sand
cv::Vec3b(128, 64, 64), // Rosy Brown
cv::Vec3b(64, 128, 64), // Pale Green
cv::Vec3b(64, 64, 128), // Slate Blue
cv::Vec3b(128, 128, 64), // Khaki
cv::Vec3b(128, 64, 128), // Plum
cv::Vec3b(64, 128, 128), // Cadet Blue
cv::Vec3b(140, 70, 20), // Saddle Brown
cv::Vec3b(0, 140, 140), // Dark Turquoise
cv::Vec3b(0, 0, 255), // Red
cv::Vec3b(0, 255, 0), // Green
cv::Vec3b(255, 0, 0), // Blue
cv::Vec3b(255, 255, 0), // Cyan
cv::Vec3b(255, 0, 255), // Magenta
cv::Vec3b(0, 255, 255), // Yellow
cv::Vec3b(128, 0, 0), // Dark Red
cv::Vec3b(0, 128, 0), // Dark Green
cv::Vec3b(0, 0, 128), // Dark Blue
cv::Vec3b(128, 128, 0), // Olive
cv::Vec3b(128, 0, 128), // Purple
cv::Vec3b(0, 128, 128), // Teal
cv::Vec3b(192, 192, 192), // Silver
cv::Vec3b(128, 128, 128), // Gray
cv::Vec3b(64, 0, 0), // Maroon
cv::Vec3b(0, 64, 0), // Dark green
cv::Vec3b(0, 0, 64), // Navy
cv::Vec3b(64, 64, 0), // Dark Olive
cv::Vec3b(64, 0, 64), // Indigo
cv::Vec3b(0, 64, 64), // Dark Cyan
cv::Vec3b(192, 192, 0), // Mustard
cv::Vec3b(192, 0, 192), // Pink
cv::Vec3b(0, 192, 192), // Sky Blue
cv::Vec3b(64, 192, 0), // Lime Green
cv::Vec3b(192, 64, 0), // Orange
cv::Vec3b(0, 192, 64), // Sea Green
cv::Vec3b(64, 0, 192), // Royal Blue
cv::Vec3b(192, 0, 64), // Deep Pink
cv::Vec3b(0, 64, 192), // Cerulean
cv::Vec3b(64, 192, 192), // Turquoise
cv::Vec3b(192, 64, 192), // Orchid
cv::Vec3b(192, 192, 64), // Sand
cv::Vec3b(128, 64, 64), // Rosy Brown
cv::Vec3b(64, 128, 64), // Pale Green
cv::Vec3b(64, 64, 128), // Slate Blue
cv::Vec3b(128, 128, 64), // Khaki
cv::Vec3b(128, 64, 128), // Plum
cv::Vec3b(64, 128, 128), // Cadet Blue
cv::Vec3b(140, 70, 20), // Saddle Brown
cv::Vec3b(0, 140, 140), // Dark Turquoise
};

The SetDevice function defines whether to use the CPU or a CUDA GPU, based on deviceNum.

// Exports the function for external use in DLL format.
__declspec(dllexport) void SetDevice(int deviceNum) {
// Set the device for model computations. 0 for CPU, any other number for GPU if available.
if (deviceNum == 0) {
device_type = torch::kCPU; // Use CPU for computations.
} else {
device_type = torch::kCUDA; // Use GPU (CUDA) for computations if available.
}
}

We will define the LoadModel function. It takes the path to the torchscript as modelPath and decides whether to use the CPU or GPU based on deviceNum.

// Exports the function for external use. Loads the model and sets the computation device.
__declspec(dllexport) int LoadModel(char* modelPath, int deviceNum) {
int return_val = 1; // Return value indicating success (1) or failure (-1).
try {
// Check the model name to ensure it's supported (e.g., yolov8).
if (strstr(modelPath, "yolov8")) {
_modelName = "yolov8"; // Set the model name if it matches.
} else {
return_val = -1; // If the model name doesn't match, return an error.
return return_val;
}
// Set the computation device based on the input parameter.
if (deviceNum == 0) {
device_type = torch::kCPU; // Use CPU for computations.
} else {
// If GPU is requested, check if CUDA is available.
if (torch::cuda::is_available()) {
device_type = torch::kCUDA; // Use GPU (CUDA) for computations if available.
} else {
device_type = torch::kCPU; // Fallback to CPU if CUDA is not available.
}
}
// Load the model from the specified path onto the selected device.
network = torch::jit::load(modelPath, device_type);
network.eval(); // Set the network to evaluation mode (disables dropout, etc.).
std::cout << "device type : " << device_type << std::endl; // Debug: print the selected device type.
} catch (const c10::Error& e) {
std::cout << "Model reading failed .. " << std::endl; // Handle errors in model loading.
return -1;
}
return return_val; // Return success status.
}

We will define the SetThreshold function. It sets the score threshold, which determines the inclusion of an object, the iou threshold as the criterion for NMS, and the segment threshold for the segmentation area.

// Exports the function to set global thresholds for detection and segmentation.
__declspec(dllexport) void SetThreshold(float score_thresh,
float iou_thresh,
float seg_thresh) {
_score_thresh = score_thresh; // Minimum confidence score to consider a detection valid.
_iou_thresh = iou_thresh; // IOU threshold for non-maximum suppression (filtering overlapping boxes)..
_seg_thresh = seg_thresh; // Threshold for segmentation mask confidence.
}

Before performing inference, we define the non_max_suppression function. The input size of the tensor is [1,8400,116]. Through non_max_suppression, only meaningful predictions out of 8400 inferences are retained.

// Applies Non-Maximum Suppression to filter overlapping detections.
void non_max_suppression(torch::Tensor preds, // Model predictions [1,8400,116]
float score_thresh = 0.5, // Default score threshold.
float iou_thresh = 0.5) { // Default IOU threshold for NMS.
dets_vec.clear(); // Clear previous detections.
auto device = preds.device(); // Get the device of the predictions tensor.
for (size_t i = 0; i < preds.sizes()[0]; ++i) {
torch::Tensor pred = preds.select(0, i).to(device); // Process each prediction.
// If using a YOLOv8 model, filter detections based on score threshold.
if (_modelName == "yolov8") {
torch::Tensor scores =
std::get<0>(torch::max(pred.slice(1, 4, 84), 1)); // Get scores.
// Filter out low-score predictions.
pred = torch::index_select(
pred, 0, torch::nonzero(scores > score_thresh).select(1, 0));
} else {
throw std::runtime_error("Model name is not valid");
}
if (pred.sizes()[0] == 0) // Skip if no predictions left after filtering.
continue;
// Convert bounding box format from center x, center y, width, height (cx, cy, w, h) to top-left and bottom-right corners (x1, y1, x2, y2).
pred.select(1, 0) = pred.select(1, 0) - pred.select(1, 2) / 2; // Calculate x1
pred.select(1, 1) = pred.select(1, 1) - pred.select(1, 3) / 2; // Calculate y1
pred.select(1, 2) = pred.select(1, 0) + pred.select(1, 2); // Calculate x2
pred.select(1, 3) = pred.select(1, 1) + pred.select(1, 3); // Calculate y2

// Identify the maximum confidence score for each prediction and its corresponding class.
auto max_tuple = torch::max(pred.slice(1, 4, 84), 1);
pred.select(1, 4) = std::get<0>(max_tuple); // Set max confidence score
predLoc = std::get<1>(max_tuple).to(pred.device()); // Store class id

torch::Tensor dets;
// Combine bounding box coordinates with confidence scores and class ids into a single tensor.
dets = torch::cat({pred.slice(1, 0, 5), pred.slice(1, 84, 116)}, 1);

// Prepare tensors to keep track of indices of detections to retain.
torch::Tensor keep = torch::empty({dets.sizes()[0]}, dets.options());
torch::Tensor areas = (dets.select(1, 3) - dets.select(1, 1)) *
(dets.select(1, 2) - dets.select(1, 0));

// Sort detections by confidence score in descending order.
auto indexes_tuple = torch::sort(dets.select(1, 4), 0,
1); // 0: first order, 1: decending order

torch::Tensor v = std::get<0>(indexes_tuple);
torch::Tensor indexes = std::get<1>(indexes_tuple);

int count = 0; // Counter for detections to keep.

// Loop over detections and apply non-maximum suppression.
while (indexes.sizes()[0] > 0) {
// Always keep the detection with the highest current score.
keep[count++] = (indexes[0].item().toInt());
// Compute the pairwise overlap between the highest scoring detection and all others.
// Preallocate tensors to hold the computed overlaps.
torch::Tensor lefts =
torch::empty(indexes.sizes()[0] - 1, indexes.options());
torch::Tensor tops =
torch::empty(indexes.sizes()[0] - 1, indexes.options());
torch::Tensor rights =
torch::empty(indexes.sizes()[0] - 1, indexes.options());
torch::Tensor bottoms =
torch::empty(indexes.sizes()[0] - 1, indexes.options());
torch::Tensor widths =
torch::empty(indexes.sizes()[0] - 1, indexes.options());
torch::Tensor heights =
torch::empty(indexes.sizes()[0] - 1, indexes.options());

// Loop over each detection remaining after the one with the highest score.
for (size_t i = 0; i < indexes.sizes()[0] - 1; ++i) {
// Compute the coordinates of the intersection rectangle.
lefts[i] = std::max(dets[indexes[0]][0].item().toFloat(),
dets[indexes[i + 1]][0].item().toFloat());
tops[i] = std::max(dets[indexes[0]][1].item().toFloat(),
dets[indexes[i + 1]][1].item().toFloat());
rights[i] = std::min(dets[indexes[0]][2].item().toFloat(),
dets[indexes[i + 1]][2].item().toFloat());
bottoms[i] = std::min(dets[indexes[0]][3].item().toFloat(),
dets[indexes[i + 1]][3].item().toFloat());
widths[i] = std::max(
float(0), rights[i].item().toFloat() - lefts[i].item().toFloat());
heights[i] = std::max(
float(0), bottoms[i].item().toFloat() - tops[i].item().toFloat());
}

// Compute the intersection over union (IoU) for each pair.
torch::Tensor overlaps =
widths * heights;
torch::Tensor ious =
overlaps / (areas.select(0, indexes[0].item().toInt()) +
torch::index_select(
areas, 0, indexes.slice(0, 1, indexes.sizes()[0])) -
overlaps);
auto nonzero_indices = torch::nonzero(ious <= iou_thresh);
torch::Tensor kk = torch::nonzero(ious <= iou_thresh).select(1, 0) + 1;
// Filter out detections with IoU above the threshold, as they overlap too much with the highest scoring box.
indexes = torch::index_select(
indexes, 0, torch::nonzero(ious <= iou_thresh).select(1, 0) + 1);
}

// Convert the 'keep' tensor to 64-bit integer type. This is necessary for indexing operations that follow.
keep = keep.toType(torch::kInt64);

// Select the detections that have been marked for keeping.
dets_vec.emplace_back(std::move(
torch::index_select(dets, 0, keep.slice(0, 0, count)).to(torch::kCPU)));

// Similarly, select the locations (predLoc) corresponding to the kept detections.
predLoc = torch::index_select(predLoc, 0, keep.slice(0, 0, count))
.to(torch::kCPU);
}
}

You’ve been waiting for a long time. Now, we define the PerformImagePathInference function, which performs inference by taking an image as input. The function arguments include imgPath, which signifies the image file name, and variables net_height, net_width for the network's input size. In this tutorial, we use 640 for both net_height and net_width.

This function takes the path of an image, resizes it to 640x640, and then performs inference. The output dimensions are [1,8400,116], [1,32,160,160]. Subsequently, non-max suppression is also performed.

// Exports function for DLL, performs inference on an single image file.
__declspec(dllexport) int PerformImagePathInference(char* imgPath,
int net_height,
int net_width) {
real_net_height = net_height; // Set the global variable for network input height.
real_net_width = net_width; // Set the global variable for network input width.

// Read the input image from the provided path.
cv::Mat input_img = cv::imread(imgPath);
if (input_img.empty()) { // Check if the image was successfully read.
std::cout << "Could not read the image: " << imgPath << std::endl;
return -1; // Return error if image read fails.
}

// Resize the image to match the network input dimensions.
cv::resize(input_img, input_img, cv::Size(net_width, net_height));

// Convert the color space from BGR to RGB, which is expected by most models.
cv::cvtColor(input_img, input_img, cv::COLOR_BGR2RGB);

// Normalize the image by converting its pixel values to float and scaling down by 255.
input_img.convertTo(input_img, CV_32FC3, 1.0f / 255.0f);

// Convert the OpenCV image to a Torch tensor.
torch::Tensor imgTensor =
torch::from_blob(input_img.data, {net_width, net_height, 3})
.to(device_type);

// Permute dimensions to match the model's expected input [C, H, W] format
imgTensor = imgTensor.permute({2, 0, 1}).contiguous();
// Add a batch dimension.
imgTensor = imgTensor.unsqueeze(0);

// Prepare the tensor for model input.
std::vector<torch::jit::IValue> inputs;
imgTensor.to(device_type);

inputs.push_back(std::move(imgTensor));
try {
// Enable inference mode for efficiency.
torch::InferenceMode guard(true);
// Forward pass: run the model with the input tensor.
torch::jit::IValue output = network.forward(inputs);
// Extract predictions.
auto preds = output.toTuple()->elements()[0].toTensor();

// Model-specific adjustments (e.g., YOLOv8 requires transposing).
if (_modelName == "yolov8") {
preds = preds.transpose(1, 2).contiguous();
}
// Extract segmentation predictions if present.
seg_pred = output.toTuple()->elements()[1].toTensor();

// Apply non-maximum suppression to filter out overlapping detections.
non_max_suppression(preds, _score_thresh, _iou_thresh);

// Return the number of detections.
int return_size = dets_vec[0].sizes()[0];
return return_size;

} catch (const c10::Error& e) {
std::cerr << e.what() << std::endl;
return -1; // Return error on exception.
}
}

Additionally, we define the PerformFrameInference function, which can perform inference directly from image footage or video frames from a camera, different from the PerformImagePathInference function. Unlike using char *imgPath, it takes uchar* inputData as input, allowing the address of the image frame from C# to be passed directly to the performFrameInference function.

// Performs inference on image data provided in memory, useful for video or webcam streams.
__declspec(dllexport) int PerformFrameInference(uchar* inputData,
int net_height,
int net_width) {
real_net_height = net_height; // Global network input height.
real_net_width = net_width; // Global network input width.

// Create an OpenCV Mat from the raw input data.
cv::Mat input_img2 = cv::Mat(net_height, net_width, CV_8UC3, inputData);
// Convert BGR to RGB.
cv::cvtColor(input_img2, input_img2, cv::COLOR_BGR2RGB);
// Normalize the image by converting its pixel values to float and scaling down by 255.
input_img2.convertTo(input_img2, CV_32FC3, 1.0f / 255.0f);
// Convert the OpenCV Mat to a Torch tensor.
torch::Tensor imgTensor =
torch::from_blob(input_img2.data, {net_width, net_height, 3})
.to(device_type);
// Adjust tensor dimensions to match model's input [C, H, W] format.
imgTensor = imgTensor.permute({2, 0, 1}).contiguous();
// Add a batch dimension.
imgTensor = imgTensor.unsqueeze(0);
imgTensor.to(device_type);

// Prepare for model input.
std::vector<torch::jit::IValue> inputs;
inputs.emplace_back(std::move(imgTensor));

try {
// Enable inference mode.
torch::InferenceMode guard(true);
// Forward pass with the provided data.
torch::jit::IValue output = network.forward(inputs);

// Process the model's output (similar to the previous function).
auto preds = output.toTuple()->elements()[0].toTensor();
if (_modelName == "yolov8") {
preds = preds.transpose(1, 2).contiguous();
}
seg_pred = output.toTuple()->elements()[1].toTensor();
non_max_suppression(preds, _score_thresh, _iou_thresh);

// Check if there are any detections.
if (dets_vec.size() == 0) {
return 0; // No detections found.
} else {
// Return the number of detections.
torch::Tensor det = dets_vec[0];
int size = det.sizes()[0];
return size; // Return the number of detections.
}

} catch (const c10::Error& e) {
std::cerr << e.what() << std::endl;
return -1; // Return error on exception.
}
}

We’ll start by defining the PopulateYoloObjectsArray function to organize the bounding boxes and segmentation results inferred using the YOLOv8 segmentation network.

The YoloObject* objects will be passed in the form of a C# array and processed in C++, making them available for use back in C#. org_height and org_width correspond to the original photo's height and width. Since the net_width and net_height used in this tutorial are 640, org_height and org_width are necessary to correspond to the original photo.

In the PopulateYoloObjectsArray function, there are several potential bottlenecks when detecting many objects. We will skip these since this is a tutorial. If you have any suggestions for improving these bottlenecks, I would appreciate it if you could leave a comment.

// Exports the function for DLL use, intended to organize detected objects and segmentation results.
__declspec(dllexport) void PopulateYoloObjectsArray(YoloObject* objects,
int org_height,
int org_width) {
// Early return if no detections were made.
if (dets_vec.size() == 0) {
return;
}
// Access the first tensor in the detections vector.
torch::Tensor det = dets_vec[0];
// Get the number of detections.
int size = det.sizes()[0];

// Initialize an empty segmentation map with the original image dimensions.
total_seg_map = cv::Mat(org_height, org_width, CV_8UC3, cv::Scalar(0, 0, 0));

// Iterate over each detection.
for (int i = 0; i < size; i++) {
// Scale bounding box coordinates from the network size to the original image size.
float left = det[i][0].item().toFloat() * org_width / real_net_width; // Ensure left is within image bounds.
left = std::max(0.0f, left); // Ensure left is within image bounds.
float top = det[i][1].item().toFloat() * org_height / real_net_height;
top = std::max(top, 0.0f); // Ensure top is within image bounds.
float right = det[i][2].item().toFloat() * org_width / real_net_width;
right = std::min(right, (float)(org_width - 1)); // Ensure right does not exceed image width.
float bottom = det[i][3].item().toFloat() * org_height / real_net_height;
bottom = std::min(bottom, (float)(org_height - 1)); // Ensure bottom does not exceed image height.
float score = det[i][4].item().toFloat(); // Get the detection confidence score.

// Assign detection properties to the objects array.
objects[i].left = left;
objects[i].top = top;
objects[i].right = right;
objects[i].bottom = bottom;
objects[i].score = score;

int classID; // Variable to store class ID
torch::Tensor seg_rois; // Tensor to hold segmentation regions of interest.

// Check if the model is yolov8.
if (_modelName == "yolov8") {
classID = predLoc[i].item().toInt(); // Extract class ID.
seg_rois = det[i].slice(0, 5, det[i].sizes()[0]); // Extract segmentation ROI.
objects[i].classID = classID;
} else {
throw std::runtime_error("Model name is not valid");
}

// Prepare segmentation mask.
seg_rois = seg_rois.view({1, 32});
seg_pred = seg_pred.to(torch::kCPU);
seg_pred = seg_pred.view({1, 32, -1});
auto final_seg = torch::matmul(seg_rois, seg_pred).view({1, 160, 160});
final_seg = final_seg.sigmoid(); // Apply sigmoid to get mask probabilities.
final_seg =
((final_seg > _seg_thresh) * 255).clamp(0, 255).to(torch::kCPU).to(torch::kU8);
// Convert probabilities to binary mask.
cv::Mat seg_map(160, 160, CV_8UC1, final_seg.data_ptr()); // Resize to original image size.
cv::Mat seg_map2;
cv::resize(seg_map, seg_map2, cv::Size(org_width, org_height),
cv::INTER_NEAREST);
cv::Mat seg_map_color;
cv::cvtColor(seg_map2, seg_map_color, cv::COLOR_GRAY2BGR); // Convert grayscale to BGR.

// Colorize the segmentation map.
for (int y = 0; y < seg_map_color.rows; y++) {
for (int x = 0; x < seg_map_color.cols; x++) {
if (seg_map_color.at<cv::Vec3b>(y, x)[0] > 0) {
seg_map_color.at<cv::Vec3b>(y, x) = colors[classID]; // Apply class-specific color.
} else
seg_map_color.at<cv::Vec3b>(y, x) = cv::Vec3b(0, 0, 0); // Set background to black.
}
}
// Combine current object's segmentation with the total segmentation map.
cv::bitwise_or(total_seg_map, seg_map_color, total_seg_map);

// Optional: Save segmentation map for debugging.
/*std::string path =
"seg_map_" + std::to_string(i) + ".png";
cv::imwrite(path, total_seg_map);*/
}
// Segmap is shared among all objects
for (int i = 0; i < size; i++) {
objects[i].seg_map = total_seg_map.data;
}
}

Finally, we’ll define the FreeAllocatedMemory function to reclaim resources.

// Exports the function for DLL, designed to free up resources used during inference.
__declspec(dllexport) void FreeAllocatedMemory() {
dets_vec.clear(); // Clear the detections vector to free up memory.

// If the model was loaded onto a CUDA (GPU) device, move it back to CPU to free GPU resources.
if (device_type == torch::kCUDA) {
network.to(torch::kCPU);
}

// Reset the network variable, effectively releasing the loaded model from memory.
network = torch::jit::Module();

// Release the OpenCV matrix holding the segmentation map, freeing its memory.
total_seg_map.release();
}

Now, everything is set. You can proceed to build.

The build results will be as follows:

A dll file corresponding to YoloV8DLLProject will be created in the YoloV8DLLProject\YoloV8DLLProject\x64\Release folder.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (13)

Additionally, the dlls necessary for running the dll, corresponding to CUDA, Libtorch, and OpenCV, will be copied to YoloV8DLLProject\YoloV8DLLProject\x64\dll by a Post-Build Event.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (14)

All files in both directories will be copied to the C# project folder.

First, create a .Net Framework project. In this tutorial, we are using .Net Framework 4.7.2. The project name is set to YoloCSharpInference.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (15)

Then, install OpenCV in C# from Tools -> NuGet Package Manager -> Manage Nuget Packages for solution.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (16)

Next, download OpenCvSharp-4.9.0–20240105.zip from https://github.com/shimat/opencvsharp/releases, and copy the OpenCvSharpExtern.dll file from NativeLib/win/x64 to the current C# project folder.

Also, copy all files from YoloV8DLLProject\YoloV8DLLProject\x64\dll and all generated files from YoloV8DLLProject\YoloV8DLLProject\x64\Release to the current C# project folder.

Then, check Allow Unsafe Code for pointer operations. Select Release in Configuration, and specify the Platform as x64.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (17)
Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (18)
Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (19)

If you can’t choose x64 then Build-> Configuration Manager, after changing <New…> then you can make x64.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (20)

and check solution Properties -> Build.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (21)

Lastly, also copy yolov8s-seg.torchscript to the current C# project folder.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (22)

Then, add all the copied files to the C# project by using Add -> Existing Item.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (23)

After building, for smooth program execution, select all the copied files, right-click, and in the Copy to Output directory item, check Copy Always. This completes the setup for using the DLL in C#.

Now, we will define the DLL load methods and C# methods for using the YOLOv8 Segmentation model, created as a C++ DLL, in C#.

First, import the necessary modules.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.IO;
using OpenCvSharp;
using OpenCvSharp.Internal;
using static System.Net.Mime.MediaTypeNames;

Then, define the Namespace and Class. For convenience in this tutorial, all methods will be defined in the class containing the Main method.

The namespace is yoloCSharpinference, and the class name is Program.

namespace YoloCSharpInference
{
internal class Program
{
// The entry point of the application.
static void Main(string[] args)
{
// Currently empty. This is where the program starts execution.
}
}
}

We define various variables in C# for using YoloV8 Segmentation. It’s important to note that when declaring the YoloObject structure, we use [StructLayout(LayoutKind.Sequential)] to ensure an explicit memory layout, interoperation with unmanaged code, binary compatibility, and to avoid memory corruption.

// Path to the model file.
public static string modelPath = "yolov8s-seg.torchscript";

// Dimensions of the input expected by the network.
public static int net_height = 640;
public static int net_width = 640;

// Dimensions of the original image.
private static int orgHeight;
private static int orgWidth;

// Enum to represent the computation device type.
enum DeviceType
{
CPU = 0,
CUDA = 1
}

// Device number to be used for inference.
private static int deviceNum = (int)DeviceType.CUDA;

// Thresholds for object detection and segmentation.
private static float score_thresh = 0.4f;
private static float iou_thresh = 0.5f;
private static float seg_thresh = 0.5f;

// Number of objects detected.
private static int numObjects;

// Array to hold detected objects.
private static YoloObject[] YoloObjectArray;

// Name of the DLL containing the inference code.
private const string dll = "YoloV8DLLProject";

// Reasons of using [StructLayout(LayoutKind.Sequential)]
// 1. Defines Explicit Memory Layout
// 2. Interop with Unmanaged Code
// 3. Binary Compatibility
// 4. Avoid Memory Corruption
[StructLayout(LayoutKind.Sequential)]
public struct YoloObject
{
// Bounding box coordinates, score, class ID, and pointer to the segmentation map.
public float left;
public float top;
public float right;
public float bottom;
public float score;
public float classID;
public IntPtr seg_map;

// Constructor for the YoloObject struct.
public YoloObject(float left, float top, float right, float bottom, float score, float classID, IntPtr seg_map)
{
this.left = left;
this.top = top;
this.right = right;
this.bottom = bottom;
this.score = score;
this.classID = classID;
this.seg_map = seg_map;
}
};

We define methods for loading the DLL file created in C++.

// Import SetDevice function from the DLL to set the computation device (CPU or GPU) for inference.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern void SetDevice(int deviceNum);

// Import SetThreshold to adjust detection and segmentation sensitivity.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern void SetThreshold(float score_tresh, float iou_thresh, float seg_thresh);

// Use LoadModel to load the model for object detection.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern int LoadModel(string modelPath, int deviceNum);

// PerformInference runs detection on a single image.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern int PerformImagePathInference(string inputData, int net_height, int net_width);

// PerformFrameInference handles detection on video or webcam streams.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern int PerformFrameInference(IntPtr inputData, int net_height, int net_width);

// PopulateObjectsArray formats detection results for C# use.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern void PopulateYoloObjectsArray(IntPtr objects, int org_height, int org_width);

// FreeResources clears memory used during detection to optimize performance.
[DllImport(dll, CallingConvention = CallingConvention.Cdecl)]
private static extern void FreeAllocatedMemory();

// AllocConsole opens a new console window for debugging output.
[DllImport("kernel32.dll", SetLastError = true)]
private static extern bool AllocConsole();

We write a method for PerformFrameInference used in the C++ DLL.

In C++, PerformFrameInference accepted a value corresponding to uchar*, but in C#, we pass the address through (IntPtr)image.DataPointer.

// Calls the PerformFrameInference function 
// passing the current frame's data pointer and
// dimensions, and returns the number of detected objects.
static unsafe int ReturnFramePerformance(Mat image, int net_height, int net_width)
{
numObjects = PerformFrameInference((IntPtr)image.DataPointer, net_height, net_width);
return numObjects;
}

We define the RunPopulateobjectArray function.

After Yolo inference is performed, the resulting bounding box information and segmentation map data are stored in an array of YoloObject. We use fixed to prevent the memory from being rearranged by the garbage collector.

// Allocates an array for detected YOLO objects and populates it
// by calling PopulateObjectsArray from the DLL.
// The fixed keyword is used to pin the YoloObjectArray in memory,
// providing an unchanging pointer to PopulateObjectsArray.
static unsafe void RunPopulateYoloObjectsArray(int numProPosals, int org_height, int org_width)
{
// Initialize the array with the number of detections.
YoloObjectArray = new YoloObject[numProPosals];

// Pin the YoloObjectArray in memory.
fixed (YoloObject* o = YoloObjectArray)
{
// Populate the array with detection data.
PopulateYoloObjectsArray((IntPtr)o, org_height, org_width);
}
}

We specify the tryFrameInference function to perform segmentation on frames received from a video or webcam. If a videoPath exists, it reads the video to perform YOLOv8 inference, and if the videoPath is entered as "", it receives frames from the webcam to perform YOLOv8 inference.

// A function to perform frame-by-frame inference on video data, 
// showing the results in a window.
static void tryFrameInference(string videoPath)
{
VideoCapture capture; // OpenCV video capture object.

// Initialize the VideoCapture object with a file path or a webcam index.
if (!string.IsNullOrEmpty(videoPath) && System.IO.File.Exists(videoPath))
{
capture = new VideoCapture(videoPath); // Load video from path.
}
else
{
capture = new VideoCapture(0); // Default to the first webcam.
}
if (!capture.IsOpened()) // Check if the video source was successfully opened.
{
Console.WriteLine("Failed to open video source.");
return;
}

// Initial setup for performing inference, including setting the inference device,
// loading the model, and configuring detection thresholds.
SetDevice(deviceNum);
Console.WriteLine("Device Set _ " + " Device info : " + deviceNum);

// Attempt to load the model for inference. If unsuccessful, exit the function.
int loadModelVal = LoadModel(modelPath, deviceNum);
if (loadModelVal == -1)
{
Console.WriteLine("Model not loaded");
return;
}
Console.WriteLine("Model Loaded ? : " + loadModelVal);
// Set the thresholds for detection confidence,
// IoU (Intersection over Union), and segmentation mask confidence.
SetThreshold(0.3f, 0.3f, 0.3f);
Console.WriteLine("Threshold setting fishined .. ");

{
if (!capture.IsOpened())
{
Console.WriteLine("Camera not found");
return;
}
// Begin capturing and processing video frames.
// Create a window for displaying the video frames.
var window = new Window("capture");
// Initialize a Mat object to hold individual frames from the video.
var frame = new Mat();
// Initialize a Mat object for the processed frame.
var img = new Mat();
// Initialize a Mat object for holding segmentation maps.
var segRegion = new Mat();
// Stopwatch for measuring frame processing time.
Stopwatch stopwatch = new Stopwatch();
while (true)
{
// Read the next frame from the video capture device.
capture.Read(frame);
// Restart the stopwatch
stopwatch.Restart();
// Check if the captured frame is empty (end of video or error).
if (frame.Empty())
{
Console.WriteLine("Blank frame grabbed");
break; // Exit the loop if a blank frame is encountered.
}
// Resize the captured frame to match the input size expected by the network model.
Cv2.Resize(frame, img, new Size(net_width, net_height));
// Store the original dimensions of the frame for later use.
orgWidth = frame.Width;
orgHeight = frame.Height;
// Perform inference on the resized frame and obtain the number of detected objects.
numObjects = ReturnFramePerformance(img, net_height, net_width);
// Check the inference result and skip processing if no objects were detected or if an error occurred.
if (numObjects == 0)
{
// Skip the rest of the loop iteration and process the next frame.
continue;
}
else if (numObjects == -1)
{
Console.WriteLine("Error in inference");
continue;
}

// Populate the YoloObjectArray with detection data for further processing.
RunPopulateYoloObjectsArray(numObjects, orgHeight, orgWidth);

// Blend detected objects' segmentation maps onto the original frame for visualization.
double alpha = 0.8;
double beta = 0.2;
double gamma = 0.0;
Rect boxRect;
var obj2 = YoloObjectArray[0];
segRegion = new Mat(frame.Rows, frame.Cols, MatType.CV_8UC3, (IntPtr)obj2.seg_map);

// Iterate over each detected object to draw bounding boxes and blend segmentation maps.
for (int i = 0; i < YoloObjectArray.Length; i++)
{
var obj = YoloObjectArray[i];
// Draw a rectangle around the detected object.
Cv2.Rectangle(frame, new Point((int)(obj.left), (int)(obj.top)), new Point((int)(obj.right), (int)(obj.bottom)), Scalar.Red, 2);
// Define the region of the frame corresponding to the current object's bounding box.
boxRect = new Rect((int)(obj.left), (int)(obj.top), (int)(obj.right - obj.left), (int)(obj.bottom - obj.top));
//if (boxRect.Left < 0 || boxRect.Top < 0 || boxRect.Right > frame.Width || boxRect.Bottom > frame.Height)
//{
// Console.WriteLine("BoxRect is out of image bounds");
// return;
//}
//if (boxRect.Width <= 0 || boxRect.Height <= 0)
//{
// Console.WriteLine("BoxRect has invalid dimensions");
// return;
//}
// Blend the segmentation map for the detected object with the corresponding region of the frame.
Cv2.AddWeighted(frame[boxRect], alpha, segRegion[boxRect], beta, gamma, frame[boxRect]);

}
// Measure and display the frame processing time (FPS).
stopwatch.Stop(); // Stop the stopwatch.
// Calculate frames per second (FPS).
double fps = 1000.0 / stopwatch.ElapsedMilliseconds;
// Display the FPS on the frame.
Cv2.PutText(frame, $"FPS: {fps:0.0}", new Point(10, 30), HersheyFonts.HersheySimplex, 1, Scalar.Green, 2);
window.ShowImage(frame); // Show the processed frame in the window.
// Exit the loop if a key is pressed.
int key = Cv2.WaitKey(1);
if (key >= 0)
{

// Clean up and release resources after exiting the loop.
frame.Dispose();
segRegion.Dispose();
img.Dispose();
window.Dispose();
window.Close();
break;
}
}
}
capture.Dispose();
FreeAllocatedMemory(); // Free resources allocated by the DLL.
Console.WriteLine("Resources Freed");
}

And we define tryImageInference to perform inference on a single image.

static void tryImageInference()
{
// Initial setup for performing inference,
// including setting the inference device,
// loading the model, and configuring detection thresholds.
SetDevice(deviceNum);
Console.WriteLine("Device Set _ " + " Device info : " + deviceNum);

// Attempt to load the model for inference. If unsuccessful, exit the function.
int loadModelVal = LoadModel(modelPath, deviceNum);
if (loadModelVal == -1)
{
Console.WriteLine("Model not loaded");
return;
}
Console.WriteLine("Model Loaded ? : " + loadModelVal);
string img_path = "image.jpg";
// Read an Image
Mat image = Cv2.ImRead(img_path);
if (image.Empty())
{
Console.WriteLine("Image not found"); // Image reading occurs an error, then Exit.
return;
}
// Store the original dimensions of the frame for later use.
orgWidth = image.Width;
orgHeight = image.Height;

// Set the thresholds for detection confidence,
// IoU (Intersection over Union), and segmentation mask confidence.
SetThreshold(0.3f, 0.3f, 0.3f);
Console.WriteLine("Threshold setting fishined .. ");
// Perform inference on the image
numObjects = PerformImagePathInference(img_path, net_height, net_width);

// Check the inference result and skip processing if no objects were detected or if an error occurred.
if (numObjects == 0)
{
// Skip the rest of the loop iteration
Console.WriteLine("No objects detected");
return;
}
else if (numObjects == -1)
{
Console.WriteLine("Error in inference");
return;
}
Console.WriteLine("PerformInference Implemented..");
Console.WriteLine("numObjects : " + numObjects);

// Populate the YoloObjectArray with detection data for further processing.
RunPopulateYoloObjectsArray(numObjects, orgHeight, orgWidth);
Console.WriteLine("RunPopulateObjectsArray Implemented..");

Console.WriteLine($"Num Objects : {numObjects}");

// Blend detected objects' segmentation maps onto the original frame for visualization.
double alpha = 0.8;
double beta = 0.2;
double gamma = 0.0;
Rect boxRect;
var obj2 = YoloObjectArray[0];
// get seg_map's memory
IntPtr ptr = (IntPtr)obj2.seg_map;
int rows = image.Rows;
int cols = image.Cols;
int type = MatType.CV_8UC3;
var segRegion = new Mat(rows, cols, type, ptr);

// Iterate over each detected object to draw bounding boxes and blend segmentation maps.
for (int i = 0; i < YoloObjectArray.Length; i++)
{
var obj = YoloObjectArray[i];
// Draw a rectangle around the detected object.
Cv2.Rectangle(image, new Point((int)(obj.left), (int)(obj.top)), new Point((int)(obj.right), (int)(obj.bottom)), Scalar.Red, 2);
// Define the region of the frame corresponding to the current object's bounding box.
boxRect = new Rect((int)(obj.left), (int)(obj.top), (int)(obj.right - obj.left), (int)(obj.bottom - obj.top));

// Check if the bounding box is within the bounds of the original image.
if (boxRect.Left < 0 || boxRect.Top < 0 || boxRect.Right > image.Width || boxRect.Bottom > image.Height)
{
// Exit the function if the bounding box is out of bounds, preventing errors.
Console.WriteLine("BoxRect is out of image bounds");
return;
}
// Validate the dimensions of the bounding box to ensure they are positive.
if (boxRect.Width <= 0 || boxRect.Height <= 0)
{
// Exit the function if the bounding box has invalid dimensions.
Console.WriteLine("BoxRect has invalid dimensions");
return;
}
// Print details of the bounding box and the image for debugging or informational purposes.
Console.WriteLine(boxRect.Left + " " + boxRect.Right + " " + boxRect.TopLeft + " " + " BoxRect : " + boxRect.Size.Height + " " + boxRect.Size.Width + " Image " + image.Height + " " + image.Width + " segRegion : " + segRegion.Height + " " + image.Width);

// Blend the segmentation map for the detected object with the corresponding region of the frame.
Cv2.AddWeighted(image[boxRect], alpha, segRegion[boxRect], beta, gamma, image[boxRect]);

}
// Resize the processed image to a quarter of its original size for display.
// This is hard-coded and may be adjusted depending on display or performance requirements.
var imsize = new Size(image.Width / 4, image.Height / 4); // hard coding
Cv2.Resize(image, image, imsize);

// Display the processed image in a window named "image".
Cv2.ImShow("image", image);

// Wait indefinitely for a key press before proceeding.
// This allows users to view the processed image.
int keyPressed = Cv2.WaitKey(0); // This will wait indefinitely for a key press

// Destroy all OpenCV windows created during the execution to free resources.
Cv2.DestroyAllWindows();

// Dispose of the Mat objects holding the original and segmentation images to free memory.
image.Dispose();
segRegion.Dispose();

// Call a function to free any additional resources used during processing,
// such as loaded models or temporary data.
FreeAllocatedMemory();
Console.WriteLine("Resources Freed");
}

Lastly, in the Main method, we define code to perform inference on a single image, code to perform inference on video frames, and code to perform inference using a webcam.

// Entry point of the console application.
static void Main(string[] args)
{
// Begin inference on a single image.
Console.WriteLine("Single Image Inference");
tryImageInference(); // Processes a single image through the model.

// Start processing a video file for inference frame by frame.
Console.WriteLine("Video Frame Inference");
tryFrameInference("video.mp4"); // Applies model to each frame of "video.mp4".

// Switch to real-time inference using webcam footage.
// A Webcam should be prepared beforehand.
Console.WriteLine("Webcam Inference");
// Uses webcam stream for model inference, interpreting "" to select default camera.
tryFrameInference("");
}

The results of the video inference are as follows.

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (24)

The results of the video inference are as follows:

Integrating YOLOv8 Segmentation into your C# projects using C++ DLLs is a game-changer for developers looking to leverage the power of deep learning in their applications. By following this comprehensive guide, you’ll be well-equipped to tackle complex computer vision challenges and create innovative solutions that push the boundaries of what’s possible with YOLOv8 Segmentation.

Don’t miss out on the opportunity to enhance your C# projects with the precision and efficiency of YOLOv8 Segmentation. Start your journey today and unlock the full potential of this powerful deep learning model!

Keywords: YOLOv8 Segmentation, C#, C++, DLL, Libtorch, CUDA, OpenCV, Deep Learning, Object Detection, Image Segmentation, Computer Vision, Unreal Engine, Unity

You can refer to the source code at:

https://github.com/kimwoonggon/Cpp_Libtorch_DLL_YoloV8Segmentation_CSharpProject

Related Link
https://www.intel.com/content/www/us/en/developer/articles/technical/developing-openvino-object-detection-unity-setup.html
https://www.intel.com/content/www/us/en/developer/articles/training/in-game-style-transfer-leveraging-unity-part-3.html

Integrating YOLOv8 Segmentation with C# and C++ DLL Using CUDA and Libtorch: A Step-by-Step… (2024)

FAQs

How to use c# dll in c++? ›

How to use C# dll in C++?
  1. call a static method in C++. in C# code, it is `com.myApp.Initialiser.initialise(object, string, int)`
  2. create a new instance in C++. in C# code it is `new com.myApp.requests.MyRequest()`
  3. Import IKVM's dll properly so that the code in myDLL. dll works.
Jun 21, 2018

Can you use ac.dll in C++? ›

You could write classes in a C# DLL and call them in C++, similarly, you could write functions in a C++ DLL and call them in C#, but I think you could write in C++ what needs to be called from C++, And write what needs to be called in C# in C#.

How to create a DLL in C# using command prompt? ›

Steps to Create DLL file with C# Compiler

I have added the same method as above which prints “Hello From GeeksForGeeks”. Step 2: Open Command Prompt or Terminal where the CS file is saved. Compile the program with the CSC compiler and make sure you add the target file as a library which will generate a DLL file.

How to create a DLL library in C and then use it with C#? ›

Introduction
  1. Start Visual Studio . ...
  2. Go to File->New->Project.
  3. Select Visual C++ Project, and from the “Templates”, select “Win32 Project”.
  4. Give the name to your project. ...
  5. Press OK.
  6. Select DLL from “Application Type” (“Application Settings” tab).

What is the difference between DLL and EXE in C++? ›

The procedure coded inside the exe file can be executed by the file itself once it is run. . dll files - They contain methods and code just like the exe files but don't have a main() method. The code and the methods have to be used from a running exe application.

Can a C++ DLL be decompiled? ›

You can likely decompile a DLL to C++ code that has equivalent effects to the DLL. You really can't get back to the original source code. Compilation is a lossy process. The compiler will make use of everything it knows to remove as much cruft and abstraction as it can from the final result.

What is the difference between lib and DLL in Visual Studio C++? ›

Static Library (.lib) vs Dynamic Library (.dll)

LIB file) (or archive) contains code that is linked to users' programs at compile time. The executable file generated keeps its own copy of the library code. A dynamic library (. dll) (or shared library) contains code designed to be shared by multiple programs.

Can C++ use C# libraries? ›

Yes, a C# Dll can be used from C++. But, one must keep in mind a C# assembly can only be loaded and executed by the dotnet runtime. This means, the C++ program must first “manually” create an instance of the dotnet runtime, and then load the C# Dll in that context for execution.

Can C++ use DLL? ›

Then it shows how to use the DLL from another C++ app. DLLs (also known as shared libraries in UNIX-based operating systems) are one of the most useful kinds of Windows components. You can use them as a way to share code and resources, and to shrink the size of your apps.

How to use DLL in Dev C++? ›

That all depends. If you have a static library that was generated with the DLL then you can just statically link that that and it will generate all the DLL import code for you. If you don't then you'll need to use LoadLibrary() to load the DLL and import the functions at runtime.

How to call C# from unmanaged C++? ›

Create a Managed C++ DLL and reference it in your C# project. This exports your function ShowMessageBox in an unmanaged format. Inside the exported function, call the Managed C++ method which calls your C# methods. Create your unmanaged C or C++ DLL or EXE and call the exposed C++ method in your managed code.

References

Top Articles
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 5519

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.