Class Documentation and C++ Reference

This section provides a breakdown of the cpp classes and what each of their functions provide. It is partially generated and augomented from the Doxygen autodoc content. You can also go directly to the raw doxygen docs.

Below is a diagram that provides insights on the relationship between Vulkan Kompute objects and Vulkan resources, which primarily encompass ownership of either CPU and/or GPU memory.

../_images/kompute-vulkan-architecture.jpg

Manager

The Kompute Manager provides a high level interface to simplify interaction with underlying kp::Sequences of kp::Operations.

../_images/kompute-vulkan-architecture-manager.jpg
class kp::Manager

Base orchestrator which creates and manages device and child components

Public Functions

Manager()

Base constructor and default used which creates the base resources including choosing the device 0 by default.

Manager(uint32_t physicalDeviceIndex, const std::vector<uint32_t> &familyQueueIndices = {})

Similar to base constructor but allows the user to provide the device they would like to create the resources on.

Parameters
  • physicalDeviceIndex: The index of the physical device to use

  • familyQueueIndices: (Optional) List of queue indices to add for explicit allocation

  • totalQueues: The total number of compute queues to create.

Manager(std::shared_ptr<vk::Instance> instance, std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, uint32_t physicalDeviceIndex)

Manager constructor which allows your own vulkan application to integrate with the vulkan kompute use.

Parameters
  • instance: Vulkan compute instance to base this application

  • physicalDevice: Vulkan physical device to use for application

  • device: Vulkan logical device to use for all base resources

  • physicalDeviceIndex: Index for vulkan physical device used

~Manager()

Manager destructor which would ensure all owned resources are destroyed unless explicitly stated that resources should not be destroyed or freed.

std::shared_ptr<Sequence> sequence(std::string sequenceName = KP_DEFAULT_SESSION, uint32_t queueIndex = 0)

Get or create a managed Sequence that will be contained by this manager. If the named sequence does not currently exist, it would be created and initialised.

Return

Shared pointer to the manager owned sequence resource

Parameters
  • sequenceName: The name for the named sequence to be retrieved or created

  • queueIndex: The queue to use from the available queues

template<typename T, typename ...TArgs>
void evalOp(std::vector<std::shared_ptr<Tensor>> tensors, std::string sequenceName, TArgs&&... params)

Function that evaluates operation against named sequence.

Parameters
  • tensors: The tensors to be used in the operation recorded

  • sequenceName: The name of the sequence to be retrieved or created

  • TArgs: Template parameters that will be used to initialise Operation to allow for extensible configurations on initialisation

template<typename T, typename ...TArgs>
void evalOpDefault(std::vector<std::shared_ptr<Tensor>> tensors, TArgs&&... params)

Function that evaluates operation against a newly created sequence.

Parameters
  • tensors: The tensors to be used in the operation recorded

  • TArgs: Template parameters that will be used to initialise Operation to allow for extensible configurations on initialisation

template<typename T, typename ...TArgs>
void evalOpAsync(std::vector<std::shared_ptr<Tensor>> tensors, std::string sequenceName, TArgs&&... params)

Function that evaluates operation against named sequence asynchronously.

Parameters
  • tensors: The tensors to be used in the operation recorded

  • sequenceName: The name of the sequence to be retrieved or created

  • params: Template parameters that will be used to initialise Operation to allow for extensible configurations on initialisation

template<typename T, typename ...TArgs>
void evalOpAsyncDefault(std::vector<std::shared_ptr<Tensor>> tensors, TArgs&&... params)

Operation that evaluates operation against default sequence asynchronously.

Parameters
  • tensors: The tensors to be used in the operation recorded

  • params: Template parameters that will be used to initialise Operation to allow for extensible configurations on initialisation

void evalOpAwait(std::string sequenceName, uint64_t waitFor = UINT64_MAX)

Operation that awaits for named sequence to finish.

Parameters
  • sequenceName: The name of the sequence to wait for termination

  • waitFor: The amount of time to wait before timing out

void evalOpAwaitDefault(uint64_t waitFor = UINT64_MAX)

Operation that awaits for default sequence to finish.

Parameters
  • tensors: The tensors to be used in the operation recorded

  • params: Template parameters that will be used to initialise Operation to allow for extensible configurations on initialisation

std::shared_ptr<Tensor> tensor(const std::vector<float> &data, Tensor::TensorTypes tensorType = Tensor::TensorTypes::eDevice, bool syncDataToGPU = true)

Function that simplifies the common workflow of tensor creation and initialization. It will take the constructor parameters for a Tensor and will will us it to create a new Tensor and then create it. The tensor memory will then be managed and owned by the manager.

Return

Initialized Tensor with memory Syncd to GPU device

Parameters
  • data: The data to initialize the tensor with

  • tensorType: The type of tensor to initialize

  • syncDataToGPU: Whether to sync the data to GPU memory

void rebuild(std::vector<std::shared_ptr<kp::Tensor>> tensors, bool syncDataToGPU = true)

Function that simplifies the common workflow of tensor initialisation. It will take the constructor parameters for a Tensor and will will us it to create a new Tensor. The tensor memory will then be managed and owned by the manager.

Parameters
  • tensors: Array of tensors to rebuild

  • syncDataToGPU: Whether to sync the data to GPU memory

void rebuild(std::shared_ptr<kp::Tensor> tensor, bool syncDataToGPU = true)

Function that simplifies the common workflow of tensor initialisation. It will take the constructor parameters for a Tensor and will will us it to create a new Tensor. The tensor memory will then be managed and owned by the manager.

Parameters
  • tensors: Single tensor to rebuild

  • syncDataToGPU: Whether to sync the data to GPU memory

void destroy(std::shared_ptr<kp::Tensor> tensor)

Destroy owned Vulkan GPU resources and free GPU memory for single tensor.

Parameters
  • tensors: Single tensor to rebuild

void destroy(std::vector<std::shared_ptr<kp::Tensor>> tensors)

Destroy owned Vulkan GPU resources and free GPU memory for vector of tensors.

Parameters
  • tensors: Single tensor to rebuild

void destroy(std::vector<std::shared_ptr<kp::Sequence>> sequences)

Destroy owned Vulkan GPU resources and free GPU memory for vector of sequences. Destroying by sequence name is more efficent and hence recommended instead of by object.

Parameters
  • sequences: Vector for shared ptrs with sequences to destroy

void destroy(std::shared_ptr<kp::Sequence> sequence)

Destroy owned Vulkan GPU resources and free GPU memory for single sequence. Destroying by sequence name is more efficent and hence recommended instead of by object.

Parameters
  • sequences: Single sequence to rebuild

void destroy(const std::string &sequenceName)

Destroy owned Vulkan GPU resources and free GPU memory for sequence by name.

Parameters
  • sequenceName: Single name of named sequence to destroy

void destroy(const std::vector<std::string> &sequenceNames)

Destroy owned Vulkan GPU resources and free GPU memory for sequences using vector of named sequence names.

Parameters
  • sequenceName: Vector of sequence names to destroy

Sequence

The Kompute Sequence consists of batches of kp::Operations, which are executed on a respective GPU queue. The execution of sequences can be synchronous or asynchronous, and it can be coordinated through its respective vk::Fence.

../_images/kompute-vulkan-architecture-sequence.jpg
class kp::Sequence

Container of operations that can be sent to GPU as batch

Public Functions

Sequence()

Base constructor for Sequence. Should not be used unless explicit intended.

Sequence(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::Queue> computeQueue, uint32_t queueIndex)

Main constructor for sequence which requires core vulkan components to generate all dependent resources.

Parameters
  • physicalDevice: Vulkan physical device

  • device: Vulkan logical device

  • computeQueue: Vulkan compute queue

  • queueIndex: Vulkan compute queue index in device

~Sequence()

Destructor for sequence which is responsible for cleaning all subsequent owned operations.

void init()

Initialises sequence including the creation of the command pool and the command buffer.

bool begin()

Begins recording commands for commands to be submitted into the command buffer.

Return

Boolean stating whether execution was successful.

bool end()

Ends the recording and stops recording commands when the record command is sent.

Return

Boolean stating whether execution was successful.

bool eval()

Eval sends all the recorded and stored operations in the vector of operations into the gpu as a submit job with a barrier.

Return

Boolean stating whether execution was successful.

bool evalAsync()

Eval Async sends all the recorded and stored operations in the vector of operations into the gpu as a submit job with a barrier. EvalAwait() must be called after to ensure the sequence is terminated correctly.

Return

Boolean stating whether execution was successful.

bool evalAwait(uint64_t waitFor = UINT64_MAX)

Eval Await waits for the fence to finish processing and then once it finishes, it runs the postEval of all operations.

Return

Boolean stating whether execution was successful.

Parameters
  • waitFor: Number of milliseconds to wait before timing out.

bool isRecording()

Returns true if the sequence is currently in recording activated.

Return

Boolean stating if recording ongoing.

bool isRunning()

Returns true if the sequence is currently running - mostly used for async workloads.

Return

Boolean stating if currently running.

bool isInit()

Returns true if the sequence has been successfully initialised.

Return

Boolean stating if sequence has been initialised.

void freeMemoryDestroyGPUResources()

Destroys and frees the GPU resources which include the buffer and memory and sets the sequence as init=False.

template<typename T, typename ...TArgs>
bool record(std::vector<std::shared_ptr<Tensor>> tensors, TArgs&&... params)

Record function for operation to be added to the GPU queue in batch. This template requires classes to be derived from the OpBase class. This function also requires the Sequence to be recording, otherwise it will not be able to add the operation.

Parameters
  • tensors: Vector of tensors to use for the operation

  • TArgs: Template parameters that are used to initialise operation which allows for extensible configurations on initialisation.

Tensor

The kp::Tensor is the atomic unit in Kompute, and it is used primarily for handling Host and GPU Device data.

../_images/kompute-vulkan-architecture-tensor.jpg
class kp::Tensor

Structured data used in GPU operations.

Tensors are the base building block in Kompute to perform operations across GPUs. Each tensor would have a respective Vulkan memory and buffer, which would be used to store their respective data. The tensors can be used for GPU data storage or transfer.

Public Types

enum TensorTypes

Type for tensors created: Device allows memory to be transferred from staging buffers. Staging are host memory visible. Storage are device visible but are not set up to transfer or receive data (only for shader storage).

Values:

enumerator eDevice = 0

Type is device memory, source and destination.

enumerator eHost = 1

Type is host memory, source and destination.

enumerator eStorage = 2

Type is Device memory (only)

Public Functions

Tensor()

Base constructor, should not be used unless explicitly intended.

Tensor(const std::vector<float> &data, TensorTypes tensorType = TensorTypes::eDevice)

Default constructor with data provided which would be used to create the respective vulkan buffer and memory.

Parameters
  • data: Non-zero-sized vector of data that will be used by the tensor

  • tensorType: Type for the tensor which is of type TensorTypes

~Tensor()

Destructor which is in charge of freeing vulkan resources unless they have been provided externally.

void init(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device)

Initialiser which calls the initialisation for all the respective tensors as well as creates the respective staging tensors. The staging tensors would only be created for the tensors of type TensorType::eDevice as otherwise there is no need to copy from host memory.

void freeMemoryDestroyGPUResources()

Destroys and frees the GPU resources which include the buffer and memory.

std::vector<float> &data()

Returns the vector of data currently contained by the Tensor. It is important to ensure that there is no out-of-sync data with the GPU memory.

Return

Reference to vector of elements representing the data in the tensor.

float &operator[](int index)

Overrides the subscript operator to expose the underlying data’s subscript operator which in this case would be its underlying vector’s.

Return

Returns the element in the position requested.

Parameters
  • i: The index where the element will be returned from.

uint32_t size()

Returns the size/magnitude of the Tensor, which will be the total number of elements across all dimensions

Return

Unsigned integer representing the total number of elements

std::array<uint32_t, KP_MAX_DIM_SIZE> shape()

Returns the shape of the tensor, which includes the number of dimensions and the size per dimension.

Return

Array containing the sizes for each dimension. Zero means respective dimension is not active.

TensorTypes tensorType()

Retrieve the tensor type of the Tensor

Return

Tensor type of tensor

bool isInit()

Returns true if the tensor initialisation function has been carried out successful, which would mean that the buffer and memory will have been provisioned.

void setData(const std::vector<float> &data)

Sets / resets the vector data of the tensor. This function does not perform any copies into GPU memory and is only performed on the host.

void recordCopyFrom(std::shared_ptr<vk::CommandBuffer> commandBuffer, std::shared_ptr<Tensor> copyFromTensor, bool createBarrier)

Records a copy from the memory of the tensor provided to the current thensor. This is intended to pass memory into a processing, to perform a staging buffer transfer, or to gather output (between others).

Parameters
  • commandBuffer: Vulkan Command Buffer to record the commands into

  • copyFromTensor: Tensor to copy the data from

  • createBarrier: Whether to create a barrier that ensures the data is copied before further operations. Default is true.

void recordCopyFromStagingToDevice(std::shared_ptr<vk::CommandBuffer> commandBuffer, bool createBarrier)

Records a copy from the internal staging memory to the device memory using an optional barrier to wait for the operation. This function would only be relevant for kp::Tensors of type eDevice.

Parameters
  • commandBuffer: Vulkan Command Buffer to record the commands into

  • createBarrier: Whether to create a barrier that ensures the data is copied before further operations. Default is true.

void recordCopyFromDeviceToStaging(std::shared_ptr<vk::CommandBuffer> commandBuffer, bool createBarrier)

Records a copy from the internal device memory to the staging memory using an optional barrier to wait for the operation. This function would only be relevant for kp::Tensors of type eDevice.

Parameters
  • commandBuffer: Vulkan Command Buffer to record the commands into

  • createBarrier: Whether to create a barrier that ensures the data is copied before further operations. Default is true.

void recordBufferMemoryBarrier(std::shared_ptr<vk::CommandBuffer> commandBuffer, vk::AccessFlagBits srcAccessMask, vk::AccessFlagBits dstAccessMask, vk::PipelineStageFlagBits srcStageMask, vk::PipelineStageFlagBits dstStageMask)

Records the buffer memory barrier into the command buffer which ensures that relevant data transfers are carried out correctly.

Parameters
  • commandBuffer: Vulkan Command Buffer to record the commands into

  • srcAccessMask: Access flags for source access mask

  • dstAccessMask: Access flags for destination access mask

  • scrStageMask: Pipeline stage flags for source stage mask

  • dstStageMask: Pipeline stage flags for destination stage mask

vk::DescriptorBufferInfo constructDescriptorBufferInfo()

Constructs a vulkan descriptor buffer info which can be used to specify and reference the underlying buffer component of the tensor without exposing it.

Return

Descriptor buffer info with own buffer

void mapDataFromHostMemory()

Maps data from the Host Visible GPU memory into the data vector. It requires the Tensor to be of staging type for it to work.

void mapDataIntoHostMemory()

Maps data from the data vector into the Host Visible GPU memory. It requires the tensor to be of staging type for it to work.

Algorithm

The kp::Algorithm consists primarily of the components required for shader code execution, including the relevant vk::DescriptorSet relatedresources as well as vk::Pipeline and all the relevant Vulkan resources as outlined in the architectural diagram.

../_images/kompute-vulkan-architecture-algorithm.jpg
class kp::Algorithm

Abstraction for compute shaders that are run on top of tensors grouped via ParameterGroups (which group descriptorsets)

Public Functions

Algorithm()

Base constructor for Algorithm. Should not be used unless explicit intended.

Algorithm(std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, const Constants &specializationConstants = {})

Default constructor for Algorithm

Parameters
  • device: The Vulkan device to use for creating resources

  • commandBuffer: The vulkan command buffer to bind the pipeline and shaders

void init(const std::vector<uint32_t> &shaderFileData, std::vector<std::shared_ptr<Tensor>> tensorParams)

Initialiser for the shader data provided to the algorithm as well as tensor parameters that will be used in shader.

Parameters
  • shaderFileData: The bytes in spir-v format of the shader The Tensors to be used in the Algorithm / shader for The specialization parameters to pass to the function processing

~Algorithm()

Destructor for Algorithm which is responsible for freeing and desroying respective pipelines and owned parameter groups.

void recordDispatch(uint32_t x = 1, uint32_t y = 1, uint32_t z = 1)

Records the dispatch function with the provided template parameters or alternatively using the size of the tensor by default.

Parameters
  • x: Layout X dispatch value

  • y: Layout Y dispatch value

  • z: Layout Z dispatch value

OpBase

The kp::OpBase provides a top level class for an operation in Kompute, which is the step that is executed on a GPU submission. The Kompute operations can consist of one or more kp::Tensor.

../_images/kompute-vulkan-architecture-operations.jpg
class kp::OpBase

Base Operation which provides the high level interface that Kompute operations implement in order to perform a set of actions in the GPU.

Operations can perform actions on tensors, and optionally can also own an Algorithm with respective parameters. kp::Operations with kp::Algorithms would inherit from kp::OpBaseAlgo.

Subclassed by kp::OpAlgoBase, kp::OpTensorCopy, kp::OpTensorSyncDevice, kp::OpTensorSyncLocal

Public Functions

OpBase()

Base constructor, should not be used unless explicitly intended.

OpBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> &tensors)

Default constructor with parameters that provides the bare minimum requirements for the operations to be able to create and manage their sub-components.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that are to be used in this operation

~OpBase()

Default destructor for OpBase class. This OpBase destructor class should always be called to destroy and free owned resources unless it is intended to destroy the resources in the parent class.

void init() = 0

The init function is responsible for setting up all the resources and should be called after the Operation has been created.

void record() = 0

The record function is intended to only send a record command or run commands that are expected to record operations that are to be submitted as a batch into the GPU.

void preEval() = 0

Pre eval is called before the Sequence has called eval and submitted the commands to the GPU for processing, and can be used to perform any per-eval setup steps required as the computation iteration begins. It’s worth noting that there are situations where eval can be called multiple times, so the resources that are created should be idempotent in case it’s called multiple times in a row.

void postEval() = 0

Post eval is called after the Sequence has called eval and submitted the commands to the GPU for processing, and can be used to perform any tear-down steps required as the computation iteration finishes. It’s worth noting that there are situations where eval can be called multiple times, so the resources that are destroyed should not require a re-init unless explicitly provided by the user.

OpAlgoBase

The vk::OpAlgoBase extends the vk::OpBase class, and provides the base for shader-based operations. Besides of consisting of one or more vk::Tensor as per the vk::OpBase, it also contains a unique vk::Algorithm.

../_images/kompute-vulkan-architecture-opmult.jpg
class kp::OpAlgoBase : public kp::OpBase

Operation that provides a general abstraction that simplifies the use of algorithm and parameter components which can be used with shaders. By default it enables the user to provide a dynamic number of tensors which are then passed as inputs.

Subclassed by kp::OpAlgoLhsRhsOut, kp::OpMult

Public Functions

OpAlgoBase()

Base constructor, should not be used unless explicitly intended.

OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> &tensors, const Workgroup &komputeWorkgroup = {}, const Constants &specializationConstants = {})

Default constructor with parameters that provides the bare minimum requirements for the operations to be able to create and manage their sub-components.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that are to be used in this operation

  • shaderFilePath: Optional parameter to specify the shader to load (either in spirv or raw format)

  • komputeWorkgroup: Optional parameter to specify the layout for processing

OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> &tensors, std::string shaderFilePath, const Workgroup &komputeWorkgroup = {}, const Constants &specializationConstants = {})

Constructor that enables a file to be passed to the operation with the contents of the shader. This can be either in raw format or in compiled SPIR-V binary format.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that are to be used in this operation

  • shaderFilePath: Parameter to specify the shader to load (either in spirv or raw format)

  • komputeWorkgroup: Optional parameter to specify the layout for processing

OpAlgoBase(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> &tensors, const std::vector<uint32_t> &shaderDataRaw, const Workgroup &komputeWorkgroup = {}, const Constants &specializationConstants = {})

Constructor that enables raw shader data to be passed to the main operation which can be either in raw shader glsl code or in compiled SPIR-V binary.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that are to be used in this operation

  • shaderDataRaw: Optional parameter to specify the shader data either in binary or raw form

  • komputeWorkgroup: Optional parameter to specify the layout for processing

~OpAlgoBase() override

Default destructor, which is in charge of destroying the algorithm components but does not destroy the underlying tensors

void init() override

The init function is responsible for the initialisation of the algorithm component based on the parameters specified, and allows for extensibility on the options provided. Further dependent classes can perform more specific checks such as ensuring tensors provided are initialised, etc.

void record() override

This records the commands that are to be sent to the GPU. This includes the barriers that ensure the memory has been copied before going in and out of the shader, as well as the dispatch operation that sends the shader processing to the gpu. This function also records the GPU memory copy of the output data for the staging buffer so it can be read by the host.

void preEval() override

Does not perform any preEval commands.

void postEval() override

Executes after the recorded commands are submitted, and performs a copy of the GPU Device memory into the staging buffer so the output data can be retrieved.

OpMult

The kp::OpMult operation is a sample implementation of the kp::OpAlgoBase class. This class shows how it is possible to create a custom vk::OpAlgoBase that can compile as part of the binary. The kp::OpMult operation uses the shader-to-cpp-header-file script to convert the script into cpp header files.

../_images/kompute-vulkan-architecture-opmult.jpg
class kp::OpMult : public kp::OpAlgoBase

Operation that performs multiplication on two tensors and outpus on third tensor.

Public Functions

OpMult()

Base constructor, should not be used unless explicitly intended.

OpMult(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> tensors, const Workgroup &komputeWorkgroup = {})

Default constructor with parameters that provides the bare minimum requirements for the operations to be able to create and manage their sub-components.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that are to be used in this operation

  • komputeWorkgroup: Optional parameter to specify the layout for processing

~OpMult() override

Default destructor, which is in charge of destroying the algorithm components but does not destroy the underlying tensors

OpTensorCopy

The kp::OpTensorCopy is a tensor only operation that copies the GPU memory buffer data from one kp::Tensor to one or more subsequent tensors.

class kp::OpTensorCopy : public kp::OpBase

Operation that copies the data from the first tensor to the rest of the tensors provided, using a record command for all the vectors. This operation does not own/manage the memory of the tensors passed to it. The operation must only receive tensors of type

Public Functions

OpTensorCopy(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> tensors)

Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that will be used to create in operation.

~OpTensorCopy() override

Default destructor. This class does not manage memory so it won’t be expecting the parent to perform a release.

void init() override

Performs basic checks such as ensuring there are at least two tensors provided, that they are initialised and that they are not of type TensorTypes::eStorage.

void record() override

Records the copy commands from the first tensor into all the other tensors provided. Also optionally records a barrier.

void preEval() override

Does not perform any preEval commands.

void postEval() override

Copies the local vectors for all the tensors to sync the data with the gpu.

OpTensorSyncLocal

The kp::OpTensorSyncLocal is a tensor only operation that maps the data from the GPU device memory into the local host vector.

class kp::OpTensorSyncLocal : public kp::OpBase

Operation that syncs tensor’s local memory by mapping device data into the local CPU memory. For TensorTypes::eDevice it will use a record operation for the memory to be syncd into GPU memory which means that the operation will be done in sync with GPU commands. For TensorTypes::eStaging it will only map the data into host memory which will happen during preEval before the recorded commands are dispatched. This operation won’t have any effect on TensorTypes::eStaging.

Public Functions

OpTensorSyncLocal(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> tensors)

Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation. The tensors provided cannot be of type TensorTypes::eStorage.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that will be used to create in operation.

~OpTensorSyncLocal() override

Default destructor. This class does not manage memory so it won’t be expecting the parent to perform a release.

void init() override

Performs basic checks such as ensuring that there is at least one tensor provided with min memory of 1 element.

void record() override

For device tensors, it records the copy command for the tensor to copy the data from its device to staging memory.

void preEval() override

Does not perform any preEval commands.

void postEval() override

For host tensors it performs the map command from the host memory into local memory.

OpTensorSyncDevice

The kp::OpTensorSyncDevice is a tensor only operation that maps the data from the local host vector into the GPU device memory.

class kp::OpTensorSyncDevice : public kp::OpBase

Operation that syncs tensor’s device by mapping local data into the device memory. For TensorTypes::eDevice it will use a record operation for the memory to be syncd into GPU memory which means that the operation will be done in sync with GPU commands. For TensorTypes::eStaging it will only map the data into host memory which will happen during preEval before the recorded commands are dispatched. This operation won’t have any effect on TensorTypes::eStaging.

Public Functions

OpTensorSyncDevice(std::shared_ptr<vk::PhysicalDevice> physicalDevice, std::shared_ptr<vk::Device> device, std::shared_ptr<vk::CommandBuffer> commandBuffer, std::vector<std::shared_ptr<Tensor>> tensors)

Default constructor with parameters that provides the core vulkan resources and the tensors that will be used in the operation. The tensos provided cannot be of type TensorTypes::eStorage.

Parameters
  • physicalDevice: Vulkan physical device used to find device queues

  • device: Vulkan logical device for passing to Algorithm

  • commandBuffer: Vulkan Command Buffer to record commands into

  • tensors: Tensors that will be used to create in operation.

~OpTensorSyncDevice() override

Default destructor. This class does not manage memory so it won’t be expecting the parent to perform a release.

void init() override

Performs basic checks such as ensuring that there is at least one tensor provided with min memory of 1 element.

void record() override

For device tensors, it records the copy command for the tensor to copy the data from its staging to device memory.

void preEval() override

Does not perform any preEval commands.

void postEval() override

Does not perform any postEval commands.

Shader

The kp::Shader class contains a set of utilities to compile and process shaders.

class kp::Shader

Shader utily class with functions to compile and process glsl files.

Public Static Functions

std::vector<uint32_t> compile_sources(const std::vector<std::string> &sources, const std::vector<std::string> &files = {}, const std::string &entryPoint = "main", std::vector<std::pair<std::string, std::string>> definitions = {})

Compile multiple sources with optional filenames. Currently this function uses the glslang C++ interface which is not thread safe so this funciton should not be called from multiple threads concurrently. If you have a online shader processing multithreading use-case that can’t use offline compilation please open an issue.

Return

The compiled SPIR-V binary in unsigned int32 format

Parameters
  • sources: A list of raw glsl shaders in string format

  • files: A list of file names respective to each of the sources

  • entryPoint: The function name to use as entry point

  • definitions: List of pairs containing key value definitions

std::vector<uint32_t> compile_source(const std::string &source, const std::string &entryPoint = "main", std::vector<std::pair<std::string, std::string>> definitions = {})

Compile a single glslang source from string value. Currently this function uses the glslang C++ interface which is not thread safe so this funciton should not be called from multiple threads concurrently. If you have a online shader processing multithreading use-case that can’t use offline compilation please open an issue.

Return

The compiled SPIR-V binary in unsigned int32 format

Parameters
  • source: An individual raw glsl shader in string format

  • entryPoint: The function name to use as entry point

  • definitions: List of pairs containing key value definitions