Variable Types for Tensors and Constants

By default the initial interfaces you may interact with, will be primarily using float values by default, which is enough to get through the basic conceptual examples. However as real world applications are being developed, more specialized types may be required for kp::Tensor, as well as for SpecializationConstants and PushConstants.

Before diving into the practical classes and interfaces that can be used to take advantage of the variable type support of Kompute, we want to provide some high level intution on what each of these components are.

Variable Tensor Types

For the kp::Tensor class, Kompute provides under the hood an interface to have more seamless interaction with multiple different underlying data types. This is done through the introduction of the class kp::TensorT<type> and parent class kp::Tensor, however you as a developer you will be primarily interacting with the top level kp::Tensor class, as this is what is provided through the high level kp::Manager class.

The kp::Tensor class does provide an “integrated” experience, which allows users to “seamlessly” retrieve the underlying data through the data() and vector() functions. This is done by leveraging C++ templates, as well as limiting the types that can be used, which are namely:

  • float

  • uint32

  • int32

  • double

  • bool

Any other data type provided would result in an error, and for the time being Kompute will focus on primarily provide support for these classes.

The tests under TestTensor.cpp and test_tensor_types.py provide an overview of how users can take advantage of these features using std::vector for C++ and numpy array for Python.

C++ Tensor Types Usage

Below you can see how it is possible to define different types in C++.

    {
        std::vector<float> vec{ 0, 1, 2 };
        std::shared_ptr<kp::TensorT<float>> tensor = mgr.tensor(vec);
        EXPECT_EQ(tensor->dataType(), kp::Tensor::TensorDataTypes::eFloat);
    }

    {
        std::vector<int32_t> vec{ 0, 1, 2 };
        std::shared_ptr<kp::TensorT<int32_t>> tensor = mgr.tensorT(vec);
        EXPECT_EQ(tensor->dataType(), kp::Tensor::TensorDataTypes::eInt);
    }

    {
        std::vector<uint32_t> vec{ 0, 1, 2 };
        std::shared_ptr<kp::TensorT<uint32_t>> tensor = mgr.tensorT(vec);
        EXPECT_EQ(tensor->dataType(), kp::Tensor::TensorDataTypes::eUnsignedInt);
    }

    {
        std::vector<double> vec{ 0, 1, 2 };
        std::shared_ptr<kp::TensorT<double>> tensor = mgr.tensorT(vec);
        EXPECT_EQ(tensor->dataType(), kp::Tensor::TensorDataTypes::eDouble);
    }
}

Python Tensor Types Usage

    spirv = compile_source(shader)

    arr_in_a = np.array([123., 153., 231.], dtype=np.float32)
    arr_in_b = np.array([9482, 1208, 1238], dtype=np.float32)
    arr_out = np.array([0, 0, 0], dtype=np.float32)

    mgr = kp.Manager()

    tensor_in_a = mgr.tensor(arr_in_a)
    tensor_in_b = mgr.tensor(arr_in_b)
    tensor_out = mgr.tensor(arr_out)

    params = [tensor_in_a, tensor_in_b, tensor_out]

    (mgr.sequence()
        .record(kp.OpTensorSyncDevice(params))
        .record(kp.OpAlgoDispatch(mgr.algorithm(params, spirv)))
        .record(kp.OpTensorSyncLocal([tensor_out]))
        .eval())

    assert np.all(tensor_out.data() == arr_in_a * arr_in_b)

Variable Push Constants

Push constants are a relatively non-expensive way to provide dynamic data to a GPU Algorithm (shader) as further CPU compute is performed. Although Push Constants are a more efficient way to provide data, it is also a limited manner as there is a memory limit for push constants.

Push constants with Kompute are flexible as it is possible to pass user-defined structs in C++. In Python it is limited to providing numpy arrays with multiple elements of the same type.

C++ Push Consts Types Usage

As mentioned above, this test under TestPushConstants.cpp shows how it is possible to use user-defined structs for multiple elements from different types, which is not possible for specialized constants or tensors.

These are defined in the algorithm function of the kp::Manager, and once it push constant is set, all other push constants provided have to consist of the same types and element size.

More specifically, when passing a custom struct it is possible to pass a single element, or alternatively passing multiple scalar values as part of the vector, and access them as outlined in the rest of the tests.

TEST(TestPushConstants, TestConstantsMixedTypes)
{
    {
        std::string shader(R"(
          #version 450
          layout(push_constant) uniform PushConstants {
            float x;
            uint y;
            int z;
          } pcs;
          layout (local_size_x = 1) in;
          layout(set = 0, binding = 0) buffer a { float pa[]; };
          void main() {
              pa[0] += pcs.x;
              pa[1] += pcs.y - 2147483000;
              pa[2] += pcs.z;
          })");

        struct TestConsts{
            float x;
            uint32_t y;
            int32_t z;
        };

        std::vector<uint32_t> spirv = compileSource(shader);

        std::shared_ptr<kp::Sequence> sq = nullptr;

        {
            kp::Manager mgr;

            std::shared_ptr<kp::TensorT<float>> tensor =
              mgr.tensorT<float>({ 0, 0, 0 });

            std::shared_ptr<kp::Algorithm> algo = mgr.algorithm<float, TestConsts>(
              { tensor }, spirv, kp::Workgroup({ 1 }), {}, {{ 0, 0, 0 }});

            sq = mgr.sequence()->eval<kp::OpTensorSyncDevice>({ tensor });

            // We need to run this in sequence to avoid race condition
            // We can't use atomicAdd as swiftshader doesn't support it for
            // float
            sq->eval<kp::OpAlgoDispatch>(algo, std::vector<TestConsts>{{ 15.32, 2147483650, 10 }});
            sq->eval<kp::OpAlgoDispatch>(algo, std::vector<TestConsts>{{ 30.32, 2147483650, -3 }});
            sq->eval<kp::OpTensorSyncLocal>({ tensor });

            EXPECT_EQ(tensor->vector(), std::vector<float>({ 45.64, 1300, 7 }));
        }
    }
}

Python Push Consts Types Usage

In python the push constants are limited to a single list of elements of the same type. These are provided by passing a numpy array to the algorithm function or the kp::OpAlgoDispatch operation.

    print(f"Dtype value {tensor_out.data().dtype}")

    assert np.all(tensor_out.data() == arr_in_a * arr_in_b)

def test_tensor_numpy_ownership():

    arr_in = np.array([1, 2, 3])

    m = kp.Manager()

    t = m.tensor(arr_in)

    # This should increment refcount for tensor sharedptr
    td = t.data()

    assert td.base.is_init() == True
    assert np.all(td == arr_in)

    del t

    assert td.base.is_init() == True
    assert np.all(td == arr_in)

    m.destroy()

    assert td.base.is_init() == False

Variable Specialization Constants

Specialization constants are analogous to push constants, but these are not dynamic, can only be set on initialization or rebuild of kp::Algorithm and cannot be changed unless a rebuild is carried out.

The usage of specailization constants is very similar to the push constants, but the only limitation are:

  • These are defined using the constant_id in the glsl shader

  • Spec constants do not support complex types (i.e. user defined struct)

  • Kompute supports an array of elements of same type for specialization constants

C++ Push Consts Types Usage

The specialization constant example shows how it is possible to define as a std::vector.

TEST(TestSpecializationConstants, TestConstantsInt)
{
    {
        std::string shader(R"(
          #version 450
          layout (constant_id = 0) const int cOne = 1;
          layout (constant_id = 1) const int cTwo = 1;
          layout (local_size_x = 1) in;
          layout(set = 0, binding = 0) buffer a { int pa[]; };
          layout(set = 0, binding = 1) buffer b { int pb[]; };
          void main() {
              uint index = gl_GlobalInvocationID.x;
              pa[index] = cOne;
              pb[index] = cTwo;
          })");

        std::vector<uint32_t> spirv = compileSource(shader);

        std::shared_ptr<kp::Sequence> sq = nullptr;

        {
            kp::Manager mgr;

            std::shared_ptr<kp::TensorT<int32_t>> tensorA =
              mgr.tensorT<int32_t>({ 0, 0, 0 });
            std::shared_ptr<kp::TensorT<int32_t>> tensorB =
              mgr.tensorT<int32_t>({ 0, 0, 0 });

            std::vector<std::shared_ptr<kp::Tensor>> params = { tensorA,
                                                                tensorB };

            std::vector<int32_t> spec({ -1, -2 });

            std::shared_ptr<kp::Algorithm> algo =
              mgr.algorithm(params, spirv, {}, spec, {});

            sq = mgr.sequence()
                   ->record<kp::OpTensorSyncDevice>(params)
                   ->record<kp::OpAlgoDispatch>(algo)
                   ->record<kp::OpTensorSyncLocal>(params)
                   ->eval();

            EXPECT_EQ(tensorA->vector(), std::vector<int32_t>({ -1, -1, -1 }));
            EXPECT_EQ(tensorB->vector(), std::vector<int32_t>({ -2, -2, -2 }));
        }
    }
}