使用cv :: cuda :: GpuMat推力（thrust）

2018-10-25 10:59 更新

目標(biāo)

推力（thrust）是一個非常強(qiáng)大的庫，用于各種各樣的cuda加速算法。然而推力被設(shè)計為與向量而不是投影矩陣一起工作。以下教程將討論將cv :: cuda :: GpuMat包裝到可用于推力算法的推力迭代器中。

本教程將向您展示如何：

將GpuMat包裝成推力迭代器
用隨機(jī)數(shù)字填充GpuMat
將GpuMat的列排序到位
將大于0的值復(fù)制到新的gpu矩陣
Use streams with thrust

將GpuMat包裝到推力迭代器中

以下代碼將為GpuMat生成一個迭代器

/*
    @Brief GpuMatBeginItr returns a thrust compatible iterator to the beginning of a GPU mat's memory.
    @Param mat is the input matrix
    @Param channel is the channel of the matrix that the iterator is accessing.  If set to -1, the iterator will access every element in sequential order
*/
template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor<T>, thrust::counting_iterator<int>>>  GpuMatBeginItr(cv::cuda::GpuMat mat, int channel = 0)
{
    if (channel == -1)
    {
        mat = mat.reshape(1);
        channel = 0;
    }
    CV_Assert(mat.depth() == cv::DataType<T>::depth);
    CV_Assert(channel < mat.channels());
    return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
        thrust::make_transform_iterator(thrust::make_counting_iterator(0), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
}

/*
@Brief GpuMatEndItr returns a thrust compatible iterator to the end of a GPU mat's memory.
@Param mat is the input matrix
@Param channel is the channel of the matrix that the iterator is accessing.  If set to -1, the iterator will access every element in sequential order
*/
template<typename T>
thrust::permutation_iterator<thrust::device_ptr<T>, thrust::transform_iterator<step_functor<T>, thrust::counting_iterator<int>>>  GpuMatEndItr(cv::cuda::GpuMat mat, int channel = 0)
{
    if (channel == -1)
    {
        mat = mat.reshape(1);
        channel = 0;
    }
    CV_Assert(mat.depth() == cv::DataType<T>::depth);
    CV_Assert(channel < mat.channels());
    return thrust::make_permutation_iterator(thrust::device_pointer_cast(mat.ptr<T>(0) + channel),
        thrust::make_transform_iterator(thrust::make_counting_iterator(mat.rows*mat.cols), step_functor<T>(mat.cols, mat.step / sizeof(T), mat.channels())));
}

我們的目標(biāo)是使一個迭代器從矩陣的開始開始，并且正確地遞增以訪問連續(xù)的矩陣元素。這對于連續(xù)的行來說是微不足道的，但是對于一個坐標(biāo)矩陣的列呢？為了做到這一點(diǎn)，我們需要迭代器來了解矩陣維度和步驟。此信息嵌入在step_functor中。

template<typename T> struct step_functor : public thrust::unary_function<int, int>
{
    int columns;
    int step;
    int channels;
    __host__ __device__ step_functor(int columns_, int step_, int channels_ = 1) : columns(columns_), step(step_), channels(channels_)  {   };
    __host__ step_functor(cv::cuda::GpuMat& mat)
    {
        CV_Assert(mat.depth() == cv::DataType<T>::depth);
        columns = mat.cols;
        step = mat.step / sizeof(T);
        channels = mat.channels();
    }
    __host__ __device__
        int operator()(int x) const
    {
        int row = x / columns;
        int idx = (row * step) + (x % columns)*channels;
        return idx;
    }
};

步驟函數(shù)接收索引值，并從矩陣的開頭返回適當(dāng)?shù)钠屏俊Ｓ嫈?shù)迭代器簡單地增加像素元素的范圍。結(jié)合到transform_iterator中，我們有一個從0到M * N的迭代器，并正確地遞增，以計算GpuMat的緩存。不幸的是，這不包括任何內(nèi)存位置信息，因為我們需要一個推力:: device_ptr。通過將設(shè)備指針與transform_iterator組合，我們可以將推力指向矩陣的第一個元素，并將其相應(yīng)地進(jìn)行調(diào)整。

用隨機(jī)數(shù)字填充GpuMat

現(xiàn)在我們有一些很好的功能來為迭代器提供推力，讓它們使用OpenCV做的一些事情。不幸的是，在撰寫本文時，OpenCV沒有Gpu隨機(jī)數(shù)生成。幸運(yùn)的是，推力是這樣的，現(xiàn)在這兩者之間的互操作是微不足道的。示例摘自http://stackoverflow.com/questions/12614164/generating-a-random-number-vector-between-0-and-1-0-using-thrust

首先我們需要編寫一個函數(shù)來產(chǎn)生我們的隨機(jī)值。

struct prg
{
  float a, b;
  __host__ __device__
    prg(float _a = 0.f, float _b = 1.f) : a(_a), b(_b) {};
  __host__ __device__
    float operator()(const unsigned int n) const
  {
    thrust::default_random_engine rng;
    thrust::uniform_real_distribution<float> dist(a, b);
    rng.discard(n);
    return dist(rng);
  }
};

這將占用整數(shù)值并輸出a和b之間的值?，F(xiàn)在我們將使用推力變換填充我們的矩陣的值在0和10之間。

  {
    cv::cuda::GpuMat d_value(1, 100, CV_32F);
    auto valueBegin = GpuMatBeginItr<float>(d_value);
    auto valueEnd = GpuMatEndItr<float>(d_value);
    thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));
    cv::Mat h_value(d_value);
  }

將GpuMat的列排序到位

使用隨機(jī)值和索引填充矩陣元素。之后，我們將排序隨機(jī)數(shù)和人格。

  {
    cv::cuda::GpuMat d_data(1, 100, CV_32SC2);
    // Thrust compatible begin and end iterators to channel 1 of this matrix
    auto keyBegin = GpuMatBeginItr<int>(d_data, 1);
    auto keyEnd = GpuMatEndItr<int>(d_data, 1);
    // Thrust compatible begin and end iterators to channel 0 of this matrix
    auto idxBegin = GpuMatBeginItr<int>(d_data, 0);
    auto idxEnd = GpuMatEndItr<int>(d_data, 0);
    // Fill the index channel with a sequence of numbers from 0 to 100
    thrust::sequence(idxBegin, idxEnd);
    // Fill the key channel with random numbers between 0 and 10.  A counting iterator is used here to give an integer value for each location as an input to prg::operator()
    thrust::transform(thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_data.cols), keyBegin, prg(0, 10));
    // Sort the key channel and index channel such that the keys and indecies stay together
    thrust::sort_by_key(keyBegin, keyEnd, idxBegin);
    cv::Mat h_idx(d_data);
  }

使用streams時，將值大于0的值復(fù)制到新的gpu矩陣

在這個例子中，我們將看到如何使用cv :: cuda :: Streams。不幸的是，這個具體例子使用必須將結(jié)果返回給CPU的功能，因此它不是Streams的最佳使用。

  {
    cv::cuda::GpuMat d_value(1, 100, CV_32F);
    auto valueBegin = GpuMatBeginItr<float>(d_value);
    auto valueEnd = GpuMatEndItr<float>(d_value);
    cv::cuda::Stream stream;
    //! [random_gen_stream]
    // Same as the random generation code from before except now the transformation is being performed on a stream
    thrust::transform(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));
    //! [random_gen_stream]
    // Count the number of values we are going to copy
    int count = thrust::count_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, pred_greater<float>(0.0));
    // Allocate a destination for copied values
    cv::cuda::GpuMat d_valueGreater(1, count, CV_32F);
    // Copy values that satisfy the predicate.
    thrust::copy_if(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), valueBegin, valueEnd, GpuMatBeginItr<float>(d_valueGreater), pred_greater<float>(0.0));
    cv::Mat h_greater(d_valueGreater);
  }

首先，我們將在Streams上填充隨機(jī)生成的數(shù)據(jù)在-1和1之間的GPU數(shù)據(jù)塊。

    // Same as the random generation code from before except now the transformation is being performed on a stream
    thrust::transform(thrust::system::cuda::par.on(cv::cuda::StreamAccessor::getStream(stream)), thrust::make_counting_iterator(0), thrust::make_counting_iterator(d_value.cols), valueBegin, prg(-1, 1));

請注意使用thrust :: system :: cuda :: par.on（...），這將創(chuàng)建一個用于在Streams上執(zhí)行推力代碼的執(zhí)行策略。在cuda工具包分發(fā)的推力版本中有一個錯誤，從版本7.5開始，這還沒有被修正。這個錯誤導(dǎo)致代碼不能在Streams上執(zhí)行。然而，可以通過使用git存儲庫中的最新版本的推力來修復(fù)該錯誤。（http://github.com/thrust/thrust.git）接下來，我們將使用以下謂詞使用推力:: count_if來確定多少值大于0：

template<typename T> struct pred_greater
{
  T value;
  __host__ __device__ pred_greater(T value_) : value(value_){}
  __host__ __device__ bool operator()(const T& val) const
  {
    return val > value;
  }
};

我們將使用這些結(jié)果創(chuàng)建一個用于存儲復(fù)制值的輸出緩沖區(qū)，然后我們將使用具有相同謂詞的copy_if來填充輸出緩沖區(qū)。最后我們將把值下載到一個CPU墊中進(jìn)行查看。

以上內(nèi)容是否對您有幫助：

← GPU上的相似性檢查（PNSR和SSIM）

啟動Viz →

寫筆記

我要補(bǔ)充