# Computational modeling for parallel grid-based recursive Bayesian estimation: parallel computation using graphics processing unit

- Xianqiao Tong
^{1}Email author, - Tomonari Furukawa
^{1}and - Hugh Durrant-Whyte
^{2}

**1**:15

https://doi.org/10.1186/2195-5468-1-15

© Tong et al.; licensee Springer. 2013

**Received: **29 August 2013

**Accepted: **11 November 2013

**Published: **16 December 2013

## Abstract

This paper presents the performance modeling of the real-time grid-based recursive Bayesian estimation (RBE), particularly the parallel computation using graphics processing unit (GPU). The proposed modeling formulates data transmission between the central processing unit (CPU) and the GPU as well as floating point operations to be carried out in each CPU and GPU necessary for one iteration of the real-time grid-based RBE. Given the specifications of the computer hardware, the proposed modeling can thus estimate the total amount of time cost for performing the grid-based RBE in a real-time environment. A new prediction formulation, which adopted separable convolution, is proposed to further accelerate the real-time grid-based RBE. The performance of the proposed modeling was investigated, and parametric studies have first demonstrated its validity in various conditions by showing that the average error of estimation in computational performance stays below 6% to 7%. Utilizing the prediction with separable convolution, the grid-based RBE has also been found to perform within 1 ms, although the size of the problem was relatively large.

### Keywords

RBE Bayesian GPU Real-time Grid-base Parallel## Introduction

Recursive Bayesian estimation (RBE) allows the estimation of belief of a dynamically moving target by updating the belief both in time and observation [1]. There are two fundamental processes for the RBE: prediction process and correction process. The prediction process updates the belief by the motion model of the target, whereas the correction process updates the belief through the current observation. If the target is observable, the accuracy of the RBE can be maintained by the correction process using the valid observations. When the target is not observable, the accuracy of the RBE heavily relies on the prediction process and the error accumulates due to the lack of the valid observation for the correction process. In order for an accurate estimation, the RBE has to be performed fast enough to catch the motion of the target with a well-defined target motion model, which requires a good synchronization between its discrete representation and the RBE. Recent years, as a result, have seen many real-time enhanced RBE techniques that help improve the speed of the RBE.

One of such techniques is the modified ensemble Kalman filter (EnKF). The EnKF allows non-Gaussian estimation by minimizing a cost function defined by a non-Gaussian observation error with a pre-conditioned conjugate gradient method [2]. Langevin-Markov Chain Monte Carlo (MCMC) method, which represents the non-Gaussian belief by sampling it using a Markov chain and Langevin equation, could be a non-Gaussian RBE technique [3]. Another sampling method is the interactive particle filter (IPF), which is able to flexibly mitigate the belief space complexity [4]. An ensemble Kalman-particle predictor-corrector filter is a hybrid method that combines the advantages of EnKF and IPF and is able to effectively deal with high-dimensional non-Gaussian problems [5]. A tree-based estimator approximates the posterior belief distribution at multiple resolutions to be effective for high-dimensional problems [6], whereas maximum likelihood state estimation method could also achieve non-Gaussian RBE [7] by using a finite Gaussian mixture model.

Grid-based RBE technique is able to maintain a good accuracy for the belief since the entire target space is spatially discretized [8]. The good accuracy is obtained by the subtle discretization of the target space but leads to an inefficient computation at the same time. Furukawa et al. [9, 10] refined the grid-based RBE by developing a more general element-based RBE. The generalized element can help accurately represent the arbitrary target space with only the small number of elements compared with the grid-based RBE so as to reduce the computation of the RBE. Lavis et al. proposed an enhanced grid-based RBE that allows the update of not only the belief but also the target space [11]. Because of the dynamic adjustment of the target space, the computation of the RBE is additionally reduced. Further, the parallel grid-based RBE has been proposed, and it significantly accelerated the computation of the RBE and made its real-time implementation possible by utilizing the GPU’s strong parallel computational capability [12]. Despite that these efforts successfully reduce the computation of the RBE to achieve the fast RBE, the accuracy of the RBE is not well kept when the prediction process dominates the RBE during the no-observation period. The time cost of one iteration of the RBE becomes critical for overcoming this issue because that only if it matches the time increment of the discrete target motion model, the RBE can maintain the accuracy during the no-observation period.

This paper presents a performance modeling for the parallel grid-based RBE, particularly the parallel computation using the GPU, and it is able to determine the time cost of one iteration of the RBE. The proposed modeling formulates the total amount of data transmission between the CPU and the GPU and the total number of floating point operations to be carried out in each CPU and GPU necessary for one iteration of the parallel grid-based RBE. Given the specifications of the computer hardware, it is thus possible to estimate the time cost for one iteration of the parallel grid-based RBE. In order to perform the parallel grid-based RBE at maximum speed, the proposed modeling also reformulates and implements the prediction process with separable convolution.

The paper is organized as follows. The following section reviews the recursive Bayesian estimation as well as the parallel grid-based RBE. Section presents the proposed reformulation of the prediction process for the parallel grid-based RBE and its computational performance modeling. Section demonstrates the validation and efficacy of the proposed modeling through numerical examples, and the Conclusion and future work are summarized in the final section.

## Parallel grid-based RBE

### Problem statement

*o*, is deterministically given by the following equation:

^{ o }represents the state of the object, u

^{ o }represents the object control input, w

^{ o }represents the system noise, which includes environmental influences on the target, and

*t*represents the time. In general, the state of the object describes its two-dimensional location but may also include other variables such as velocity. Let the time interval between the consecutive time steps be defined as

*Δ*

*t*. By integrating Equation (1), the state of the object at the time step

*k*is given by

where *t*
_{
k−1} is the time which corresponds to the time step *k*−1.

### Recursive Bayesian estimation

#### Prediction

*Δ*

*t*between the consecutive time steps into

*n*subintervals. The state of the object at the time step

*k*is given by

*k*−1 be defined as ${\phantom{\rule{0.3em}{0ex}}}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k-1}\equiv {\{}^{s}{\stackrel{~}{\mathbf{z}}}_{i}|\forall i\in \{1,\dots ,k-1\left\}\right\}$. Notice here that $\stackrel{~}{(\xb7)}$ represents an instance of variable (·). The prediction process computes the belief of the current state $p\left({\mathbf{x}}_{k}^{o}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k-1}\right)$ from the belief in the previous time step $p\left({\mathbf{x}}_{k-1}^{o}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k-1}\right)$. The prediction is iteratively carried out by Chapman-Kolmogorov equation and given by

where $p\left({\mathbf{x}}_{k}^{o}\right|{\mathbf{x}}_{k-1}^{o})$ is the probabilistic representation of the object motion model defined in Equation (3), which maps the probability of transition from the previous state ${\mathbf{x}}_{k-1}^{o}$ to the current state ${\mathbf{x}}_{k}^{o}$. The prediction process at *k*=1 is carried out by letting $p\left({\mathbf{x}}_{k-1}^{o}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k-1}\right)=p\left({\stackrel{~}{\mathbf{x}}}_{0}^{o}\right)$, where $p\left({\stackrel{~}{\mathbf{x}}}_{0}^{o}\right)$ is defined as a prior belief of the object in terms of the probability density function. Equation (4) indicates that the performance of the prediction process relies on the object motion model $p\left({\mathbf{x}}_{k}^{o}\right|{\mathbf{x}}_{k-1}^{o})$. Due to the fact that the object motion model is usually non-Gaussian when only prediction process applies to the RBE, the belief could eventually become heavily non-Gaussian.

#### Correction

^{ s }

**z**

_{ k }at the time step

*k*is given by

where ${\mathbf{v}}_{k}^{\phantom{\rule{2.77626pt}{0ex}}\phantom{\rule{0.3em}{0ex}}s}$ represents the observation noise at the time step *k*, and $\varnothing $ represents an empty element, indicating that the observation contained no information on the object or that the target is unobservable when it is not within the observable region.

where $p\left({\mathbf{x}}_{k}^{o}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{k}\right)$ is the probabilistic representation of the observation model defined in Equation (6). When the object is within the observable region, a positive observation is obtained and the observation likelihood is a probability density function given the current of the object observation. When the object is out of the observable region, the negative observation is defined with respect to the PoD as the observation likelihood. Due to the fact that the observation likelihood of the negative observation is non-Gaussian, when the negative observation occurs in the RBE, the object belief would immediately become heavily non-Gaussian.

### Parallel grid-based RBE

#### Representation of target space and belief

**m**= [

*x*,

*y*]. The grid space is further introduced by discretizing the rectangular space by

*n*

_{ x }and

*n*

_{ y }grid cells in two directions, respectively. The dimensions of a grid cell are defined as $\Delta {x}^{r}=({x}_{max}^{t}-{x}_{min}^{t})/{n}_{x}$ and $\Delta {y}^{r}=({y}_{max}^{t}-{y}_{min}^{t})/{n}_{y}$. This results in introducing the center of each grid cell as

*i*

_{ x }∈{1,…,

*n*

_{ x }} and ∀

*i*

_{ y }∈{1,…,

*n*

_{ y }}. Each grid cell is defined as

Note that $\bigcup _{{i}_{x}=1}^{{n}_{x}}\bigcup _{{i}_{y}=1}^{{n}_{y}}{\mathcal{X}}_{{i}_{x},{i}_{y}}^{r}={\mathcal{X}}^{r}$ and $\bigcap _{{i}_{x}=1}^{{n}_{x}}\bigcap _{{i}_{y}=1}^{{n}_{y}}{\mathcal{X}}_{{i}_{x},{i}_{y}}^{r}=\varnothing $. Finally, the selection of grid cells that represent the target space is performed by selecting a grid cell when its center is located in the target space, ${\mathcal{X}}_{{i}_{x},{i}_{y}}^{r}\subset {\mathcal{X}}^{t}$ if ${\stackrel{\u0304}{\mathbf{x}}}_{{i}_{x},{i}_{y}}^{r}\in {\mathcal{X}}^{t}$. The approximate target space derived by the processes described above is ${\mathcal{X}}^{t}\approx \{{\mathcal{X}}_{1}^{r},{\mathcal{X}}_{2}^{r},\dots ,{\mathcal{X}}_{{n}_{g}}^{r}\}$, where *n*
_{
g
} is the number of grid cells approximating the target space.

The belief is usually represented by a probability density function over the target space. Similar to the discretization of the target space, the belief could also be represented discretely by grid cells. The position of each grid cell can be described in the two-dimensional integer space as [ *i*
_{
x
},*i*
_{
y
}], where *i*
_{
x
}∈1,…,*n*
_{
x
} and *i*
_{
y
}∈1,…,*n*
_{
y
}. With the integer representation, the belief at the grid cell [ *i*
_{
x
},*i*
_{
y
}] can be represented as ${p}^{{i}_{x},{i}_{y}}(\xb7)$.

#### Prediction

*i*

_{ x },

*i*

_{ y }] and the target motion model ${p}^{{I}_{x},{I}_{y}}\left({\mathbf{x}}_{k}^{t}\right|{\mathbf{x}}_{k-1}^{t})$ constructed in the matrix of size

*I*

_{ x }×

*I*

_{ y }as a convolution kernel, the predicted belief of the current state can be numerically computed as

The parallelization of the prediction process is straightforward. Since the prediction at each grid cell, given by Equation (12), can be performed independently, the parallelization of the prediction corresponds to the parallelization of the equation and achieves a parallel efficiency of 100% in an ideal environment. However, this equation also shows that the computation for the prediction process is largely dominated by the size of the convolution kernel. In order for real-time performance, it is important that the convolution kernel of an appropriate size, which needs to be big enough to capture the motion of the target as well as small enough to perform fast computation, is utilized.

#### Correction

*i*

_{ x },

*i*

_{ y }], the corrected belief is computed by

*A*

_{c}is the area of a grid cell, and

- 1.
Calculate ${q}^{{i}_{x},{i}_{y}}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$ by multiplying the predicted belief ${p}^{{i}_{x},{i}_{y}}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k-1}\right)$ with the observation likelihood ${l}^{{i}_{x},{i}_{y}}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{k}\right)$;

- 2.
Sum $\sum _{\alpha =1}^{{n}_{g}}{q}^{\alpha}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$ and multiply the sum by

*A*_{c}; - 3.
Calculate ${p}^{{i}_{x},{i}_{y}}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$ by dividing ${q}^{{i}_{x},{i}_{y}}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$ by ${A}_{\mathrm{c}}\sum _{\alpha =1}^{{n}_{g}}{q}^{\alpha}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$.

The breakdown indicates that steps 1 and 3 are grid-wise sub-processes, which can be conducted independently. Therefore, for the correction process, steps 1 and 3 can be computed in parallel, whereas step 2 is not parallelizable.

### Target state evaluation

*Δ*

*t*is necessary. Given a specific computer hardware configuration, each iteration of the parallel grid-based RBE requires the certain amount of time

*Δ*

*t*

_{c}to perform the computation, including both the prediction and correction processes. In order to achieve an accurate evaluation of the target state, the time interval

*Δ*

*t*needs to be chosen such that it matches the

*Δ*

*t*

_{c}. As shown in Figure 1, only when the

*Δ*

*t*is identical with the

*Δ*

*t*

_{c}the evaluated target states could match the real target states. When the

*Δ*

*t*is smaller or larger than the

*Δ*

*t*

_{c}, the evaluation of the target states fails and eventually leads to large accumulated errors. The

*Δ*

*t*

_{c}is determined by not only the parallel grid-based RBE itself but also its computational performance for the specific computer hardware configuration.

## Computational performance modeling

### Acceleration of prediction process

*I*

_{ x }×

*I*

_{ y }can be separated into two vector kernels in the name of separable convolution: a column kernel of length

*I*

_{ x }and a row kernel of length

*I*

_{ y }. Therefore, the target motion model matrix is separated as

*I*

_{ x }+

*I*

_{ y }. Substituting Equation (15) into Equation (11), the predicted belief of the current state can be computed as

*I*

_{ x }since

*I*

_{ x }times of one multiplication and one summation are necessary, whereas the number of floating point operations for Equation (18) is 2

*I*

_{ y }via the similar observation. Having a total of

*n*

_{ g }grid cells, the total number of floating point operations for the prediction process is thus given by

This is considerably small compared to that of the original formulation which is derived as 2*n*
_{
g
}
*I*
_{
x
}
*I*
_{
y
} via Equation (12) since *I*
_{
x
}+*I*
_{
y
}≪*I*
_{
x
}
*I*
_{
y
} for an appropriate prediction process.

### Parallel computation using GPU

### Modeling of computational performance

where *Δ* *t*
_{trans} represents the data transmission time cost between the CPU’s memory and the GPU’s global memory as well as that between the local and the global memory inside the GPU, *Δ* *t*
_{G} represents the time cost of the parallel computation performed on the GPU, and *Δ* *t*
_{C} represents the time cost of the computation performed on the CPU.

#### Data transmission

*Δ*

*t*

_{trans}for one iteration of the accelerated parallel grid-based RBE, the data transmitted among the CPU’s memory, GPU’s global memory, and GPU’s local memory need to be evaluated in both the prediction and correction processes. Let the amount of data transmitted in the unit of bytes be defined as

where *P* is the precision of the numerical representation, and *N* is defined as the number of data transmitted. Since the precision is usually constant, the amount of data transmitted could be derived in terms of the number of data transmitted. The numbers of data of the belief and the target motion model for the prediction process are *n*
_{
g
} and *I*
_{
x
}+*I*
_{
y
}, respectively. The same numbers of data, *n*
_{
g
} and *I*
_{
x
}+*I*
_{
y
}, are transmitted to the GPU’s local memory to perform parallel calculation. In the correction process, the number of data of the likelihood to be transmitted from the CPU’s memory to the GPU’s local memory through the GPU’s global memory is *n*
_{
g
}, whereas the number of data of the result $q\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$ to be transmitted from the GPU’s local memory to the CPU’s memory through the GPU’s global memory is similarly *n*
_{
g
}. The number of data of the sum, ${A}_{\mathrm{c}}\sum _{\alpha =1}^{{n}_{g}}{q}^{\alpha}\left({\mathbf{x}}_{k}^{t}{|}^{s}{\stackrel{~}{\mathbf{z}}}_{1:k}\right)$, to be then transmitted to the GPU’s local memory to perform parallel divisions is 1, and finally, the number of data to be transmitted back to the CPU’s memory for the next RBE is *n*
_{
g
}.

*Δ*

*t*

_{trans}for one iteration of the accelerated parallel grid-based RBE is given by

*N*

_{CG}and

*B*

_{CG}are the total number of data transmitted and the copy bandwidth with the unit of bytes per second from the CPU’s memory to the GPU’s global memory, respectively,

*N*

_{GC}and

*B*

_{GC}are those from the GPU’s global memory to the CPU’s memory, respectively, and

*N*

_{GG}and

*B*

_{GG}represent those between the GPU’s global memory and the GPU’s local memory. Due to the fact that the copy bandwidth from the GPU’s global memory to the GPU’s local memory and the one in opposite direction are the same, the number of data transmitted inside the GPU is given by

It is to be noted here that these parameters of copy bandwidths are inherent for a specific computer hardware configuration and can be determined experimentally.

#### Floating point operations

*Δ*

*t*

_{G}and CPU computation time cost

*Δ*

*t*

_{C}for one iteration of the accelerated parallel grid-based RBE, the number of floating point operations performed on both CPU and GPU needs to be evaluated. The number of floating point operations performed on the GPU for the prediction process is seen 2

*n*

_{ g }(

*I*

_{ x }+

*I*

_{ y }) as the Equation (19) indicated. The number of floating point operations performed on the GPU for the correction process is identified as 2

*n*

_{ g }in total since

*n*

_{ g }parallel multiplications and

*n*

_{ g }parallel divisions are performed for steps 1 and 3 in Subsection 2, respectively. Meanwhile, the number of floating point operations performed on the CPU is

*n*

_{ g }by

*n*

_{ g }summations in step 2 of the Subsection 2. As a consequence, the total number of floating point operations performed on the GPU and the CPU for one iteration of the accelerated parallel grid-based RBE is given, respectively, by

*N*

_{G}is the number of floating point operations performed on the GPU, and

*V*

_{G}is the computational rate of GPU with the unit of FLOPS. Substituting Equation (29) into Equation (29), the GPU computation time cost is given by

*N*

_{C}represents the number of floating point operations performed on the CPU, and

*V*

_{C}is the computational rate of CPU with the unit of FLOPS. In the same way, by substituting Equation (29) into Equation (31), the CPU computation time cost is given by

It is to be noted here that the computational rates, *V*
_{G} and *V*
_{C}, are also inherent for a specific CPU and GPU configuration and can be determined experimentally.

## Experimental validation

**Test computer system specifications**

Setup | Processor | Memory (GB) | GPU |
---|---|---|---|

1 | Intel Dual-Core, 2.70 GHz | 4.0 | Nvidia GeForce GT220 |

2 | Intel Dual-Core, 2.40 GHz | 4.0 | Nvidia GeForce GT320M |

3 | Intel Dual-Core, 2.40 GHz | 4.0 | Nvidia GeForce GS8400 |

### Improvement in prediction process

### Validation

This set of tests was aimed at validating the proposed modeling of computer performance by estimating the total iteration time cost *Δ* *t* of the parallel grid-based RBE using GPU and comparing it with the actual iteration time cost experimentally measured in three different computer setups. Each component, *Δ* *t*
_{trans}, *Δ* *t*
_{G}, or *Δ* *t*
_{C}, is also compared with the actual performance, respectively. All the time cost results are measured by averaging the time cost of 10,000 iterations. Needless to say, the convolution kernel size *I*
_{
x
}+*I*
_{
y
} and grid space size *n*
_{
g
} are the two major factors in the proposed modeling. Two tests were thus conducted by each, changing the convolution kernel size and the grid space size.

#### Test 1

Test 1 was performed by fixing the grid space size of the parallel grid-based RBE to 1,000×1,000 and varying the convolution kernel size *I*
_{
x
}=*I*
_{
y
}=*i* from 1 to 200. A convolution kernel size over 200 was not explored since it is unlikely that the target motion model requires such a large convolution kernel. The square convolution kernel was because of the insignificance in changing size in both *x* and *y* directions, and this additionally allows visualization of results in two-dimensional space.

**Quantitative results for test 1**

Time cost | Setup | |||
---|---|---|---|---|

1 | 2 | 3 | ||

Average relative error |
| 1.159 ms | 1.165 ms | 1.305 ms |

| 0.216 ms | 0.462 ms | 0.856 ms | |

| 0.402 ms | 0.446 ms | 0.382 ms | |

| 1.777 ms | 2.073 ms | 2.543 ms | |

(5.88 | (6.55 | (6.05 | ||

Maximum relative error |
| 2.351 ms | 2.254 ms | 2.670 ms |

| 0.716 ms | 1.464 ms | 3.259 ms | |

| 0.779 ms | 0.857 ms | 0.818 ms | |

| 3.228 ms | 4.149 ms | 6.081 ms | |

(10.63 | (11.24 | (11.45 |

#### Test 2

Test 2 was performed by fixing the convolution kernel size of the parallel grid-based RBE to 16×16 or 32×32 and varying grid space size *n*
_{
x
}=*n*
_{
y
}=*n* from 100 to 1,000. These convolution kernel sizes often represent the target motion model with sufficient accuracy, and the grid space size *n*=1,000, which creates 1,000,000 grid cells, also provides good accuracy in many practical problems. Similarly to test 1, the square grid size enables two-dimensional visualization of results.

**Quantitative results for test 2**

Total time cost | Setup | |||
---|---|---|---|---|

1 | 2 | 3 | ||

Average relative error |
| 0.513ms | 0.530ms | 0.617ms |

(5.59 | (5.68 | (5.90 | ||

Maximum relative error |
| 2.140 ms | 2.491 ms | 2.835 ms |

(10.08 | (10.64 | (10.26 |

### Simulated target searching task

*v*

^{ t }and

*γ*

^{ t }are the velocity and direction of the target motion, respectively, each subject to a Gaussian noise, and

*Δ*

*t*is the time increment. The prior belief on the target is also Gaussian. The autonomous sensor platforms are assumed to move on a horizontal plane and given by

*s*

_{ i }) respectively, and ${\alpha}^{{s}_{i}}$ is a coefficient governing the rate of turn. The probability of detection ${P}_{\mathrm{d}}\left({\mathbf{x}}_{k}^{t}\right|{\mathbf{x}}_{k}^{{s}_{i}})$ is given by a Gaussian distribution, whereas the likelihood $l({\mathbf{x}}_{k}^{t}{|}^{{s}_{i}}{\stackrel{~}{\mathbf{z}}}_{k}^{t},{\stackrel{~}{\mathbf{x}}}_{k}^{{s}_{i}})$ when the target is detected is given by a Gaussian distribution with variances proportional to the distance between the sensor platform

*s*

_{ i }and the target. Table 4 shows the major parameters of this simulated target searching task. The convolution kernel constructed by the target motion model is represented by a 32×32 matrix, and the grid space size is set as 1,000×1,000. The computer specifications followed the setup 3 in the Table 1. With the proposed approach, the time increment

*Δ*

*t*was chosen as 0.032 s, the time cost of one iteration of the RBE estimated by the proposed modeling. For the case without the proposed approach, the time increment

*Δ*

*t*was chosen as 0.02 s randomly in order to show the comparison.

**Major parameters of the target searching task**

Parameter | Value | |
---|---|---|

Sensor platform, | Velocity ${v}_{k}^{{s}_{i}}$ | 0.05 |

Turn coef. ${\alpha}^{{s}_{i}}$ | 0.8 | |

PoD var. | [ 0.2, 0.2] | |

Target, | Velocity ${v}_{k}^{t}$ | N(0.1, 0.02) |

Direction ${\gamma}_{k}^{t}$ | N(0rad, 0.7rad) | |

Prior $[{x}_{0}^{t},{y}_{0}^{t}]$ | N([20, 25], diag{200, 200 }) |

**Quantitative results for simulated search and rescue task**

Total time | Sensor platform | ||||
---|---|---|---|---|---|

1 | 2 | 3 | 4 | ||

Average relative error |
| 0.618 ms | 0.633 ms | 0.626 ms | 0.618 ms |

(5.78 | (6.21 | (5.82 | (5.93 | ||

Maximum relative error |
| 2.856 ms | 2.823 ms | 2.892 ms | 2.854 ms |

(9.89 | (9.56 | (9.25 | (9.68 |

## Conclusion and future work

The performance modeling for the real-time grid-based RBE, especially parallel computation using GPU, has been proposed to identify the best resolution of the RBE with given computer hardware. The modeling allows the estimation of time costs necessary within CPU and GPU and that of data transmission between CPU and GPU for the real-time grid-based RBE. In order to speed up the RBE, the prediction has been additionally reformulated with the separable convolution.

The proposed modeling was experimentally investigated by varying its major parameters. The result of the first test with varying convolution kernel size shows that the average error of the estimation by the proposed modeling stays below 7% regardless of the convolution kernel size and that a high-performance GPU is necessary if the convolution kernel size is large. In the second test with varying grid space size, it is found that the proposed modeling estimates within the average error of 6%, irrespective of the grid space size, and that a high-quality memory is necessary if fast RBE is required for large grid space. Utilizing prediction with separable convolution, the RBE has also been found to perform within 1 ms, although the size of the problem was relatively large.

The current study is still the first step for achieving high-fidelity RBE in a real-time environment. The project is further planned to utilize the best resolution of the RBE identified by the proposed modeling and investigate its efficacy.

## Declarations

## Authors’ Affiliations

## References

- Tarantola A:
*Inverse Problem Theory and Methods for Model Parameter Estimation*. Philadelphia: Society for Industrial and Applied Mathematics; 2005.View ArticleGoogle Scholar - Harlim J, Hunt BR:
**A non-Gaussian ensemble filter for assimilating infrequent noisy observations.***Tellus A*2007,**59:**225–237. 10.1111/j.1600-0870.2007.00225.xView ArticleGoogle Scholar - Apte A, Hairer M, Stuart AM, Voss J:
**Sampling the posterior: an approach to non-Gaussian data assimilation.***Physica D*2007,**230:**50–64. 10.1016/j.physd.2006.06.009MathSciNetView ArticleGoogle Scholar - Doshi P, Gmytrasiewicz PJ:
**Monte Carlo sampling methods for approximating interactive POMDPs.***J. Artif. Intell. Res*2009,**34:**297–337.Google Scholar - Mandel J, Beezley JD:
**An ensemble Kalman-particle predictor-corrector filter for non-Gaussian data assimilation.***Comput. Sci. ICCS*2009,**2009:**470–478.Google Scholar - Stenger B, Thayananthan A, Torr PHS, Cipolla R:
**Filtering using a tree-based estimator.***IEEE Int. Conf. Comput. Vis*2003,**2:**1063–1070.Google Scholar - Huang D, Leung H:
**Maximum likelihood state estimation of semi-Markovian switching system in non-Gaussian measurement noise.***IEEE Trans. Aerosp. Electron. Syst*2010,**46:**133–146.View ArticleGoogle Scholar - Bergman N:
*Recursive Bayesian estimation navigation and tracking applications*. PhD Dissertation, Linkopings University; 1999.Google Scholar - Furukawa T, Durrant-Whyte HF, Lavis B:
*The element-based method—theory and its application to Bayesian search and tracking*. San Diego: Paper presented at the IEEE/RSJ international conference on intelligent robots and systems; 29 Oct–2 Nov 2007.Google Scholar - Lavis B, Furukawa T:
*HyPE: Hybrid particle-element approach for recursive Bayesian searching and tracking. Proceedings of Robotic: Science and Systems IV.*Zurich: MIT Press; 2008.Google Scholar - Lavis B, Furukawa T, Durrant-Whyte HF:
**Dynamic space reconfiguration for Bayesian search and tracking with moving targets.***Auto. Robots*2008,**24**(4):387–399. 10.1007/s10514-007-9081-4View ArticleGoogle Scholar - Furukawa T, Lavis B, Durrant-Whyte HF:
*Parallel grid-based recursive Bayesian estimation using GPU for real-time autonomous navigation. Paper presented at the IEEE international conference on robotics and automation.*Anchorage, AK, USA: ; 3–7 May 2010.Google Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.