Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks

Using Dynamic Programming

Sune Lundø Sørensen

1 a

, Lakshadeep Naik

2 b

, Peter Khiem Duc Tinh Nguyen

Aljaz Kramberger

2 c

, Leon Bodenhagen

2 d

, Mikkel Baun Kjærgaard

1 e

and Norbert Kr

uger

2,3 f

SDU Software Engineering, The Mærsk McKinney Møller Institute, University of Southern Denmark, Campusvej 55,

Odense, Denmark

SDU Robotics, The Mærsk McKinney Møller Institute, University of Southern Denmark, Campusvej 55, Odense, Denmark

Danish Institute for Advanced Studies (DIAS), Odense M, Denmark

Keywords:

Base Pose Planning, Mobile Manipulation, Dynamic Programming, World Modeling, Perceptual Anchoring.

Abstract:

Given a setup with external cameras and a mobile manipulator with an eye-in-hand camera, we address the

problem of computing a sequence of base poses and grasp choices that allows for clearing objects from a

table while minimizing the overall execution time. The ﬁrst step in our approach is to construct a world

model, which is generated by an anchoring process, using information from the external cameras. Next, we

developed a planning module which – based on the contents of the world model - is able to create a plausible

plan for reaching base positions and suitable grasp choices keeping execution time minimal. Comparing our

approach to two baseline methods shows that the average execution cost of plans computed by our approach is

40% lower than the naive baseline and 33% lower than the heuristic-based baseline. Furthermore, we integrate

our approach in a demonstrator, undertaking the full complexity of the problem.

1 INTRODUCTION

Grasping arbitrary objects from tables (see Figure 1)

in unconstrained environments is a challenging task,

which can take place in the e.g. the hospitality in-

dustry where tables needs to be cleared in a restau-

rant or a university canteen (Khatib, 1999). Solving

the task requires to address a number of difﬁcult sub-

problems. (1) Planning an optimal path for the robot

platform in terms of performance and execution time,

(2) Visual pose estimation based on multiple cameras

potentially combined with visual exploration, (3) Pre-

grasp and Grasp planning and (4) informed decisions

on grasp alternatives and grasp execution despite the

expected uncertainties in pose estimation. On top of

these challenges, it is required to integrate the sub-

components into a stable system. Because of these

complexities, today such systems are not in use in real

https://orcid.org/0000-0003-2874-0660

https://orcid.org/0000-0002-2614-8594

https://orcid.org/0000-0002-4830-4885

https://orcid.org/0000-0002-8083-0770

https://orcid.org/0000-0001-5124-744X

https://orcid.org/0000-0002-3931-116X

world unconstrained environments.

In many situations external cameras installed in

the rooms where the robot is operating can be ex-

ploited (see Figure 1(a)), since these are nowadays

available in many buildings. Given such a setup with

external cameras and a camera on the robot arm, we

address ﬁrst problem (1) of planning the optimal se-

ries of actions for a mobile manipulator to clear a

table, minimizing navigation and manipulation time.

The actions consists of navigating to a base pose and

grasping the object and storing it on the robot itself.

Our approach consists of three steps. First, a sym-

bolic world model of the scene is constructed, using

the information from the external cameras. The world

model is built and maintained by an anchoring pro-

cess which associates the information from the ex-

ternal cameras. The world model also contains the

pre-deﬁned grasp poses and the inverse reachability

map, which are needed to plan the base poses and

grasp choices. Next, we compute the grasping and

navigation costs. Finally, the contents of the world

model and the costs are used to solve the planning

problem using dynamic programming with memoiza-

tion. Compared to previous works we do not assume

the order in which the objects should be grasped to

1316

Sørensen, S., Naik, L., Nguyen, P., Kramberger, A., Bodenhagen, L., Kjærgaard, M. and Krüger, N.

Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks Using Dynamic Programming.

DOI: 10.5220/0012472700003636

Paper published under CC license (CC BY-NC-ND 4.0)

In Proceedings of the 16th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2024) - Volume 3, pages 1316-1323

ISBN: 978-989-758-680-4; ISSN: 2184-433X

Figure 1: (a): The robot is tasked with clearing the objects o

, o

...o

N−1

from the table. The external cameras, cam

and

cam

in the upper left and right corners provide an overview of the scene. (b): A world model is built using the information

from the external cameras. (c): Using the world model’s contents we propose a solution for computing a sequence of actions

for clearing the objects, minimizing execution time. In this case, the plan contains three base poses (smaller rectangles with

arrows) for picking the four objects (numbered dots inside the bigger rectangle, which is the table). (d): The robot executes

the plan and uses a eye-in-hand camera to accurately pose estimate the objects before grasping.

be known beforehand. To show the efﬁciency of our

method we also implement two baseline methods and

compare the execution costs.

The remainder of this paper is structured as fol-

lows: Section 2 discuss related works, Section 3

presents formally the problem we address, Section 4

presents our approach, and Section 5 presents the

results of evaluating the approach and comparing

it against two baseline methods. In Section 6 we

demonstrate our approach in an table-clearing appli-

cation addressing problems (1)-(4). Lastly, Section 7

discusses the results and concludes.

2 STATE OF THE ART

This work involves a few different disciplines. In

this section we describe the related works, starting

with world modeling (Section 2.1), explicit base pose

planning (Section 2.2), and ﬁnally sequence planning

(Section 2.3).

2.1 World Modeling

World modeling - and in particular perceptual an-

choring - using information from external, static and

robot cameras have been addressed in several con-

texts. Daoutis et al. (Daoutis et al., 2012) describe

a frame-work for cooperative perceptual anchoring,

in a setup of stationary agents (e.g., external cam-

eras) and a mobile agent (e.g., a mobile manipula-

tor). (Wong et al., 2015) use clustering to associate

data from multiple viewpoints. They use several vari-

ations of a Dirichlet process mixture-model and incor-

porate information about the different views. In this

work we use the Jonker-Volgenant algorithm (Crouse,

2016) and a similarity measure based on Euclidean

distance and class.

2.2 Explicit Base Pose Planning

The selection of a base pose for grasping an object

depends on the existence of a valid Inverse Kinemat-

ics Solution to attain the desired grasp pose. Typi-

cally, this is accomplished through the utilization of

Inverse Reachability Maps (IRM) (Makhal and Goins,

2018; Vahrenkamp et al., 2013). However, employing

IRM for planning an appropriate base pose is com-

putationally demanding, primarily due to the inher-

ent complexity of the 6D search space. Hence, recent

works (Jauhri et al., 2022) have introduced learning-

based approaches to predict base poses for grasping.

Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks Using Dynamic Programming

1317

Some alternative approaches involve planning ma-

nipulator conﬁgurations in conjunction with the base

pose (Reister et al., 2022; Vafadar et al., 2018). Ad-

ditionally, certain methodologies also consider uncer-

tainty in both the robot’s localization and the object

pose during base pose planning (Meng et al., 2021;

Stulp et al., 2012).

In this work, we plan the optimal base pose for

grasping an object, while simultaneously consider-

ing the cost associated with navigating to the selected

base pose from the robot’s prior base pose and also the

future base poses that must be visited to successfully

complete the task.

2.3 Sequence Planning

The problem of planning base poses for a mobile ma-

nipulator and selecting the optimal execution order

of tasks (in this work, grasping) is a highly complex

one. The main reason is that there is often a large

number of possibilities to consider. Previous works

have tried to address this problem in a limited setup.

For example in (Reister et al., 2022) the grasping se-

quence is assumed to be known a priori and only lo-

cal optimization is performed i.e. minimize the time

to navigate to the next object and grasp it. Xu et al.

(Xu et al., 2020) address the problem in the context

of picking parts from trays for assembling. They se-

lect base poses from the intersections of base poses

from where the objects can be grasped to minimize

the movement of the mobile base, but do not consider

manipulation time. Further, to compensate for local-

ization uncertainty they ignore intersections smaller

than a certain threshold. The ﬁnal plan is then found

by computing the shortest path between all of the base

poses. (Harada et al., 2015) have also tried to mini-

mize the number of base poses without considering

the combined time cost of navigation and manipula-

tion.

In this work, we present a method that provides

both the base poses and the optimal sequence in which

these poses should be visited. Notably, our approach

does not make assumptions inherent in prior works,

such as the sequence in which objects should be

grasped is known or the assumption that minimal base

poses inevitably lead to an optimal solution.

3 PROBLEM DEFINITION

We use in this work three types of poses: Objects pose

is the 6D pose of e.g., an object. The grasp pose of

an object is the pose the gripper should be in to grasp

the object. Finally, a base pose refers to the SE2 pose

of the mobile base of the robot. We assume that we

have up to M external stationary cameras as shown in

Figure 1(a). Further, we assume that there are N in-

stances of different objects with poses o

, o

, ..., o

the table that the robot needs to grasp and that each

object has a grasp pose g

, g

, ..., g

. It is assumed

that all external and robot cameras are RGB-D cam-

eras with RGB and depth information, and that there

are pipelines in place to detect and pose estimate ob-

jects from the RGB and depth information.

The problem can then be formulated as: Given N ob-

jects with poses o

, o

, ..., o

N−1

on the table, ﬁnd ﬁrst

for each object the set A

of possible actions, where

each action contains an object grasp pose g

and a

base pose b

, {b

, g

}, as shown in Equation (1).

= {{b

, g

}}

k=1

= {{b

, g

}}

k=1

= {{b

, g

}}

k=1

(1)

Then, taking the union of all A

, A

, ..., A

yields

A, as shown in Equation (2), which contains all pos-

sible actions.

A = A

∪ A

∪ ... ∪ A

(2)

Then, from the power set P (A), which gives all

subsets of A including

0 and A itself, we can con-

struct a set D, as shown in Equation (3) by keeping

only valid sets of P (A). A valid set a of actions in-

cludes grasping all objects (|a ∩ A | = N) and each ob-

ject only once (|a∩A

| = 1 ∀ n). a may have the same

base pose across multiple actions.

D =



s ∈ P (A)



|s ∩ A| = N and |s ∩ A

| = 1 ∀ n



(3)

D contains all valid sets of actions which satisﬁes

the aforementioned criteria, which means that they

can be executed in one sequence. Each set of ac-

tions in D can be executed in N! different sequences.

Therefore, we take all possible sequences of each el-

ement of D , which yields the set S of all valid se-

quences in which actions can be executed.

Equation (4) shows an example of a sequence s

where four objects with grasp poses g

, g

are

grasped from three base poses b

, b

s = ({b

, g

}, {b

, g

}, {b

, g

}, {b

, g

}) (4)

For a sequence s ∈ S we deﬁne s(i) to be the ith

action in the sequence, g(s(i)) to be the grasp pose

of the object to be grasped for that action and b(s(i))

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1318

to be the base pose from where the object should be

grasped.

Finally, the problem is to ﬁnd the sequence s

∗

which reduces the overall task execution time. Equa-

tion (5) formally states the problem, where C

seq

(s) is

the total execution cost of a sequence s which is de-

ﬁned in Section 4.3.1.

∗

= argmin

s∈S

seq

(s) (5)

4 PROPOSED APPROACH

Figure 2: The components of our approach.

Our approach has three steps as shown in Figure 2.

First, we construct a world model from the data from

the external cameras, to have a stable representation

of the objects in the scene, and augment it with pre-

deﬁned grasp poses and inverse reachability maps.

Next, we compute the grasping and navigation costs

for use in the planning algorithm. Finally, the plan-

ning algorithm uses the information from the world

model and the costs to determine the time-optimal ac-

tion sequence of base poses and grasp choices. We

assume that the following information is obtained be-

forehand:

• The dimensions of the table and its pose

• The 6D transformations between the cameras

In the following sections, we will describe the

steps of the approach in more detail, starting with the

world model.

4.1 The World Model

The world model (WM) has two important functions:

(1) It provides a stable symbolic representation of the

objects and the table, and (2) it contains predeﬁned

grasp poses g

, g

, ..., g

, one for each object class

, c

, ..., c

and an inverse reachability map. This en-

ables it to provide all the necessary information for

solving the planning problem. Each object is repre-

sented with an anchor α

= {o

, c

}, where o

is the

estimated 6D pose in the world frame and c

is the

object’s class. The world model receives object detec-

tions and positions estimates from the object detector

for each of the different camera observations. These

are then processed by a data association (DA) pro-

cess to determine which measurements originate from

the same object. The DA process uses the Jonker-

Volgenant algorithm (Crouse, 2016) as described in

(Sørensen and Kjærgaard, 2023).

The associations are then fed to the multi-view 6D

object pose estimation process that fuses the poses es-

timate of the associated objects to obtain a full 6D

pose o

for each object. Finally, each anchor candi-

date will either initiate the creation of a new anchor

or be associated to an existing depending on the state

of the world model, using an anchoring process. The

anchoring process starts by comparing each candidate

to each anchor in the WM using the same similarity

measure as the DA process (Sørensen and Kjærgaard,

2023). Next, it also uses the Jonker-Volgenant al-

gorithm (Crouse, 2016) to compute the assignments

of new observations to anchors. New observations

which are not associated by the algorithm will instan-

tiate new anchors.

4.1.1 Predeﬁned Grasps and Inverse

Reachability Maps

As mentioned in Section 4.1 the WM contains a pre-

deﬁned a grasp pose g

for each object class c

. The

grasp pose is deﬁned in the object frame.

In addition to the predeﬁned grasps, the WM also

contains an Inverse Reachability Map (IRM). Each

robot manipulator has a unique IRM (Makhal and

Goins, 2018; Vahrenkamp et al., 2013) that contains

suitable robot base poses for reaching a grasp pose

. The map is continuous but in order to make the

planning problem feasible, we discretize it in transla-

tion using step-size λ and orientation using step-size

φ. Thus, for a grasp pose g

for o

in the robot base

frame, it contains a set of 6D base poses (Makhal and

Goins, 2018) such that from each base pose there ex-

ists an inverse kinematic solution to the grasp pose.

Figure 3 shows a subset of base poses for the UR5e

robot in its Inverse Reachability Map. The red cube in

the middle of the ﬁgure represents the object o

with

grasp pose g

. b

, b

, ..., b

are the base poses from

where the UR5e can grasp the object.

Using the grasp pose g

and pose o

of each ob-

ject, we can compute an IRM for each object. Each of

these are then ﬁltered, yielding an IRM for each ob-

Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks Using Dynamic Programming

1319

Figure 3: Inverse Reachability Map for UR5e robot as it is

stored in the world model. Note that only subset of base

poses have been shown here.

ject, which contains only the robot base poses where

the mobile base is on the ground and not in collision

with the table. From these, the WM can now pro-

vide the sets A

, A

, ..., A

= A of possible actions

for each of the N objects.

4.2 Cost Computation

The next step in our approach is to compute the

navigation and grasping costs. It involves approx-

imating the cost C

grasp

(g(s(i))) of grasping object

g(s(i)) from base pose b(s(i)) using grasp pose g

and

the cost C

nav

(b(s(i − 1)), b(s(i))) of navigating from

b(s(i − 1)) to b(s(i)). To begin with, the time cost

grasp

(g(s(i))) of grasping the object from each valid

base pose in the A is computed. Figure 4 illustrates

the possible base poses for grasping the objects at dif-

ferent locations on the table.

Once the grasping costs are computed for all the

objects in the scene, we compute the navigation cost

nav

(b(s(i − 1)), b(s(i))). The navigation cost esti-

mates the time required for navigating between the

two base poses. Therefore, for all the K unique base

poses across A, the navigation costs need to be com-

puted for the K × K different combinations.

Thus, if the robot is at base pose b(s(i − 1)) the

total cost of performing the next action s(i) is given

by:

action

(s(i − 1), s(i)) =

nav

(b(s(i − 1)), g(s(i))) +C

grasp

(g(s(i)))

(6)

4.3 Computing the Sequence

The core challenge now is to compute the base poses

and object grasping sequence. We formulate this as an

optimization problem, where the objective is to min-

imize navigation and manipulation time and solve it

using dynamic programming with memoization. The

algorithm has three inputs described previously in

Sections 4.1 and 4.2:

• A set S of all possible valid action sequences.

• Estimated action costs C

action

(s(i − 1), s(i))

Finally, the base poses and navigation costs are

used in a dynamic programming with memoization-

algorithm (DP) (Held and M. Karp, 1962). Fig-

ure 1(c) illustrates the solution to the scene shown in

Figure 1(a). The algorithm is described next.

4.3.1 The Optimization Algorithm

As stated in the problem deﬁnition in Section 3, the

algorithm has to compute an ordered sequence s

∗

N object actions. The sequence should minimize the

total cost C

seq

∗

) of all actions in s

∗

seq

(s) =

|s|

∑

i=1

action

(s(i − 1), s(i)) (7)

where the C

action

(s(i − 1), s(i)) consists of manipula-

tion and navigation costs as described in Section 4.2.

For i = 1 we deﬁne

action

(s(0), s(1)) =

nav

, b(s(1))) +C

grasp

(g(s(1)))

(8)

where b

is the robot’s initial base pose. The grasp-

ing cost C

grasp

(s(i)) is set to a high cost if g(s(i)) not

reachable from b(s(i)). To solve the minimization

problem, we deﬁne a function C

(s(i)) representing

the cost of executing all actions in a sequence s up to

and including s(i) as:

(s(i)) =

(

action

(s(i − 1), s(i)) if i = 1

mincost(s(i)) +C

grasp

(s(i)) if i > 1

(9)

which depends on a function mincost(s(i)), see

Equation (10), that ﬁnds the previous action s(i − 1)

which has the minimum cost, i.e. the minimum time

to grasp the previous object from the previous base

pose and to navigate from there to the current base

pose.

mincost(s(i)) =

argmin

s(i−1)

nav

(s(i − 1), s(i)) +C

(s(i − 1))

(10)

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1320

Figure 4: Manipulation costs computed for different objects in the scene.

Figure 5: Base poses (green arrows) from where at least

one object (pink arrows) could be grasped. Navigation cost

is computed between these base poses.

Note that mincost(s(i)) depends on the cost of

the previous actions C

(s(i − 1)) and thus the prob-

lem is solved recursively in a top-down fashion. The

solution is calculated in polynomial time using dy-

namic programming with memoization. There are

two phases. First, the cost C

seq

(s) is computed recur-

sively using C

(s(i)) for the all unique actions (valid

combinations of a grasp and base pose) at the end of

a sequence in D (which contains all valid sequences,

see Equation (3) in Section 3). Thus, for all sequences

ending with a particular action, we now have the one

with the minimum cost. The computational and mem-

ory requirements are signiﬁcantly reduced by using

memoization, i.e., storing and reusing the cost of par-

tial sequences that appear in multiple s ∈ D. This is

possible because the nature of the problem allows it;

For a given action, s(i), the cost of all possible actions

depends on the previous actions taken (i.e., which ob-

jects have already been grasped and from which base

pose(s)). Finally, the optimal sequence s

∗

is simply

the one with the lowest cost C

seq

(s).

5 RESULTS

In this section we evaluate the execution cost of

the proposed approach and compare it against two

baseline methods; One heuristic-based and one more

naive method. We generate 25 scenes with ﬁve

randomly placed objects and a random starting pose

for the robot and apply the three methods. The next

sections describe the details of the evaluation.

Prerequisites: Experiments were conducted on

ﬁve different objects from the YCB benchmark for

manipulation (Calli et al., 2015). Top grasp poses

were selected for grasping all the objects. Inverse

Reachability Maps were computed with a resolution

of λ cm translation and φ degrees rotation. Object

pose estimates were determined using two RealSense

D455

RGB-D cameras using (Naik et al., 2022).

The costs were computed using the NVIDIA Isaac

simulator

. For the manipulation cost, grasp execu-

tion time was computed using the Lula Trajectory

generator. The navigation cost between the two base

poses was computed using a heuristic based on the

distance between the two bases while considering the

geometry of the robot and the scene.

Baselines: We compared our method against two

https://www.intelrealsense.com/depth-camera-d455

https://developer.nvidia.com/isaac-sim

Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks Using Dynamic Programming

1321

Figure 6: Box plots of the execution cost of the dynamic

programming and the heuristic-based solutions, using λ =

0.25 m discretization in translation and φ = 45

◦

in orien-

tation. The median is shown with a horizontal line in the

boxes and the means with green triangles.

baselines. The ﬁrst baseline is a heuristic-based

method. This method selects the base with the lowest

navigation cost from its current pose and grasps all

possible objects from there. It then continues to select

the next base pose from there where new objects can

be grasped. The other baseline is simpler. It ﬁrst

selects the nearest base base pose along the table’s

edge and then in turn selects base poses clockwise

around the table, checking every q cm if any objects

can be grasped.

Experimental Setup: The scenes used for evaluation

are generated by sampling a random pose for ﬁve dif-

ferent objects (a mug, bleach bottle, mustard bottle,

bowl and a cracker box) on the table’s surface. A ran-

dom collision-free starting base pose for the robot is

also sampled for each scene.

5.1 Quantitative Results

Figure 6 shows box plots of the execution cost for

the proposed method and the two baselines. The

colored boxes extends from the 1st to the 3rd quartile

with the line showing the median and the green

triangles showing the mean execution cost. It can be

seen that the execution cost of our approach is 40%

lower than the naive baseline and 33% lower than the

heuristic-based baseline.

Ablation Study: Cost Computation Time

The most time-consuming part of our approach is the

cost computation. Figure 7 shows box plots for the

time used for computing costs, the execution cost and

the percentage of grasped objects, for a scene with

Figure 7: Computation time for costs with different levels

of discretization of translation. The discretization of the

orientation is kept constant at 45

◦

. The percentages in red

tells how many of the 125 objects across the 25 scenes were

reachable.

different translation discretization steps, λ. It shows

that even with a rather low resolution of λ = 0.25 m

the cost computation time is only around 18 seconds

and still the execution time and percentage of objects

grasped is comparable to smaller values of λ.

6 APPLICATION

In addition to the evaluation in Section 5, we have

also demonstrated our approach in a lab environment

covering the full complexity of the problem at hand.

The setup consists of two RGB-D cameras mounted

on the walls and a mobile manipulator with a grip-

per and an RGB-D camera, as shown in Figure 1(a).

First, for each camera, the objects on the table are de-

tected using Detectron2 (Wu et al., 2019) and their

position estimated. This information is the sent to

the world model, which performs the data association.

The pose estimation pipeline then uses the association

to perform a 6D pose estimate of the objects, which

is then returned to the world model, see Figure 1(b).

We then run the planning algorithm as described in

Section 4 to obtain the table-clearing plan, see Fig-

ure 1(c). The robot then executes the plan by ﬁrst

driving to the ﬁrst base pose in the plan and starting

a visual exploration to achieve a good view of the ob-

ject to grasp and a pose estimate with low uncertainty.

When the uncertainty is below a pre-deﬁned thresh-

old, the robot then attempts to grasp the object, see

Figure 1(d). This is repeated for all base poses and

objects in the plan. A video of the demonstration can

be viewed on YouTube

https://youtu.be/NcxZ-c6hcvQ

ICAART 2024 - 16th International Conference on Agents and Artiﬁcial Intelligence

1322

7 CONCLUSION

In this paper, we have presented a novel approach to

planning a table-clearing task for a mobile manipula-

tor, in a setup with external cameras and a robot cam-

era. We ﬁrst model the scene in a world model us-

ing the information from the external cameras. Based

on the world model, we use dynamic programming

to plan an sequence of bases poses and grasp choices

which minimize the overall execution time. Evalu-

ating the approach on 25 different scenes and com-

paring it to two baseline methods shows that our ap-

proach computes plans with a 33% lower execution

cost than the heuristic-based baseline and 40% lower

than the naive approach. Limitations of our approach,

which should be addressed in future work, include the

cost computation time and including objects’ geome-

try in the planning, to avoid collisions between ob-

jects when removing them from the table.

ACKNOWLEDGMENTS

This work is funded by the Innovation Fund Denmark

through the FacilityCobot project. The authors would

also like to thank the I4.0 lab of The University of

Southern Denmark for lending us the robot used in

this work.

REFERENCES

Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel,

P., and Dollar, A. M. (2015). Benchmarking in

manipulation research: The ycb object and model

set and benchmarking protocols. arXiv preprint

arXiv:1502.03143.

Crouse, D. F. (2016). On implementing 2D rectangu-

lar assignment algorithms. IEEE Transactions on

Aerospace and Electronic Systems, 52(4):1679–1696.

Daoutis, M., Coradeschi, S., and Loutﬁ, A. (2012). Coop-

erative Knowledge Based Perceptual Anchoring. In-

ternational Journal on Artiﬁcial Intelligence Tools,

21(03):1250012.

Harada, K., Tsuji, T., Kikuchi, K., Nagata, K., Onda, H.,

and Kawai, Y. (2015). Base position planning for

dual-arm mobile manipulators performing a sequence

of pick-and-place tasks. In 2015 IEEE-RAS 15th

International Conference on Humanoid Robots (Hu-

manoids), pages 194–201. IEEE.

Held, M. and M. Karp, R. (1962). A Dynamic Program-

ming Approach to Sequencing Problems. Journal of

the Society for Industrial and Applied Mathematics,

10(1):196–210.

Jauhri, S., Peters, J., and Chalvatzaki, G. (2022). Robot

Learning of Mobile Manipulation With Reachability

BehaviorPriors. IEEE RA-L, 7(3):8399–8406.

Khatib, O. (1999). Mobile manipulation: The robotic assis-

tant. Robotics and Autonomous Systems, 26(2-3):175–

183.

Makhal, A. and Goins, A. K. (2018). Reuleaux: Robot base

placement by reachability analysis. In 2018 Second

IEEE International Conference on Robotic Comput-

ing (IRC), pages 137–142. IEEE.

Meng, Y., Chen, Y., and Lou, Y. (2021). Uncertainty aware

mobile manipulator platform pose planning based on

capability map. In 2021 IEEE International Confer-

ence on Real-time Computing and Robotics (RCAR),

pages 123–128. IEEE.

Naik, L., Iversen, T. M., Kramberger, A., Wilm, J., and

Krueger, N. (2022). Multi-view object pose distribu-

tion tracking for pre-grasp planning on mobile robots.

In 2022 International Conference on Robotics and Au-

tomation (ICRA), pages 1554–1561.

Reister, F., Grotz, M., and Asfour, T. (2022). Combining

navigation and manipulation costs for time-efﬁcient

robot placement in mobile manipulation tasks. IEEE

RA-L, 7(4):9913–9920.

Stulp, F., Fedrizzi, A., M

osenlechner, L., and Beetz, M.

(2012). Learning and reasoning with action-related

places for robust mobile manipulation. Journal of Ar-

tiﬁcial Intelligence Research, 43:1–42.

Sørensen, S. L. and Kjærgaard, M. B. (2023). Quantifying

the Accuracy of Collaborative IoT and Robot Sensing

in Indoor Settings of Rigid Objects. Under review,

page 8.

Vafadar, S., Olabi, A., and Panahi, M. S. (2018). Optimal

motion planning of mobile manipulators with mini-

mum number of platform movements. In IEEE Int.

Conf. on Industrial Technology.

Vahrenkamp, N., Asfour, T., and Dillmann, R. (2013).

Robot placement based on reachability inversion. In

IEEE ICRA.

Wong, L. L., Kaelbling, L. P., and Lozano-P

erez, T. (2015).

Data association for semantic world modeling from

partial views. The International Journal of Robotics

Research, 34(7):1064–1082.

Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., and Gir-

shick, R. (2019). Detectron2. https://github.com/

facebookresearch/detectron2.

Xu, J., Harada, K., Wan, W., Ueshiba, T., and Domae, Y.

(2020). Planning an efﬁcient and robust base sequence

for a mobile manipulator performing multiple pick-

and-place tasks. In 2020 IEEE International Con-

ference on Robotics and Automation (ICRA), pages

11018–11024.

Planning Base Poses and Object Grasp Choices for Table-Clearing Tasks Using Dynamic Programming

1323