How do Robots Learn Through Exploration?
A conversation with Professor Nathan Michael, Chief Technology Officer.
How does learning appear within the context of coordinated exploration?
Within the context of exploration, learning emerges the more the system operates. The system improves with operation because the more that it operates, the more experience it acquires about how the actions it takes given the appearance and nature of the environment will impact the amount and quality of information it is able to gather. And so the more the system engages, the better idea it has of how future actions will improve its performance.
This is true for both single robot exploration and multi-robot exploration. In the multi-robot case, it’s not just this question of how the robots work together instantaneously in order to acquire information, but who should do what, and what resources should be allocated to specific exploration tasks. This problem of resource allocation and task assignment emerges. The system must determine which robot should do what based on the current conditions and the characteristics of each robot.
There are a lot of complexities and challenges associated with this. One of those challenges is communication. When groups of robots work together, each robot may be learning things differently. Each of those different robots will have potentially learned different things in different ways. This presents a challenge when they share the information they have learned. They need to be able to come together and transfer that learning between each other. The information shared must be made consistent in order to ensure that as the robots work together as a larger group they can rely upon what each has learned in order to improve their performance collectively. This idea that the robots are learning independently as they engage in the environment and then must share their different observations to understand and arrive at a common framework is actually a pretty hard problem.
Is there a law of diminishing returns factored into the decision of how many robots is the optimal amount to search an environment?
Yes. In determining the optimal number of agents to explore an environment, the system is weighing the anticipated amount of information that will be acquired against the amount of time or energy that is expended. The system’s underlying algorithms are taking into account what the system is doing, and they are asking the following question: “Assume there are n robots. What happens if 1 additional robot is added? What additional value is there in exploration with n+1 robots? How much value would n+2 robots add?” And so forth. The system is constantly asking that question, but it’s doing it in such a way that it’s thinking about it not just instantaneously but over some finite time horizon -- over some time window and based on the current state of the system. If it sees that there is some gain to be made to introduce a new robot, then a new robot deploys. If the benefit is limited, then a robot may simply land while all the other robots continue. The consequence is that there is really not that much loss in terms of our speed or ability to explore the environment, but conversely now that robot has saved on energy.
How is the decision made for one robot to land as opposed to another robot within the system?
This is the question being asked by the resource allocation and task assignment problem. The idea is that the system is evaluating its resources -- resources that are being used now and resources that may be used in the future over some anticipated amount of time. It is evaluating how those resources will be used, what level of value they bring and the amount of information it expects to acquire compared to the amount of time or energy expended. If there is a diminishing return, it will manifest within this resource allocation problem because it’s cast as an optimization.
This problem is being run over and over again by the system. Anytime there is a team of robots operating, there will be some form of operation occurring in the background that is asking about the optimal use of these robots at each instance in time. As a consequence, each robot will be tasked to engage in particular types of activities based upon their individual characteristics and what they bring in terms of value. This may be based upon factors such as the robot’s sensing capabilities, perception abilities, computation power, battery life or proximity to the area of the environment that needs to be explored.
Connecting this back to the concept of reinforcement learning, the system is running this optimization in order to figure out which agent should do what at each instance in time in order to maximize some kind of notion of reward.