Researchers commonly employ reinforcement learning, a trial-and-error method where an agent is incentivized for actions that bring it closer to a goal, to train AI agents in new tasks, such as opening a kitchen cabinet.
Traditionally, a reward function, a motivating mechanism designed by experts, is used to guide the agent’s exploration. However, updating this function incrementally as the agent learns can be time-consuming and challenging, particularly for complex tasks.
A novel approach developed by researchers from MIT, Harvard University, and the University of Washington introduces a reinforcement learning strategy that doesn’t rely on a pre-defined reward function. Instead, it leverages input from inexperienced users to direct the agent’s learning process.
Although user-generated data may contain errors, this method enables faster learning compared to approaches that utilize non-expert feedback, despite the noisy nature of the input.
This innovative technique allows for simultaneous feedback collection from novice users worldwide, facilitating the training of the agent through diverse perspectives.
By crowdsourcing the design of reward functions and incorporating non-expert feedback, the researchers aim to scale up robot learning, as stated by Pulkit Agrawal from MIT.
This approach could revolutionize how robots learn to perform tasks autonomously, guided by a diverse group of non-experts rather than predefined reward functions.
The new reinforcement learning strategy, named HuGE (Human Guided Exploration), divides the learning process into two steps: one guided by public feedback for goal selection and the other where the agent explores autonomously based on this guidance.
HuGE has shown promising results in training robots for various tasks both in simulations and real-world scenarios, outperforming traditional methods in terms of learning speed and accuracy.
The researchers emphasize the importance of aligning AI agents with human values and plan to enhance HuGE further to incorporate different forms of communication and enable multi-agent training simultaneously.