Researchers from MIT, Harvard University, and the University of Washington have developed an innovative reinforcement learning approach that forgoes the need for an expertly designed reward function. Instead, this new method utilizes crowdsourced feedback from many nonexpert users to guide an AI agent in learning to complete tasks, such as opening a kitchen cabinet.
Challenges in Traditional Reinforcement Learning
Traditionally, reinforcement learning involves a trial-and-error process where an AI agent is rewarded for actions that bring it closer to a goal. However, designing an effective reward function often requires considerable time and effort from human experts, especially for complex tasks involving many steps.
The New Approach: Leveraging Crowdsourced Feedback
This latest approach, developed by leading researchers, leverages feedback from nonexperts, overcoming the limitations of previous methods that struggle with noisy data from crowdsourced users. This novel technique allows for faster learning by the AI agent, despite the potential inaccuracies in the feedback.
Asynchronous Feedback for Global Input
The method also allows for asynchronous feedback, enabling contributions from nonexpert users around the world. This feature broadens the scope of input and facilitates diverse learning opportunities for the AI agent. Pulkit Agrawal, an assistant professor in the MIT Department of Electrical Engineering and Computer Science (EECS) and leader of the Improbable AI Lab at MIT CSAIL, emphasizes the scalability of robot learning through this crowdsourced approach.
Guided Exploration Instead of Direct Instruction
Marcel Torne ’23, a research assistant in the Improbable AI Lab and lead author of the study, explains that the new method focuses on guiding the agent’s exploration rather than dictating exact actions. This approach is beneficial even with somewhat inaccurate and noisy human supervision.
Decoupling Process and HuGE Method
The researchers have decoupled the learning process into two separate parts, each directed by its own algorithm, in a method they call HuGE (Human Guided Exploration). A goal selector algorithm, continuously updated with human feedback, guides the agent’s exploration. This feedback is not used directly as a reward function but rather as a guide for the agent’s actions.
Simulated and Real-World Applications
The HuGE method was tested in both simulated and real-world tasks, proving effective in learning tasks with long sequences of actions and training robotic arms for specific activities. The crowdsourced data from nonexperts showed better performance than synthetic data, indicating the method’s scalability.
Future Developments and Applications
Looking ahead, the research team aims to refine the HuGE method to include learning from natural language and physical interactions with robots. They also plan to apply this method in teaching multiple agents simultaneously. A related paper presented at the Conference on Robot Learning detailed an enhancement to HuGE, allowing AI agents to autonomously reset the environment for continuous learning.
Aligning AI with Human Values
The research emphasizes the importance of ensuring that AI agents are aligned with human values, a critical aspect in the development of AI and machine learning approaches. The potential applications of this new method are vast, promising to revolutionize the way AI agents learn and interact in various environments.