Multi-Armed Bandit Problem and Its Solutions

In probability theory and decision-making under uncertainty, the multi-armed bandit problem presents a challenge where a limited set of resources must be wisely allocated among competing choices to maximize the expected gain. This is a classic reinforcement learning problem that perfectly embodies the exploration vs exploitation dilemma. Imagine we are facing a row of slot machines (also called one-armed bandits). We must make a series of decisions: which arms to play, how many times to play each arm, the order in which to play them, and whether to stick with the current arm or switch to another one....

March 22, 2024 · 7 min

Key Concepts In (Deep) Reinforcement Learning

Reinforcement Learning (RL) revolves around the interactions between an agent and its environment. The environment represents the world where the agent lives and takes action. At each step, the agent observes some information about the environment, makes decisions, and affects the environment through its actions. The agent also receives rewards from the environment, which indicate how well it is doing. The agent’s ultimate goal is to maximize the total rewards it receives, called return....

March 12, 2024 · 6 min