Reinforcement Learning Guide

SLIM is, in principle, a game with different agents optimising their profits. We support this by encapsulating the simulator state within a RL environment. However as Gym’s API was not designed for MARL scenarios we decided to adopt PettingZoo.

AEC and PettingZoo

Alternated Environment Cycle or AEC is a type of MARL in which multiple agents act in turns exactly once before the turn is over. The order in which they act is irrelevant.

Note

Currently, we adopt a simple AEC scheme which does not allow for parallel agent execution.

Each agent performs two actions, in the given order:

samples from the observation space
performs an action

Only after all agents have performed an action farms’ spaces will be updated. Each agent can only predicate about its own space, and only has access to a limited subset of what the simulator models. In particular, the simulator exposes to an agent the following:

current lice aggregation;
fish population;
which treatments are being used;
how many treatments can still be used within the year;
whether the organisation has asked to treat;

The action space is made of \(T+2\) actions with \(T\) being the number of available treatments. The two extra options are fallowing and inaction.

The main logic is implemented in slim.simulation.simulator.SimulatorPZEnv.

Policies

A number of policies are defined in slim.simulation.simulator. These are namely:

No treatment policy (slim.simulation.simulator.UntreatedPolicy )
Bernoullian policy, i.e. each farm will randomly cooperate and randomly apply a treatment of choice (slim.simulation.simulator.BernoullianPolicy )
Mosaic policy, i.e. each farm will apply a different treatment whenever requested to (slim.simulation.simulator.MosaicPolicy )

Additionally, any policy within the stable-baselines package should be supported although they have not been tested yet.

The main policy prediction loop is performed inside slim.simulation.simulator.Simulator.

To select a policy one needs to set the treatment_strategy option in the configuration.

For example:

slim run \
    output_folder/Loch_Fyne \
    config_data/Fyne \
    --treatment-strategy=bernoulli

from slim.simulation.config import Config
from slim.simulation.simulator import Simulator

cfg = Config(
    "config_data/config.json",
    "config_data/Fyne",
    name="Fyne_foobar",
    treatment_strategy="bernoulli")
sim = Simulator("output", cfg)
sim.run_model()

See Environment-specific configuration schema for details.