mlagents.trainers.trainer.on_policy_trainer
OnPolicyTrainer
- __init__
- add_policy
mlagents.trainers.trainer.off_policy_trainer
OffPolicyTrainer
- __init__
- save_model
- save_replay_buffer
- load_replay_buffer
- add_policy
mlagents.trainers.trainer.rl_trainer
RLTrainer
- end_episode
- create_optimizer
- save_model
- advance
mlagents.trainers.trainer.trainer
Trainer
- __init__
- stats_reporter
- parameters
- get_max_steps
- get_step
- threaded
- should_still_train
- reward_buffer
- save_model
- end_episode
- create_policy
- add_policy
- get_policy
- advance
- publish_policy_queue
- subscribe_trajectory_queue
mlagents.trainers.settings
deep_update_dict
RewardSignalSettings
- structure
ParameterRandomizationSettings
- __str__
- structure
- unstructure
- apply
ConstantSettings
- __str__
- apply
UniformSettings
- __str__
- apply
GaussianSettings
- __str__
- apply
MultiRangeUniformSettings
- __str__
- apply
CompletionCriteriaSettings
- need_increment
Lesson
EnvironmentParameterSettings
- structure
TrainerSettings
- structure
CheckpointSettings
- prioritize_resume_init
RunOptions
- from_argparse

mlagents.trainers.trainer.on_policy_trainer

OnPolicyTrainer Objects

class OnPolicyTrainer(RLTrainer)

The PPOTrainer is an implementation of the PPO algorithm.

init

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an on-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

add_policy

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

Arguments:

parsed_behavior_id: Behavior identifiers that the policy should belong to.
policy: Policy to associate with name_behavior_id.

mlagents.trainers.trainer.off_policy_trainer

OffPolicyTrainer Objects

class OffPolicyTrainer(RLTrainer)

The SACTrainer is an implementation of the SAC algorithm, with support for discrete actions and recurrent networks.

init

 | __init__(behavior_name: str, reward_buff_cap: int, trainer_settings: TrainerSettings, training: bool, load: bool, seed: int, artifact_path: str)

Responsible for collecting experiences and training an off-policy model.

Arguments:

behavior_name: The name of the behavior associated with trainer config
reward_buff_cap: Max reward history to track in the reward buffer
trainer_settings: The parameters for the trainer.
training: Whether the trainer is set for training.
load: Whether the model should be loaded.
seed: The seed the model will be initialized with
artifact_path: The directory within which to store artifacts from this trainer.

save_model

 | save_model() -> None

Saves the final training model to memory Overrides the default to save the replay buffer.

save_replay_buffer

 | save_replay_buffer() -> None

Save the training buffer's update buffer to a pickle file.

load_replay_buffer

 | load_replay_buffer() -> None

Loads the last saved replay buffer from a file.

add_policy

 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

mlagents.trainers.trainer.rl_trainer

RLTrainer Objects

class RLTrainer(Trainer)

This class is the base class for trainers that use Reward Signals.

end_episode

 | end_episode() -> None

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_optimizer

 | @abc.abstractmethod
 | create_optimizer() -> TorchOptimizer

Creates an Optimizer object

save_model

 | save_model() -> None

Saves the policy associated with this trainer.

advance

 | advance() -> None

Steps the trainer, taking in trajectories and updates if ready. Will block and wait briefly if there are no trajectories.

mlagents.trainers.trainer.trainer

Trainer Objects

class Trainer(abc.ABC)

This class is the base class for the mlagents_envs.trainers

init

 | __init__(brain_name: str, trainer_settings: TrainerSettings, training: bool, load: bool, artifact_path: str, reward_buff_cap: int = 1)

Responsible for collecting experiences and training a neural network model.

Arguments:

brain_name: Brain name of brain to be trained.
trainer_settings: The parameters for the trainer (dictionary).
training: Whether the trainer is set for training.
artifact_path: The directory within which to store artifacts from this trainer
reward_buff_cap:

stats_reporter

 | @property
 | stats_reporter()

Returns the stats reporter associated with this Trainer.

parameters

 | @property
 | parameters() -> TrainerSettings

Returns the trainer parameters of the trainer.

get_max_steps

 | @property
 | get_max_steps() -> int

Returns the maximum number of steps. Is used to know when the trainer should be stopped.

Returns:

The maximum number of steps of the trainer

get_step

 | @property
 | get_step() -> int

Returns the number of steps the trainer has performed

Returns:

the step count of the trainer

threaded

 | @property
 | threaded() -> bool

Whether or not to run the trainer in a thread. True allows the trainer to update the policy while the environment is taking steps. Set to False to enforce strict on-policy updates (i.e. don't update the policy when taking steps.)

should_still_train

 | @property
 | should_still_train() -> bool

Returns whether or not the trainer should train. A Trainer could stop training if it wasn't training to begin with, or if max_steps is reached.

reward_buffer

 | @property
 | reward_buffer() -> Deque[float]

Returns the reward buffer. The reward buffer contains the cumulative rewards of the most recent episodes completed by agents using this trainer.

Returns:

the reward buffer.

save_model

 | @abc.abstractmethod
 | save_model() -> None

Saves model file(s) for the policy or policies associated with this trainer.

end_episode

 | @abc.abstractmethod
 | end_episode()

A signal that the Episode has ended. The buffer must be reset. Get only called when the academy resets.

create_policy

 | @abc.abstractmethod
 | create_policy(parsed_behavior_id: BehaviorIdentifiers, behavior_spec: BehaviorSpec) -> Policy

Creates a Policy object

add_policy

 | @abc.abstractmethod
 | add_policy(parsed_behavior_id: BehaviorIdentifiers, policy: Policy) -> None

Adds policy to trainer.

get_policy

 | get_policy(name_behavior_id: str) -> Policy

Gets policy associated with name_behavior_id

Arguments:

name_behavior_id: Fully qualified behavior name

Returns:

Policy associated with name_behavior_id

advance

 | @abc.abstractmethod
 | advance() -> None

Advances the trainer. Typically, this means grabbing trajectories from all subscribed trajectory queues (self.trajectory_queues), and updating a policy using the steps in them, and if needed pushing a new policy onto the right policy queues (self.policy_queues).

publish_policy_queue

 | publish_policy_queue(policy_queue: AgentManagerQueue[Policy]) -> None

Adds a policy queue to the list of queues to publish to when this Trainer makes a policy update

Arguments:

policy_queue: Policy queue to publish to.

subscribe_trajectory_queue

 | subscribe_trajectory_queue(trajectory_queue: AgentManagerQueue[Trajectory]) -> None

Adds a trajectory queue to the list of queues for the trainer to ingest Trajectories from.

Arguments:

trajectory_queue: Trajectory queue to read from.

mlagents.trainers.settings

deep_update_dict

deep_update_dict(d: Dict, update_d: Mapping) -> None

Similar to dict.update(), but works for nested dicts of dicts as well.

RewardSignalSettings Objects

@attr.s(auto_attribs=True)
class RewardSignalSettings()

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a Dict of RewardSignalSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of RewardSignalSettings classes.

ParameterRandomizationSettings Objects

@attr.s(auto_attribs=True)
class ParameterRandomizationSettings(abc.ABC)

str

 | __str__() -> str

Helper method to output sampler stats to console.

structure

 | @staticmethod
 | structure(d: Union[Mapping, float], t: type) -> "ParameterRandomizationSettings"

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure(). This is needed to handle the special Enum selection of ParameterRandomizationSettings classes.

unstructure

 | @staticmethod
 | unstructure(d: "ParameterRandomizationSettings") -> Mapping

Helper method to a ParameterRandomizationSettings class. Meant to be registered with cattr.register_unstructure_hook() and called with cattr.unstructure().

apply

 | @abc.abstractmethod
 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the appropriate sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

ConstantSettings Objects

@attr.s(auto_attribs=True)
class ConstantSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the constant sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

UniformSettings Objects

@attr.s(auto_attribs=True)
class UniformSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the uniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

GaussianSettings Objects

@attr.s(auto_attribs=True)
class GaussianSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the gaussian sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

MultiRangeUniformSettings Objects

@attr.s(auto_attribs=True)
class MultiRangeUniformSettings(ParameterRandomizationSettings)

str

 | __str__() -> str

Helper method to output sampler stats to console.

apply

 | apply(key: str, env_channel: EnvironmentParametersChannel) -> None

Helper method to send sampler settings over EnvironmentParametersChannel Calls the multirangeuniform sampler type set method.

Arguments:

key: environment parameter to be sampled
env_channel: The EnvironmentParametersChannel to communicate sampler settings to environment

CompletionCriteriaSettings Objects

@attr.s(auto_attribs=True)
class CompletionCriteriaSettings()

CompletionCriteriaSettings contains the information needed to figure out if the next lesson must start.

need_increment

 | need_increment(progress: float, reward_buffer: List[float], smoothing: float) -> Tuple[bool, float]

Given measures, this method returns a boolean indicating if the lesson needs to change now, and a float corresponding to the new smoothed value.

Lesson Objects

@attr.s(auto_attribs=True)
class Lesson()

Gathers the data of one lesson for one environment parameter including its name, the condition that must be fulfilled for the lesson to be completed and a sampler for the environment parameter. If the completion_criteria is None, then this is the last lesson in the curriculum.

EnvironmentParameterSettings Objects

@attr.s(auto_attribs=True)
class EnvironmentParameterSettings()

EnvironmentParameterSettings is an ordered list of lessons for one environment parameter.

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Dict[str, "EnvironmentParameterSettings"]

Helper method to structure a Dict of EnvironmentParameterSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

TrainerSettings Objects

@attr.s(auto_attribs=True)
class TrainerSettings(ExportableSettings)

structure

 | @staticmethod
 | structure(d: Mapping, t: type) -> Any

Helper method to structure a TrainerSettings class. Meant to be registered with cattr.register_structure_hook() and called with cattr.structure().

CheckpointSettings Objects

@attr.s(auto_attribs=True)
class CheckpointSettings()

prioritize_resume_init

 | prioritize_resume_init() -> None

Prioritize explicit command line resume/init over conflicting yaml options. if both resume/init are set at one place use resume

RunOptions Objects

@attr.s(auto_attribs=True)
class RunOptions(ExportableSettings)

from_argparse

 | @staticmethod
 | from_argparse(args: argparse.Namespace) -> "RunOptions"

Takes an argparse.Namespace as specified in parse_command_line, loads input configuration files from file paths, and converts to a RunOptions instance.

Arguments:

args: collection of command-line parameters passed to mlagents-learn

Returns:

RunOptions representing the passed in arguments, with trainer config, curriculum and sampler configs loaded from files.

Table of Contents

mlagents.trainers.trainer.on_policy_trainer

OnPolicyTrainer Objects

__init__

add_policy

mlagents.trainers.trainer.off_policy_trainer

OffPolicyTrainer Objects

__init__

save_model

save_replay_buffer

load_replay_buffer

add_policy

mlagents.trainers.trainer.rl_trainer

RLTrainer Objects

end_episode

create_optimizer

save_model

advance

mlagents.trainers.trainer.trainer

Trainer Objects

__init__

stats_reporter

parameters

get_max_steps

get_step

threaded

should_still_train

reward_buffer

save_model

end_episode

create_policy

add_policy

get_policy

advance

publish_policy_queue

subscribe_trajectory_queue

mlagents.trainers.settings

deep_update_dict

RewardSignalSettings Objects

structure

ParameterRandomizationSettings Objects

__str__

structure

unstructure

apply

ConstantSettings Objects

__str__

apply

UniformSettings Objects

__str__

apply

GaussianSettings Objects

__str__

apply

MultiRangeUniformSettings Objects

__str__

apply

CompletionCriteriaSettings Objects

need_increment

Lesson Objects

EnvironmentParameterSettings Objects

structure

TrainerSettings Objects

structure

CheckpointSettings Objects

prioritize_resume_init

RunOptions Objects

from_argparse

init

init

init

str

str

str

str

str