Skip to content

Table of Contents


UnityGymException Objects

class UnityGymException(error.Error)

Any error related to the gym wrapper of ml-agents.

UnityToGymWrapper Objects

class UnityToGymWrapper(gym.Env)

Provides Gym wrapper for Unity Learning Environments.


 | __init__(unity_env: BaseEnv, uint8_visual: bool = False, flatten_branched: bool = False, allow_multiple_obs: bool = False, action_space_seed: Optional[int] = None)

Environment initialization


  • unity_env: The Unity BaseEnv to be wrapped in the gym. Will be closed when the UnityToGymWrapper closes.
  • uint8_visual: Return visual observations as uint8 (0-255) matrices instead of float (0.0-1.0).
  • flatten_branched: If True, turn branched discrete action spaces into a Discrete space rather than MultiDiscrete.
  • allow_multiple_obs: If True, return a list of np.ndarrays as observations with the first elements containing the visual observations and the last element containing the array of vector observations. If False, returns a single np.ndarray containing either only a single visual observation or the array of vector observations.
  • action_space_seed: If non-None, will be used to set the random seed on created gym.Space instances.


 | reset() -> Union[List[np.ndarray], np.ndarray]

Resets the state of the environment and returns an initial observation. Returns: observation (object/list): the initial observation of the space.


 | step(action: List[Any]) -> GymStepResult

Run one timestep of the environment's dynamics. When end of episode is reached, you are responsible for calling reset() to reset this environment's state. Accepts an action and returns a tuple (observation, reward, done, info).


  • action object/list - an action provided by the environment


  • observation object/list - agent's observation of the current environment reward (float/list) : amount of reward returned after previous action
  • done boolean/list - whether the episode has ended.
  • info dict - contains auxiliary diagnostic information.


 | render(mode="rgb_array")

Return the latest visual observations. Note that it will not render a new frame of the environment.


 | close() -> None

Override _close in your subclass to perform any necessary cleanup. Environments will automatically close() themselves when garbage collected or when the program exits.


 | seed(seed: Any = None) -> None

Sets the seed for this env's random number generator(s). Currently not implemented.

ActionFlattener Objects

class ActionFlattener()

Flattens branched discrete action spaces into single-branch discrete action spaces.


 | __init__(branched_action_space)

Initialize the flattener.


  • branched_action_space: A List containing the sizes of each branch of the action space, e.g. [2,3,3] for three branches with size 2, 3, and 3 respectively.


 | lookup_action(action)

Convert a scalar discrete action into a unique set of branched actions.


  • action: A scalar value representing one of the discrete actions.


The List containing the branched actions.