Interacting with the Simulation
This how-to guide will show you which functions you have to define in order to specify how the step function interacts with the SOFA simulation.
We will build this environment around this very basic scene description that just contains a controllable rigid object as it’s agent and an uncontrollable rigid object as the target.
from sofa_env.sofa_templates.rigid import ControllableRigidObject, RigidObject, RIGID_PLUGIN_LIST
from sofa_env.sofa_templates.scene_header import add_scene_header, SCENE_HEADER_PLUGIN_LIST
PLUGIN_LIST = RIGID_PLUGIN_LIST + SCENE_HEADER_PLUGIN_LIST
def createScene(root_node: Sofa.Core.Node) -> dict:
add_scene_header(root_node=root_node, plugin_list=PLUGIN_LIST)
controllable_object = ControllableRigidObject(
parent_node=root_node,
name="object",
pose=(50, 50, 0, 0, 0, 0, 1),
)
target = RigidObject(
parent_node=root_node,
name="target",
pose=(50, 50, 0, 0, 0, 0, 1),
)
return {
"root_node": root_node,
"controllable_object": controllable_object,
"target": target,
}
Step
observation, reward, done, info = env.step(action)
is the main function to interact with the environment.
It applies the action to the sofa simulation, retrieves the next observation, calculates the reward, determines if the episode is done (environment is in a terminal state), and returns additional information as a dictionary.
def step(self, action: Any) -> Tuple[Union[np.ndarray, dict], float, bool, dict]:
rgb_observation = super().step(action)
observation = self._get_observation(rgb_observation)
reward = self._get_reward()
done = self._get_done()
info = self._get_info()
return observation, reward, done, info
To specify the environment’s behavior in a granular way, we define the following functions.
_do_action
_get_observation
_get_reward
_get_done
_get_info
_do_action
Applying an action to the SOFA simulation means
changing physical values of the simulation, such as specifying an object’s new position, changing a springs’s stiffness, or attaching objects to each other and then
triggering SOFA’s animation loop for one or more steps.
Translating an agents action (defined by the environment’s action space) to changes in the SOFA simulation is done in _do_action
.
This function has to be specified by you, since the environment class makes no assumptions about how you want to interact with the actual SOFA simulation and how the simulation graph is structured.
We will write code that takes the action, represented by a numpy array, and applies a change to the sphere’s position in simulation.
def _do_action(self, action: np.ndarray) -> None:
scaled_action = action * self.time_step * self.maximum_velocity
old_pose = self.scene_creation_result["controllable_object"].get_pose()
# Poses in SOFA are defined as [Cartesian position, quaternion for orientation]
new_pose = old_pose + np.append(scaled_action, np.array([0, 0, 0, 1]))
self.scene_creation_result["controllable_object"].set_pose(new_pose)
_get_observation
If you are not using pixel observations, you also have to specify how observations are determined from the current simulation state. If you want to observe the Cartesian position of an object in the scene, for example, you have to define how these values are retrieved, optionally transformed and passed back from the env because, again, the env does not know how you define your simulation graph, how the elements are named, and what sort of values you want to read. The observation could be anything from the Cartesian position of a rigid object, to the point wise stress of a deformable fem mesh.
def _get_observation(self, rgb_observation: Union[np.ndarray, None]) -> np.ndarray:
if self._observation_type == ObservationType.RGB:
observation = maybe_rgb_observation
else:
observation = self.observation_space.sample()
observation[:] = self.scene_creation_result["controllable_object"].get_pose()[:3]
return observation
_get_reward
In _get_reward
, you specify the reward function of your environment.
You are free to define this function in any way you like.
Meaning you can calculate the reward based on the various conventions of a reward function.
For example by calculating a reward for the current state r(s), the state and the action that lead to the state f(s, a), or the previous state (make sure to save the relevant values), the action and the following state r(s, a, s’).
This function should return a single float.
For our example, we assume that the goal of this environment is the move the sphere to the target position.
The reward function for that goal could look something like this
def _get_reward(self) -> float:
current_position = self.scene_creation_result["controllable_object"].get_pose()[:3]
target_position = self.scene_creation_result["target"].get_pose()[:3]
reward_features["distance_to_target"] = -np.linalg.norm(current_position - target_position)
reward_features["time_step_cost"] = -1.0
reward_features["successful_task"] = 10.0 * reward_features["distance_to_target"] <= self._distance_to_target_threshold
self.reward_features = reward_features.copy()
reward = 0.0
for value in reward_features.values():
reward += value
return reward
_get_done
In _get_done
you check whether your environment is in a terminal state and return a boolean value.
def _get_done(self) -> bool:
return reward_features["successful_task"] > 0
_get_info
_get_info
returns a dictionary with additional information about the environments state, the episode, debugging information or anything else you want to pass to the learning algorithm.
def _get_info(self) -> dict:
return self.reward_features
Reset
In env.reset()
you define how you want to reset the SOFA simulation as well as the environment.
SOFA’s own reset function resets the state of the simulation components in the simulation graph to the state that was defined on scene creation.
Any additional behavior, like chosing a new position for objects, cleaning up any values of the previous episode, and setting a new goal are defined by you.
The reset function should be the first thing you call after instantiating the environment, since the first call initializes the SOFA simulation.
def reset(self) -> np.ndarray:
super().reset()
# sample new positions for object and target
object_position = self.rng.uniform([-100.0] * 3, [100.0] * 3)
target_position = self.rng.uniform([-100.0] * 3, [100.0] * 3)
# set the new positions
self.scene_creation_result["controllable_object"].set_pose(np.append(object_position, np.array([0, 0, 0, 1])))
self.scene_creation_result["target"].set_pose(np.append(target, np.array([0, 0, 0, 1])))
return self._get_observation(rgb_observation=self._maybe_update_rgb_buffer())