After following the installation guide diligently circus
is
ready for use. See test/example.py
for some code examples and the
API Documentation for more details.
import gym
env = gym.make('circus:sym-gpdk180-geom-v0')
obs = env.reset()
a = np.random.randn(env.action_space.shape[0])
o,r,d,i = env.step(a)
Or test it with an of-the-shelf agent implementation:
from stable_baselines3 import DDPG
env = gym.make('circus:sym-gpdk180-geom-v1')
ddpg = DDPG('MlpPolicy', env, verbose = 1)
ddpg.learn(total_timesteps=10000, log_interval=1)
The circus.make
constructor accepts several additional optional arguments.
# OP2 GoalEnv in xh035 with electric design space
env = circus.make( 'circus:sym-gpdk180-elec-v0'
, n_envs: int = 1 # Number of environemts
, num_steps: int = 50 # Maximum number of steps
, seed: int = 666 # Random Seed for all Environments
, obs_filter: Union[str,List[str]] = 'perf' # 'perf' | 'all' | [str]
, goal_filter: [str] = None # List of goal parameters ⊂ `obs_filter`
, goal_preds: [Callable] = None # Goal predicates
, reward_fn: Callable = binary_reward | dummy_reward # A custom reward function
, scale_observation: bool = True # Scale observations ∈ [-1.0; 1.0]
, auto_reset: bool = False # Automatically Reset when done
, )
n_envs
: Number of parallel Environments, should not be more than nproc
.
num_steps
: Number of steps per Episodes before goal should be reached.
Otherwise Terminal flag will be set.
seed
: Random seed for all Environments.
obs_filter
: Can be either the string 'all'
, for everything ACE has to
offer, the string 'perf'
for only the performance parameters, or a list of
strings with the desired keys in the performance dict obtained from ACE.
goal_filter
: Is the same as obs_filter = 'perf'
by default, should be a
subset of obs_filter
.
goal_preds
: The goal predicates are a list of operators, which will be called
to determine whether the goal was reached: performance <operator> goal
.
reward_fn
: A custom reward function can be passed to the environment. By
default it is binary_reward
for GoalEnvs (v0
) and dummy_reward
for
Non-GoalEnvs (v1
). See the following subsection for further details.
scale_observation
: Whether the observations should be scaled to be roughly
∈ [-1.0;1.0]. This is based on an estimation and is therefore not 100%
reliable.
A custom reward function should be of the following form:
def reward (observation: dict[str, np.ndarray]) -> np.ndarray:
r = np.random.randn(1)
return r
Where observation
is a dictionary, with at least one field named
'observation'
for VecEnv
s and the fields 'observation'
, 'achieved_goal'
and 'desired_goal'
for GoalEnv
s.
circus
IDs have a specific structure: circus:<id>-<pdk>-<space>-v<variant>'
The id
field is linked to the serafin id as shown in the
availability matrix and the Table of Contents.
Same goes for the pdk
field, it is linked to the backends.
The space
field denotes the action space of the agent.
Space | Description |
---|---|
elec |
Action space in the electric domain. |
geom |
Action space in the geometric domain. |
The variants v#
define the shape of the observation space
Variant | Description |
---|---|
v0 |
GoalEnv with Dict space. |
v1 |
VecEnv with Box space. |