美文网首页
gym 介绍

gym 介绍

作者: 博士伦2014 | 来源:发表于2018-11-29 20:24 被阅读0次

    1. 组成

    OpenAI Gym由两部分组成:

    1. gym开源库:测试问题的集合。当你测试强化学习的时候,测试问题就是环境,比如机器人玩游戏,环境的集合就是游戏的画面。这些环境有一个公共的接口,允许用户设计通用的算法。
    2. OpenAI Gym服务:提供一个站点(比如对于游戏cartpole-v0:https://gym.openai.com/envs/CartPole-v0)和api,允许用户对他们的测试结果进行比较。

    2. 接口

    gym的核心接口是Env,作为统一的环境接口。Env包含下面几个核心函数:

    • reset(self):重置环境的状态,返回观测。
    • step(self, action):物理引擎,向前推进一个时间步长,返回observation,reward,done,info
    • render(self, mode=’human’, close=False):图像引擎,重绘环境的一帧。默认模式一般比较友好,如弹出一个窗口。

    3. 注册自己的模拟器

    1. 目标是在注册表中注册自己的环境。假设你在以下结构中定义了自己的环境:
    myenv/
        __init__.py
        myenv.py
    
    1. myenv.py包含适用于我们自己的环境的类。 在init.py中,输入以下代码:
    from gym.envs.registration import register
    register(
        id='MyEnv-v0',
        entry_point='myenv.myenv:MyEnv', # 第一个myenv是文件夹名字,第二个myenv是文件名字,MyEnv是文件内类的名字
    )
    
    1. 要使用我们自己的环境:
    import gym
    import myenv # 一定记得导入自己的环境,这是很容易忽略的一点
    env = gym.make('MyEnv-v0')
    
    1. 在PYTHONPATH中安装myenv目录或从父目录启动python。
    目录结构:
    myenv/
        __init__.py
        my_hotter_colder.py
    -------------------
    __init__.py 文件:
    -------------------
    from gym.envs.registration import register
    register(
        id='MyHotterColder-v0',
        entry_point='myenv.my_hotter_colder:MyHotterColder',
    )
    -------------------
    my_hotter_colder.py文件:
    -------------------
    import gym
    from gym import spaces
    from gym.utils import seeding
    import numpy as np
    
    class MyHotterColder(gym.Env):
        """Hotter Colder
        The goal of hotter colder is to guess closer to a randomly selected number
    
        After each step the agent receives an observation of:
        0 - No guess yet submitted (only after reset)
        1 - Guess is lower than the target
        2 - Guess is equal to the target
        3 - Guess is higher than the target
    
        The rewards is calculated as:
        (min(action, self.number) + self.range) / (max(action, self.number) + self.range)
    
        Ideally an agent will be able to recognise the 'scent' of a higher reward and
        increase the rate in which is guesses in that direction until the reward reaches
        its maximum
        """
        def __init__(self):
            self.range = 1000  # +/- value the randomly select number can be between
            self.bounds = 2000  # Action space bounds
    
            self.action_space = spaces.Box(low=np.array([-self.bounds]), high=np.array([self.bounds]))
            self.observation_space = spaces.Discrete(4)
    
            self.number = 0
            self.guess_count = 0
            self.guess_max = 200
            self.observation = 0
    
            self.seed()
            self.reset()
    
        def seed(self, seed=None):
            self.np_random, seed = seeding.np_random(seed)
            return [seed]
    
        def step(self, action):
            assert self.action_space.contains(action)
    
            if action < self.number:
                self.observation = 1
    
            elif action == self.number:
                self.observation = 2
    
            elif action > self.number:
                self.observation = 3
    
            reward = ((min(action, self.number) + self.bounds) / (max(action, self.number) + self.bounds)) ** 2
    
            self.guess_count += 1
            done = self.guess_count >= self.guess_max
    
            return self.observation, reward[0], done, {"number": self.number, "guesses": self.guess_count}
    
        def reset(self):
            self.number = self.np_random.uniform(-self.range, self.range)
            self.guess_count = 0
            self.observation = 0
            return self.observation
    

    参考:

    1. https://github.com/openai/gym/issues/626
    2. https://github.com/openai/gym/tree/master/gym/envs#how-to-create-new-environments-for-gym
    3. https://github.com/openai/gym/blob/522c2c532293399920743265d9bc761ed18eadb3/gym/envs/init.py

    相关文章

      网友评论

          本文标题:gym 介绍

          本文链接:https://www.haomeiwen.com/subject/edhqcqtx.html