<img src="https://d3i71xaburhd42.cloudfront.net/b5375995ab8d679a581ffcc2f2e8d3777d60324b/5-Figure5-1.png" alt="Figure 5: Left: Rewards over RL training. The reward is computed as the AP of sampled architectures on the proxy task. Right: The number of sampled unique architectures to the total number of sampled architectures. As controller converges, more identical architectures are sampled by the controller." data-selenium-selector="figure-image">
网友评论