PowerPoint Presentation

(1)

AWS DeepRacer

Amazon Web Service Japan K.K.

*** Solutions Architect

****

(2)

S U M M I T

自己紹介

志村誠

ソリューションアーキテクト

• データ分析・機械学習系サービスを担当

• 好きなサービス

• _{Amazon Athena}

• _{AWS Glue}

• _{そして Amazon SageMaker}

(3)

(4)

S U M M I T

https://aws.amazon.com/jp/campaigns/manga

(5)

アジェンダ

• _{AWS DeepRacer の概要}

• 強化学習

• シミュレータ

• _{AWS DeepRacer の構成詳細}

• _{DeepRacer リーグ}

• _{AWS DeepRacer コンソールの利用方法}

本資料では2019年5月30日時点のサービス内容についてご説明して

います。最新の情報は AWS 公式ウェブサイト

(http://aws.amazon.com) にてご確認ください。

(6)

(7)

AWS DeepRacer

強化学習をすべての開発者の

手に届けるためのサービス

(8)

S U M M I T

AWS DeepRacer とは

1/18スケールの

自律走行カー

学習と評価のための

シミュレータ

レースリーグ

世界中での

(9)

DeepRacer を走らせるためには

直進

….

• クルマからのカメラ画像のあらゆる見え方に対して、自動運転カーが

とるべき運転行動を登録できれば、コースを走らせることが可能

• 実際には無数の見え方が存在するため登録自体が難しい

左

(10)

S U M M I T

エージェント

行動

環境

ゴール

モデル

状態

強化学習の導入

• カメラ画像から行動を決定するモデルを学習により作成

• 環境 (コース) に対して、エージェントが様々な行動 (運転) を試し、

ゴールに到達できるように学習

(11)

(12)

S U M M I T

強化学習の位置づけ

強化学習

教師あり

学習

教師なし

学習

(13)

機械学習の全体像

教師あり学習

すべての学習デー

タは、対応するラ

ベルが必要

教師なし学習

学習データにラベ

ルは不要

強化学習

特定の環境下で、

一連の行動から学習

(14)

S U M M I T

実世界における強化学習

良い行動に

(15)

強化学習の用語

エージェント

環境

状態

(16)

S U M M I T

報酬関数

強化学習において、特定の行動に

(17)

S

G = 2

ゴール

エージェント

(18)

S U M M I T

センターラインを走るようにインセンティブを

与える

0.1

0.1 S

2

2 G = 2

0.1

8.6

9.5

8.5

7.5

6.3

5.0

3.5

1.9 S

10.4

9.4

8.2

6.9

5.4

3.8 G = 2

8.6

9.5

8.5

7.5

6.3

5.0

3.5

1.9 ステップの割引率0.9

(19)

学習が行われるプロセス

価値関数 (value fn)

(20)

S U M M I T

強化学習アルゴリズム: Vanilla policy gradient

* Image Source:

Landscape image

is

CC0 1.0

public domain

J()

New weights New weights

0.4 ± 𝛿

0.3 ±

𝛿

(21)

AWS DeepRacer のニューラルネットワーク構造

(22)

S U M M I T

Amazon SageMaker Reinforcement Learning

• ゲームやロボットのシミュレーション環境と統合した SageMaker 上の強化学習

• 強化学習ツールキットとして

Coach とRL-Ray をサポート

• AWS RoboMaker などのシミュレータを OSS OpenAI Gym インターフェース経由

で利用可能．分散学習とシミュレーションの並列化が可能

Redis

方策をもとに行動

観測結果, 報酬

方策

を学習

エージェント

Container for

Agent

Container for

Agent

Container for

environment

Container for

environment

OpenAI gym,

_simulator…

環境シミュレータ

AWS RoboMaker

強化学習ツール

Coach, RLLib

(23)

教師あり学習

(BEHAVIORAL CLONING)

• カメラ付きの実機カーを熟練の

ドライバーが運転

• カメラ画像とドライバーの運転

を記録し、モデルを学習

学習の結果

状態 (画像)を入力すると運転行動

を決定する

DeepRacer

における強化学習 vs. それ以外のアルゴリズム

強化学習

• 仮想的なエージェントがシミュ

レーション環境で行動を繰り返

し、経験 (入力画像・行動・次

状態・報酬) を蓄積

• 経験を利用して学習し、学習し

たモデルでさらに経験を獲得

学習の結果

状態 (画像)を入力すると運転行動

を決定する

(24)

(25)

AWS Cloud

AWS DeepRacer

NAT gateway

VPC

AWS DeepRacer

モデル

シミュレー

ション動画

メトリク

ス

AWS DeepRacer シミュレーションアーキテクチャ

(26)

S U M M I T

(27)

行動空間の設定

• スピードと

ステアリング

の組合せで定義

• 細かい調整を行

うために粒度を

設定可能

(28)

S U M M I T

(29)

コースの構成要素

センターライン

サーキットの壁

コース面 (別名: コース上,

on-track)

フィールド（別名: コース外,

off-track)

コースの境界線

(30)

S U M M I T

座標系と参照点 (waypoints)

コース外側の参照点

コース中央の参照点

コース内側の参照点

X

Y

コース幅

自動運転カーの向き

(31)

(32)

S U M M I T

(33)

AWS DeepRacer スペック

CAR

:18th scale 4WD with monster truck chassis

CPU

: Intel Atom Processor

MEMORY

: 4 GB RAM

STORAGE

: 32 GB (expandable)

WI-FI

: 802.11ac

CAMERA

: 4 MP camera with MJPEG

DRIVE BATTERY

: 1000 mAh lithium polymer

COMPUTE BATTERY

: 13600 mAh USB-C

SENSORS

: Integrated accelerometer and gyroscope

PORTS

: 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI

SOFTWARE

: Ubuntu OS 16.04.3 LTS, Intel OpenVINO

toolkit, ROS Kinetic

(34)

S U M M I T

Stored file

ROS nodes

Video

M-JPEG

Webサーバ

動画

最適化済みモデル

メディア

エンジン

カメラ

モデル

AWS DeepRacer ソフトウェアアーキテクチャ

モデル

最適化

推論

エンジン

推論

結果

ナビゲーションノード

自動運転

手動運転

Webサーバ

publisher

制御ノード

サーボ&モータ

(35)

シミュレーションと実環境の

ドメイン転移

シミュレーションから実環境への

難しさ

• シミュレーション画像を利用し

て学習しているが、実機では実

世界の画像を利用

• 実環境の完全なシミュレーショ

ンも難しい

戦略

• 環境制御を実世界に近づける

• 環境にランダムな要素を追加

• モデルのモジュール化

・抽象化

(36)

(37)

AWS DeepRacer League

世界で最初のグローバルな自動運転レースリーグ

バーチャルサーキット

_{Summit サーキット}

• AWS DeepRacer のサービス

にアクセスしましょう

• モデルを学習させます

• バーチャルサーキットで開

催されているレースにモデ

ルを提出します

• 実機とコースは AWS

Summit で用意されます

• モデルを持ち込むか、

ワークショップで学習さ

せましょう

• レースに参加して、リー

ダーボードに名前を載せ、

歴史を作りましょう

(38)

S U M M I T

(39)

強化学習についてもっと知りたい

• リーダーボードで上位になるためには強化学習に関す

る知識が必要不可欠です

• 強化学習とAWS DeepRacer に関する学習コンテンツを

提供しています

• コンテンツは無料で、90分間、6つの自己学習のパー

トで構成されています

https://www.aws.training/learningobject/wbc?id=32143

(40)

S U M M I T

AWS DeepRacer: training and certification

(41)

(42)

S U M M I T

https://github.com/aws-samples/aws-deepracer-

workshops/blob/master/Workshops/2019-AWSSummits-AWSDeepRacerService/Lab1/Readme-Japanese.md

http://bit.ly/deepracer-wsjp

AWS DeepRacer workshop labs

(43)

S U M M I T

(44)

S U M M I T

Get hands on with AWS DeepRacer & compete in

the AWS DeepRacer League

DeClercq Wentzel

Senior Product Manager

Amazon Web Services

(45)

Agenda

• _{AWS DeepRacer origin}

• _{RL for the Sunday driver}

• _{Virtual simulator}

• _{Rubber meets the road}

(46)

(47)

How can we put machine

learning

in the hands of all

developers? literally

(48)

S U M M I T

1/18 scale autonomous

race car

AWS DeepRacer: An exciting way for developers to get hands-on experience with

machine learning

Global Racing League

Virtual simulator, to train

(49)

AWS DeepRacer League, race for prizes and glory

The world’s first global, autonomous racing league

www.deepracerleague.com

(50)

S U M M I T

AWS DeepRacer problem formulation

(51)

(52)

S U M M I T

Reinforcement learning in the broader AI context

Reinforcement

Learning

Supervised

Learning

Unsupervised

Learning

(53)

Machine learning overview

(54)

S U M M I T

Reinforcement learning in the real world

Reward positive

behavior

Don’t reward

(55)

Reinforcement learning terms

AGENT

_ENVIRONMENT

STATE

ACTION

EPISODE

REWARD

(56)

S U M M I T

The reward function

The reward function incentivizes particular

behaviors and is at the core of reinforcement

(57)

The reward function in a race grid

S

G = 2

GOAL

AGENT

(58)

S U M M I T

Incentivizing centerline behavior

0.1

0.1 S

2

2 G = 2

0.1

8.6

9.5

8.5

7.5

6.3

5.0

3.5

1.9 S

10.4

9.4

8.2

6.9

5.4

3.8 G = 2

8.6

9.5

8.5

7.5

6.3

5.0

3.5

1.9 Discount per step

0.9

(59)

How does learning happen?

_{VALUE FUNCTION}

(60)

S U M M I T

RL algorithms: Vanilla policy gradient

* Image Source:

Landscape image

is

CC0 1.0

public domain

Data is only used once

• High variance of rewards

_{• Magnitude of update could be too large}

J()

New weights New weights

0.4 ± 𝛿

0.3 ± 𝛿

(61)

AWS DeepRacer Neural Network Architecture

Output - action

Input - state (image)

(62)

S U M M I T

METHOD

Supervised learning

HOW IT WORKS

Expert driver controls a real world

car, that has a camera. Save the images from the

camera as inputs and corresponding driving actions

(speed and steering angle) as outputs. Train a

model.

RESULT

Provide state(image) into model and

receive driving action

RL vs. other approaches for robotic racing

METHOD

Reinforcement learning

HOW IT WORKS

Virtual agent repeatedly interacts

with a simulated environment and logs experience

(image, action, new state, reward). Experience is

used to train a model, and new model is used to

get more experience.

RESULT

Provide state(image) into model and

receive driving action

(63)

(64)

S U M M I T

Lab 0 – AWS DeepRacer service resource creation

OBJECTIVE

Setup your account resources to get you to the races!

TIME 5

min.

1. Find the lab content here:

https://github.com/aws-samples/aws-deepracer-workshops/

2. Navigate to:

(65)

AWS Cloud

AWS

DeepRacer

NAT gateway

VPC

AWS DeepRacer

Models

Simulation

video

Metrics

(66)

S U M M I T

(67)

(68)

S U M M I T

Track components

TRACK CENTER

TRACK WALL

TRACK SURFACE aka ON-TRACK

FIELD aka OFF-TRACK

(69)

Coordinate system and track waypoints

OUTER BOUNDARY WAYPOINTS

TRACK CENTER WAYPOINTS

INNER BOUNDARY WAYPOINTS

Y

TRACK WIDTH

CAR DIRECTION

(70)

S U M M I T

(71)

(72)

(73)

AWS DeepRacer League, race for prizes and glory

(74)

S U M M I T

(75)

Lab 1 – AWS DeepRacer service

OBJECTIVE

Build your first AWS DeepRacer RL model

TIME

50 min.

1. Find the lab content here:

https://github.com/aws-samples/aws-deepracer-workshops/

(76)

S U M M I T

AWS DeepRacer: Driven by reinforcement learning

Want to learn more?

Learn how to build a reinforcement learning model and find tips and tricks about

how to tune those models to climb the League leaderboard in a digital training

course for reinforcement learning and AWS DeepRacer.

This 90-minute course is available at no cost, has 6 self-guided chapters, and will

PowerPoint Presentation

AWS DeepRacer

Amazon Web Service Japan K.K.

*** Solutions Architect

****

S U M M I T

自己紹介

志村 誠

ソリューションアーキテクト

•

データ分析・機械学習系サービスを担当

•

好きなサービス

•

Amazon Athena

•

AWS Glue

•

そして Amazon SageMaker

S U M M I T

https://aws.amazon.com/jp/campaigns/manga

アジェンダ

•

AWS DeepRacer の概要

•

強化学習

•

シミュレータ

•

AWS DeepRacer の構成詳細

•

DeepRacer リーグ

•

AWS DeepRacer コンソールの利用方法

本資料では2019年5月30日時点のサービス内容についてご説明して

います。最新の情報は AWS 公式ウェブサイト

(http://aws.amazon.com) にてご確認ください。

AWS DeepRacer

強化学習をすべての開発者の

手に届けるためのサービス

S U M M I T

AWS DeepRacer とは

1/18スケールの

自律走行カー

学習と評価のための

シミュレータ

レースリーグ

世界中での

DeepRacer を走らせるためには

直進

….

• クルマからのカメラ画像のあらゆる見え方に対して、自動運転カーが

とるべき運転行動を登録できれば、コースを走らせることが可能

• 実際には無数の見え方が存在するため登録自体が難しい

左

S U M M I T

エージェント

行動

環境

ゴール

モデル

状態

強化学習の導入

• カメラ画像から行動を決定するモデルを学習により作成

• 環境 (コース) に対して、エージェントが様々な行動 (運転) を試し、

ゴールに到達できるように学習

S U M M I T

強化学習の位置づけ

強化学習

教師あり

学習

教師なし

学習

機械学習の全体像

教師あり学習

すべての学習デー

タは、対応するラ

ベルが必要

教師なし学習

学習データにラベ

志村誠

_{Amazon Athena}

_{AWS Glue}

_{そして Amazon SageMaker}

_{AWS DeepRacer の概要}

_{AWS DeepRacer の構成詳細}

_{DeepRacer リーグ}

_{AWS DeepRacer コンソールの利用方法}