Raspberry Pi - Deep Learning

Welcome to Mashykom WebSite

Raspberry Pi 4 Model B で物体検出

　AIを搭載したロボットを自作してみようと思い立って、Raspberry Piをロボットのコンピュータボードとして活用することにしました。OSとして Raspberry Pi Legacy(Devian Buster) また Debian Bullseye がインストールされていることを前提とします。RaspberryPiの基本的なセットアップは完了しているとします。ssh接続及びVNCViewerを用いたデスクトップの表示も可能になっていると想定します。pythonのパッケージ等は Raspberry Pi OS のインストールと同時に同梱されるバージョンをそのまま使用しています。バージョンはPython 3.9.2 です。Devian Buster では、Python 3.7.3 です。

　Raspberry Pi をPCからリモートで操作できる環境を想定します。VNCViewer を用いてRaspberry Pi のデスクトップ画面を表示させることができることを想定します。そうでなければ、Raspberry Pi 本体を直接操作することを前提とします。

　2019年6月24日にRaspberry Pi 4 が発売開始されたので、この Raspberry Pi 4 (Model B) で説明します。Raspberry Pi 3(Model B) を使用しても同じ結果になります。Raspberry Pi 4 では、HDMI が micro HDMI に、２個のUSB2ポートがUSB3にアップグレードされ、メモリが2-8ギガに拡大されています。基本的な特徴はRaspberry Pi 3 (Model B) から変化していません。

　このページでは、ラズパイを用いたコンピュータビジョンの可能性を追求します。そのために、ディープラーニングの基本である畳み込みニューラルネットワークをラズパイに実装します。最も簡単なニューラルネットワークを実装したライブラリはscikit-learnの機能を取り込んだsklearnというpython用のパッケージです。

　次に、畳み込みニューラルネットワークの代表的なライブラリであるTensorflowの Lite バージョンの活用を試みます。A TensorFlow Lite モデルは、FlatBuffers と呼ばれる専用の効率的なポータブルフォーマット（ファイル拡張子「.tflite」で識別されます）で表されます。このフォーマットは、TensorFlow のプロトコルバッファモデルフォーマットに比べて、サイズの縮小や推論速度の向上などの利点があります。

　さらに、Jupyter Notebookのインストール手順を説明したのち、Pytorchをインストールします。Pytorchを用いた簡単な物体検出の例を取り上げます。

　畳み込みニューラルネットワークの学習時間を短縮するためにはGPUが必須となっているので、（ラズパイに搭載されているCPUだと何日も必要となってしまうので）学習済みのモデルを使用します。GoogleのCoral Edge Accelerator を利用した機械学習については、Coral Edge TPU + RaspberryPiでObject Detectionに説明があります。

Last updated 2022.4.6

初めてのDeep Learning : scikit-learnを活用する

　ここからは、ディープラーニング（深層学習）のプログラムをラズパイに実際に組み込むことになります。最も取り扱いやすい scikti-learn から始めます。

　PCのターミナルからラズパイへssh接続をしてください。scikit-learnをインストールするために、


$ sudo apt-get update
$ sudo apt-get install python3-sklearn

と入力する。この結果、Scikit-learnの実行に必要な NumPy、SciPy、Matplotlib などのモジュールも同時にインストールされます。Pythonの各種モジュール、例えば、NumPy、SciPy、Matplotlib、sklearnなどは, /usr/lib/python3/dist-packagesディレクトリにインストールされます。

　sklearnのパッケージの内容は以下の通りです。

pi@raspberrypi:/usr/lib/python3/dist-packages/sklearn $ tree

　sklearnパッケージはscikit-learnの機能をまとめたもので、その内容全体を表示するとあまりに長くなるので、主なモジュールだけをリストアップすると、

cluster
datasets
	loads_iris
	loads_digits
decomposition
	PCA
feature_extraction
feature_selection
gaussian_process
linear_model
	Perception
metrics
mixer
model_selection
neural_network
	MLPClassifier
svm
	SVC
tree
utils

　となっています。機械学習で最初に登場するアヤメの分類と手書き文字の識別はsklearnでも簡単に解決できます。data-setsに配置されているloads_irisはアヤメの分類問題で使用されているデータを読みこむためのモジュールです。また、data-setsに配置されているloads_digitsは手書き文字の識別問題で使用される手書きデータを読み込みます。linear_modelには、単純なパーセプトロンのモデルが配置されており、neural_networkには、畳み込みニューラルネットワークの簡単なモデル(MLPClassifier)が含まれています。

　Scikit-Learnを用いた、Iris データの分類問題を取り上げます。Raspi のプログラミング・メニューから Thonny Python IDE を起動して、以下のコード実行して下さい。sklearn_1.pyとして保存します。


from sklearn import datasets
import numpy as np
from sklearn.neural_network import MLPClassifier

iris = datasets.load_iris()
X = iris.data[:, [2, 3]]
y = iris.target

mlp = MLPClassifier(hidden_layer_sizes=(100, ), max_iter=100000, tol=0.0001, random_state=1)
#clf = MLPClassifier(hidden_layer_sizes=(100, ), max_iter=10000, tol=0.00001, random_state=None)

mlp.fit(X, y)

y_pred = mlp.predict(X)

from sklearn.metrics import accuracy_score

print('Class labels:', np.unique(y))
print('Misclassified samples: %d' % (y != y_pred).sum())
print('Accuracy: %.2f' % accuracy_score(y, y_pred))

結果はこうなります。

Class labels: [0 1 2]
Misclassified samples: 7
Accuracy: 0.95

　Class labels はアヤメの種類です。以下の写真がそうです。

　パーセプトロン及び畳み込みニューラルネットワークの解説は、以下のDeep Learning入門ページを参照ください。

TensorFlow Lite - Runtime の使用

　Tensorflow Lite のインストールなどでは、背後で実行されているビルドで64bit C++コンパイラーが使用されます。ここで、OSのバージョンおよびC++コンパラーのバージョンを調べておきましょう。OSのバージョンチェックは以下のコマンドを打ちます。


$ uname -a 
$ cat /etc/os-release
$ gcc -v

　aarch64-linux-gnu と表示されて、OSがbullseyeのときはDebian GNU/Linux 11 (bullseye)となっているはずです。TensorFlow Lite の C ++ API をビルドするために必要な64bit C++コンパイラー aarch64-linux-gnu となっているはずです。また、OSがbusterのときはRaspbian GNU/Linux 10 (buster)、arm-linux-gnueabihfとなっています。

　PythonでTensorFlow Liteを使用すると、 Raspberry PiやEdgeTPUを備えたCoralデバイスなど、Linuxベースの組み込みデバイスに最適です。PythonでTensorFlow Liteモデルの実行をすばやく開始するには、すべてのTensorFlowパッケージではなく、TensorFlow Liteインタープリターのみを使用します。この簡略化されたPythonパッケージをtflite_runtimeと言います。

　tflite_runtimeパッケージは、完全なtensorflowパッケージの数分の1のサイズであり、TensorFlow Lite（主にInterpreter Pythonクラス）で推論を実行するために必要な最小限のコードが含まれています。公式サイトの説明によれば、TensorFlow Liteランタイムパッケージをインストールするには、次のコマンドを実行します。


$ sudo apt install curl 
$ echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
$ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
$ sudo apt update
$ sudo apt install python3-tflite-runtime

　このコマンドにより Python 3.9 に対応した tflite_runtime のパッケージが /usr/lib/python3/dist-packages にインストールされます。

　この例を用いた tflite_runtime の実行を試みます。使用する画像や MobileNet v1 モデルをインストールします。


$ mkdir tmp
$ curl https://raw.githubusercontent.com/tensorflow/tensorflow/master/tensorflow/lite/examples/label_image/testdata/grace_hopper.bmp > ~/tmp/grace_hopper.bmp

$ curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_2018_02_22/mobilenet_v1_1.0_224.tgz | tar xzv -C ~/tmp

$ curl https://storage.googleapis.com/download.tensorflow.org/models/mobilenet_v1_1.0_224_frozen.tgz  | tar xzv -C ~/tmp  mobilenet_v1_1.0_224/labels.txt

$ mv ~/tmp/mobilenet_v1_1.0_224/labels.txt ~/tmp/

　部分的にエラーが出ますが、無視します。Python スクリプト label_image.py を/home/にコピーします。tensorflow モジュールから Interpreter をインポートする代わりに、tflite_runtime からインポートする必要があります。このスクリプトは Tensorflow 向けに書かれているので、以下の２箇所に修正が必要です。


import tensorflow as tf 　⇨　 import tflite_runtime.interpreter as tflite

interpreter = tf.lite.Interpreter(model_path=args.model_file) 　⇨　 interpreter = tflite.Interpreter(model_path=args.model_file)

　この修正後、以下のコマンドを実行します。


$ python3 label_image.py   --model_file ~/tmp/mobilenet_v1_1.0_224.tflite   --label_file ~/tmp/labels.txt   --image ~/tmp/grace_hopper.bmp


0.919720: 653:military uniform
0.017762: 907:Windsor tie
0.007507: 668:mortarboard
0.005419: 466:bulletproof vest
0.003828: 458:bow tie, bow-tie, bowtie
time: 159.685ms

tflite - Runtimeでリアルタイム Object Detection

　OpenCV のインストールが済んでいない時は、以下のコマンドを打って cv2 をインストールして下さい。


$ sudo apt install python3-opencv

　Tensorflow Lite の公式ページに沿って説明します。TensorFlow のGitHubからRasspberry Pi 向けのモジュールを含む Tensorflow Lite パッケージをダウンロードします。


$ git clone https://github.com/tensorflow/examples --depth 1

$ cd examples/lite/examples/object_detection/raspberry_pi

$ sh setup.sh

---- Successfully installed absl-py-1.0.0 argparse-1.4.0 flatbuffers-1.12 numpy-1.22.1 opencv-python-4.5.3.56 pybind11-2.9.0 tflite-runtime-2.7.0 tflite-support-0.3.1 （中略） -e Downloaded files are in ./

と表示されて、インストールが終わります。warning がいくつか出ますが、無視してください。tflite_runtime をはじめとする関連するファイルはすべてユーザ直下のディレクトリ「.local/lib/python3.9」にインストールされます。なお、
「/home/pi/examples/lite/examples/object_detection/raspberry_pi」
には以下のファイルがインストールされます。


README.md                          object_detector.py       test_data
detect.py                          object_detector_test.py  utils.py
efficientdet_lite0.tflite          requirements.txt
efficientdet_lite0_edgetpu.tflite  setup.sh

ここで使用される学習済みモデルが efficientdet_lite0.tflite になります。これは EfficientNet をTensorflow Lite で使用できるように変換したモデルです。次に、Raspberry Pi に Webcamera を接続して下さい。Raspberry Piのデスクトップにあるターミナルから実行して下さい。「OSError: PortAudio library not found」というエラーが出るときは、audio関連のモジュールをインストールします。

$ sudo apt install libportaudio2
$ cd ~/examples/lite/examples/object_detection/raspberry_pi
$ python3 detect.py --model efficientdet_lite0.tflite --cameraId 0

　と入力して、Web camera を用いた物体検出のコードを実行します。

以下のスクリプトが detect.py の内容です。

# Copyright 2021 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Main script to run the object detection routine."""
import argparse
import sys
import time

import cv2
from object_detector import ObjectDetector
from object_detector import ObjectDetectorOptions
import utils

def run(model: str, camera_id: int, width: int, height: int, num_threads: int,
        enable_edgetpu: bool) -> None:
  """Continuously run inference on images acquired from the camera.

Args:
    model: Name of the TFLite object detection model.
    camera_id: The camera id to be passed to OpenCV.
    width: The width of the frame captured from the camera.
    height: The height of the frame captured from the camera.
    num_threads: The number of CPU threads to run the model.
    enable_edgetpu: True/False whether the model is a EdgeTPU model.
  """

# Variables to calculate FPS
  counter, fps = 0, 0
  start_time = time.time()

# Start capturing video input from the camera
  cap = cv2.VideoCapture(camera_id)
  cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
  cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)

# Visualization parameters
  row_size = 20  # pixels
  left_margin = 24  # pixels
  text_color = (0, 0, 255)  # red
  font_size = 1
  font_thickness = 1
  fps_avg_frame_count = 10

# Initialize the object detection model
  options = ObjectDetectorOptions(
      num_threads=num_threads,
      score_threshold=0.3,
      max_results=3,
      enable_edgetpu=enable_edgetpu)
  detector = ObjectDetector(model_path=model, options=options)

# Continuously capture images from the camera and run inference
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      sys.exit(
          'ERROR: Unable to read from webcam. Please verify your webcam settings.'
      )

counter += 1
    image = cv2.flip(image, 1)

# Run object detection estimation using the model.
    detections = detector.detect(image)

# Draw keypoints and edges on input image
    image = utils.visualize(image, detections)

# Calculate the FPS
    if counter % fps_avg_frame_count == 0:
      end_time = time.time()
      fps = fps_avg_frame_count / (end_time - start_time)
      start_time = time.time()

# Show the FPS
    fps_text = 'FPS = {:.1f}'.format(fps)
    text_location = (left_margin, row_size)
    cv2.putText(image, fps_text, text_location, cv2.FONT_HERSHEY_PLAIN,
                font_size, text_color, font_thickness)

# Stop the program if the ESC key is pressed.
    if cv2.waitKey(1) == 27:
      break
    cv2.imshow('object_detector', image)

cap.release()
  cv2.destroyAllWindows()

def main():
  parser = argparse.ArgumentParser(
      formatter_class=argparse.ArgumentDefaultsHelpFormatter)
  parser.add_argument(
      '--model',
      help='Path of the object detection model.',
      required=False,
      default='efficientdet_lite0.tflite')
  parser.add_argument(
      '--cameraId', help='Id of camera.', required=False, type=int, default=0)
  parser.add_argument(
      '--frameWidth',
      help='Width of frame to capture from camera.',
      required=False,
      type=int,
      default=640)
  parser.add_argument(
      '--frameHeight',
      help='Height of frame to capture from camera.',
      required=False,
      type=int,
      default=480)
  parser.add_argument(
      '--numThreads',
      help='Number of CPU threads to run the model.',
      required=False,
      type=int,
      default=4)
  parser.add_argument(
      '--enableEdgeTPU',
      help='Whether to run the model on EdgeTPU.',
      action='store_true',
      required=False,
      default=False)
  args = parser.parse_args()

run(args.model, int(args.cameraId), args.frameWidth, args.frameHeight,
      int(args.numThreads), bool(args.enableEdgeTPU))

if __name__ == '__main__':
  main()

　detect.py を実行すると、デスクトップ画面に結果が表示されます。デスクトップにポップアップする検出映像では、検出速度はfps=6.6前後です。Raspberry Pi 3 ではfps=3前後です。

　Raspberry Pi のVNC Viewerのデスクトップのターミナルから実行しても、このコマンドは正常に作動して、Raspberry Pi のデスクトップに Web camera の映像が表示されて、物体検出がされるはずです。ssh接続のターミナルからではエラーが出ます。

　画像識別を行いたいときは、


$ cd ~/examples/lite/examples/image_classification/raspberry_pi

# Run this script to install the required dependencies and download the TFLite models.
$ sh setup.sh

　と関係するモジュール等をインストールします。このディレクトリに保存されたファイルは以下の通りです。

README.md                  efficientnet_lite0_edgetpu.tflite  setup.sh
__pycache__                image_classifier.py                test_data
classify.py                image_classifier_test.py
efficientnet_lite0.tflite  requirements.txt

その後、


$ python3 classify.py \
  --model efficientnet_lite0.tflite \
  --maxResults 5 --cameraId 0

　と実行します。Webcamera の映像がデスクトップ画面に表示されて、そのポップアップ画面の右上に識別された画像名称と確率が明示されます。maxResults 5 は識別された物体数の上限を設定します。使用されるモデルは学習済みのefficientnet_lite0.tflite です。

　ディレクトリ ~/examples/lite/examples/ の下には、
image_classification
image_segmentation
object_detection
pose_estimation
などのフォルダーが配置され、各フォルダーには android 、 ios 、raspberry_pi に対応するpyスクリプト、tflite形式の学習済みモデル、及び、テスト用画像がダウンロードできます。これらを用いたTensorflow Lite Runtime の実行が可能です。

　次に、image_segmentationの実行方法を見てみましょう。


$ cd ~/examples/lite/examples/image_segmentation/raspberry_pi
$ sh setup.sh

　deeplabv3.tflite および deeplabv3_edgetpu.tflite などがダウンロードされます。deeplabv3.tfliteを用いて、セグメンテーションを実行しましょう。


$ python3 segment.py --cameraId 0

　デスクトップにポップアップする検出映像にセグメンテーションが行われています。処理スピードは非常に遅いです。GPUが必須です。

オプション`--model`のパラメータを指定してTensorFlowLiteのモデルを設定できます。使用するモデルのデフォルト値は`deeplabv3.tflite`です。
TensorFlowHubの画像セグメンテーションモデル（メタデータ付）がサポートされています。
オプション`--displayMode`のパラメータを指定して、`overlay`、`side-by-side`オプションで、セグメンテーション結果の表示方法を指定します。デフォルト値は`overlay`です。
webcamの指定は、--cameraId 0 とします。

　以下のように使用します。


$ python3 segment.py --model deeplabv3.tflite --displayMode side-by-side --cameraId 0

　displayModeのoverlayでは、リアルタイム映像の上にセグメンテーションの色を被せて表示する方法、side-by-sideとは映像の横にセグメンテーションした図形をポップアップ表示させる方法です。

overlayモードでの表示

　続編で、GPU を内蔵したCoral Edge TPU をUSB 接続して利用し、リアルタイムの物体検出をしたいと思います。物体検出の処理スピードは数十倍に加速化すると思われます。coralを用いたリアルタイムの物体検出をみて下さい。

Jupyter Notebook のインストール

　Jupyter Notebookをインストールする。


$ pip install -U pip setuptools
$ pip install jupyter lab notebook

　インストールでエラーが出たときは、以下のモジュールをインストールしてから、再実行します。


$ sudo apt install libbz2-dev libsqlite3-dev libffi-dev python3-dev

　設定ファイルの作成とパスワードの設定を行います。


$ jupyter notebook --generate-config

　OSが Buster のときエラーが出ますので、以下のモジュールをダウングレードしてから、再実行します。


$ pip install markupsafe==2.0.1

　設定ファイルは、~/.jupyter/jupyter_notebook_config.pyに作成されます。最近、設定ファイル「~/.jupyter/jupyter_notebook_config.py」が自動で作成されないようなので、以下のコマンドで作成します。


$ jupyter notebook --generate-config

　このファイルはコメント#のみが多数記述されていますが、これに以下のコードを追加します。該当する各コメント行を検索してから修正をするのは煩雑なので、もとのコメントコードはそのままの状態で、以下の4行を分かりやすい箇所に追加するだけにします。


c.NotebookApp.allow_remote_access = True
c.NotebookApp.base_url = '/raspi/'
c.NotebookApp.open_browser = False
c.NotebookApp.ip = '0.0.0.0'

　base_url ='/raspi/'とすることで、http://[Raspberry PiのIPアドレス]:8888/raspi/でアクセスできるようになります。次に、パスワードを設定したいときは、以下のコマンドを打つと、パスワードの設定が要求されます。


$ jupyter notebook password

　Jupyter Notebookの自動起動をするために、エディターで /etc/systemd/system/jupyter.service を作成し、下記のようにします。{USER}と{GROUP}については、当該のユーザー名に適宜置き換えてください。


[Unit]
Description = Jupyter Notebook
[Service]
Type=simple
PIDFile=/var/run/jupyter-notebook.pid
ExecStart=/bin/bash -c ". /home/{USER}/.local/bin/jupyter-notebook --notebook-dir=/home/{USER}/my-notebooks"
WorkingDirectory=/home/{USER}/my-notebooks
User={USER}
Group={GROUP}
Restart=always
[Install]
WantedBy = multi-user.target

　自動起動のオプションを以下のコマンドで設定します。


$ sudo systemctl start jupyter.service
$ sudo systemctl enable jupyter.service

　最後に、パスを通します。以下の内容を~/.bashrcに記載してください。


# Jupyterのコマンドが入っているディレクトリにパスを通す
PATH="$PATH:~/.local/bin"

　reboot すれば、Jupyter Notebook が起動できます。jupyter notebook を起動して、ブラウザから http://[Raspberry PiのIPアドレス]:8888/raspi/ とアクセスすれば使用できます。

Pytorch のインストール

　REAL TIME INFERENCE ON RASPBERRY PI 4に従って、PyTorchをインストールします。


$ pip install torch torchvision torchaudio
$ pip install opencv-contrib-python
$ pip install -U opencv-python
$ pip install numpy --upgrade

　Pytorchのインストールとバージョンを確認します。

  
$ python 
>>> import torch
>>> torch.__version__
'1.11.0'
>>> torch.rand(5,3)
tensor([[0.7472, 0.6565, 0.1537],
[0.6427, 0.8880, 0.3322],
[0.9593, 0.1942, 0.4365],
[0.1949, 0.4621, 0.0187],
[0.6703, 0.5259, 0.9941]])

>>> import torchvision as tv
>>> tv.__version__
'0.12.0'

　以上で、Pytorch及びTorchvisionはインストールできました。これらのパッケージは./.local/lib/python3.9/site-packages/に配置されます。Pytorchの学習済みモデルを用いた物体識別や物体検出などが実行できます。Raspberry Pi はGPUを内蔵してないので、モデルの学習を実行することは無理です。

　Pytorchで簡単な物体識別を実行してみましょう。ここで使用しているスクリプトは私のGithubにある Pytorch_AlexNet.ipynb というノートブックです。


$ git clone https://github.com/mashyko/Image-Classifier.git

　とダウンロードして利用して下さい。

　または、jupyter notebookを起動して、以下のスクリプトを順に各セルにコピペして下さい。


# パッケージのimport
import numpy as np
import json
from PIL import Image
import matplotlib.pyplot as plt
%matplotlib inline

import torch
import torchvision
from torchvision import models, transforms

# PyTorchのバージョン確認
print("PyTorch Version: ",torch.__version__)
print("Torchvision Version: ",torchvision.__version__)

# 学習済みのAlexNetモデルをロード

net = models.alexnet(pretrained=True)# 学習済みのパラメータを使用
net.eval() # 推論モードに設定

print(net)

　学習済みモデルモデルをダウンロードしたので、次に、画像の前処理をします。以下のスクリプトをセルに貼り付けて、runして下さい。


入力画像の前処理クラスを作成
# 入力画像の前処理のクラス
class BaseTransform():
    """
    画像のサイズをリサイズし、色を標準化する。
    resize : int リサイズ先の画像の大きさ。
    mean : (R, G, B) 各色チャネルの平均値。
    std : (R, G, B) 各色チャネルの標準偏差。
    """

def __init__(self, resize, mean, std):
    self.base_transform = transforms.Compose([
      transforms.Resize(resize), # 短い辺の長さがresizeの大きさになる
      transforms.CenterCrop(resize), # 画像中央をresize × resizeで切り取り
      transforms.ToTensor(), # Torchテンソルに変換
      transforms.Normalize(mean, std) # 色情報の標準化
    ])

def __call__(self, img):
    return self.base_transform(img)

resize = 224

　画像データを読み込みます。


#画像データを読み込みます。 私の GitHub Repo に必要な画像がありますので、ダウンロードします。
!git clone https://github.com/mashyko/Image-Classifier
%cd Image-Classifier
!ls images/

# 1. 画像読み込み

image_file_path = './images/cat.jpg'
img = Image.open(image_file_path) # [高さ][幅][色RGB]

# 2. 元の画像の表示
plt.imshow(img)
plt.show()

# 3. 画像の前処理と処理済み画像の表示
resize = 224
mean = (0.485, 0.456, 0.406)
std = (0.229, 0.224, 0.225)
transform = BaseTransform(resize, mean, std)
img_transformed = transform(img) # torch.Size([3, 224, 224])

# (色、高さ、幅)を (高さ、幅、色)に変換し、0-1に値を制限して表示
img_transformed = img_transformed.numpy().transpose((1, 2, 0))
img_transformed = np.clip(img_transformed, 0, 1)
plt.imshow(img_transformed)
plt.show()

　画像にある物体の推論を実行します。


出力結果からラベルを予測する後処理クラスを作成
# ILSVRCのラベル情報をロードし辞意書型変数を生成します
class_index = json.load(open('./data/imagenet_class_index.json', 'r'))
#class_index

# 出力結果からラベルを予測する後処理クラス
class Predictor():

    def __init__(self, class_index):
      self.class_index = class_index

    def predict_max(self, out):
      """
      確率最大のラベル名を取得する。
      out : torch.Size([1, 1000]) Netからの出力。
      """
      maxid = np.argmax(out.detach().numpy())
      predicted_label_name = self.class_index[str(maxid)][1]

      return predicted_label_name

# Predictorのインスタンスを生成します
predictor = Predictor(class_index)

# 前処理の後、バッチサイズの次元を追加する
transform = BaseTransform(resize, mean, std) # 前処理クラス作成
img_transformed = transform(img) # torch.Size([3, 224, 224])
inputs = img_transformed.unsqueeze_(0) # torch.Size([1, 3, 224, 224])

# モデルに入力し、モデル出力をラベルに変換する
out = net(inputs) # torch.Size([1, 1000])
result = predictor.predict_max(out)

# 予測結果を出力する
print("入力画像の予測結果：", result)

　以上の全セルを実行すると、入力画像の予測結果は Egyptian_cat になるはずです。手持ちの画像を dataディレクトリに配置して画像の分類を実行してみて下さい。

　最後に、リアルタイム推論をPytorchの公式サイトTutorialsに掲載されているREAL TIME INFERENCE ON RASPBERRY PI 4を参考にして実行しましょう。

　Raspberry Pi 4 にwebcamまたはRaspberry Pi cameraを接続して下さい。

　以下のPython スクリプトをinfer.py という名前で作成・保存して下さい。


import time
import torch
import numpy as np
from torchvision import models, transforms

import cv2
from PIL import Image

torch.backends.quantized.engine = 'qnnpack'

cap = cv2.VideoCapture(1, cv2.CAP_V4L2)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 224)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 224)
cap.set(cv2.CAP_PROP_FPS, 36)

preprocess = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

net = models.quantization.mobilenet_v2(pretrained=True, quantize=True)
# jit model to take it from ~20fps to ~30fps
net = torch.jit.script(net)

started = time.time()
last_logged = time.time()
frame_count = 0

# open the imagenet1000_clsid_to_labels.txt
with open("imagenet1000_clsid_to_labels.txt") as f:
    classes = eval(f.read())

with torch.no_grad():
    while True:
        # read frame
        ret, image = cap.read()
        if not ret:
            raise RuntimeError("failed to read frame")

        # convert opencv output from BGR to RGB
        image = image[:, :, [2, 1, 0]]
        permuted = image

        # preprocess
        input_tensor = preprocess(image)

        # create a mini-batch as expected by the model
        input_batch = input_tensor.unsqueeze(0)

        # run model
        output = net(input_batch)
        # do something with output ...
        top = list(enumerate(output[0].softmax(dim=0)))
        top.sort(key=lambda x: x[1], reverse=True)
        for idx, val in top[:10]:
        print(f"{val.item()*100:.2f}% {classes[idx]}")

        # log model performance
        frame_count += 1
        now = time.time()
        if now - last_logged > 1:
            print(f"{frame_count / (now-last_logged)} fps")
            last_logged = now
            frame_count = 0

　このスクリプトを実行するためには、物体の分類ラベルのテキストファイルが必要です。ここでは、分類分けに、imagenet 向けの「imagenet1000_clsid_to_labels.txt」を使用します。このファイルはこのimagenet class labelsなので、これをダウンロードして保存して下さい。

　ここまで来たら、実際に上記のスクリプトをターミナルから実行します。


$ python infer.py

/home/pi/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python
extension:
warn(f"Failed to load image Python extension: {e}")
/home/pi/.local/lib/python3.9/site-packages/torch/ao/quantization/utils.py:210: UserWarning: must run observer before
calling calculate_qparams. Returning default values.
warnings.warn(
[W TensorImpl.h:1460] Warning: Named tensors and all their associated APIs are an experimental feature and subject to
change. Please do not use them for anything important until they are released as stable. (function operator())
--
中略
--
44.96% desk
24.37% library
6.14% bookcase
5.27% photocopier
3.88% desktop computer
2.10% mouse, computer mouse
2.10% printer
1.33% laptop, laptop computer
1.33% monitor
0.53% lab coat, laboratory coat
8.838474571032808 fps
--

　上記の通り、ターミナルに識別の確率が大きい順に物体名が表示されます。検出速度は 8~9 fpsです。Tutorialsでの説明では、30 fps程度になるとされていますが、そこまでの速度は実現できません。

　使用したコードの説明を簡単にします。aarch64 で最適なパフォーマンスを得るには、量子化（Quantized）および融合された（Fused）モデルが必要です。量子化（Quantized）とは、標準のfloat32演算よりもはるかにパフォーマンスの高いint8を使用して計算を行うことを意味します。融合（Fused）とは、連続する操作が融合されて、可能な場合はよりパフォーマンスの高いバージョンになることを意味します。

　さらに、pytorchの aarch64 バージョンはqnnpack engineを必要とします。そのために、
torch.backends.quantized.engine = 'qnnpack'
を使用しています。この例では、MobileNetV2の "prequantized and fused version"を使用しています。なので、
from torchvision import models
net = models.quantization.mobilenet_v2(pretrained=True, quantize=True)
というコードが登場します。

　その次に、モデルをjitして、Pythonのオーバーヘッドを減らし、あらゆる操作を融合させます。
net = torch.jit.script(net)
このコードはモデル（nn.Module）をTorchScriptでコンパイルして、ScriptModuleとして返す機能を果たします。ScriptModuleについてはPytorchの公式ページを参照ください。

　利用可能なINT8 quantized モデルは以下の通りです。

inception_v3 = models.quantization.inception_v3()
mobilenet_v2 = models.quantization.mobilenet_v2()
mobilenet_v3_large = models.quantization.mobilenet_v3_large()
resnet18 = models.quantization.resnet18()
resnet50 = models.quantization.resnet50()
resnext101_32x8d = models.quantization.resnext101_32x8d()
shufflenet_v2_x0_5 = models.quantization.shufflenet_v2_x0_5()
shufflenet_v2_x1_0 = models.quantization.shufflenet_v2_x1_0()
shufflenet_v2_x1_5 = models.quantization.shufflenet_v2_x1_5()
shufflenet_v2_x2_0 = models.quantization.shufflenet_v2_x2_0()

　これらのモデルを利用するときは、上の例のように


import torchvision.models as models
model = models.quantization.inception_v3(pretrained=True, quantize=True)
model.eval()

　とします。興味のある人は好みのモデルで実証して下さい。詳しくは、pytorch.org/vision/stable/models.html#quantized-modelsのページを参照ください。

Pytorch-YOLOv5による推論

　次に、Yolo(You Only Look Once)v5 バージョンのPytorch実装を取り上げます。このパッケージを利用するためには、Python>=3.8 および PyTorch>=1.7 がインストールされていることが必要です。ultralytics/yolov5を参照してください。そのGoogle Colab はYOLOv5_tutorials_roboflow.ipynbにあります。以下のように git clone します


$ git clone https://github.com/ultralytics/yolov5
$ cd yolov5
$ pip install -r requirements.txt

　これで準備はできました。通常は、Jupyter notebook を起動して、tutorial.ipynb を使用して実行します。

　ここでは、Python スクリプトを用いた物体検出を行ってみましょう。以下のように入力して下さい。


$ python detect.py --source data/images --weights yolov5s.pt --conf 0.25

　これは data/images ディレクトリにある２種類の画像からの物体検出の実行例です。学習済みモデルのyolov5s.ptがダウンロードされます。結果は以下のように、 runs/detect/exp 内に保存されます。bus.jpg 、 zidane.jpg です。

result:
/home/pi/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python
extension:
warn(f"Failed to load image Python extension: {e}")
Downloading https://ultralytics.com/assets/Arial.ttf to /home/pi/.config/Ultralytics/Arial.ttf...
detect: weights=['yolov5s.pt'], source=data/images, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25,
iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False,
classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp,
exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-105-gd257c75 torch 1.11.0 CPU

Downloading https://github.com/ultralytics/yolov5/releases/download/v6.1/yolov5s.pt to yolov5s.pt...
100%|██████████████████████████████████████| 14.1M/14.1M [00:08<00:00, 1.78MB/s] Fusing layers... YOLOv5s summary: 213
layers, 7225885 parameters, 0 gradients 
image 1/2 /home/pi/notebooks/yolov5/data/images/bus.jpg: 640x480 4 persons, 1
  bus, Done. (0.872s) 
image 2/2 /home/pi/notebooks/yolov5/data/images/zidane.jpg: 384x640 2 persons, 2 ties, Done.
  (0.688s) Speed: 6.8ms pre-process, 779.9ms inference, 7.9ms NMS per image at shape (1, 3, 640, 640) 
Results saved to runs/detect/exp

Zidane.jpg

　異なる学習済みモデルを、例えば、 yolov5mを使用するときは、以下のようにします。


$ python detect.py --source data/images --weights yolov5m.pt --conf 0.25

　jupyter notebookを使用することを取り上げます。リモートPC で、jupyter notebookを起動して、ブラウザから http://[Raspberry PiのIPアドレス]:8888/raspi/ とアクセスします。notebooks/yolov5 を開きます。新規ノートブックを開き、以下のようにセルを記述します。最初のセルに


#!git clone https://github.com/ultralytics/yolov5 # clone repo
%cd yolov5

import torch
from IPython.display import Image, clear_output # to display images

clear_output()
print('Setup complete. Using torch %s %s' % (torch.__version__, torch.cuda.get_device_properties(0) if
torch.cuda.is_available() else 'CPU'))

　このセルを実行後、２番目のセルに以下の内容を作成します


!python3 detect.py --weights yolov5s.pt --img 640 --conf 0.25 --source data/images/
Image(filename='runs/detect/exp/zidane.jpg', width=600)

このセルを実行すると、上記の画像 zidane.jpg が表示されます。

　全部のテスト画像を表示するには


#display inference on ALL test images

import glob
from IPython.display import Image, display
for imageName in glob.glob('./runs/detect/exp/*.jpg'): #assuming jpg
   display(Image(filename=imageName))
   print("\n")

　というセルを作成して、実行します。zidane.jpg と bus.jpg の物体検出後の画像が表示されます。

　次に、webcamera のライブ映像から物体検出を実行しましょう。webcamera をRaspberry Piに接続して、リモートPCでNVC Viewerを起動します。Raspberry Piのデスクトップ画面のターミナルを開いて、以下のように打って下さい。


$ python detect.py --source 0

----- result:
/home/pi/.local/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python
extension:
warn(f"Failed to load image Python extension: {e}")
detect: weights=yolov5s.pt, source=0, data=data/coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45,
max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None,
agnostic_nms=False, augment=False, visualize=False, update=False, project=runs/detect, name=exp, exist_ok=False,
line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5 🚀 v6.1-105-gd257c75 torch 1.11.0 CPU

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
qt.qpa.xcb: QXcbConnection: XCB error: 148 (Unknown), sequence: 186, resource id: 0, major code: 140 (Unknown), minor
code: 20
1/1: 0... Success (inf frames 640x480 at 30.00 FPS)

0: 480x640 1 bottle, 1 cup, 1 mouse, 9 books, 1 toothbrush, Done. (1.013s)
0: 480x640 1 bottle, 1 cup, 1 mouse, 10 books, Done. (1.001s)
0: 480x640 1 bottle, 1 cup, 1 mouse, 10 books, 2 toothbrushs, Done. (0.941s)
0: 480x640 1 person, 1 bottle, 1 cup, 10 books, 2 toothbrushs, Done. (0.977s)
0: 480x640 1 bottle, 1 cup, 1 mouse, 10 books, 2 toothbrushs, Done. (0.965s)
0: 480x640 2 persons, 1 bottle, 1 cup, 10 books, 2 toothbrushs, Done. (0.927s)
--

　コマンドは非常に簡単です。デスクトップ画面に物体検出のなされた映像がポップアップします。検出速度は非常に速いです。

--------
このページのトップへ
Coral Edge TPU を用いてObject Detectionのページに
トップページに戻る