第15回 機械学習自動化技術(AutoML)に触れてみよう!#
この授業で学ぶこと#
最終回の今回は、14回で学んだ機械学習を自動化する技術であるAutoMLがどのようなものかを知るため、Fujitsu AutoMLの無料版を使って機械学習モデルの自動構築を体験してもらう。
Fujitsu AutoMLの概要はこちらのリンクを参照のこと。
利用手順#
まずはFujitsu AutoMLのライブラリをインストールする必要がある。 pip install
でインストールすることが出来る。
今回も、前回利用してきたダイヤモンドのデータを利用する。
pip install fujitsu-automl
Requirement already satisfied: fujitsu-automl in /usr/local/lib/python3.10/dist-packages (3.1.1)
Requirement already satisfied: jsonrpcclient<5.0.0,>=4.0.3 in /usr/local/lib/python3.10/dist-packages (from fujitsu-automl) (4.0.3)
Requirement already satisfied: msal<2.0.0,>=1.23.0 in /usr/local/lib/python3.10/dist-packages (from fujitsu-automl) (1.26.0)
Requirement already satisfied: sapientml in /usr/local/lib/python3.10/dist-packages (from fujitsu-automl) (0.4.12.post0)
Requirement already satisfied: sapientml-core in /usr/local/lib/python3.10/dist-packages (from fujitsu-automl) (0.5.4.post3)
Requirement already satisfied: toml<0.11.0,>=0.10.2 in /usr/local/lib/python3.10/dist-packages (from fujitsu-automl) (0.10.2)
Requirement already satisfied: requests<3,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from msal<2.0.0,>=1.23.0->fujitsu-automl) (2.31.0)
Requirement already satisfied: PyJWT[crypto]<3,>=1.0.0 in /usr/lib/python3/dist-packages (from msal<2.0.0,>=1.23.0->fujitsu-automl) (2.3.0)
Requirement already satisfied: cryptography<44,>=0.6 in /usr/local/lib/python3.10/dist-packages (from msal<2.0.0,>=1.23.0->fujitsu-automl) (41.0.7)
Requirement already satisfied: numpy<2.0.0,>=1.19.5 in /usr/local/lib/python3.10/dist-packages (from sapientml->fujitsu-automl) (1.23.5)
Requirement already satisfied: pandas<3.0.0,>=2.0.3 in /usr/local/lib/python3.10/dist-packages (from sapientml->fujitsu-automl) (2.1.4)
Requirement already satisfied: pydantic<3.0.0,>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from sapientml->fujitsu-automl) (2.5.3)
Requirement already satisfied: catboost<2.0,>=1.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (1.2.2)
Requirement already satisfied: imbalanced-learn<0.12.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.11.0)
Requirement already satisfied: ipykernel<7.0.0,>=6.25.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (6.29.0)
Requirement already satisfied: japanize-matplotlib<2.0.0,>=1.1.3 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (1.1.3)
Requirement already satisfied: jinja2<4.0.0,>=3.1.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (3.1.3)
Requirement already satisfied: libcst<2.0.0,>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (1.1.0)
Requirement already satisfied: lightgbm<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (4.1.0)
Requirement already satisfied: nbconvert<8.0.0,>=7.7.4 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (7.14.2)
Requirement already satisfied: nbformat<6.0.0,>=5.9.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (5.9.2)
Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (3.8.1)
Requirement already satisfied: numba<0.59.0,>=0.57.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.57.1)
Requirement already satisfied: optuna<4.0.0,>=3.2.0 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (3.5.0)
Requirement already satisfied: sapientml-loaddata in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.4.3)
Requirement already satisfied: sapientml-preprocess in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.3.6)
Requirement already satisfied: scikit-learn==1.3.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (1.3.2)
Requirement already satisfied: scipy<2.0.0,>=1.11.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (1.11.4)
Requirement already satisfied: seaborn<0.14.0,>=0.12.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.13.1)
Requirement already satisfied: shap<0.45,>=0.43 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (0.44.0)
Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (4.66.1)
Requirement already satisfied: xgboost<3.0.0,>=1.7.6 in /usr/local/lib/python3.10/dist-packages (from sapientml-core->fujitsu-automl) (2.0.3)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.3.2->sapientml-core->fujitsu-automl) (1.3.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn==1.3.2->sapientml-core->fujitsu-automl) (3.2.0)
Requirement already satisfied: graphviz in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (0.20.1)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (3.7.1)
Requirement already satisfied: plotly in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (5.15.0)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (1.16.0)
Requirement already satisfied: cffi>=1.12 in /usr/local/lib/python3.10/dist-packages (from cryptography<44,>=0.6->msal<2.0.0,>=1.23.0->fujitsu-automl) (1.16.0)
Requirement already satisfied: comm>=0.1.1 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.2.1)
Requirement already satisfied: debugpy>=1.6.5 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (1.6.6)
Requirement already satisfied: ipython>=7.23.1 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (7.34.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (6.1.12)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (5.7.1)
Requirement already satisfied: matplotlib-inline>=0.1 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.1.6)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (1.5.8)
Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (23.2)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (5.9.5)
Requirement already satisfied: pyzmq>=24 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (25.1.2)
Requirement already satisfied: tornado>=6.1 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (6.3.2)
Requirement already satisfied: traitlets>=5.4.0 in /usr/local/lib/python3.10/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (5.7.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2<4.0.0,>=3.1.2->sapientml-core->fujitsu-automl) (2.1.3)
Requirement already satisfied: typing-extensions>=3.7.4.2 in /usr/local/lib/python3.10/dist-packages (from libcst<2.0.0,>=1.0.1->sapientml-core->fujitsu-automl) (4.9.0)
Requirement already satisfied: typing-inspect>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from libcst<2.0.0,>=1.0.1->sapientml-core->fujitsu-automl) (0.9.0)
Requirement already satisfied: pyyaml>=5.2 in /usr/local/lib/python3.10/dist-packages (from libcst<2.0.0,>=1.0.1->sapientml-core->fujitsu-automl) (6.0.1)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (4.11.2)
Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (6.1.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (0.7.1)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (0.3.0)
Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (3.0.2)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (0.9.0)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (1.5.0)
Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (2.16.1)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.10/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (1.2.1)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.10/dist-packages (from nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (2.19.1)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.10/dist-packages (from nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (4.19.2)
Requirement already satisfied: click in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->sapientml-core->fujitsu-automl) (8.1.7)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.10/dist-packages (from nltk<4.0.0,>=3.8.1->sapientml-core->fujitsu-automl) (2023.6.3)
Requirement already satisfied: llvmlite<0.41,>=0.40.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba<0.59.0,>=0.57.1->sapientml-core->fujitsu-automl) (0.40.1)
Requirement already satisfied: alembic>=1.5.0 in /usr/local/lib/python3.10/dist-packages (from optuna<4.0.0,>=3.2.0->sapientml-core->fujitsu-automl) (1.13.1)
Requirement already satisfied: colorlog in /usr/local/lib/python3.10/dist-packages (from optuna<4.0.0,>=3.2.0->sapientml-core->fujitsu-automl) (6.8.0)
Requirement already satisfied: sqlalchemy>=1.3.0 in /usr/local/lib/python3.10/dist-packages (from optuna<4.0.0,>=3.2.0->sapientml-core->fujitsu-automl) (2.0.24)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml->fujitsu-automl) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml->fujitsu-automl) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml->fujitsu-automl) (2023.4)
Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.1.1->sapientml->fujitsu-automl) (0.6.0)
Requirement already satisfied: pydantic-core==2.14.6 in /usr/local/lib/python3.10/dist-packages (from pydantic<3.0.0,>=2.1.1->sapientml->fujitsu-automl) (2.14.6)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.0.0->msal<2.0.0,>=1.23.0->fujitsu-automl) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.0.0->msal<2.0.0,>=1.23.0->fujitsu-automl) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.0.0->msal<2.0.0,>=1.23.0->fujitsu-automl) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2.0.0->msal<2.0.0,>=1.23.0->fujitsu-automl) (2023.11.17)
Requirement already satisfied: slicer==0.0.7 in /usr/local/lib/python3.10/dist-packages (from shap<0.45,>=0.43->sapientml-core->fujitsu-automl) (0.0.7)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from shap<0.45,>=0.43->sapientml-core->fujitsu-automl) (2.2.1)
Requirement already satisfied: fasttext-wheel<0.10.0,>=0.9.2 in /usr/local/lib/python3.10/dist-packages (from sapientml-preprocess->sapientml-core->fujitsu-automl) (0.9.2)
Requirement already satisfied: ipadic<2.0.0,>=1.0.0 in /usr/local/lib/python3.10/dist-packages (from sapientml-preprocess->sapientml-core->fujitsu-automl) (1.0.0)
Requirement already satisfied: mecab-python3<2.0.0,>=1.0.6 in /usr/local/lib/python3.10/dist-packages (from sapientml-preprocess->sapientml-core->fujitsu-automl) (1.0.8)
Requirement already satisfied: Mako in /usr/local/lib/python3.10/dist-packages (from alembic>=1.5.0->optuna<4.0.0,>=3.2.0->sapientml-core->fujitsu-automl) (1.3.0)
Requirement already satisfied: webencodings in /usr/local/lib/python3.10/dist-packages (from bleach!=5.0.0->nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (0.5.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.10/dist-packages (from cffi>=1.12->cryptography<44,>=0.6->msal<2.0.0,>=1.23.0->fujitsu-automl) (2.21)
Requirement already satisfied: pybind11>=2.2 in /usr/local/lib/python3.10/dist-packages (from fasttext-wheel<0.10.0,>=0.9.2->sapientml-preprocess->sapientml-core->fujitsu-automl) (2.11.1)
Requirement already satisfied: setuptools>=0.7.0 in /usr/local/lib/python3.10/dist-packages (from fasttext-wheel<0.10.0,>=0.9.2->sapientml-preprocess->sapientml-core->fujitsu-automl) (67.7.2)
Requirement already satisfied: jedi>=0.16 in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.19.1)
Requirement already satisfied: decorator in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (3.0.43)
Requirement already satisfied: backcall in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.10/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (4.9.0)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (23.2.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (2023.12.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (0.32.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core->fujitsu-automl) (0.16.2)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.10/dist-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (4.1.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (4.47.2)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (1.4.5)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (9.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (3.1.1)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sqlalchemy>=1.3.0->optuna<4.0.0,>=3.2.0->sapientml-core->fujitsu-automl) (3.0.3)
Requirement already satisfied: mypy-extensions>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from typing-inspect>=0.4.0->libcst<2.0.0,>=1.0.1->sapientml-core->fujitsu-automl) (1.0.0)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->nbconvert<8.0.0,>=7.7.4->sapientml-core->fujitsu-automl) (2.5)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly->catboost<2.0,>=1.2->sapientml-core->fujitsu-automl) (8.2.3)
Requirement already satisfied: parso<0.9.0,>=0.8.3 in /usr/local/lib/python3.10/dist-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.10/dist-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.10/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core->fujitsu-automl) (0.2.13)
import pandas as pd
from sapientml import SapientML
from sapientml.util.logging import setup_logger
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import seaborn as sns
import os
# ToyoNet-ACEで公開しているアカウント情報を貼り付ける。2024/07/17まで有効
train_data = sns.load_dataset('diamonds')
train_data
carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.23 | Ideal | E | SI2 | 61.5 | 55.0 | 326 | 3.95 | 3.98 | 2.43 |
1 | 0.21 | Premium | E | SI1 | 59.8 | 61.0 | 326 | 3.89 | 3.84 | 2.31 |
2 | 0.23 | Good | E | VS1 | 56.9 | 65.0 | 327 | 4.05 | 4.07 | 2.31 |
3 | 0.29 | Premium | I | VS2 | 62.4 | 58.0 | 334 | 4.20 | 4.23 | 2.63 |
4 | 0.31 | Good | J | SI2 | 63.3 | 58.0 | 335 | 4.34 | 4.35 | 2.75 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
53935 | 0.72 | Ideal | D | SI1 | 60.8 | 57.0 | 2757 | 5.75 | 5.76 | 3.50 |
53936 | 0.72 | Good | D | SI1 | 63.1 | 55.0 | 2757 | 5.69 | 5.75 | 3.61 |
53937 | 0.70 | Very Good | D | SI1 | 62.8 | 60.0 | 2757 | 5.66 | 5.68 | 3.56 |
53938 | 0.86 | Premium | H | SI2 | 61.0 | 58.0 | 2757 | 6.15 | 6.12 | 3.74 |
53939 | 0.75 | Ideal | D | SI2 | 62.2 | 55.0 | 2757 | 5.83 | 5.87 | 3.64 |
53940 rows × 10 columns
Fujitsu AutoMLでは、データセットと目的変数を指定することでその目的変数を予測するのに適している機械学習モデルの種類と前処理を自動で選択し、プログラムを組み立てて学習・推論を実行することができる。
以下のコードセルでは、ダイヤモンドのデータの cut
列を予測するモデルを構築する。
train_data, test_data = train_test_split(train_data)
y_true = test_data["cut"].reset_index(drop=True)
test_data.drop(["cut"], axis=1, inplace=True)
cls = SapientML(["cut"], model_type="fujitsu-automl")
setup_logger().handlers.clear()
cls.fit(train_data)
INFO:sapientml:Loading dataset...
WARNING:sapientml:Metric is not specified. Use 'f1' by default.
INFO:sapientml:Generating pipelines...
INFO:sapientml:Generating meta features...
INFO:sapientml:Calling WebAPIs for generating pipelines...
INFO:sapientml:Authenticating app via OAuth 2.0 Client Credentials Flow
INFO:sapientml:Experiment Id: f9afa7fb-2f4d-4bed-9c71-60faa1c1e8a7
INFO:sapientml:Executing generated pipelines...
INFO:sapientml:Running script (1/3)...
INFO:sapientml:Running script (2/3)...
INFO:sapientml:Running script (3/3)...
INFO:sapientml:Evaluating execution results of generated pipelines...
INFO:sapientml:Building model by generated pipeline...
INFO:sapientml:Done.
<sapientml.main.SapientML at 0x78b0591f7f70>
構築したモデルで予測を行うには以下のコードセルを実行する。
y_pred = cls.predict(test_data)
print("Accuracy:", accuracy_score(y_true, y_pred))
INFO:sapientml:Predicting by built model...
Accuracy: 0.757285873192436
モデルの構築・予測のために生成されたプログラムは以下で確認することができる。
train_script = cls.model.files["final_train.py"].decode("utf-8")
print(train_script)
# *** GENERATED PIPELINE ***
# LOAD DATA
import pandas as pd
train_dataset = pd.read_pickle("./training.pkl")
import pickle
# PREPROCESSING-1
import numpy as np
NUMERIC_COLS_TO_SCALE = ['carat', 'depth', 'table', 'price']
train_dataset[NUMERIC_COLS_TO_SCALE] = np.log1p(train_dataset[NUMERIC_COLS_TO_SCALE])
# DETACH TARGET
TARGET_COLUMNS = ['cut']
feature_train = train_dataset.drop(TARGET_COLUMNS, axis=1)
target_train = train_dataset[TARGET_COLUMNS].copy()
# PREPROCESSING-2
from sklearn.preprocessing import OneHotEncoder
CATEGORICAL_COLS = ['color', 'clarity']
onehot_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
train_encoded = pd.DataFrame(onehot_encoder.fit_transform(feature_train[CATEGORICAL_COLS]), columns=onehot_encoder.get_feature_names_out(), index=feature_train.index)
feature_train = pd.concat([feature_train, train_encoded ], axis=1)
feature_train.drop(CATEGORICAL_COLS, axis=1, inplace=True)
with open('oneHotEncoder.pkl', 'wb') as f:
pickle.dump(onehot_encoder, f)
# MODEL
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
random_state_model = 42
model = GradientBoostingClassifier(random_state=random_state_model, )
model.fit(feature_train, target_train.values.ravel())
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
predict_script = cls.model.files["final_predict.py"].decode("utf-8")
print(predict_script)
# *** GENERATED PIPELINE ***
# LOAD DATA
import pandas as pd
test_dataset = pd.read_pickle("./test.pkl")
import pickle
# PREPROCESSING-1
import numpy as np
NUMERIC_COLS_TO_SCALE = ['carat', 'depth', 'table', 'price']
NUMERIC_COLS_TO_SCALE_FOR_TEST = list(set(test_dataset.columns) & set(NUMERIC_COLS_TO_SCALE))
test_dataset[NUMERIC_COLS_TO_SCALE_FOR_TEST] = np.log1p(test_dataset[NUMERIC_COLS_TO_SCALE_FOR_TEST])
# DETACH TARGET
TARGET_COLUMNS = ['cut']
if set(TARGET_COLUMNS).issubset(test_dataset.columns.tolist()):
feature_test = test_dataset.drop(TARGET_COLUMNS, axis=1)
target_test = test_dataset[TARGET_COLUMNS].copy()
else:
feature_test = test_dataset
# PREPROCESSING-2
with open('oneHotEncoder.pkl', 'rb') as f:
onehot_encoder = pickle.load(f)
CATEGORICAL_COLS = ['color', 'clarity']
test_encoded = pd.DataFrame(onehot_encoder.transform(feature_test[CATEGORICAL_COLS]), columns=onehot_encoder.get_feature_names_out(), index=feature_test.index)
feature_test = pd.concat([feature_test, test_encoded ], axis=1)
feature_test.drop(CATEGORICAL_COLS, axis=1, inplace=True)
# MODEL
import numpy as np
with open('model.pkl', 'rb') as f:
model = pickle.load(f)
y_pred = model.predict(feature_test)
#EVALUATION
if set(TARGET_COLUMNS).issubset(test_dataset.columns.tolist()):
from sklearn import metrics
f1 = metrics.f1_score(target_test, y_pred, average='macro')
print('RESULT: F1 Score: ' + str(f1))
# OUTPUT PREDICTION
prediction = pd.DataFrame(y_pred, columns=TARGET_COLUMNS, index=feature_test.index)
prediction.to_csv("./prediction_result.csv")
おわりに#
プログラミング実習講義3,4では、Pythonによるプログラミングの基礎、基本的なアルゴリズムやデータ構造、また応用としてデータサイエンスおよび機械学習、AutoMLについて学んできた。過去に蓄積されてきたおびただしい量の便利なライブラリ・ツールに加えて近年では生成AIやノーコード・ローコード開発等の台頭により、人が必ずしも全てのコードを書かずとも高度なソフトウェアを開発することが出来ることになった。しかしながら、それらを正しく使いこなすにはプログラミングの素養がまだまだ重要である。経営学部の学生の皆様におかれても、この講義で学んだことをきっかけにプログラミングの素養を今後も引き続き磨いていってもらえることを願う。
演習#
課題
Fujitsu AutoMLを用いて、インターネットで入手可能な適当なデータセット(CSVファイル)および適当な目的変数を設定して機械学習モデルを構築し、学習・予測のために生成されたプログラムを確認しなさい。
以下のコードセルでは、URLで直接ダウンロード可能なデータセットを例として記載している。ここに新たなURLを指定するか、Google DriveにCSVファイルを置いて新しいデータを読み込んで試してみよう。Google Driveに置く方法で実施した場合は、レポートにCSVファイルも一緒に添付して提出してほしい。
なお、データセット・目的変数の選び方によってはFujitsu AutoMLのバグに起因するエラーが発生する可能性がある。データセット・目的変数の選び方が妥当かつどうしてもエラーが解消しない場合は、エラーのまま提出してもらって構わない。バグの報告は開発者にとってはとてもありがたいものである。
train_data = pd.read_csv("https://github.com/sapientml/sapientml/files/12481088/titanic.csv")
train_data
train_data, test_data = train_test_split(train_data)
y_true = test_data["survived"].reset_index(drop=True)
test_data.drop(["survived"], axis=1, inplace=True)
cls = SapientML(["survived"], model_type="fujitsu-automl")
setup_logger().handlers.clear()
cls.fit(train_data)
train_script = cls.model.files["final_train.py"].decode("utf-8")
print(train_script)
predict_script = cls.model.files["final_predict.py"].decode("utf-8")
print(predict_script)