第15回 機械学習自動化技術(AutoML)に触れてみよう!#
この授業で学ぶこと#
最終回の今回は、14回で学んだ機械学習を自動化する技術であるAutoMLがどのようなものかを知るため、SapientMLというOSSを使って機械学習モデルの自動構築を体験してもらう。
利用手順#
まずはSapientMLのライブラリをインストールする必要がある。 pip install
でインストールすることが出来る。
今回も、前回利用してきたダイヤモンドのデータを利用する。
pip install -U sapientml sapientml-core==0.6.2
Collecting sapientml
Downloading sapientml-0.4.15-py3-none-any.whl.metadata (10 kB)
Collecting sapientml-core==0.6.2
Downloading sapientml_core-0.6.2-py3-none-any.whl.metadata (1.5 kB)
Collecting catboost>=1.2.3 (from sapientml-core==0.6.2)
Downloading catboost-1.2.7-cp311-cp311-manylinux2014_x86_64.whl.metadata (1.2 kB)
Collecting fasttext-wheel<0.10.0,>=0.9.2 (from sapientml-core==0.6.2)
Downloading fasttext_wheel-0.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (16 kB)
Collecting imbalanced-learn<0.13,>=0.11 (from sapientml-core==0.6.2)
Downloading imbalanced_learn-0.12.4-py3-none-any.whl.metadata (8.3 kB)
Collecting ipadic<2.0.0,>=1.0.0 (from sapientml-core==0.6.2)
Downloading ipadic-1.0.0.tar.gz (13.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.4/13.4 MB 63.1 MB/s eta 0:00:00
?25h Preparing metadata (setup.py) ... ?25l?25hdone
Collecting ipykernel<7.0.0,>=6.25.1 (from sapientml-core==0.6.2)
Downloading ipykernel-6.29.5-py3-none-any.whl.metadata (6.3 kB)
Collecting japanize-matplotlib<2.0.0,>=1.1.3 (from sapientml-core==0.6.2)
Downloading japanize-matplotlib-1.1.3.tar.gz (4.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.1/4.1 MB 78.3 MB/s eta 0:00:00
?25h Preparing metadata (setup.py) ... ?25l?25hdone
Requirement already satisfied: jinja2<4.0.0,>=3.1.2 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (3.1.5)
Collecting libcst<2.0.0,>=1.0.1 (from sapientml-core==0.6.2)
Downloading libcst-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (17 kB)
Requirement already satisfied: lightgbm<5.0.0,>=4.0.0 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (4.5.0)
Collecting mecab-python3<2.0.0,>=1.0.6 (from sapientml-core==0.6.2)
Downloading mecab_python3-1.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.2 kB)
Requirement already satisfied: nbconvert<8.0.0,>=7.7.4 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (7.16.5)
Requirement already satisfied: nbformat<6.0.0,>=5.9.2 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (5.10.4)
Requirement already satisfied: nltk<4.0.0,>=3.8.1 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (3.9.1)
Requirement already satisfied: numba<0.61.0,>=0.57.1 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (0.60.0)
Collecting optuna<4.0.0,>=3.2.0 (from sapientml-core==0.6.2)
Downloading optuna-3.6.1-py3-none-any.whl.metadata (17 kB)
Requirement already satisfied: requests<3.0.0,>=2.31.0 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (2.32.3)
Collecting scikit-learn==1.3.2 (from sapientml-core==0.6.2)
Downloading scikit_learn-1.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Requirement already satisfied: scipy<2.0.0,>=1.11.1 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (1.13.1)
Requirement already satisfied: seaborn<0.14.0,>=0.12.2 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (0.13.2)
Collecting shap<0.46,>=0.43 (from sapientml-core==0.6.2)
Downloading shap-0.45.1-cp311-cp311-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (24 kB)
Requirement already satisfied: tqdm<5.0.0,>=4.66.1 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (4.67.1)
Requirement already satisfied: xgboost<3.0.0,>=1.7.6 in /usr/local/lib/python3.11/dist-packages (from sapientml-core==0.6.2) (2.1.3)
Requirement already satisfied: numpy<2.0,>=1.17.3 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.3.2->sapientml-core==0.6.2) (1.26.4)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.3.2->sapientml-core==0.6.2) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn==1.3.2->sapientml-core==0.6.2) (3.5.0)
Requirement already satisfied: pandas<3.0.0,>=2.0.3 in /usr/local/lib/python3.11/dist-packages (from sapientml) (2.2.2)
Requirement already satisfied: pydantic<3.0.0,>=2.1.1 in /usr/local/lib/python3.11/dist-packages (from sapientml) (2.10.5)
Requirement already satisfied: graphviz in /usr/local/lib/python3.11/dist-packages (from catboost>=1.2.3->sapientml-core==0.6.2) (0.20.3)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (from catboost>=1.2.3->sapientml-core==0.6.2) (3.10.0)
Requirement already satisfied: plotly in /usr/local/lib/python3.11/dist-packages (from catboost>=1.2.3->sapientml-core==0.6.2) (5.24.1)
Requirement already satisfied: six in /usr/local/lib/python3.11/dist-packages (from catboost>=1.2.3->sapientml-core==0.6.2) (1.17.0)
Collecting pybind11>=2.2 (from fasttext-wheel<0.10.0,>=0.9.2->sapientml-core==0.6.2)
Downloading pybind11-2.13.6-py3-none-any.whl.metadata (9.5 kB)
Requirement already satisfied: setuptools>=0.7.0 in /usr/local/lib/python3.11/dist-packages (from fasttext-wheel<0.10.0,>=0.9.2->sapientml-core==0.6.2) (75.1.0)
Collecting comm>=0.1.1 (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2)
Downloading comm-0.2.2-py3-none-any.whl.metadata (3.7 kB)
Requirement already satisfied: debugpy>=1.6.5 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (1.8.0)
Requirement already satisfied: ipython>=7.23.1 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (7.34.0)
Requirement already satisfied: jupyter-client>=6.1.12 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (6.1.12)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (5.7.2)
Requirement already satisfied: matplotlib-inline>=0.1 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.1.7)
Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (1.6.0)
Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (24.2)
Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (5.9.5)
Requirement already satisfied: pyzmq>=24 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (24.0.1)
Requirement already satisfied: tornado>=6.1 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (6.3.3)
Requirement already satisfied: traitlets>=5.4.0 in /usr/local/lib/python3.11/dist-packages (from ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (5.7.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2<4.0.0,>=3.1.2->sapientml-core==0.6.2) (3.0.2)
Requirement already satisfied: pyyaml>=5.2 in /usr/local/lib/python3.11/dist-packages (from libcst<2.0.0,>=1.0.1->sapientml-core==0.6.2) (6.0.2)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (4.12.3)
Requirement already satisfied: bleach!=5.0.0 in /usr/local/lib/python3.11/dist-packages (from bleach[css]!=5.0.0->nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (6.2.0)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (0.7.1)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (0.3.0)
Requirement already satisfied: mistune<4,>=2.0.3 in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (3.1.0)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (0.10.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (1.5.1)
Requirement already satisfied: pygments>=2.4.1 in /usr/local/lib/python3.11/dist-packages (from nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (2.18.0)
Requirement already satisfied: fastjsonschema>=2.15 in /usr/local/lib/python3.11/dist-packages (from nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (2.21.1)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.11/dist-packages (from nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (4.23.0)
Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from nltk<4.0.0,>=3.8.1->sapientml-core==0.6.2) (8.1.8)
Requirement already satisfied: regex>=2021.8.3 in /usr/local/lib/python3.11/dist-packages (from nltk<4.0.0,>=3.8.1->sapientml-core==0.6.2) (2024.11.6)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.11/dist-packages (from numba<0.61.0,>=0.57.1->sapientml-core==0.6.2) (0.43.0)
Collecting alembic>=1.5.0 (from optuna<4.0.0,>=3.2.0->sapientml-core==0.6.2)
Downloading alembic-1.14.1-py3-none-any.whl.metadata (7.4 kB)
Collecting colorlog (from optuna<4.0.0,>=3.2.0->sapientml-core==0.6.2)
Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: sqlalchemy>=1.3.0 in /usr/local/lib/python3.11/dist-packages (from optuna<4.0.0,>=3.2.0->sapientml-core==0.6.2) (2.0.37)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas<3.0.0,>=2.0.3->sapientml) (2024.2)
Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.1.1->sapientml) (0.7.0)
Requirement already satisfied: pydantic-core==2.27.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.1.1->sapientml) (2.27.2)
Requirement already satisfied: typing-extensions>=4.12.2 in /usr/local/lib/python3.11/dist-packages (from pydantic<3.0.0,>=2.1.1->sapientml) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.31.0->sapientml-core==0.6.2) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.31.0->sapientml-core==0.6.2) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.31.0->sapientml-core==0.6.2) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.11/dist-packages (from requests<3.0.0,>=2.31.0->sapientml-core==0.6.2) (2024.12.14)
Requirement already satisfied: slicer==0.0.8 in /usr/local/lib/python3.11/dist-packages (from shap<0.46,>=0.43->sapientml-core==0.6.2) (0.0.8)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.11/dist-packages (from shap<0.46,>=0.43->sapientml-core==0.6.2) (3.1.0)
Requirement already satisfied: nvidia-nccl-cu12 in /usr/local/lib/python3.11/dist-packages (from xgboost<3.0.0,>=1.7.6->sapientml-core==0.6.2) (2.21.5)
Collecting Mako (from alembic>=1.5.0->optuna<4.0.0,>=3.2.0->sapientml-core==0.6.2)
Downloading Mako-1.3.8-py3-none-any.whl.metadata (2.9 kB)
Requirement already satisfied: webencodings in /usr/local/lib/python3.11/dist-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (0.5.1)
Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from bleach[css]!=5.0.0->nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (1.4.0)
Collecting jedi>=0.16 (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2)
Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)
Requirement already satisfied: decorator in /usr/local/lib/python3.11/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (4.4.2)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.11/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (3.0.48)
Requirement already satisfied: backcall in /usr/local/lib/python3.11/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.2.0)
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.11/dist-packages (from ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (4.9.0)
Requirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (24.3.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (2024.10.1)
Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (0.35.1)
Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from jsonschema>=2.6->nbformat<6.0.0,>=5.9.2->sapientml-core==0.6.2) (0.22.3)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.11/dist-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (4.3.6)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (1.3.1)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (4.55.3)
Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (1.4.8)
Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->catboost>=1.2.3->sapientml-core==0.6.2) (3.2.1)
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.11/dist-packages (from sqlalchemy>=1.3.0->optuna<4.0.0,>=3.2.0->sapientml-core==0.6.2) (3.1.1)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.11/dist-packages (from beautifulsoup4->nbconvert<8.0.0,>=7.7.4->sapientml-core==0.6.2) (2.6)
Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.11/dist-packages (from plotly->catboost>=1.2.3->sapientml-core==0.6.2) (9.0.0)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.11/dist-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.8.4)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.11/dist-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.11/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=7.23.1->ipykernel<7.0.0,>=6.25.1->sapientml-core==0.6.2) (0.2.13)
Downloading sapientml_core-0.6.2-py3-none-any.whl (230 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 230.2/230.2 kB 15.6 MB/s eta 0:00:00
?25hDownloading scikit_learn-1.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 89.1 MB/s eta 0:00:00
?25hDownloading sapientml-0.4.15-py3-none-any.whl (29 kB)
Downloading catboost-1.2.7-cp311-cp311-manylinux2014_x86_64.whl (98.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.7/98.7 MB 8.0 MB/s eta 0:00:00
?25hDownloading fasttext_wheel-0.9.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.4/4.4 MB 84.8 MB/s eta 0:00:00
?25hDownloading imbalanced_learn-0.12.4-py3-none-any.whl (258 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 258.3/258.3 kB 15.1 MB/s eta 0:00:00
?25hDownloading ipykernel-6.29.5-py3-none-any.whl (117 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 117.2/117.2 kB 7.9 MB/s eta 0:00:00
?25hDownloading libcst-1.6.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 52.5 MB/s eta 0:00:00
?25hDownloading mecab_python3-1.0.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.8/588.8 kB 31.9 MB/s eta 0:00:00
?25hDownloading optuna-3.6.1-py3-none-any.whl (380 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 380.1/380.1 kB 22.2 MB/s eta 0:00:00
?25hDownloading shap-0.45.1-cp311-cp311-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (540 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 540.6/540.6 kB 31.1 MB/s eta 0:00:00
?25hDownloading alembic-1.14.1-py3-none-any.whl (233 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.6/233.6 kB 16.7 MB/s eta 0:00:00
?25hDownloading comm-0.2.2-py3-none-any.whl (7.2 kB)
Downloading pybind11-2.13.6-py3-none-any.whl (243 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 243.3/243.3 kB 15.4 MB/s eta 0:00:00
?25hDownloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 52.6 MB/s eta 0:00:00
?25hDownloading Mako-1.3.8-py3-none-any.whl (78 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.6/78.6 kB 5.2 MB/s eta 0:00:00
?25hBuilding wheels for collected packages: ipadic, japanize-matplotlib
Building wheel for ipadic (setup.py) ... ?25l?25hdone
Created wheel for ipadic: filename=ipadic-1.0.0-py3-none-any.whl size=13556704 sha256=b9904d03b6041aab0f32410ac1d1245bc32cb99304fe631cde5de70cdd46560a
Stored in directory: /root/.cache/pip/wheels/44/56/37/f543963822b85260c9f948df8fac8c20169c80dc71b24dc407
Building wheel for japanize-matplotlib (setup.py) ... ?25l?25hdone
Created wheel for japanize-matplotlib: filename=japanize_matplotlib-1.1.3-py3-none-any.whl size=4120257 sha256=1b4c7b9f2b899283cb9c9acc6473dd43f7b654a06e7b87cc1aae5662a75e110f
Stored in directory: /root/.cache/pip/wheels/da/a1/71/b8faeb93276fed10edffcca20746f1ef6f8d9e071eee8425fc
Successfully built ipadic japanize-matplotlib
Installing collected packages: mecab-python3, ipadic, pybind11, Mako, libcst, jedi, comm, colorlog, scikit-learn, fasttext-wheel, alembic, shap, optuna, japanize-matplotlib, ipykernel, imbalanced-learn, catboost, sapientml-core, sapientml
Attempting uninstall: scikit-learn
Found existing installation: scikit-learn 1.6.0
Uninstalling scikit-learn-1.6.0:
Successfully uninstalled scikit-learn-1.6.0
Attempting uninstall: shap
Found existing installation: shap 0.46.0
Uninstalling shap-0.46.0:
Successfully uninstalled shap-0.46.0
Attempting uninstall: ipykernel
Found existing installation: ipykernel 5.5.6
Uninstalling ipykernel-5.5.6:
Successfully uninstalled ipykernel-5.5.6
Attempting uninstall: imbalanced-learn
Found existing installation: imbalanced-learn 0.13.0
Uninstalling imbalanced-learn-0.13.0:
Successfully uninstalled imbalanced-learn-0.13.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
google-colab 1.0.0 requires ipykernel==5.5.6, but you have ipykernel 6.29.5 which is incompatible.
Successfully installed Mako-1.3.8 alembic-1.14.1 catboost-1.2.7 colorlog-6.9.0 comm-0.2.2 fasttext-wheel-0.9.2 imbalanced-learn-0.12.4 ipadic-1.0.0 ipykernel-6.29.5 japanize-matplotlib-1.1.3 jedi-0.19.2 libcst-1.6.0 mecab-python3-1.0.10 optuna-3.6.1 pybind11-2.13.6 sapientml-0.4.15 sapientml-core-0.6.2 scikit-learn-1.3.2 shap-0.45.1
import pandas as pd
from sapientml import SapientML
from sapientml.util.logging import setup_logger
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
import seaborn as sns
import os
train_data = sns.load_dataset('diamonds')
train_data
carat | cut | color | clarity | depth | table | price | x | y | z | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0.23 | Ideal | E | SI2 | 61.5 | 55.0 | 326 | 3.95 | 3.98 | 2.43 |
1 | 0.21 | Premium | E | SI1 | 59.8 | 61.0 | 326 | 3.89 | 3.84 | 2.31 |
2 | 0.23 | Good | E | VS1 | 56.9 | 65.0 | 327 | 4.05 | 4.07 | 2.31 |
3 | 0.29 | Premium | I | VS2 | 62.4 | 58.0 | 334 | 4.20 | 4.23 | 2.63 |
4 | 0.31 | Good | J | SI2 | 63.3 | 58.0 | 335 | 4.34 | 4.35 | 2.75 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
53935 | 0.72 | Ideal | D | SI1 | 60.8 | 57.0 | 2757 | 5.75 | 5.76 | 3.50 |
53936 | 0.72 | Good | D | SI1 | 63.1 | 55.0 | 2757 | 5.69 | 5.75 | 3.61 |
53937 | 0.70 | Very Good | D | SI1 | 62.8 | 60.0 | 2757 | 5.66 | 5.68 | 3.56 |
53938 | 0.86 | Premium | H | SI2 | 61.0 | 58.0 | 2757 | 6.15 | 6.12 | 3.74 |
53939 | 0.75 | Ideal | D | SI2 | 62.2 | 55.0 | 2757 | 5.83 | 5.87 | 3.64 |
53940 rows × 10 columns
SapientMLでは、データセットと目的変数を指定することでその目的変数を予測するのに適している機械学習モデルの種類と前処理を自動で選択し、プログラムを組み立てて学習・推論を実行することができる。
以下のコードセルでは、ダイヤモンドのデータの cut
列を予測するモデルを構築する。
train_data, test_data = train_test_split(train_data)
y_true = test_data["cut"].reset_index(drop=True)
test_data.drop(["cut"], axis=1, inplace=True)
cls = SapientML(["cut"])
setup_logger().handlers.clear()
cls.fit(train_data)
INFO:sapientml:Loading dataset...
WARNING:sapientml:Metric is not specified. Use 'f1' by default.
INFO:sapientml:Generating pipelines...
INFO:sapientml:Generating meta features...
INFO:sapientml:Executing generated pipelines...
INFO:sapientml:Running script (1/3)...
INFO:sapientml:Running script (2/3)...
INFO:sapientml:Running script (3/3)...
INFO:sapientml:Evaluating execution results of generated pipelines...
INFO:sapientml:Building model by generated pipeline...
INFO:sapientml:Done.
<sapientml.main.SapientML at 0x7f173d0a68d0>
構築したモデルで予測を行うには以下のコードセルを実行する。
y_pred = cls.predict(test_data)
print("Accuracy:", accuracy_score(y_true, y_pred))
INFO:sapientml:Predicting by built model...
Accuracy: 0.7624026696329255
モデルの構築・予測のために生成されたプログラムは以下で確認することができる。
train_script = cls.model.files["final_train.py"].decode("utf-8")
print(train_script)
# *** GENERATED PIPELINE ***
# LOAD DATA
import pandas as pd
train_dataset = pd.read_pickle("./training.pkl")
import pickle
# PREPROCESSING-1
import numpy as np
NUMERIC_COLS_TO_SCALE = ['carat', 'depth', 'table', 'price']
train_dataset[NUMERIC_COLS_TO_SCALE] = np.log1p(train_dataset[NUMERIC_COLS_TO_SCALE])
# DETACH TARGET
TARGET_COLUMNS = ['cut']
feature_train = train_dataset.drop(TARGET_COLUMNS, axis=1)
target_train = train_dataset[TARGET_COLUMNS].copy()
# PREPROCESSING-2
from sklearn.preprocessing import OneHotEncoder
CATEGORICAL_COLS = ['clarity', 'color']
onehot_encoder = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
train_encoded = pd.DataFrame(onehot_encoder.fit_transform(feature_train[CATEGORICAL_COLS]), columns=onehot_encoder.get_feature_names_out(), index=feature_train.index)
feature_train = pd.concat([feature_train, train_encoded ], axis=1)
feature_train.drop(CATEGORICAL_COLS, axis=1, inplace=True)
with open('oneHotEncoder.pkl', 'wb') as f:
pickle.dump(onehot_encoder, f)
# MODEL
import numpy as np
from sklearn.ensemble import GradientBoostingClassifier
random_state_model = 42
model = GradientBoostingClassifier(random_state=random_state_model, )
model.fit(feature_train, target_train.values.ravel())
with open('model.pkl', 'wb') as f:
pickle.dump(model, f)
predict_script = cls.model.files["final_predict.py"].decode("utf-8")
print(predict_script)
# *** GENERATED PIPELINE ***
# LOAD DATA
import pandas as pd
test_dataset = pd.read_pickle("./test.pkl")
import pickle
# PREPROCESSING-1
import numpy as np
NUMERIC_COLS_TO_SCALE = ['carat', 'depth', 'table', 'price']
NUMERIC_COLS_TO_SCALE_FOR_TEST = list(set(test_dataset.columns) & set(NUMERIC_COLS_TO_SCALE))
test_dataset[NUMERIC_COLS_TO_SCALE_FOR_TEST] = np.log1p(test_dataset[NUMERIC_COLS_TO_SCALE_FOR_TEST])
# DETACH TARGET
TARGET_COLUMNS = ['cut']
if set(TARGET_COLUMNS).issubset(test_dataset.columns.tolist()):
feature_test = test_dataset.drop(TARGET_COLUMNS, axis=1)
target_test = test_dataset[TARGET_COLUMNS].copy()
else:
feature_test = test_dataset
# PREPROCESSING-2
with open('oneHotEncoder.pkl', 'rb') as f:
onehot_encoder = pickle.load(f)
CATEGORICAL_COLS = ['clarity', 'color']
test_encoded = pd.DataFrame(onehot_encoder.transform(feature_test[CATEGORICAL_COLS]), columns=onehot_encoder.get_feature_names_out(), index=feature_test.index)
feature_test = pd.concat([feature_test, test_encoded ], axis=1)
feature_test.drop(CATEGORICAL_COLS, axis=1, inplace=True)
# MODEL
import numpy as np
with open('model.pkl', 'rb') as f:
model = pickle.load(f)
y_pred = model.predict(feature_test)
#EVALUATION
if set(TARGET_COLUMNS).issubset(test_dataset.columns.tolist()):
from sklearn import metrics
f1 = metrics.f1_score(target_test, y_pred, average='macro')
print('RESULT: F1 Score: ' + str(f1))
# OUTPUT PREDICTION
prediction = pd.DataFrame(y_pred, columns=TARGET_COLUMNS, index=feature_test.index)
prediction.to_csv("./prediction_result.csv")
おわりに#
プログラミング実習講義3,4では、Pythonによるプログラミングの基礎、基本的なアルゴリズムやデータ構造、また応用としてデータサイエンスおよび機械学習、AutoMLについて学んできた。過去に蓄積されてきたおびただしい量の便利なライブラリ・ツールに加えて近年では生成AIやノーコード・ローコード開発等の台頭により、人が必ずしも全てのコードを書かずとも高度なソフトウェアを開発することが出来ることになった。しかしながら、それらを正しく使いこなすにはプログラミングの素養がまだまだ重要である。経営学部の学生の皆様におかれても、この講義で学んだことをきっかけにプログラミングの素養を今後も引き続き磨いていってもらえることを願う。
演習#
課題
SapientMLを用いて、インターネットで入手可能な適当なデータセット(CSVファイル)および適当な目的変数を設定して機械学習モデルを構築し、学習・予測のために生成されたプログラムを確認しなさい。
以下のコードセルでは、URLで直接ダウンロード可能なデータセットを例として記載している。ここに新たなURLを指定するか、Google DriveにCSVファイルを置いて新しいデータを読み込んで試してみよう。Google Driveに置く方法で実施した場合は、レポートにCSVファイルも一緒に添付して提出してほしい。
なお、データセット・目的変数の選び方によってはSapientMLのバグに起因するエラーが発生する可能性がある。データセット・目的変数の選び方が妥当かつどうしてもエラーが解消しない場合は、エラーのまま提出してもらって構わない。バグの報告は開発者にとってはとてもありがたいものである。
train_data = pd.read_csv("https://github.com/sapientml/sapientml/files/12481088/titanic.csv")
train_data
train_data, test_data = train_test_split(train_data)
y_true = test_data["survived"].reset_index(drop=True)
test_data.drop(["survived"], axis=1, inplace=True)
cls = SapientML(["survived"])
setup_logger().handlers.clear()
cls.fit(train_data)
train_script = cls.model.files["final_train.py"].decode("utf-8")
print(train_script)
predict_script = cls.model.files["final_predict.py"].decode("utf-8")
print(predict_script)