File size: 67,156 Bytes
7fac0dd |
1 |
{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.7.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# Input data files are available in the read-only \"../input/\" directory\n# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n for filename in filenames:\n print(os.path.join(dirname, filename))\n\n# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","execution":{"iopub.status.busy":"2022-11-25T13:49:41.215612Z","iopub.execute_input":"2022-11-25T13:49:41.216011Z","iopub.status.idle":"2022-11-25T13:49:41.221723Z","shell.execute_reply.started":"2022-11-25T13:49:41.215977Z","shell.execute_reply":"2022-11-25T13:49:41.220827Z"},"_kg_hide-input":true,"trusted":true},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":"## Scikit-learn with Transformers\n\nIn this notebook, I will show how you can use scikit-learn estimators with model weights from [🤗 Transformers](https://huggingface.co/docs/transformers/main/en/index) thanks to [whatlies](https://github.com/koaning/whatlies). We will later push our model with a model card using [skops](https://skops.readthedocs.org/) to Hugging Face Hub.","metadata":{}},{"cell_type":"markdown","source":"# Installing whatlies, datasets, scikit-learn and gradio","metadata":{}},{"cell_type":"code","source":"!pip install datasets\n!pip install gradio\n!pip install whatlies[transformers]\n!pip install scikit-learn\n!pip install skops","metadata":{"_kg_hide-output":true,"execution":{"iopub.status.busy":"2022-11-25T13:49:50.291974Z","iopub.execute_input":"2022-11-25T13:49:50.292348Z","iopub.status.idle":"2022-11-25T13:50:42.641689Z","shell.execute_reply.started":"2022-11-25T13:49:50.292320Z","shell.execute_reply":"2022-11-25T13:50:42.640626Z"},"trusted":true},"execution_count":3,"outputs":[{"name":"stdout","text":"Requirement already satisfied: datasets in /opt/conda/lib/python3.7/site-packages (2.1.0)\nRequirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (2.27.1)\nRequirement already satisfied: multiprocess in /opt/conda/lib/python3.7/site-packages (from datasets) (0.70.13)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from datasets) (4.11.4)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from datasets) (1.3.5)\nRequirement already satisfied: xxhash in /opt/conda/lib/python3.7/site-packages (from datasets) (3.0.0)\nRequirement already satisfied: aiohttp in /opt/conda/lib/python3.7/site-packages (from datasets) (3.8.1)\nRequirement already satisfied: responses<0.19 in /opt/conda/lib/python3.7/site-packages (from datasets) (0.18.0)\nRequirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.7/site-packages (from datasets) (1.21.6)\nRequirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from datasets) (21.3)\nRequirement already satisfied: pyarrow>=5.0.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (5.0.0)\nRequirement already satisfied: fsspec[http]>=2021.05.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (2022.5.0)\nRequirement already satisfied: tqdm>=4.62.1 in /opt/conda/lib/python3.7/site-packages (from datasets) (4.64.0)\nRequirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from datasets) (0.3.5.1)\nRequirement already satisfied: huggingface-hub<1.0.0,>=0.1.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (0.11.0)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.6.0)\nRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (4.2.0)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (6.0)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->datasets) (3.0.9)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2022.5.18.1)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2.0.12)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (1.26.9)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (3.3)\nRequirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (4.0.2)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.7.2)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (6.0.2)\nRequirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.2.0)\nRequirement already satisfied: asynctest==0.13.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (0.13.0)\nRequirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.3.0)\nRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (21.4.0)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->datasets) (3.8.0)\nRequirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2.8.2)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2022.1)\nRequirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.16.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: gradio in /opt/conda/lib/python3.7/site-packages (3.11.0)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from gradio) (2.27.1)\nRequirement already satisfied: websockets>=10.0 in /opt/conda/lib/python3.7/site-packages (from gradio) (10.3)\nRequirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from gradio) (3.1.2)\nRequirement already satisfied: pycryptodome in /opt/conda/lib/python3.7/site-packages (from gradio) (3.15.0)\nRequirement already satisfied: aiohttp in /opt/conda/lib/python3.7/site-packages (from gradio) (3.8.1)\nRequirement already satisfied: fsspec in /opt/conda/lib/python3.7/site-packages (from gradio) (2022.5.0)\nRequirement already satisfied: pydub in /opt/conda/lib/python3.7/site-packages (from gradio) (0.25.1)\nRequirement already satisfied: uvicorn in /opt/conda/lib/python3.7/site-packages (from gradio) (0.17.6)\nRequirement already satisfied: pyyaml in /opt/conda/lib/python3.7/site-packages (from gradio) (6.0)\nRequirement already satisfied: httpx in /opt/conda/lib/python3.7/site-packages (from gradio) (0.23.1)\nRequirement already satisfied: fastapi in /opt/conda/lib/python3.7/site-packages (from gradio) (0.78.0)\nRequirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from gradio) (3.5.2)\nRequirement already satisfied: ffmpy in /opt/conda/lib/python3.7/site-packages (from gradio) (0.3.0)\nRequirement already satisfied: markdown-it-py[linkify,plugins] in /opt/conda/lib/python3.7/site-packages (from gradio) (2.1.0)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from gradio) (1.3.5)\nRequirement already satisfied: pydantic in /opt/conda/lib/python3.7/site-packages (from gradio) (1.8.2)\nRequirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from gradio) (1.21.6)\nRequirement already satisfied: orjson in /opt/conda/lib/python3.7/site-packages (from gradio) (3.6.8)\nRequirement already satisfied: python-multipart in /opt/conda/lib/python3.7/site-packages (from gradio) (0.0.5)\nRequirement already satisfied: pillow in /opt/conda/lib/python3.7/site-packages (from gradio) (9.1.0)\nRequirement already satisfied: h11<0.13,>=0.11 in /opt/conda/lib/python3.7/site-packages (from gradio) (0.12.0)\nRequirement already satisfied: paramiko in /opt/conda/lib/python3.7/site-packages (from gradio) (2.12.0)\nRequirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (4.0.2)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (6.0.2)\nRequirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.3.0)\nRequirement already satisfied: charset-normalizer<3.0,>=2.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (2.0.12)\nRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (21.4.0)\nRequirement already satisfied: asynctest==0.13.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (0.13.0)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.7.2)\nRequirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.2.0)\nRequirement already satisfied: typing-extensions>=3.7.4 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (4.2.0)\nRequirement already satisfied: starlette==0.19.1 in /opt/conda/lib/python3.7/site-packages (from fastapi->gradio) (0.19.1)\nRequirement already satisfied: anyio<5,>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from starlette==0.19.1->fastapi->gradio) (3.6.1)\nRequirement already satisfied: httpcore<0.17.0,>=0.15.0 in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (0.15.0)\nRequirement already satisfied: sniffio in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (1.2.0)\nRequirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (1.5.0)\nRequirement already satisfied: certifi in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (2022.5.18.1)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from jinja2->gradio) (2.0.1)\nRequirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.1.0)\nRequirement already satisfied: linkify-it-py~=1.0 in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (1.0.3)\nRequirement already satisfied: mdit-py-plugins in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.3.0)\nRequirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (3.0.9)\nRequirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (2.8.2)\nRequirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (0.11.0)\nRequirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (1.4.2)\nRequirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (4.33.3)\nRequirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (21.3)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->gradio) (2022.1)\nRequirement already satisfied: pynacl>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (1.5.0)\nRequirement already satisfied: cryptography>=2.5 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (36.0.2)\nRequirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (1.16.0)\nRequirement already satisfied: bcrypt>=3.1.3 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (4.0.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->gradio) (3.3)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->gradio) (1.26.9)\nRequirement already satisfied: asgiref>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn->gradio) (3.5.2)\nRequirement already satisfied: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn->gradio) (8.0.4)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from click>=7.0->uvicorn->gradio) (4.11.4)\nRequirement already satisfied: cffi>=1.12 in /opt/conda/lib/python3.7/site-packages (from cryptography>=2.5->paramiko->gradio) (1.15.0)\nRequirement already satisfied: uc-micro-py in /opt/conda/lib/python3.7/site-packages (from linkify-it-py~=1.0->markdown-it-py[linkify,plugins]->gradio) (1.0.1)\nRequirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.12->cryptography>=2.5->paramiko->gradio) (2.21)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->uvicorn->gradio) (3.8.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: whatlies[transformers] in /opt/conda/lib/python3.7/site-packages (0.7.0)\nRequirement already satisfied: altair>=4.2.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (4.2.0)\nRequirement already satisfied: matplotlib>=3.5.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (3.5.2)\nRequirement already satisfied: scikit-learn>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (1.0.2)\nRequirement already satisfied: gensim~=3.8.3 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (3.8.3)\nRequirement already satisfied: bpemb>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (0.3.4)\nRequirement already satisfied: transformers>=4.19.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (4.24.0)\nRequirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (1.21.6)\nRequirement already satisfied: jsonschema>=3.0 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (4.5.1)\nRequirement already satisfied: pandas>=0.18 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (1.3.5)\nRequirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (3.1.2)\nRequirement already satisfied: toolz in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (0.11.2)\nRequirement already satisfied: entrypoints in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (0.4)\nRequirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (4.64.0)\nRequirement already satisfied: sentencepiece in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (0.1.96)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (2.27.1)\nRequirement already satisfied: smart-open>=1.8.1 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (5.2.1)\nRequirement already satisfied: scipy>=0.18.1 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (1.7.3)\nRequirement already satisfied: six>=1.5.0 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (1.16.0)\nRequirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (9.1.0)\nRequirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (3.0.9)\nRequirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (0.11.0)\nRequirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (2.8.2)\nRequirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (4.33.3)\nRequirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (1.4.2)\nRequirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (21.3)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=1.0.0->whatlies[transformers]) (3.1.0)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=1.0.0->whatlies[transformers]) (1.1.0)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (3.6.0)\nRequirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (0.11.0)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (6.0)\nRequirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (2021.11.10)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (4.11.4)\nRequirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (0.12.1)\nRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers>=4.19.0->whatlies[transformers]) (4.2.0)\nRequirement already satisfied: importlib-resources>=1.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (5.7.1)\nRequirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (21.4.0)\nRequirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (0.18.1)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas>=0.18->altair>=4.2.0->whatlies[transformers]) (2022.1)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->transformers>=4.19.0->whatlies[transformers]) (3.8.0)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from jinja2->altair>=4.2.0->whatlies[transformers]) (2.0.1)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (1.26.9)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (3.3)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (2.0.12)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (2022.5.18.1)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (1.0.2)\nRequirement already satisfied: numpy>=1.14.6 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.21.6)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (3.1.0)\nRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.7.3)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.1.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: skops in /opt/conda/lib/python3.7/site-packages (0.2)\nRequirement already satisfied: scikit-learn>=0.24 in /opt/conda/lib/python3.7/site-packages (from skops) (1.0.2)\nRequirement already satisfied: tabulate>=0.8.8 in /opt/conda/lib/python3.7/site-packages (from skops) (0.8.9)\nRequirement already satisfied: modelcards>=0.1.6 in /opt/conda/lib/python3.7/site-packages (from skops) (0.1.6)\nRequirement already satisfied: huggingface-hub>=0.9.0rc3 in /opt/conda/lib/python3.7/site-packages (from skops) (0.11.0)\nRequirement already satisfied: typing-extensions>=3.7 in /opt/conda/lib/python3.7/site-packages (from skops) (4.2.0)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (4.11.4)\nRequirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (4.64.0)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (2.27.1)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (6.0)\nRequirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (21.3)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (3.6.0)\nRequirement already satisfied: Jinja2 in /opt/conda/lib/python3.7/site-packages (from modelcards>=0.1.6->skops) (3.1.2)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (3.1.0)\nRequirement already satisfied: numpy>=1.14.6 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.21.6)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.1.0)\nRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.7.3)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.9->huggingface-hub>=0.9.0rc3->skops) (3.0.9)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->huggingface-hub>=0.9.0rc3->skops) (3.8.0)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from Jinja2->modelcards>=0.1.6->skops) (2.0.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (3.3)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (1.26.9)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (2.0.12)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (2022.5.18.1)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0m","output_type":"stream"}]},{"cell_type":"code","source":"import datasets\nimport sklearn\nimport gradio as gr\nimport whatlies\nfrom whatlies.language import HFTransformersLanguage\nfrom transformers import pipeline\nfrom sklearn.pipeline import Pipeline # yeah it's a bit confusing! 😅\nfrom sklearn.linear_model import LogisticRegression","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:50:46.200063Z","iopub.execute_input":"2022-11-25T13:50:46.200452Z","iopub.status.idle":"2022-11-25T13:50:53.433011Z","shell.execute_reply.started":"2022-11-25T13:50:46.200418Z","shell.execute_reply":"2022-11-25T13:50:53.432126Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":"## Load and preprocess the dataset\nWe'll drop nan values, get rid of entries with 1024 characters for both simplicity and to fit gpt-2's conditions and convert them to list (as whatlies accepts lists).","metadata":{}},{"cell_type":"code","source":"train_set, test_set = datasets.load_dataset('imdb', split =['train[0:1000]+train[24000:25000]', 'test[0:1000]+test[24000:25000]'])","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:23.321696Z","iopub.execute_input":"2022-11-25T13:51:23.322412Z","iopub.status.idle":"2022-11-25T13:51:25.550318Z","shell.execute_reply.started":"2022-11-25T13:51:23.322374Z","shell.execute_reply":"2022-11-25T13:51:25.549509Z"},"trusted":true},"execution_count":5,"outputs":[{"output_type":"display_data","data":{"text/plain":" 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"12771f9ca2974ea885c9f1490a6ddfc3"}},"metadata":{}}]},{"cell_type":"code","source":"df_train = pd.DataFrame(train_set)\ndf_test = pd.DataFrame(test_set)\ndf_train.dropna(inplace=True)\ndf_test.dropna(inplace=True)\ndf_train = df_train[df_train['text'].apply(lambda x: len(x) < 1024)]\ndf_test = df_test[df_test['text'].apply(lambda x: len(x) < 1024)]","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:30.973618Z","iopub.execute_input":"2022-11-25T13:51:30.974022Z","iopub.status.idle":"2022-11-25T13:51:31.223343Z","shell.execute_reply.started":"2022-11-25T13:51:30.973990Z","shell.execute_reply":"2022-11-25T13:51:31.222539Z"},"trusted":true},"execution_count":6,"outputs":[]},{"cell_type":"code","source":"X_train = df_train[\"text\"].tolist()\ny_train = df_train[\"label\"].tolist()\nX_test = df_test[\"text\"].tolist()\ny_test = df_test[\"label\"].tolist()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:33.322281Z","iopub.execute_input":"2022-11-25T13:51:33.322649Z","iopub.status.idle":"2022-11-25T13:51:33.328371Z","shell.execute_reply.started":"2022-11-25T13:51:33.322619Z","shell.execute_reply":"2022-11-25T13:51:33.327284Z"},"trusted":true},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":"# Setup classifier","metadata":{}},{"cell_type":"markdown","source":"We'll use gpt-2 weights.","metadata":{}},{"cell_type":"code","source":"pipe = Pipeline([\n (\"embedding\", HFTransformersLanguage(\"facebook/bart-base\")),\n (\"model\", LogisticRegression())\n])","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:43.068329Z","iopub.execute_input":"2022-11-25T13:51:43.068705Z","iopub.status.idle":"2022-11-25T13:51:47.292129Z","shell.execute_reply.started":"2022-11-25T13:51:43.068671Z","shell.execute_reply":"2022-11-25T13:51:47.291213Z"},"trusted":true},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":"## Visualizing pipeline and see the hyperparameters","metadata":{}},{"cell_type":"code","source":"from sklearn import set_config\nset_config(display=\"diagram\")\npipe","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:53.359407Z","iopub.execute_input":"2022-11-25T13:51:53.359791Z","iopub.status.idle":"2022-11-25T13:51:53.375506Z","shell.execute_reply.started":"2022-11-25T13:51:53.359759Z","shell.execute_reply":"2022-11-25T13:51:53.374429Z"},"trusted":true},"execution_count":9,"outputs":[{"execution_count":9,"output_type":"execute_result","data":{"text/plain":"Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])","text/html":"<style>#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 {color: black;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 pre{padding: 0;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable {background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator:hover {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-item {z-index: 1;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:only-child::after {width: 0;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-3435eef2-8d0c-47b2-b96f-5928f72386c9\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"e3a181e0-5c64-46c2-98d2-595e7b5e7695\" type=\"checkbox\" ><label for=\"e3a181e0-5c64-46c2-98d2-595e7b5e7695\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"cbf2e096-f325-4734-8f6c-c12e2018f277\" type=\"checkbox\" ><label for=\"cbf2e096-f325-4734-8f6c-c12e2018f277\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">HFTransformersLanguage</label><div class=\"sk-toggleable__content\"><pre>HFTransformersLanguage(model_name_or_path='facebook/bart-base')</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b5fac2c8-808d-451a-bdf0-cee2baf8ffdb\" type=\"checkbox\" ><label for=\"b5fac2c8-808d-451a-bdf0-cee2baf8ffdb\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>"},"metadata":{}}]},{"cell_type":"code","source":"pipe.get_params()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:52:01.683209Z","iopub.execute_input":"2022-11-25T13:52:01.683939Z","iopub.status.idle":"2022-11-25T13:52:01.691876Z","shell.execute_reply.started":"2022-11-25T13:52:01.683894Z","shell.execute_reply":"2022-11-25T13:52:01.691118Z"},"trusted":true},"execution_count":10,"outputs":[{"execution_count":10,"output_type":"execute_result","data":{"text/plain":"{'memory': None,\n 'steps': [('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())],\n 'verbose': False,\n 'embedding': HFTransformersLanguage(model_name_or_path='facebook/bart-base'),\n 'model': LogisticRegression(),\n 'embedding__model_name_or_path': 'facebook/bart-base',\n 'model__C': 1.0,\n 'model__class_weight': None,\n 'model__dual': False,\n 'model__fit_intercept': True,\n 'model__intercept_scaling': 1,\n 'model__l1_ratio': None,\n 'model__max_iter': 100,\n 'model__multi_class': 'auto',\n 'model__n_jobs': None,\n 'model__penalty': 'l2',\n 'model__random_state': None,\n 'model__solver': 'lbfgs',\n 'model__tol': 0.0001,\n 'model__verbose': 0,\n 'model__warm_start': False}"},"metadata":{}}]},{"cell_type":"code","source":"pipe.fit(X_train, y_train)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:52:04.625316Z","iopub.execute_input":"2022-11-25T13:52:04.625708Z","iopub.status.idle":"2022-11-25T13:58:33.755087Z","shell.execute_reply.started":"2022-11-25T13:52:04.625674Z","shell.execute_reply":"2022-11-25T13:58:33.753890Z"},"trusted":true},"execution_count":11,"outputs":[{"name":"stderr","text":"/opt/conda/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):\nSTOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n\nIncrease the number of iterations (max_iter) or scale the data as shown in:\n https://scikit-learn.org/stable/modules/preprocessing.html\nPlease also refer to the documentation for alternative solver options:\n https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,\n","output_type":"stream"},{"execution_count":11,"output_type":"execute_result","data":{"text/plain":"Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])","text/html":"<style>#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 {color: black;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 pre{padding: 0;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable {background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator:hover {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-item {z-index: 1;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:only-child::after {width: 0;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"f2b1c083-5a76-4e26-ad92-84f7ee9b6276\" type=\"checkbox\" ><label for=\"f2b1c083-5a76-4e26-ad92-84f7ee9b6276\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"2356825d-ffb5-404b-a00f-65f966ff8060\" type=\"checkbox\" ><label for=\"2356825d-ffb5-404b-a00f-65f966ff8060\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">HFTransformersLanguage</label><div class=\"sk-toggleable__content\"><pre>HFTransformersLanguage(model_name_or_path='facebook/bart-base')</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"2bbcc021-cb91-4421-9401-a793252da046\" type=\"checkbox\" ><label for=\"2bbcc021-cb91-4421-9401-a793252da046\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>"},"metadata":{}}]},{"cell_type":"code","source":"y_pred = pipe.predict(X_test)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:58:33.756912Z","iopub.execute_input":"2022-11-25T13:58:33.757390Z","iopub.status.idle":"2022-11-25T14:05:10.082300Z","shell.execute_reply.started":"2022-11-25T13:58:33.757353Z","shell.execute_reply":"2022-11-25T14:05:10.081151Z"},"trusted":true},"execution_count":12,"outputs":[]},{"cell_type":"markdown","source":"## Evaluation\nNot bad :')","metadata":{}},{"cell_type":"code","source":"from sklearn.metrics import classification_report\nprint(classification_report(y_test, y_pred))","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:07:15.856326Z","iopub.execute_input":"2022-11-25T14:07:15.856720Z","iopub.status.idle":"2022-11-25T14:07:15.869250Z","shell.execute_reply.started":"2022-11-25T14:07:15.856690Z","shell.execute_reply":"2022-11-25T14:07:15.868082Z"},"trusted":true},"execution_count":13,"outputs":[{"name":"stdout","text":" precision recall f1-score support\n\n 0 0.85 0.89 0.87 522\n 1 0.89 0.85 0.87 550\n\n accuracy 0.87 1072\n macro avg 0.87 0.87 0.87 1072\nweighted avg 0.87 0.87 0.87 1072\n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"## Now we will initialize a repository, create a model card and push it to 🤗Hub","metadata":{}},{"cell_type":"code","source":"import os\n# create a directory for the repo\nlocal_repo = \"./local_repo_skops\"\nos.mkdir(local_repo) ","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:28.980484Z","iopub.execute_input":"2022-11-25T14:08:28.980855Z","iopub.status.idle":"2022-11-25T14:08:28.985187Z","shell.execute_reply.started":"2022-11-25T14:08:28.980824Z","shell.execute_reply":"2022-11-25T14:08:28.984383Z"},"trusted":true},"execution_count":19,"outputs":[]},{"cell_type":"code","source":"# save the model\nimport joblib\njoblib.dump(pipe, \"./pipeline.pkl\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:37.593665Z","iopub.execute_input":"2022-11-25T14:08:37.594070Z","iopub.status.idle":"2022-11-25T14:08:38.863226Z","shell.execute_reply.started":"2022-11-25T14:08:37.594036Z","shell.execute_reply":"2022-11-25T14:08:38.862348Z"},"trusted":true},"execution_count":20,"outputs":[{"execution_count":20,"output_type":"execute_result","data":{"text/plain":"['./pipeline.pkl']"},"metadata":{}}]},{"cell_type":"code","source":"from skops import card, hub_utils\n\n# initialize the repository\n# this will create a config file for reproducibility\nhub_utils.init(\n model=\"pipeline.pkl\",\n requirements=[f\"scikit-learn={sklearn.__version__}\"],\n dst=local_repo,\n task=\"text-classification\",\n data=X_test,\n)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:46.500745Z","iopub.execute_input":"2022-11-25T14:08:46.501129Z","iopub.status.idle":"2022-11-25T14:08:47.217510Z","shell.execute_reply.started":"2022-11-25T14:08:46.501096Z","shell.execute_reply":"2022-11-25T14:08:47.216632Z"},"trusted":true},"execution_count":21,"outputs":[]},{"cell_type":"markdown","source":"We will now create the model card. Passing `card.metadata_from_config` to metadata will fill metadata section automatically.","metadata":{}},{"cell_type":"code","source":"model_card = card.Card(pipe, metadata=card.metadata_from_config(local_repo))","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:55.795433Z","iopub.execute_input":"2022-11-25T14:08:55.795826Z","iopub.status.idle":"2022-11-25T14:08:55.808770Z","shell.execute_reply.started":"2022-11-25T14:08:55.795794Z","shell.execute_reply":"2022-11-25T14:08:55.807726Z"},"trusted":true},"execution_count":22,"outputs":[]},{"cell_type":"markdown","source":"Let's add some information to our model card!","metadata":{}},{"cell_type":"code","source":"model_card.add(model_description=\"This is a logistic regression model trained with GPT-2 embeddings on imdb dataset.\")\nmodel_card.add(limitations=\"This model is trained for educational purposes.\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:58.653624Z","iopub.execute_input":"2022-11-25T14:08:58.654003Z","iopub.status.idle":"2022-11-25T14:08:58.661810Z","shell.execute_reply.started":"2022-11-25T14:08:58.653966Z","shell.execute_reply":"2022-11-25T14:08:58.661018Z"},"trusted":true},"execution_count":23,"outputs":[{"execution_count":23,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n)"},"metadata":{}}]},{"cell_type":"markdown","source":"We will evaluate the model and add evaluation results to our card.","metadata":{}},{"cell_type":"code","source":"# use f1 to evaluate our model\nfrom sklearn.metrics import f1_score\n\nf1 = f1_score(y_test, y_pred, average=\"macro\")\nmodel_card.add_metrics(**{\"f1_score\": f1})\n# add explanation\nmodel_card.add(eval_method=\"The model is evaluated on test data using F1-score with macro avg.\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:09.793325Z","iopub.execute_input":"2022-11-25T14:09:09.793689Z","iopub.status.idle":"2022-11-25T14:09:09.807023Z","shell.execute_reply.started":"2022-11-25T14:09:09.793657Z","shell.execute_reply":"2022-11-25T14:09:09.805600Z"},"trusted":true},"execution_count":24,"outputs":[{"execution_count":24,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n eval_method='The model is evaluated...data using F1-score with macro avg.',\n)"},"metadata":{}}]},{"cell_type":"markdown","source":"We can conveniently add plots using `add_plot` method.","metadata":{}},{"cell_type":"code","source":"from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\nfrom pathlib import Path\ncm = confusion_matrix(y_test, y_pred, labels=pipe.classes_)\ndisp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=pipe.classes_)\ndisp.plot()\n\n# we have to save the figure and pass it to add_plot\ndisp.figure_.savefig(Path(local_repo) / \"confusion_matrix.png\")\nmodel_card.add_plot(**{\"Confusion matrix\": \"confusion_matrix.png\"})","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:34.312647Z","iopub.execute_input":"2022-11-25T14:09:34.313037Z","iopub.status.idle":"2022-11-25T14:09:34.605611Z","shell.execute_reply.started":"2022-11-25T14:09:34.313003Z","shell.execute_reply":"2022-11-25T14:09:34.604821Z"},"trusted":true},"execution_count":25,"outputs":[{"execution_count":25,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n eval_method='The model is evaluated...data using F1-score with macro avg.',\n Confusion matrix='confusion_matrix.png',\n)"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 432x288 with 2 Axes>","image/png":"\n"},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","source":"Save our card.","metadata":{}},{"cell_type":"code","source":"model_card.save(Path(local_repo) / \"README.md\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:48.779576Z","iopub.execute_input":"2022-11-25T14:09:48.779956Z","iopub.status.idle":"2022-11-25T14:09:48.793803Z","shell.execute_reply.started":"2022-11-25T14:09:48.779912Z","shell.execute_reply":"2022-11-25T14:09:48.792994Z"},"trusted":true},"execution_count":26,"outputs":[]},{"cell_type":"markdown","source":"We now should have our model card, configuration file, our model and plot.","metadata":{}},{"cell_type":"code","source":"os.listdir(local_repo)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:56.677361Z","iopub.execute_input":"2022-11-25T14:09:56.677730Z","iopub.status.idle":"2022-11-25T14:09:56.684725Z","shell.execute_reply.started":"2022-11-25T14:09:56.677698Z","shell.execute_reply":"2022-11-25T14:09:56.683305Z"},"trusted":true},"execution_count":27,"outputs":[{"execution_count":27,"output_type":"execute_result","data":{"text/plain":"['README.md', 'pipeline.pkl', 'config.json', 'confusion_matrix.png']"},"metadata":{}}]},{"cell_type":"markdown","source":"Let's push our model to 🤗Hub! \nFirstly, we need to authenticate ourselves and then push the model to the Hub.","metadata":{}},{"cell_type":"code","source":"from huggingface_hub import notebook_login\nnotebook_login()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:10:00.105490Z","iopub.execute_input":"2022-11-25T14:10:00.105854Z","iopub.status.idle":"2022-11-25T14:10:00.149877Z","shell.execute_reply.started":"2022-11-25T14:10:00.105824Z","shell.execute_reply":"2022-11-25T14:10:00.149006Z"},"trusted":true},"execution_count":28,"outputs":[{"output_type":"display_data","data":{"text/plain":"VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ba7d4a84329f48f1a1fc4e867c757613"}},"metadata":{}}]},{"cell_type":"code","source":"# if the repository doesn't exist we can create using `create_remote`\nhub_utils.push(repo_id = \"scikit-learn/transformers-imdb\",\n source = local_repo,\n create_remote = True)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:10:24.377634Z","iopub.execute_input":"2022-11-25T14:10:24.378340Z","iopub.status.idle":"2022-11-25T14:11:10.963107Z","shell.execute_reply.started":"2022-11-25T14:10:24.378288Z","shell.execute_reply":"2022-11-25T14:11:10.962194Z"},"trusted":true},"execution_count":29,"outputs":[]},{"cell_type":"markdown","source":"You can find the repository [here](https://huggingface.co/scikit-learn/transformers-imdb).\nSome useful links if you're interested:\n- [Skops docs](https://skops.readthedocs.io/en/stable/)\n- [Skops GitHub](https://github.com/skops-dev/skops)\n- [Whatlies GitHub](https://github.com/koaning/whatlies)","metadata":{}}]} |