File size: 67,156 Bytes
7fac0dd |
1 |
{"metadata":{"kernelspec":{"language":"python","display_name":"Python 3","name":"python3"},"language_info":{"name":"python","version":"3.7.12","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"}},"nbformat_minor":4,"nbformat":4,"cells":[{"cell_type":"code","source":"# This Python 3 environment comes with many helpful analytics libraries installed\n# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python\n# For example, here's several helpful packages to load\n\nimport numpy as np # linear algebra\nimport pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)\n\n# Input data files are available in the read-only \"../input/\" directory\n# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory\n\nimport os\nfor dirname, _, filenames in os.walk('/kaggle/input'):\n for filename in filenames:\n print(os.path.join(dirname, filename))\n\n# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using \"Save & Run All\" \n# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session","metadata":{"_uuid":"8f2839f25d086af736a60e9eeb907d3b93b6e0e5","_cell_guid":"b1076dfc-b9ad-4769-8c92-a6c4dae69d19","execution":{"iopub.status.busy":"2022-11-25T13:49:41.215612Z","iopub.execute_input":"2022-11-25T13:49:41.216011Z","iopub.status.idle":"2022-11-25T13:49:41.221723Z","shell.execute_reply.started":"2022-11-25T13:49:41.215977Z","shell.execute_reply":"2022-11-25T13:49:41.220827Z"},"_kg_hide-input":true,"trusted":true},"execution_count":2,"outputs":[]},{"cell_type":"markdown","source":"## Scikit-learn with Transformers\n\nIn this notebook, I will show how you can use scikit-learn estimators with model weights from [🤗 Transformers](https://huggingface.co/docs/transformers/main/en/index) thanks to [whatlies](https://github.com/koaning/whatlies). We will later push our model with a model card using [skops](https://skops.readthedocs.org/) to Hugging Face Hub.","metadata":{}},{"cell_type":"markdown","source":"# Installing whatlies, datasets, scikit-learn and gradio","metadata":{}},{"cell_type":"code","source":"!pip install datasets\n!pip install gradio\n!pip install whatlies[transformers]\n!pip install scikit-learn\n!pip install skops","metadata":{"_kg_hide-output":true,"execution":{"iopub.status.busy":"2022-11-25T13:49:50.291974Z","iopub.execute_input":"2022-11-25T13:49:50.292348Z","iopub.status.idle":"2022-11-25T13:50:42.641689Z","shell.execute_reply.started":"2022-11-25T13:49:50.292320Z","shell.execute_reply":"2022-11-25T13:50:42.640626Z"},"trusted":true},"execution_count":3,"outputs":[{"name":"stdout","text":"Requirement already satisfied: datasets in /opt/conda/lib/python3.7/site-packages (2.1.0)\nRequirement already satisfied: requests>=2.19.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (2.27.1)\nRequirement already satisfied: multiprocess in /opt/conda/lib/python3.7/site-packages (from datasets) (0.70.13)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from datasets) (4.11.4)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from datasets) (1.3.5)\nRequirement already satisfied: xxhash in /opt/conda/lib/python3.7/site-packages (from datasets) (3.0.0)\nRequirement already satisfied: aiohttp in /opt/conda/lib/python3.7/site-packages (from datasets) (3.8.1)\nRequirement already satisfied: responses<0.19 in /opt/conda/lib/python3.7/site-packages (from datasets) (0.18.0)\nRequirement already satisfied: numpy>=1.17 in /opt/conda/lib/python3.7/site-packages (from datasets) (1.21.6)\nRequirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from datasets) (21.3)\nRequirement already satisfied: pyarrow>=5.0.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (5.0.0)\nRequirement already satisfied: fsspec[http]>=2021.05.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (2022.5.0)\nRequirement already satisfied: tqdm>=4.62.1 in /opt/conda/lib/python3.7/site-packages (from datasets) (4.64.0)\nRequirement already satisfied: dill in /opt/conda/lib/python3.7/site-packages (from datasets) (0.3.5.1)\nRequirement already satisfied: huggingface-hub<1.0.0,>=0.1.0 in /opt/conda/lib/python3.7/site-packages (from datasets) (0.11.0)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (3.6.0)\nRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (4.2.0)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0.0,>=0.1.0->datasets) (6.0)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->datasets) (3.0.9)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2022.5.18.1)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (2.0.12)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (1.26.9)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests>=2.19.0->datasets) (3.3)\nRequirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (4.0.2)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.7.2)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (6.0.2)\nRequirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.2.0)\nRequirement already satisfied: asynctest==0.13.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (0.13.0)\nRequirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (1.3.0)\nRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->datasets) (21.4.0)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->datasets) (3.8.0)\nRequirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2.8.2)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->datasets) (2022.1)\nRequirement already satisfied: six>=1.5 in /opt/conda/lib/python3.7/site-packages (from python-dateutil>=2.7.3->pandas->datasets) (1.16.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: gradio in /opt/conda/lib/python3.7/site-packages (3.11.0)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from gradio) (2.27.1)\nRequirement already satisfied: websockets>=10.0 in /opt/conda/lib/python3.7/site-packages (from gradio) (10.3)\nRequirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from gradio) (3.1.2)\nRequirement already satisfied: pycryptodome in /opt/conda/lib/python3.7/site-packages (from gradio) (3.15.0)\nRequirement already satisfied: aiohttp in /opt/conda/lib/python3.7/site-packages (from gradio) (3.8.1)\nRequirement already satisfied: fsspec in /opt/conda/lib/python3.7/site-packages (from gradio) (2022.5.0)\nRequirement already satisfied: pydub in /opt/conda/lib/python3.7/site-packages (from gradio) (0.25.1)\nRequirement already satisfied: uvicorn in /opt/conda/lib/python3.7/site-packages (from gradio) (0.17.6)\nRequirement already satisfied: pyyaml in /opt/conda/lib/python3.7/site-packages (from gradio) (6.0)\nRequirement already satisfied: httpx in /opt/conda/lib/python3.7/site-packages (from gradio) (0.23.1)\nRequirement already satisfied: fastapi in /opt/conda/lib/python3.7/site-packages (from gradio) (0.78.0)\nRequirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from gradio) (3.5.2)\nRequirement already satisfied: ffmpy in /opt/conda/lib/python3.7/site-packages (from gradio) (0.3.0)\nRequirement already satisfied: markdown-it-py[linkify,plugins] in /opt/conda/lib/python3.7/site-packages (from gradio) (2.1.0)\nRequirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from gradio) (1.3.5)\nRequirement already satisfied: pydantic in /opt/conda/lib/python3.7/site-packages (from gradio) (1.8.2)\nRequirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from gradio) (1.21.6)\nRequirement already satisfied: orjson in /opt/conda/lib/python3.7/site-packages (from gradio) (3.6.8)\nRequirement already satisfied: python-multipart in /opt/conda/lib/python3.7/site-packages (from gradio) (0.0.5)\nRequirement already satisfied: pillow in /opt/conda/lib/python3.7/site-packages (from gradio) (9.1.0)\nRequirement already satisfied: h11<0.13,>=0.11 in /opt/conda/lib/python3.7/site-packages (from gradio) (0.12.0)\nRequirement already satisfied: paramiko in /opt/conda/lib/python3.7/site-packages (from gradio) (2.12.0)\nRequirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (4.0.2)\nRequirement already satisfied: multidict<7.0,>=4.5 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (6.0.2)\nRequirement already satisfied: frozenlist>=1.1.1 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.3.0)\nRequirement already satisfied: charset-normalizer<3.0,>=2.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (2.0.12)\nRequirement already satisfied: attrs>=17.3.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (21.4.0)\nRequirement already satisfied: asynctest==0.13.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (0.13.0)\nRequirement already satisfied: yarl<2.0,>=1.0 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.7.2)\nRequirement already satisfied: aiosignal>=1.1.2 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (1.2.0)\nRequirement already satisfied: typing-extensions>=3.7.4 in /opt/conda/lib/python3.7/site-packages (from aiohttp->gradio) (4.2.0)\nRequirement already satisfied: starlette==0.19.1 in /opt/conda/lib/python3.7/site-packages (from fastapi->gradio) (0.19.1)\nRequirement already satisfied: anyio<5,>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from starlette==0.19.1->fastapi->gradio) (3.6.1)\nRequirement already satisfied: httpcore<0.17.0,>=0.15.0 in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (0.15.0)\nRequirement already satisfied: sniffio in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (1.2.0)\nRequirement already satisfied: rfc3986[idna2008]<2,>=1.3 in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (1.5.0)\nRequirement already satisfied: certifi in /opt/conda/lib/python3.7/site-packages (from httpx->gradio) (2022.5.18.1)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from jinja2->gradio) (2.0.1)\nRequirement already satisfied: mdurl~=0.1 in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.1.0)\nRequirement already satisfied: linkify-it-py~=1.0 in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (1.0.3)\nRequirement already satisfied: mdit-py-plugins in /opt/conda/lib/python3.7/site-packages (from markdown-it-py[linkify,plugins]->gradio) (0.3.0)\nRequirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (3.0.9)\nRequirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (2.8.2)\nRequirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (0.11.0)\nRequirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (1.4.2)\nRequirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (4.33.3)\nRequirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib->gradio) (21.3)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas->gradio) (2022.1)\nRequirement already satisfied: pynacl>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (1.5.0)\nRequirement already satisfied: cryptography>=2.5 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (36.0.2)\nRequirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (1.16.0)\nRequirement already satisfied: bcrypt>=3.1.3 in /opt/conda/lib/python3.7/site-packages (from paramiko->gradio) (4.0.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->gradio) (3.3)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->gradio) (1.26.9)\nRequirement already satisfied: asgiref>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn->gradio) (3.5.2)\nRequirement already satisfied: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from uvicorn->gradio) (8.0.4)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from click>=7.0->uvicorn->gradio) (4.11.4)\nRequirement already satisfied: cffi>=1.12 in /opt/conda/lib/python3.7/site-packages (from cryptography>=2.5->paramiko->gradio) (1.15.0)\nRequirement already satisfied: uc-micro-py in /opt/conda/lib/python3.7/site-packages (from linkify-it-py~=1.0->markdown-it-py[linkify,plugins]->gradio) (1.0.1)\nRequirement already satisfied: pycparser in /opt/conda/lib/python3.7/site-packages (from cffi>=1.12->cryptography>=2.5->paramiko->gradio) (2.21)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->uvicorn->gradio) (3.8.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: whatlies[transformers] in /opt/conda/lib/python3.7/site-packages (0.7.0)\nRequirement already satisfied: altair>=4.2.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (4.2.0)\nRequirement already satisfied: matplotlib>=3.5.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (3.5.2)\nRequirement already satisfied: scikit-learn>=1.0.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (1.0.2)\nRequirement already satisfied: gensim~=3.8.3 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (3.8.3)\nRequirement already satisfied: bpemb>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (0.3.4)\nRequirement already satisfied: transformers>=4.19.0 in /opt/conda/lib/python3.7/site-packages (from whatlies[transformers]) (4.24.0)\nRequirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (1.21.6)\nRequirement already satisfied: jsonschema>=3.0 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (4.5.1)\nRequirement already satisfied: pandas>=0.18 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (1.3.5)\nRequirement already satisfied: jinja2 in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (3.1.2)\nRequirement already satisfied: toolz in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (0.11.2)\nRequirement already satisfied: entrypoints in /opt/conda/lib/python3.7/site-packages (from altair>=4.2.0->whatlies[transformers]) (0.4)\nRequirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (4.64.0)\nRequirement already satisfied: sentencepiece in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (0.1.96)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from bpemb>=0.3.0->whatlies[transformers]) (2.27.1)\nRequirement already satisfied: smart-open>=1.8.1 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (5.2.1)\nRequirement already satisfied: scipy>=0.18.1 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (1.7.3)\nRequirement already satisfied: six>=1.5.0 in /opt/conda/lib/python3.7/site-packages (from gensim~=3.8.3->whatlies[transformers]) (1.16.0)\nRequirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (9.1.0)\nRequirement already satisfied: pyparsing>=2.2.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (3.0.9)\nRequirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (0.11.0)\nRequirement already satisfied: python-dateutil>=2.7 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (2.8.2)\nRequirement already satisfied: fonttools>=4.22.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (4.33.3)\nRequirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (1.4.2)\nRequirement already satisfied: packaging>=20.0 in /opt/conda/lib/python3.7/site-packages (from matplotlib>=3.5.0->whatlies[transformers]) (21.3)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=1.0.0->whatlies[transformers]) (3.1.0)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=1.0.0->whatlies[transformers]) (1.1.0)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (3.6.0)\nRequirement already satisfied: huggingface-hub<1.0,>=0.10.0 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (0.11.0)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (6.0)\nRequirement already satisfied: regex!=2019.12.17 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (2021.11.10)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (4.11.4)\nRequirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /opt/conda/lib/python3.7/site-packages (from transformers>=4.19.0->whatlies[transformers]) (0.12.1)\nRequirement already satisfied: typing-extensions>=3.7.4.3 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub<1.0,>=0.10.0->transformers>=4.19.0->whatlies[transformers]) (4.2.0)\nRequirement already satisfied: importlib-resources>=1.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (5.7.1)\nRequirement already satisfied: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (21.4.0)\nRequirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /opt/conda/lib/python3.7/site-packages (from jsonschema>=3.0->altair>=4.2.0->whatlies[transformers]) (0.18.1)\nRequirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.7/site-packages (from pandas>=0.18->altair>=4.2.0->whatlies[transformers]) (2022.1)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->transformers>=4.19.0->whatlies[transformers]) (3.8.0)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from jinja2->altair>=4.2.0->whatlies[transformers]) (2.0.1)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (1.26.9)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (3.3)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (2.0.12)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->bpemb>=0.3.0->whatlies[transformers]) (2022.5.18.1)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (1.0.2)\nRequirement already satisfied: numpy>=1.14.6 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.21.6)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (3.1.0)\nRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.7.3)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn) (1.1.0)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0mRequirement already satisfied: skops in /opt/conda/lib/python3.7/site-packages (0.2)\nRequirement already satisfied: scikit-learn>=0.24 in /opt/conda/lib/python3.7/site-packages (from skops) (1.0.2)\nRequirement already satisfied: tabulate>=0.8.8 in /opt/conda/lib/python3.7/site-packages (from skops) (0.8.9)\nRequirement already satisfied: modelcards>=0.1.6 in /opt/conda/lib/python3.7/site-packages (from skops) (0.1.6)\nRequirement already satisfied: huggingface-hub>=0.9.0rc3 in /opt/conda/lib/python3.7/site-packages (from skops) (0.11.0)\nRequirement already satisfied: typing-extensions>=3.7 in /opt/conda/lib/python3.7/site-packages (from skops) (4.2.0)\nRequirement already satisfied: importlib-metadata in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (4.11.4)\nRequirement already satisfied: tqdm in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (4.64.0)\nRequirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (2.27.1)\nRequirement already satisfied: pyyaml>=5.1 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (6.0)\nRequirement already satisfied: packaging>=20.9 in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (21.3)\nRequirement already satisfied: filelock in /opt/conda/lib/python3.7/site-packages (from huggingface-hub>=0.9.0rc3->skops) (3.6.0)\nRequirement already satisfied: Jinja2 in /opt/conda/lib/python3.7/site-packages (from modelcards>=0.1.6->skops) (3.1.2)\nRequirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (3.1.0)\nRequirement already satisfied: numpy>=1.14.6 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.21.6)\nRequirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.1.0)\nRequirement already satisfied: scipy>=1.1.0 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.24->skops) (1.7.3)\nRequirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging>=20.9->huggingface-hub>=0.9.0rc3->skops) (3.0.9)\nRequirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata->huggingface-hub>=0.9.0rc3->skops) (3.8.0)\nRequirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.7/site-packages (from Jinja2->modelcards>=0.1.6->skops) (2.0.1)\nRequirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (3.3)\nRequirement already satisfied: urllib3<1.27,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (1.26.9)\nRequirement already satisfied: charset-normalizer~=2.0.0 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (2.0.12)\nRequirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->huggingface-hub>=0.9.0rc3->skops) (2022.5.18.1)\n\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n\u001b[0m","output_type":"stream"}]},{"cell_type":"code","source":"import datasets\nimport sklearn\nimport gradio as gr\nimport whatlies\nfrom whatlies.language import HFTransformersLanguage\nfrom transformers import pipeline\nfrom sklearn.pipeline import Pipeline # yeah it's a bit confusing! 😅\nfrom sklearn.linear_model import LogisticRegression","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:50:46.200063Z","iopub.execute_input":"2022-11-25T13:50:46.200452Z","iopub.status.idle":"2022-11-25T13:50:53.433011Z","shell.execute_reply.started":"2022-11-25T13:50:46.200418Z","shell.execute_reply":"2022-11-25T13:50:53.432126Z"},"trusted":true},"execution_count":4,"outputs":[]},{"cell_type":"markdown","source":"## Load and preprocess the dataset\nWe'll drop nan values, get rid of entries with 1024 characters for both simplicity and to fit gpt-2's conditions and convert them to list (as whatlies accepts lists).","metadata":{}},{"cell_type":"code","source":"train_set, test_set = datasets.load_dataset('imdb', split =['train[0:1000]+train[24000:25000]', 'test[0:1000]+test[24000:25000]'])","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:23.321696Z","iopub.execute_input":"2022-11-25T13:51:23.322412Z","iopub.status.idle":"2022-11-25T13:51:25.550318Z","shell.execute_reply.started":"2022-11-25T13:51:23.322374Z","shell.execute_reply":"2022-11-25T13:51:25.549509Z"},"trusted":true},"execution_count":5,"outputs":[{"output_type":"display_data","data":{"text/plain":" 0%| | 0/2 [00:00<?, ?it/s]","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"12771f9ca2974ea885c9f1490a6ddfc3"}},"metadata":{}}]},{"cell_type":"code","source":"df_train = pd.DataFrame(train_set)\ndf_test = pd.DataFrame(test_set)\ndf_train.dropna(inplace=True)\ndf_test.dropna(inplace=True)\ndf_train = df_train[df_train['text'].apply(lambda x: len(x) < 1024)]\ndf_test = df_test[df_test['text'].apply(lambda x: len(x) < 1024)]","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:30.973618Z","iopub.execute_input":"2022-11-25T13:51:30.974022Z","iopub.status.idle":"2022-11-25T13:51:31.223343Z","shell.execute_reply.started":"2022-11-25T13:51:30.973990Z","shell.execute_reply":"2022-11-25T13:51:31.222539Z"},"trusted":true},"execution_count":6,"outputs":[]},{"cell_type":"code","source":"X_train = df_train[\"text\"].tolist()\ny_train = df_train[\"label\"].tolist()\nX_test = df_test[\"text\"].tolist()\ny_test = df_test[\"label\"].tolist()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:33.322281Z","iopub.execute_input":"2022-11-25T13:51:33.322649Z","iopub.status.idle":"2022-11-25T13:51:33.328371Z","shell.execute_reply.started":"2022-11-25T13:51:33.322619Z","shell.execute_reply":"2022-11-25T13:51:33.327284Z"},"trusted":true},"execution_count":7,"outputs":[]},{"cell_type":"markdown","source":"# Setup classifier","metadata":{}},{"cell_type":"markdown","source":"We'll use gpt-2 weights.","metadata":{}},{"cell_type":"code","source":"pipe = Pipeline([\n (\"embedding\", HFTransformersLanguage(\"facebook/bart-base\")),\n (\"model\", LogisticRegression())\n])","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:43.068329Z","iopub.execute_input":"2022-11-25T13:51:43.068705Z","iopub.status.idle":"2022-11-25T13:51:47.292129Z","shell.execute_reply.started":"2022-11-25T13:51:43.068671Z","shell.execute_reply":"2022-11-25T13:51:47.291213Z"},"trusted":true},"execution_count":8,"outputs":[]},{"cell_type":"markdown","source":"## Visualizing pipeline and see the hyperparameters","metadata":{}},{"cell_type":"code","source":"from sklearn import set_config\nset_config(display=\"diagram\")\npipe","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:51:53.359407Z","iopub.execute_input":"2022-11-25T13:51:53.359791Z","iopub.status.idle":"2022-11-25T13:51:53.375506Z","shell.execute_reply.started":"2022-11-25T13:51:53.359759Z","shell.execute_reply":"2022-11-25T13:51:53.374429Z"},"trusted":true},"execution_count":9,"outputs":[{"execution_count":9,"output_type":"execute_result","data":{"text/plain":"Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])","text/html":"<style>#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 {color: black;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 pre{padding: 0;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable {background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-estimator:hover {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-item {z-index: 1;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-parallel-item:only-child::after {width: 0;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-3435eef2-8d0c-47b2-b96f-5928f72386c9 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-3435eef2-8d0c-47b2-b96f-5928f72386c9\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"e3a181e0-5c64-46c2-98d2-595e7b5e7695\" type=\"checkbox\" ><label for=\"e3a181e0-5c64-46c2-98d2-595e7b5e7695\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"cbf2e096-f325-4734-8f6c-c12e2018f277\" type=\"checkbox\" ><label for=\"cbf2e096-f325-4734-8f6c-c12e2018f277\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">HFTransformersLanguage</label><div class=\"sk-toggleable__content\"><pre>HFTransformersLanguage(model_name_or_path='facebook/bart-base')</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"b5fac2c8-808d-451a-bdf0-cee2baf8ffdb\" type=\"checkbox\" ><label for=\"b5fac2c8-808d-451a-bdf0-cee2baf8ffdb\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>"},"metadata":{}}]},{"cell_type":"code","source":"pipe.get_params()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:52:01.683209Z","iopub.execute_input":"2022-11-25T13:52:01.683939Z","iopub.status.idle":"2022-11-25T13:52:01.691876Z","shell.execute_reply.started":"2022-11-25T13:52:01.683894Z","shell.execute_reply":"2022-11-25T13:52:01.691118Z"},"trusted":true},"execution_count":10,"outputs":[{"execution_count":10,"output_type":"execute_result","data":{"text/plain":"{'memory': None,\n 'steps': [('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())],\n 'verbose': False,\n 'embedding': HFTransformersLanguage(model_name_or_path='facebook/bart-base'),\n 'model': LogisticRegression(),\n 'embedding__model_name_or_path': 'facebook/bart-base',\n 'model__C': 1.0,\n 'model__class_weight': None,\n 'model__dual': False,\n 'model__fit_intercept': True,\n 'model__intercept_scaling': 1,\n 'model__l1_ratio': None,\n 'model__max_iter': 100,\n 'model__multi_class': 'auto',\n 'model__n_jobs': None,\n 'model__penalty': 'l2',\n 'model__random_state': None,\n 'model__solver': 'lbfgs',\n 'model__tol': 0.0001,\n 'model__verbose': 0,\n 'model__warm_start': False}"},"metadata":{}}]},{"cell_type":"code","source":"pipe.fit(X_train, y_train)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:52:04.625316Z","iopub.execute_input":"2022-11-25T13:52:04.625708Z","iopub.status.idle":"2022-11-25T13:58:33.755087Z","shell.execute_reply.started":"2022-11-25T13:52:04.625674Z","shell.execute_reply":"2022-11-25T13:58:33.753890Z"},"trusted":true},"execution_count":11,"outputs":[{"name":"stderr","text":"/opt/conda/lib/python3.7/site-packages/sklearn/linear_model/_logistic.py:818: ConvergenceWarning: lbfgs failed to converge (status=1):\nSTOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n\nIncrease the number of iterations (max_iter) or scale the data as shown in:\n https://scikit-learn.org/stable/modules/preprocessing.html\nPlease also refer to the documentation for alternative solver options:\n https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG,\n","output_type":"stream"},{"execution_count":11,"output_type":"execute_result","data":{"text/plain":"Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])","text/html":"<style>#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 {color: black;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 pre{padding: 0;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable {background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-estimator:hover {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-item {z-index: 1;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 2em;bottom: 0;left: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item {display: flex;flex-direction: column;position: relative;background-color: white;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-parallel-item:only-child::after {width: 0;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;position: relative;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label label {font-family: monospace;font-weight: bold;background-color: white;display: inline-block;line-height: 1.2em;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-label-container {position: relative;z-index: 2;text-align: center;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-ba0aa1b0-53be-4a8f-bf3c-1c8181800f85\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre><b>Please rerun this cell to show the HTML repr or trust the notebook.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item sk-dashed-wrapped\"><div class=\"sk-label-container\"><div class=\"sk-label sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"f2b1c083-5a76-4e26-ad92-84f7ee9b6276\" type=\"checkbox\" ><label for=\"f2b1c083-5a76-4e26-ad92-84f7ee9b6276\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">Pipeline</label><div class=\"sk-toggleable__content\"><pre>Pipeline(steps=[('embedding',\n HFTransformersLanguage(model_name_or_path='facebook/bart-base')),\n ('model', LogisticRegression())])</pre></div></div></div><div class=\"sk-serial\"><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"2356825d-ffb5-404b-a00f-65f966ff8060\" type=\"checkbox\" ><label for=\"2356825d-ffb5-404b-a00f-65f966ff8060\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">HFTransformersLanguage</label><div class=\"sk-toggleable__content\"><pre>HFTransformersLanguage(model_name_or_path='facebook/bart-base')</pre></div></div></div><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"2bbcc021-cb91-4421-9401-a793252da046\" type=\"checkbox\" ><label for=\"2bbcc021-cb91-4421-9401-a793252da046\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">LogisticRegression</label><div class=\"sk-toggleable__content\"><pre>LogisticRegression()</pre></div></div></div></div></div></div></div>"},"metadata":{}}]},{"cell_type":"code","source":"y_pred = pipe.predict(X_test)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T13:58:33.756912Z","iopub.execute_input":"2022-11-25T13:58:33.757390Z","iopub.status.idle":"2022-11-25T14:05:10.082300Z","shell.execute_reply.started":"2022-11-25T13:58:33.757353Z","shell.execute_reply":"2022-11-25T14:05:10.081151Z"},"trusted":true},"execution_count":12,"outputs":[]},{"cell_type":"markdown","source":"## Evaluation\nNot bad :')","metadata":{}},{"cell_type":"code","source":"from sklearn.metrics import classification_report\nprint(classification_report(y_test, y_pred))","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:07:15.856326Z","iopub.execute_input":"2022-11-25T14:07:15.856720Z","iopub.status.idle":"2022-11-25T14:07:15.869250Z","shell.execute_reply.started":"2022-11-25T14:07:15.856690Z","shell.execute_reply":"2022-11-25T14:07:15.868082Z"},"trusted":true},"execution_count":13,"outputs":[{"name":"stdout","text":" precision recall f1-score support\n\n 0 0.85 0.89 0.87 522\n 1 0.89 0.85 0.87 550\n\n accuracy 0.87 1072\n macro avg 0.87 0.87 0.87 1072\nweighted avg 0.87 0.87 0.87 1072\n\n","output_type":"stream"}]},{"cell_type":"markdown","source":"## Now we will initialize a repository, create a model card and push it to 🤗Hub","metadata":{}},{"cell_type":"code","source":"import os\n# create a directory for the repo\nlocal_repo = \"./local_repo_skops\"\nos.mkdir(local_repo) ","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:28.980484Z","iopub.execute_input":"2022-11-25T14:08:28.980855Z","iopub.status.idle":"2022-11-25T14:08:28.985187Z","shell.execute_reply.started":"2022-11-25T14:08:28.980824Z","shell.execute_reply":"2022-11-25T14:08:28.984383Z"},"trusted":true},"execution_count":19,"outputs":[]},{"cell_type":"code","source":"# save the model\nimport joblib\njoblib.dump(pipe, \"./pipeline.pkl\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:37.593665Z","iopub.execute_input":"2022-11-25T14:08:37.594070Z","iopub.status.idle":"2022-11-25T14:08:38.863226Z","shell.execute_reply.started":"2022-11-25T14:08:37.594036Z","shell.execute_reply":"2022-11-25T14:08:38.862348Z"},"trusted":true},"execution_count":20,"outputs":[{"execution_count":20,"output_type":"execute_result","data":{"text/plain":"['./pipeline.pkl']"},"metadata":{}}]},{"cell_type":"code","source":"from skops import card, hub_utils\n\n# initialize the repository\n# this will create a config file for reproducibility\nhub_utils.init(\n model=\"pipeline.pkl\",\n requirements=[f\"scikit-learn={sklearn.__version__}\"],\n dst=local_repo,\n task=\"text-classification\",\n data=X_test,\n)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:46.500745Z","iopub.execute_input":"2022-11-25T14:08:46.501129Z","iopub.status.idle":"2022-11-25T14:08:47.217510Z","shell.execute_reply.started":"2022-11-25T14:08:46.501096Z","shell.execute_reply":"2022-11-25T14:08:47.216632Z"},"trusted":true},"execution_count":21,"outputs":[]},{"cell_type":"markdown","source":"We will now create the model card. Passing `card.metadata_from_config` to metadata will fill metadata section automatically.","metadata":{}},{"cell_type":"code","source":"model_card = card.Card(pipe, metadata=card.metadata_from_config(local_repo))","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:55.795433Z","iopub.execute_input":"2022-11-25T14:08:55.795826Z","iopub.status.idle":"2022-11-25T14:08:55.808770Z","shell.execute_reply.started":"2022-11-25T14:08:55.795794Z","shell.execute_reply":"2022-11-25T14:08:55.807726Z"},"trusted":true},"execution_count":22,"outputs":[]},{"cell_type":"markdown","source":"Let's add some information to our model card!","metadata":{}},{"cell_type":"code","source":"model_card.add(model_description=\"This is a logistic regression model trained with GPT-2 embeddings on imdb dataset.\")\nmodel_card.add(limitations=\"This model is trained for educational purposes.\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:08:58.653624Z","iopub.execute_input":"2022-11-25T14:08:58.654003Z","iopub.status.idle":"2022-11-25T14:08:58.661810Z","shell.execute_reply.started":"2022-11-25T14:08:58.653966Z","shell.execute_reply":"2022-11-25T14:08:58.661018Z"},"trusted":true},"execution_count":23,"outputs":[{"execution_count":23,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n)"},"metadata":{}}]},{"cell_type":"markdown","source":"We will evaluate the model and add evaluation results to our card.","metadata":{}},{"cell_type":"code","source":"# use f1 to evaluate our model\nfrom sklearn.metrics import f1_score\n\nf1 = f1_score(y_test, y_pred, average=\"macro\")\nmodel_card.add_metrics(**{\"f1_score\": f1})\n# add explanation\nmodel_card.add(eval_method=\"The model is evaluated on test data using F1-score with macro avg.\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:09.793325Z","iopub.execute_input":"2022-11-25T14:09:09.793689Z","iopub.status.idle":"2022-11-25T14:09:09.807023Z","shell.execute_reply.started":"2022-11-25T14:09:09.793657Z","shell.execute_reply":"2022-11-25T14:09:09.805600Z"},"trusted":true},"execution_count":24,"outputs":[{"execution_count":24,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n eval_method='The model is evaluated...data using F1-score with macro avg.',\n)"},"metadata":{}}]},{"cell_type":"markdown","source":"We can conveniently add plots using `add_plot` method.","metadata":{}},{"cell_type":"code","source":"from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay\nfrom pathlib import Path\ncm = confusion_matrix(y_test, y_pred, labels=pipe.classes_)\ndisp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=pipe.classes_)\ndisp.plot()\n\n# we have to save the figure and pass it to add_plot\ndisp.figure_.savefig(Path(local_repo) / \"confusion_matrix.png\")\nmodel_card.add_plot(**{\"Confusion matrix\": \"confusion_matrix.png\"})","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:34.312647Z","iopub.execute_input":"2022-11-25T14:09:34.313037Z","iopub.status.idle":"2022-11-25T14:09:34.605611Z","shell.execute_reply.started":"2022-11-25T14:09:34.313003Z","shell.execute_reply":"2022-11-25T14:09:34.604821Z"},"trusted":true},"execution_count":25,"outputs":[{"execution_count":25,"output_type":"execute_result","data":{"text/plain":"Card(\n model=Pipeline(steps=[('embedding',...), ('model', LogisticRegression())]),\n metadata.library_name=sklearn,\n metadata.tags=['sklearn', 'skops', 'text-classification'],\n model_description='This is a logist...h GPT-2 embeddings on imdb dataset.',\n limitations='This model is trained for educational purposes.',\n eval_method='The model is evaluated...data using F1-score with macro avg.',\n Confusion matrix='confusion_matrix.png',\n)"},"metadata":{}},{"output_type":"display_data","data":{"text/plain":"<Figure size 432x288 with 2 Axes>","image/png":"iVBORw0KGgoAAAANSUhEUgAAATgAAAEGCAYAAADxD4m3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8qNh9FAAAACXBIWXMAAAsTAAALEwEAmpwYAAAdGklEQVR4nO3de7xVVb338c93o1zkLqAh4C3xGPkkqY/XMrVOAlnaeenJS0czy8xLpqdT2dOT5Tmeo4+W1rH0wcsRrDQ1SzJETfPgDRW8oGAmiQoCclMUkdvev/PHHBuWyF57TVhrr70m3/frNV+sOeZcc469kZ9jzDHm+CkiMDMroqZ6V8DMrFYc4MyssBzgzKywHODMrLAc4MyssLaqdwVKDdy2S+w0rFNVydrx4vSe9a6C5bCSd1gdq7Q51zjisJ6xZGlzRedOm77q7ogYtTn32xydKprsNGwrHpk0pN7VsByOHLJPvatgOTwW9232NRYvbeaxu4dWdO7Wg/82cLNvuBk6VYAzs0YQNEdLvStREQc4M8slgBYa4wUBBzgzy60Ft+DMrICCYE2DdFE9TcTMcgmgmahoq4SkLpKeknRn2r9B0mxJT6dtZCqXpJ9JmiVpuqS927u2W3BmlluVn8GdAzwP9Ckp+5eIuG2D80YDw9O2P3BV+rNNbsGZWS4BNEdUtLVH0lDgM8C1Fdz6KGB8ZKYA/SQNLvcFBzgzy62lwg0YKGlqyXbaBpe6Avj2+tPXuSh1Qy+X1C2VDQHmlJwzN5W1yV1UM8slcjxfAxZHxL4bOyDpSGBhREyTdGjJofOBBUBXYCzwHeDCTamrA5yZ5RIBa6rzCO5g4HOSxgDdgT6SfhkRX0zHV0n6L+Bbaf81YFjJ94emsja5i2pmOYnmCrdyIuL8iBgaETsDxwH3R8QXW5+rSRJwNPBc+soE4KQ0mnoAsCwi5pe7h1twZpZLAC21fZHhV5IGAQKeBk5P5ROBMcAsYAVwSnsXcoAzs9zaa53lFREPAA+kz4e3cU4AZ+a5rgOcmeWSTfStboCrFQc4M8slgDXRGI/vHeDMLJdANDfI+KQDnJnl1hLuoppZAfkZnJkVmGj2MzgzK6JsRV8HODMroAixOrrUuxoVcYAzs9xa/AzOzIooG2RwF9XMCsmDDGZWUB5kMLNCa/ZEXzMrokCsicYIHY1RSzPrNDzIYGaFFchdVDMrLg8ymFkhRdAw00Qao5Zm1mlkgwxdKtoqIamLpKck3Zn2d5H0mKRZkn4jqWsq75b2Z6XjO7d3bQc4M8utmaaKtgqdAzxfsn8JcHlE7Aa8AZyayk8F3kjll6fzynKAM7NcAtESlW3tkTQU+AxwbdoXcDhwWzplHFnqQICj0j7p+CfT+W3yMzgzyy1H62ygpKkl+2MjYmzJ/hXAt4HeaX8A8GZErE37c4Eh6fMQYA5ARKyVtCydv7itmzvAmVkuWV7UigPc4ojYd2MHJB0JLIyIaZIOrU7t3ssBzsxyaj9rfYUOBj4naQzQHegD/BToJ2mr1IobCryWzn8NGAbMlbQV0BdYUu4GfgZnZrlkaQM3fxQ1Is6PiKERsTNwHHB/RJwI/Bk4Jp12MnBH+jwh7ZOO35+SQbfJLTgzyyVCebqom+I7wM2S/g14CrgulV8H3ChpFrCULCiW5QBnZrlVe6JvRDwAPJA+vwTst5FzVgLH5rmuA5yZ5ZKtB+d3Uc2skLyir5kVVDZNxC04Myug1ndRG4EDnJnl5uWSzKyQsuWS3EU1s4LyMzgzK6RsNRF3Uc2sgLJXtRzgtijNzXDu6A8x4AOruWD834iAGy/ZgYfu7E9Tl2DMSYv43KmLmHJ3X3556Q5I0GWr4Ks/msOH93un3tXfoo17bCbvLu9CSws0rxVnj96dXUe8y9kXz6VHzxZen9uVS87ckRXLG2PksPbcggNA0iiy1QG6ANdGxMW1vF89Tbh2O4YNX8mKt7O/+D/dMoBF87py9eQZNDXBm4uzX/VeH3ub/T/9PBLMntmDS07flasnz6hn1Q349rEf5K2l6/85fPOyOVxz4Q48O6UXnz5uCcd8fSHjLx1cxxp2Lo3yJkPNwrCkLsDPgdHACOB4SSNqdb96Wjxva564ry+fPn79unsTxw/i+HPn05R+w/0GZuv39ejZQusapCtXNIHKLoZgdTJ011U8O6UnAE9N7s3HPrOszjXqPFpHUSvZ6q2WLbj9gFnpxVkk3Uy25PDMGt6zLsZeMIwvf/81Vixf//+LBS9348EJ/Xl0Uj/6DljLaRfOYciuqwB45K5+jP+PIby5ZCsuGDerXtW2ViH+/aaXIOCPNw7grl8N4JW/dufAUW/x6KS+fPzIZQzaYU29a9mpNEoXtZa1XLe8cFK69PA6kk6TNFXS1EVLmmtYndp4/N6+9Bu4ht0+suI95WtWi627tXDFXX/hiBMW89N/3mndsYNGv8nVk2fw/ev+xi8v3aGjq2wbOO/o3TjriN35Pyfuwue+tJg991/OT84bxmdPXsyVk/5Kj17NrF1d/9ZIZ1HNnAy1VvdBhrQ++1iAffbq1nD9tZlTe/LYPf2Yen9fVq9q4t23u3DZ2TszcPAaDhrzJgAHjn6TK87b+X3f3fOA5Sx4tRvLlnah77aNF9yLYsmCrQFYtmRrHp7Ulz0+uoLbrt6O7x3/QQCG7LqK/T/5Vj2r2KkEsNYtuHXLC7cqXXq4ML50/jzGTXuW6x97jm//4iU+cvBbfOs/X+aAUW8y/ZEsj8azj/ZiyK4rAZg3uxuta5DOerYHa1aLPv0d3OqlW49mevRsXvd5n0+8zct/6U7fAVmXVApOOOd17rxxQD2r2em0RFNFW73VsgX3BDBc0i5kge044IQa3q9TOebMBVx21i7ccc32dN+mmbMvfQWARyb24/7bBtBlq6Br9xa+c9VLlE98ZrXUf9BaLrjuZSCbtvPn3/Vn6gN9OPrURXz2S9mg0cN39eWem7etYy07mU7S/ayE2lnSfPMuniWTuIJsmsj1EXFRufP32atbPDLpfY/prBM7csg+9a6C5fBY3MdbsXSzolP/PbaLw68/pv0TgdsPvmpaW1m1OkJNn8FFxERgYi3vYWYdrxotOEndgclAN7JYdFtEXCDpBuATQOvcnC9FxNMpyfNPgTHAilT+ZLl71H2QwcwaSxUXvFwFHB4RyyVtDTwk6a507F8i4rYNzh8NDE/b/sBV6c82OcCZWS6BWNuy+QMIKeXf8rS7ddrKPTM7ChifvjdFUj9JgyNifltfqP8wh5k1nBZU0QYMbJ3nmrbTSq8jqYukp4GFwL0R8Vg6dJGk6ZIul9QtlVU0t7aUW3Bmlk/k6qIuLjfIEBHNwEhJ/YDfSdoTOB9YAHQlmyP7HeDCTamqW3BmlkvrM7hqvskQEW+SZbQfFRHzI7MK+C/W50jNPbfWAc7McqtGgJM0KLXckNQD+HvgL5IGpzIBRwPPpa9MAE5S5gBgWbnnb+AuqpnlFIjmKgwyAIOBcWnloSbgloi4U9L9kgYBAp4GTk/nTySbIjKLbJrIKe3dwAHOzHKrxnpwETEd+OhGyg9v4/wAzsxzDwc4M8sl8g0y1JUDnJnlFg5wZlZMjfOyvQOcmeXmFpyZFVIENLc4wJlZQTVKVi0HODPLJXAX1cwKy4MMZlZgNVwIvKoc4MwsN3dRzayQslHUxlinwwHOzHJzF9XMCstdVDMrpEAOcGZWXA3SQ3WAM7OcAsKvaplZUTVKF7UxxnrNrFOJqGwrR1J3SY9LekbSDEk/SuW7SHpM0ixJv5HUNZV3S/uz0vGd26tnmy04Sf9Jma52RHyjvYubWfFU8V3UtjLbnwdcHhE3S7oaOJUsi/2pwBsRsZuk44BLgC+Uu0G5LurUavwEZlYwAVQhwJXJbH84cEIqHwf8kCzAHZU+A9wGXClJ6Tob1WaAi4hxpfuStomIFbl/CjMrnBwTfQdKKm0sjY2Isa07KaPWNGA34OfA34A3I2JtOqU0e/26zPYRsVbSMmAAsLitm7c7yCDpQOA6oBewo6S9gK9FxBmV/XxmVizKM4qaK7M9sMfm12+9SgYZrgCOAJakCj0DHFLNSphZg4kKt0ovtz6z/YFAP0mtja/S7PXrMtun431JcaktFY2iRsScDYqaK6q1mRVPZIMMlWzltJHZ/nmyQHdMOu1k4I70eULaJx2/v9zzN6hsHtwcSQcBkUY6zkmVMLMtVXVeZWgrs/1M4GZJ/wY8RfaIjPTnjZJmAUuB49q7QSUB7nTgp2QP+OYBd5Mzu7SZFU1NM9u/BOy3kfKVwLF57tFugIuIxcCJeS5qZgXXUu8KVKbdZ3CSdpX0B0mLJC2UdIekXTuicmbWCbXOg6tkq7NKBhl+DdxC1l/eAbgVuKmWlTKzzq0ar2p1hEoC3DYRcWNErE3bL4Huta6YmXViVZ4mUivl3kXdNn28S9J3gZvJqvwFYGIH1M3MOqtO0P2sRLlBhmlkAa31J/laybEAzq9Vpcysc1MnaJ1Voty7qLt0ZEXMrEGEoEgLXkraExhBybO3iBhfq0qZWSfX6C24VpIuAA4lC3ATgdHAQ4ADnNmWqkECXCWjqMcAnwQWRMQpwF5kL7ma2Zaq0UdRS7wbES2S1krqAywkvdFvZlugKi142REqCXBT0xv/15CNrC4HHq1lpcysc2v4UdRWJQtbXi1pEtAnvSRrZluqRg9wkvYudywinqxNlcyssytCC+7HZY61Joaoqlkz+3LUyFHVvqzV0N3z7q13FSyH/Y6oUlqVRn8GFxGHdWRFzKxBdJIR0ko4s72Z5ecAZ2ZFpaIseGlm9j5VmOgraZikP0uaKWmGpHNS+Q8lvSbp6bSNKfnO+ZJmSXpB0hHtVbOSV7VEtmT5rhFxoaQdgQ9ExOPtfdfMikdRtVHUtcA/R8STknoD0yS1jlpdHhGXvee+0giyRDMfJlt890+Sdk+5VTeqkhbcL8hyFR6f9t8my0BtZluqKixZHhHzW6ebRcTbZNn6hpT5ylHAzRGxKiJmA7PYSHKaUpUEuP0j4kxgZarIG0DXCr5nZkVVeRd1oKSpJdtpG7ucpJ3JMmw9lorOkjRd0vWS+qeyIUBpjua5lA+IFQW4NSlvYaSKDKJhcuqYWS20dlPb24DFEbFvyTb2fdeSegG/Bb4ZEW8BVwEfBEYC8yk/J7esSgLcz4DfAdtJuohsqaR/39QbmlmDi2wUtZKtPSmZ/G+BX0XE7QAR8XpENEdEC9k78K3d0Nd470IfQ1NZmyp5F/VXkqaRLZkk4OiIcGZ7sy1ZFQYZ0gDmdcDzEfGTkvLBETE/7X4eeC59ngD8WtJPyAYZhgNlBzsrGUXdEVgB/KG0LCJezfGzmFmRVGcU9WDgn4BnJT2dyr4HHC9pZLrLy6R8MBExQ9ItwEyyEdgzy42gQmUTff/I+uQz3YFdgBfIhmrNbAtUjWkiEfEQ65NalWoza19EXARcVOk9Kumi/q/S/bTKyBltnG5m1mnkflUrTcrbvxaVMbMGUZR3USWdV7LbBOwNzKtZjcysc4vGeRe1khZc75LPa8meyf22NtUxs4ZQhBZcmuDbOyK+1UH1MbNOThRgRV9JW0XEWkkHd2SFzKwBNHqAI5tAtzfwtKQJwK3AO60HW2cdm9kWpnqridRcJc/gugNLyHIwtM6HC8ABzmxLVYBBhu3SCOpzrA9srRokfptZLRShBdcF6MXGZxo3yI9nZjXRIBGgXICbHxEXdlhNzKwxFCSrVmMkPjSzDleELuonO6wWZtZYGj3ARcTSjqyImTWOIr2qZWa2XkGewZmZvY9onAf0DnBmll+DtOCc2d7McsuRVavta7Sd2X5bSfdKejH92T+VS9LPUmb76Wnx3bIc4Mwsv8rzopbTmtl+BHAAcGbKXv9d4L6IGA7cl/YBRpMlmhkOnEaWXrAsBzgzy6dKaQPLZLY/ChiXThsHHJ0+HwWMj8wUoJ+kweXu4QBnZvlVpwW3zgaZ7bcvSRu4ANg+fc6d2d6DDGaWW443GQZKmlqyP3bD7PYbZrbP0qVmIiKkTX9vwgHOzPKrPOQsjoh92zq4scz2wOutyZ9TF3RhKs+d2d5dVDPLrUqjqBvNbE+Wwf7k9Plk4I6S8pPSaOoBwLKSruxGuQVnZvkE1Vrwsq3M9hcDt0g6FXgF+Md0bCIwBpgFrABOae8GDnBmlku1ks6UyWwPG1nsIyICODPPPRzgzCy/BnmTwQHOzHJTNEaEc4Azs3y8moiZFVkRVvQ1M9soL3hpZsXlFpyZFVLBMtubmb2XA5yZFVG1Jvp2BAc4M8tNLY0R4RzgzCwfz4Pbch39xVc44h/mEQEvv9iLy38wgjO+9wLDR7yFBK+9sg0/+b8jWPmuf/X11twMZ4/anQGD1/Cv42cTATdc8gEevLMfTU1w5EmLOfori7n1F4O4//Zt131nzovd+c2zz9Gnf3Odf4L62eKniUi6HjgSWBgRe9bqPp3JgO1W8rkT5nD65w9k9aounP//pvOJUa8z9tLdefed7Ff91W/9lc8eP5dbr9+5vpU1fn/tIIYNX8WK5dmqYff8ZlsWzevKtZP/QlMTvLk4+zs79oxFHHvGIgCm3NOH268ZtEUHN6BhWnC1XA/uBmBUDa/fKXXpEnTt1kJTlxa69WhhyaJu64IbBF27NdMgr/EV2qJ5W/P4fX0YfcKSdWV3jh/AiecuoCn9q+g3cO37vvfn3/fn0KPf6KhqdlrVWA+uI9SsBRcRk9M661uMJQu7c/u4nRh390OsXtnEk48O4KlHBwBw7oUz2PdjS3j1pZ5c++Pd61xTu/qCIXzl+/NYsbzLurL5r3Tjvyf055G7+tJ3wFrO+Ne5DNl19brjK1eIqQ/05syL5tajyp1HQKP8X7ruK/pKOk3SVElTV7e8W+/qbJZevddwwGGLOGXMwXzx7z9O9x7NHPaZbMHRy3/wYf7pUx9nzks9OeSI1+tc0y3blHv70G/gWoZ/5L3/va1ZJbp2a+HKSX9l9IlL+PF5O27wvb58eN933D2lOlm1OkLdA1xEjI2IfSNi365NPepdnc0y8oClLHitB2+90ZXmtU08fN8gPrTXsnXHW1rE5Enbc/CnFpa5itXazCd6MuWePpy03wj+4+s78cxDvbnkrB0ZOHgNHxuT/X0dPHoZs59/73+P/31HP3dPWT8PrhG6qHUPcEWyaEF39vjIMrp1bwaCkfu/wZzZ2zB42Ip0RrD/oYuYM3ubelZzi/fl783nV9NmMv7xmZx/1Svs9bG3+c6Vr3LQqGU883AvAKY/2ouhu65a95133mpi+pReHDTqrXpVu/OIqHyrM89VqKIXnu3LQ/dux89ufozmZvHSX3pz121DufiaaWzTay0IZr/Qmysv2qPeVbWN+MJZC7nkrB25/ZpB9OjZwjcve3XdsYfv6sc+h7xN9206Qb+rE6hW62xjsy0k/RD4KrAonfa9iJiYjp0PnAo0A9+IiLvL17NGUVbSTcChwEDgdeCCiLiu3Hf6br1dHLjtMTWpj9XGxGfurXcVLIf9jpjD1GdWtpUHoSK9+w2Njx5yTkXnPviHb09rJ23gIcBysoz1pQFueURctsG5I4CbgP2AHYA/AbtHRJsPRWs5inp8ra5tZvVVrRZcztkWRwE3R8QqYLakWWTB7tG2vuBncGaWTwDNUdmWMtuXbKdVeJezJE2XdL2k/qlsCDCn5Jy5qaxNDnBmlluOUdTFrbMk0ja2gstfBXwQGAnMB368qfX0IIOZ5VfDEdKIWDdRVNI1wJ1p9zVgWMmpQ1NZm9yCM7PcajkPTtLgkt3PA8+lzxOA4yR1k7QLMBx4vNy13IIzs3yquFxS6WwLSXOBC4BDJY1Md3kZ+BpARMyQdAswE1gLnFluBBUc4MwsJwFqrk6Ea2O2RZvTySLiIuCiSq/vAGdmuTmzvZkVk1f0NbPi6hzvmVbCAc7McusMK4VUwgHOzPJzC87MCimqN4paaw5wZpZfY8Q3Bzgzy8/TRMysuBzgzKyQAmiQhY0d4MwsFxHuoppZgbU0RhPOAc7M8nEX1cyKzF1UMysuBzgzKya/bG9mRdWaVasBOMCZWW6N8gzOSWfMLL+IyrZ2pLynCyU9V1K2raR7Jb2Y/uyfyiXpZ5JmpZype7d3fQc4M8sngJaobGvfDcCoDcq+C9wXEcOB+9I+wGiyTFrDgdPI8qeW5QBnZjlV2HqroAUXEZOBpRsUHwWMS5/HAUeXlI+PzBSg3wYpBt/Hz+DMLL/Kn8ENlDS1ZH9sBdntt4+I+enzAmD79HkIMKfkvLmpbD5tcIAzs3wCaK74VYbFEbHvJt8qIqRNXyDdXVQzyykgWirbNs3rrV3P9OfCVP4aMKzkvKGprE0OcGaWX5WewbVhAnBy+nwycEdJ+UlpNPUAYFlJV3aj3EU1s3xaR1GrQNJNwKFkz+rmAhcAFwO3SDoVeAX4x3T6RGAMMAtYAZzS3vUd4MwsvypN9I2I49s49MmNnBvAmXmu7wBnZvk1yJsMDnBmlk8ENDfXuxYVcYAzs/zcgjOzwnKAM7Niqvg907pzgDOzfAJi0yfxdigHODPLr/JXterKAc7M8olw2kAzKzAPMphZUYVbcGZWTM6qZWZFVcWX7WvNAc7Mcgkg/KqWmRVSxOYsZtmhHODMLLdwF9XMCqtBWnCKTjQaImkR2QqeRTMQWFzvSlguRf072ykiBm3OBSRNIvv9VGJxRGyY97TDdKoAV1SSpm5OZiHreP47KwYnnTGzwnKAM7PCcoDrGO1l8rbOx39nBeBncGZWWG7BmVlhOcCZWWE5wNWQpFGSXpA0S9J3610fa5+k6yUtlPRcvetim88BrkYkdQF+DowGRgDHSxpR31pZBW4A6jYx1arLAa529gNmRcRLEbEauBk4qs51snZExGRgab3rYdXhAFc7Q4A5JftzU5mZdRAHODMrLAe42nkNGFayPzSVmVkHcYCrnSeA4ZJ2kdQVOA6YUOc6mW1RHOBqJCLWAmcBdwPPA7dExIz61sraI+km4FHg7yTNlXRqvetkm86vaplZYbkFZ2aF5QBnZoXlAGdmheUAZ2aF5QBnZoXlANdAJDVLelrSc5JulbTNZlzrBknHpM/XllsIQNKhkg7ahHu8LOl92ZfaKt/gnOU57/VDSd/KW0crNge4xvJuRIyMiD2B1cDppQclbVKe24j4SkTMLHPKoUDuAGdWbw5wjetBYLfUunpQ0gRgpqQuki6V9ISk6ZK+BqDMlWl9uj8B27VeSNIDkvZNn0dJelLSM5Luk7QzWSA9N7UePy5pkKTfpns8Ieng9N0Bku6RNEPStYDa+yEk/V7StPSd0zY4dnkqv0/SoFT2QUmT0ncelLRHVX6bVkjObN+AUkttNDApFe0N7BkRs1OQWBYR/1tSN+BhSfcAHwX+jmxtuu2BmcD1G1x3EHANcEi61rYRsVTS1cDyiLgsnfdr4PKIeEjSjmRva3wIuAB4KCIulPQZoJK3AL6c7tEDeELSbyNiCdATmBoR50r6Qbr2WWTJYE6PiBcl7Q/8Ajh8E36NtgVwgGssPSQ9nT4/CFxH1nV8PCJmp/JPAx9pfb4G9AWGA4cAN0VEMzBP0v0buf4BwOTWa0VEW+uifQoYIa1roPWR1Cvd4x/Sd/8o6Y0KfqZvSPp8+jws1XUJ0AL8JpX/Erg93eMg4NaSe3er4B62hXKAayzvRsTI0oL0D/2d0iLg7Ii4e4PzxlSxHk3AARGxciN1qZikQ8mC5YERsULSA0D3Nk6PdN83N/wdmLXFz+CK527g65K2BpC0u6SewGTgC+kZ3WDgsI18dwpwiKRd0ne3TeVvA71LzrsHOLt1R9LI9HEycEIqGw30b6eufYE3UnDbg6wF2aoJaG2FnkDW9X0LmC3p2HQPSdqrnXvYFswBrniuJXu+9mRKnPL/yVrqvwNeTMfGk62Y8R4RsQg4jaw7+Azru4h/AD7fOsgAfAPYNw1izGT9aO6PyALkDLKu6qvt1HUSsJWk54GLyQJsq3eA/dLPcDhwYSo/ETg11W8GXgbeyvBqImZWWG7BmVlhOcCZWWE5wJlZYTnAmVlhOcCZWWE5wJlZYTnAmVlh/Q9xaeEk3q2UCgAAAABJRU5ErkJggg==\n"},"metadata":{"needs_background":"light"}}]},{"cell_type":"markdown","source":"Save our card.","metadata":{}},{"cell_type":"code","source":"model_card.save(Path(local_repo) / \"README.md\")","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:48.779576Z","iopub.execute_input":"2022-11-25T14:09:48.779956Z","iopub.status.idle":"2022-11-25T14:09:48.793803Z","shell.execute_reply.started":"2022-11-25T14:09:48.779912Z","shell.execute_reply":"2022-11-25T14:09:48.792994Z"},"trusted":true},"execution_count":26,"outputs":[]},{"cell_type":"markdown","source":"We now should have our model card, configuration file, our model and plot.","metadata":{}},{"cell_type":"code","source":"os.listdir(local_repo)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:09:56.677361Z","iopub.execute_input":"2022-11-25T14:09:56.677730Z","iopub.status.idle":"2022-11-25T14:09:56.684725Z","shell.execute_reply.started":"2022-11-25T14:09:56.677698Z","shell.execute_reply":"2022-11-25T14:09:56.683305Z"},"trusted":true},"execution_count":27,"outputs":[{"execution_count":27,"output_type":"execute_result","data":{"text/plain":"['README.md', 'pipeline.pkl', 'config.json', 'confusion_matrix.png']"},"metadata":{}}]},{"cell_type":"markdown","source":"Let's push our model to 🤗Hub! \nFirstly, we need to authenticate ourselves and then push the model to the Hub.","metadata":{}},{"cell_type":"code","source":"from huggingface_hub import notebook_login\nnotebook_login()","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:10:00.105490Z","iopub.execute_input":"2022-11-25T14:10:00.105854Z","iopub.status.idle":"2022-11-25T14:10:00.149877Z","shell.execute_reply.started":"2022-11-25T14:10:00.105824Z","shell.execute_reply":"2022-11-25T14:10:00.149006Z"},"trusted":true},"execution_count":28,"outputs":[{"output_type":"display_data","data":{"text/plain":"VBox(children=(HTML(value='<center> <img\\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…","application/vnd.jupyter.widget-view+json":{"version_major":2,"version_minor":0,"model_id":"ba7d4a84329f48f1a1fc4e867c757613"}},"metadata":{}}]},{"cell_type":"code","source":"# if the repository doesn't exist we can create using `create_remote`\nhub_utils.push(repo_id = \"scikit-learn/transformers-imdb\",\n source = local_repo,\n create_remote = True)","metadata":{"execution":{"iopub.status.busy":"2022-11-25T14:10:24.377634Z","iopub.execute_input":"2022-11-25T14:10:24.378340Z","iopub.status.idle":"2022-11-25T14:11:10.963107Z","shell.execute_reply.started":"2022-11-25T14:10:24.378288Z","shell.execute_reply":"2022-11-25T14:11:10.962194Z"},"trusted":true},"execution_count":29,"outputs":[]},{"cell_type":"markdown","source":"You can find the repository [here](https://huggingface.co/scikit-learn/transformers-imdb).\nSome useful links if you're interested:\n- [Skops docs](https://skops.readthedocs.io/en/stable/)\n- [Skops GitHub](https://github.com/skops-dev/skops)\n- [Whatlies GitHub](https://github.com/koaning/whatlies)","metadata":{}}]} |