Inference error
When try to deploy a dedicated inference endpoint for token classification task, we get the below error.
ValueError: The checkpoint you are trying to load has model type gemma2
but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
Application startup failed. Exiting.
The total error is as follows:
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1128, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 825, in __getitem__
raise KeyError(key)
KeyError: 'gemma2'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 732, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 608, in __aenter__
await self._router.startup()
File "/usr/local/lib/python3.10/dist-packages/starlette/routing.py", line 709, in startup
await handler()
File "/app/webservice_starlette.py", line 60, in some_startup_task
inference_handler = get_inference_handler_either_custom_or_default_handler(HF_MODEL_DIR, task=HF_TASK)
File "/app/huggingface_inference_toolkit/handler.py", line 54, in get_inference_handler_either_custom_or_default_handler
return HuggingFaceHandler(model_dir=model_dir, task=task)
File "/app/huggingface_inference_toolkit/handler.py", line 18, in __init__
self.pipeline = get_pipeline(
File "/app/huggingface_inference_toolkit/utils.py", line 276, in get_pipeline
hf_******** = pipeline(
File "/usr/local/lib/python3.10/dist-packages/transformers/pipelines/__init__.py", line 815, in pipeline
config = AutoConfig.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/configuration_auto.py", line 1130, in from_pretrained
raise ValueError(
ValueError: The checkpoint you are trying to load has model type `gemma2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.```
try updating transformers version
transformers==4.42.3
Wait for IE to update
same error. any news @gsasikiran ?
@kargaranamir I have tried today to deploy to inference endpoints again and got a new error regarding GPU CUDA compatability with FlashAttention and sharding.
{"timestamp":"2024-07-15T07:13:41.525975Z","level":"INFO","fields":{"message":"Using default cuda graphs [1, 2, 4, 8, 16, 32]"},"target":"text_generation_launcher"}
2024/07/15 09:13:41
{"timestamp":"2024-07-15T07:13:41.525984Z","level":"INFO","fields":{"message":"Sharding model on 4 processes"},"target":"text_generation_launcher"}
2024/07/15 09:13:41
{"timestamp":"2024-07-15T07:13:41.526054Z","level":"INFO","fields":{"message":"Starting download process."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.044455Z","level":"INFO","fields":{"message":"Files are already present on the host. Skipping download.\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.729310Z","level":"INFO","fields":{"message":"Successfully downloaded weights."},"target":"text_generation_launcher","span":{"name":"download"},"spans":[{"name":"download"}]}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.729466Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.729479Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.729522Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]}
2024/07/15 09:13:45
{"timestamp":"2024-07-15T07:13:45.729773Z","level":"INFO","fields":{"message":"Starting shard"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]}
2024/07/15 09:13:49
{"timestamp":"2024-07-15T07:13:49.732782Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:49
{"timestamp":"2024-07-15T07:13:49.732782Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:49
{"timestamp":"2024-07-15T07:13:49.732783Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:49
{"timestamp":"2024-07-15T07:13:49.732783Z","level":"WARN","fields":{"message":"Unable to use Flash Attention V2: GPU with CUDA capability 7 5 is not supported for Flash Attention V2\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:50
{"timestamp":"2024-07-15T07:13:50.184999Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:50
{"timestamp":"2024-07-15T07:13:50.185479Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:50
{"timestamp":"2024-07-15T07:13:50.185853Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:50
{"timestamp":"2024-07-15T07:13:50.186664Z","level":"ERROR","fields":{"message":"Error when initializing model\nTraceback (most recent call last):\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 311, in __call__\n return get_command(self)(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 778, in main\n return _main(\n File \"/opt/conda/lib/python3.10/site-packages/typer/core.py\", line 216, in _main\n rv = self.invoke(ctx)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1688, in invoke\n return _process_result(sub_ctx.command.invoke(sub_ctx))\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/opt/conda/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/opt/conda/lib/python3.10/site-packages/typer/main.py\", line 683, in wrapper\n return callback(**use_params) # type: ignore\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 636, in run_until_complete\n self.run_forever()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 603, in run_forever\n self._run_once()\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 1909, in _run_once\n handle._run()\n File \"/opt/conda/lib/python3.10/asyncio/events.py\", line 80, in _run\n self._context.run(self._callback, *self._args)\n> File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher"}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.034332Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n return future.result()\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\n\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.034444Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n return future.result()\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\n\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.037358Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\nTraceback (most recent call last):\n\n File \"/opt/conda/bin/text-generation-server\", line 8, in <module>\n sys.exit(app())\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py\", line 90, in serve\n server.serve(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 253, in serve\n asyncio.run(\n\n File \"/opt/conda/lib/python3.10/asyncio/runners.py\", line 44, in run\n return loop.run_until_complete(main)\n\n File \"/opt/conda/lib/python3.10/asyncio/base_events.py\", line 649, in run_until_complete\n return future.result()\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py\", line 217, in serve_inner\n model = get_model(\n\n File \"/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py\", line 610, in get_model\n raise NotImplementedError(\"sharded is not supported for AutoModel\")\n\nNotImplementedError: sharded is not supported for AutoModel\n"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.132849Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.132865Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.134410Z","level":"INFO","fields":{"message":"Terminating shard"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.134428Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]}
2024/07/15 09:13:51
{"timestamp":"2024-07-15T07:13:51.234558Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]}
2024/07/15 09:13:51
Error: ShardCannotStart```
try updating transformers version
transformers==4.42.3
@saireddy How can I update it? Should I clone and then deploy?
Hi
@gsasikiran
, You can use !pip install -U transformers
or can install specific version using !pip install transformers==4.42.3
.
ValueError: Trying to set a tensor of shape torch.Size([4096, 3584]) in "weight" (which has shape torch.Size([7340032, 1])), this look incorrect.
not working with textgen web ui, everything updated.