neuralmagic/Meta-Llama-3-8B-Instruct-FP8 · Potential problematic behavior of truncation and/or padding?

Aug 27

Hi, I'm trying to import your model with huggingface tokenizers and transformers and doing some experiment on it. Because I do the tokenization task on texts with various ranges from very short sentences to 8k sentences. So I don't want any truncation/padding.

I find tokenizer.json in this repo contains the additional truncation and padding configurations. Is It intentional? If so, how can I turn off these logics?

  "truncation": {
    "direction": "Right",
    "max_length": 512,
    "strategy": "LongestFirst",
    "stride": 0
  },
  "padding": {
    "strategy": {
      "Fixed": 512
    },
    "direction": "Right",
    "pad_to_multiple_of": null,
    "pad_id": 128001,
    "pad_type_id": 0,
    "pad_token": "<|end_of_text|>"
  }

ekurtic

Neural Magic org Aug 27

This is not intentional as we are just copy-pasting tokenizer from the unquantized model. It seems to be that unquantized model updated these files after we did quantization.
To turn off this logic, please feel free to copy-paste tokenizer.json from the unquantized model.

ekurtic changed discussion status to closed Aug 27