hezarai
/

t5-base-fa

Text Generation

Model card Files Files and versions Community

t5-base-fa / preprocessor /tokenizer_config.yaml

arxyzan's picture

Hezar: Upload model and config

2b2d98f about 1 year ago

647 Bytes

	name: sentencepiece_unigram_tokenizer
	config_type: preprocessor
	pretrained_path: t5-base-fa
	max_length: 512
	truncation_strategy: longest_first
	truncation_direction: right
	stride: 0
	padding_strategy: longest
	padding_direction: right
	pad_to_multiple_of: 0
	pad_token_id: 0
	pad_token: <pad>
	pad_token_type_id: 0
	unk_token: <unk>
	special_tokens:
	- <s>
	- <pad>
	- </s>
	- <unk>
	- <mask>
	- <\|endoftext\|>
	- <\|startoftext\|>
	- <nl>
	- <hs>
	- <sep>
	- <cls>
	continuing_subword_prefix: ''
	replacement: _
	add_prefix_space: true
	end_of_word_suffix: ''
	fuse_unk: false
	vocab_size: 32103
	min_frequency: 2
	limit_alphabet: 1000
	initial_alphabet: []
	show_progress: true