Transformers documentation

DeBERTa-v2

Transformers

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.46.3).

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

DeBERTa-v2

Overview

DeBERTa モデルは、Pengcheng He、Xiaodong Liu、Jianfeng Gao、Weizhu Chen によって DeBERTa: Decoding-enhanced BERT with Disentangled Attendant で提案されました。Google のモデルに基づいています。 2018年にリリースされたBERTモデルと2019年にリリースされたFacebookのRoBERTaモデル。

これは、もつれた注意を解きほぐし、使用されるデータの半分を使用して強化されたマスクデコーダトレーニングを備えた RoBERTa に基づいて構築されています。ロベルタ。

論文の要約は次のとおりです。

事前トレーニングされたニューラル言語モデルの最近の進歩により、多くの自然言語モデルのパフォーマンスが大幅に向上しました。言語処理 (NLP) タスク。この論文では、新しいモデルアーキテクチャ DeBERTa (Decoding-enhanced BERT with これは、2 つの新しい技術を使用して BERT モデルと RoBERTa モデルを改善します。 1つ目は、もつれを解く注意メカニズム。各単語は、その内容をエンコードする 2 つのベクトルを使用して表現され、単語間の注意の重みは、それらの単語のもつれ解除行列を使用して計算されます。内容と相対的な位置。 2 番目に、強化されたマスクデコーダを使用して、出力ソフトマックスレイヤを次のように置き換えます。モデルの事前トレーニング用にマスクされたトークンを予測します。これら 2 つの手法により効率が大幅に向上することを示します。モデルの事前トレーニングと下流タスクのパフォーマンスの向上。 RoBERTa-Large と比較すると、DeBERTa モデルは半分のレベルでトレーニングされています。トレーニングデータは幅広い NLP タスクで一貫して優れたパフォーマンスを示し、MNLI で +0.9% の改善を達成しました。 (90.2% 対 91.1%)、SQuAD v2.0 では +2.3% (88.4% 対 90.7%)、RACE では +3.6% (83.2% 対 86.8%) でした。 DeBERTa コードと事前トレーニングされたモデルは https://github.com/microsoft/DeBERTa で公開されます。

次の情報は、元の実装で直接表示されますリポジトリ。 DeBERTa v2 は、DeBERTa モデルの 2 番目のバージョンです。それには以下が含まれます SuperGLUE 単一モデルの提出に使用された 1.5B モデルは、人間のベースライン 89.8 に対して 89.9 を達成しました。あなたはできるこの投稿に関する詳細については、著者のドキュメントを参照してください。ブログ

v2 の新機能:

語彙 v2 では、トレーニングデータから構築されたサイズ 128K の新しい語彙を使用するようにトークナイザーが変更されました。 GPT2 ベースのトークナイザーの代わりに、トークナイザーは sentencepiece ベーストークナイザー。
nGiE(nGram Induced Input Encoding) DeBERTa-v2 モデルは、最初の畳み込み層とは別に追加の畳み込み層を使用します。トランスフォーマー層を使用して、入力トークンのローカル依存関係をよりよく学習します。
位置射影行列を注目レイヤーのコンテンツ射影行列と共有 以前に基づく実験では、パフォーマンスに影響を与えることなくパラメータを保存できます。
バケットを適用して相対位置をエンコードします DeBERTa-v2 モデルはログバケットを使用して相対位置をエンコードします T5に似ています。
900M モデル & 1.5B モデル 2 つの追加モデルサイズ: 900M と 1.5B が利用可能で、これにより、パフォーマンスが大幅に向上します。下流タスクのパフォーマンス。

このモデルは DeBERTa によって寄稿されました。このモデルの TF 2.0 実装は、 kamalkraj による投稿。元のコードはこちらにあります。

Transformers

DeBERTa-v2

Overview

Resources

DebertaV2Config

class transformers.DebertaV2Config

DebertaV2Tokenizer

class transformers.DebertaV2Tokenizer

build_inputs_with_special_tokens

get_special_tokens_mask

create_token_type_ids_from_sequences

save_vocabulary

DebertaV2TokenizerFast

class transformers.DebertaV2TokenizerFast

build_inputs_with_special_tokens

create_token_type_ids_from_sequences

DebertaV2Model

class transformers.DebertaV2Model

forward

DebertaV2PreTrainedModel

class transformers.DebertaV2PreTrainedModel

_forward_unimplemented

DebertaV2ForMaskedLM

class transformers.DebertaV2ForMaskedLM

forward

DebertaV2ForSequenceClassification

class transformers.DebertaV2ForSequenceClassification

forward

DebertaV2ForTokenClassification

class transformers.DebertaV2ForTokenClassification

forward

DebertaV2ForQuestionAnswering

class transformers.DebertaV2ForQuestionAnswering

forward

DebertaV2ForMultipleChoice

class transformers.DebertaV2ForMultipleChoice

forward

TFDebertaV2Model

class transformers.TFDebertaV2Model

call

TFDebertaV2PreTrainedModel

class transformers.TFDebertaV2PreTrainedModel

call

TFDebertaV2ForMaskedLM

class transformers.TFDebertaV2ForMaskedLM

call

TFDebertaV2ForSequenceClassification

class transformers.TFDebertaV2ForSequenceClassification

call

TFDebertaV2ForTokenClassification

class transformers.TFDebertaV2ForTokenClassification

call

TFDebertaV2ForQuestionAnswering

class transformers.TFDebertaV2ForQuestionAnswering

call

TFDebertaV2ForMultipleChoice

class transformers.TFDebertaV2ForMultipleChoice

call