궤적 트랜스포머
이 모델은 유지 보수 모드로만 운영되며, 코드를 변경하는 새로운 PR(Pull Request)은 받지 않습니다.
이 모델을 실행하는 데 문제가 발생한다면, 이 모델을 지원하는 마지막 버전인 v4.30.0를 다시 설치해 주세요. 다음 명령어를 실행하여 재설치할 수 있습니다: pip install -U transformers==4.30.0
.
개요
Trajectory Transformer 모델은 Michael Janner, Qiyang Li, Sergey Levine이 제안한 하나의 커다란 시퀀스 모델링 문제로서의 오프라인 강화학습라는 논문에서 소개되었습니다.
해당 논문의 초록입니다:
강화학습(RL)은 일반적으로 마르코프 속성을 활용하여 시간에 따라 문제를 인수분해하면서 정적 정책이나 단일 단계 모델을 추정하는 데 중점을 둡니다. 하지만 우리는 RL을 높은 보상 시퀀스로 이어지는 행동 시퀀스를 생성하는 것을 목표로 하는 일반적인 시퀀스 모델링 문제로 볼 수도 있습니다. 이러한 관점에서, 자연어 처리와 같은 다른 도메인에서 잘 작동하는 고용량 시퀀스 예측 모델이 RL 문제에도 효과적인 해결책을 제공할 수 있는지 고려해 볼 만합니다. 이를 위해 우리는 RL을 시퀀스 모델링의 도구로 어떻게 다룰 수 있는지 탐구하며, 트랜스포머 아키텍처를 사용하여 궤적에 대한 분포를 모델링하고 빔 서치를 계획 알고리즘으로 재활용합니다. RL을 시퀀스 모델링 문제로 프레임화하면 다양한 설계 결정이 단순화되어, 오프라인 RL 알고리즘에서 흔히 볼 수 있는 많은 구성 요소를 제거할 수 있습니다. 우리는 이 접근 방식의 유연성을 장기 동역학 예측, 모방 학습, 목표 조건부 RL, 오프라인 RL에 걸쳐 입증합니다. 더 나아가, 이 접근 방식을 기존의 모델 프리 알고리즘과 결합하여 희소 보상, 장기 과제에서 최신 계획기(planner)를 얻을 수 있음을 보여줍니다.
이 모델은 CarlCochet에 의해 기여되었습니다. 원본 코드는 이곳에서 확인할 수 있습니다.
사용 팁
이 트랜스포머는 심층 강화학습에 사용됩니다. 사용하려면 이전의 모든 타임스텝에서의 행동, 상태, 보상으로부터 시퀀스를 생성해야 합니다. 이 모델은 이 모든 요소를 함께 하나의 큰 시퀀스(궤적)로 취급합니다.
TrajectoryTransformerConfig
class transformers.TrajectoryTransformerConfig
< source >( vocab_size = 100 action_weight = 5 reward_weight = 1 value_weight = 1 block_size = 249 action_dim = 6 observation_dim = 17 transition_dim = 25 n_layer = 4 n_head = 4 n_embd = 128 embd_pdrop = 0.1 attn_pdrop = 0.1 resid_pdrop = 0.1 learning_rate = 0.0006 max_position_embeddings = 512 initializer_range = 0.02 layer_norm_eps = 1e-12 kaiming_initializer_range = 1 use_cache = True pad_token_id = 1 bos_token_id = 50256 eos_token_id = 50256 **kwargs )
Parameters
- vocab_size (
int
, optional, defaults to 100) — Vocabulary size of the TrajectoryTransformer model. Defines the number of different tokens that can be represented by thetrajectories
passed when calling TrajectoryTransformerModel - action_weight (
int
, optional, defaults to 5) — Weight of the action in the loss function - reward_weight (
int
, optional, defaults to 1) — Weight of the reward in the loss function - value_weight (
int
, optional, defaults to 1) — Weight of the value in the loss function - block_size (
int
, optional, defaults to 249) — Size of the blocks in the trajectory transformer. - action_dim (
int
, optional, defaults to 6) — Dimension of the action space. - observation_dim (
int
, optional, defaults to 17) — Dimension of the observation space. - transition_dim (
int
, optional, defaults to 25) — Dimension of the transition space. - n_layer (
int
, optional, defaults to 4) — Number of hidden layers in the Transformer encoder. - n_head (
int
, optional, defaults to 4) — Number of attention heads for each attention layer in the Transformer encoder. - n_embd (
int
, optional, defaults to 128) — Dimensionality of the embeddings and hidden states. - resid_pdrop (
float
, optional, defaults to 0.1) — The dropout probability for all fully connected layers in the embeddings, encoder, and pooler. - embd_pdrop (
int
, optional, defaults to 0.1) — The dropout ratio for the embeddings. - attn_pdrop (
float
, optional, defaults to 0.1) — The dropout ratio for the attention. - hidden_act (
str
orfunction
, optional, defaults to"gelu"
) — The non-linear activation function (function or string) in the encoder and pooler. If string,"gelu"
,"relu"
,"selu"
and"gelu_new"
are supported. - max_position_embeddings (
int
, optional, defaults to 512) — The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048). - initializer_range (
float
, optional, defaults to 0.02) — The standard deviation of the truncated_normal_initializer for initializing all weight matrices. - layer_norm_eps (
float
, optional, defaults to 1e-12) — The epsilon used by the layer normalization layers. - kaiming_initializer_range (`float, optional, defaults to 1) — A coefficient scaling the negative slope of the kaiming initializer rectifier for EinLinear layers.
- use_cache (
bool
, optional, defaults toTrue
) — Whether or not the model should return the last key/values attentions (not used by all models). Only relevant ifconfig.is_decoder=True
. - Example —
This is the configuration class to store the configuration of a TrajectoryTransformerModel. It is used to instantiate an TrajectoryTransformer model according to the specified arguments, defining the model architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of the TrajectoryTransformer CarlCochet/trajectory-transformer-halfcheetah-medium-v2 architecture.
Configuration objects inherit from PretrainedConfig and can be used to control the model outputs. Read the documentation from PretrainedConfig for more information.
>>> from transformers import TrajectoryTransformerConfig, TrajectoryTransformerModel
>>> # Initializing a TrajectoryTransformer CarlCochet/trajectory-transformer-halfcheetah-medium-v2 style configuration
>>> configuration = TrajectoryTransformerConfig()
>>> # Initializing a model (with random weights) from the CarlCochet/trajectory-transformer-halfcheetah-medium-v2 style configuration
>>> model = TrajectoryTransformerModel(configuration)
>>> # Accessing the model configuration
>>> configuration = model.config
TrajectoryTransformerModel
class transformers.TrajectoryTransformerModel
< source >( config )
Parameters
- config (TrajectoryTransformerConfig) — Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out the from_pretrained() method to load the model weights.
The bare TrajectoryTransformer Model transformer outputting raw hidden-states without any specific head on top. This model is a PyTorch torch.nn.Module sub-class. Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
the full GPT language model, with a context size of block_size
forward
< source >( trajectories: typing.Optional[torch.LongTensor] = None past_key_values: typing.Optional[typing.Tuple[typing.Tuple[torch.Tensor]]] = None targets: typing.Optional[torch.FloatTensor] = None attention_mask: typing.Optional[torch.FloatTensor] = None use_cache: typing.Optional[bool] = None output_attentions: typing.Optional[bool] = None output_hidden_states: typing.Optional[bool] = None return_dict: typing.Optional[bool] = None ) → transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
Parameters
- trajectories (
torch.LongTensor
of shape(batch_size, sequence_length)
) — Batch of trajectories, where a trajectory is a sequence of states, actions and rewards. - past_key_values (
Tuple[Tuple[torch.Tensor]]
of lengthconfig.n_layers
, optional) — Contains precomputed hidden-states (key and values in the attention blocks) as computed by the model (seepast_key_values
output below). Can be used to speed up sequential decoding. Theinput_ids
which have their past given to this model should not be passed asinput_ids
as they have already been computed. - targets (
torch.LongTensor
of shape(batch_size, sequence_length)
, optional) — Desired targets used to compute the loss. - attention_mask (
torch.FloatTensor
of shape(batch_size, sequence_length)
, optional) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked.
- use_cache (
bool
, optional) — If set toTrue
,past_key_values
key value states are returned and can be used to speed up decoding (seepast_key_values
). - output_attentions (
bool
, optional) — Whether or not to return the attentions tensors of all attention layers. Seeattentions
under returned tensors for more detail. - output_hidden_states (
bool
, optional) — Whether or not to return the hidden states of all layers. Seehidden_states
under returned tensors for more detail. - return_dict (
bool
, optional) — Whether or not to return a ModelOutput instead of a plain tuple.
Returns
transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or tuple(torch.FloatTensor)
A transformers.models.deprecated.trajectory_transformer.modeling_trajectory_transformer.TrajectoryTransformerOutput
or a tuple of
torch.FloatTensor
(if return_dict=False
is passed or when config.return_dict=False
) comprising various
elements depending on the configuration (TrajectoryTransformerConfig) and inputs.
- loss (
torch.FloatTensor
of shape(1,)
, optional, returned whenlabels
is provided) — Language modeling loss. - logits (
torch.FloatTensor
of shape(batch_size, sequence_length, config.vocab_size)
) — Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax). - past_key_values (
Tuple[Tuple[torch.Tensor]]
, optional, returned whenuse_cache=True
is passed or whenconfig.use_cache=True
) — Tuple of lengthconfig.n_layers
, containing tuples of tensors of shape(batch_size, num_heads, sequence_length, embed_size_per_head)
). Contains pre-computed hidden-states (key and values in the attention blocks) that can be used (seepast_key_values
input) to speed up sequential decoding. - hidden_states (
tuple(torch.FloatTensor)
, optional, returned whenoutput_hidden_states=True
is passed or whenconfig.output_hidden_states=True
) — Tuple oftorch.FloatTensor
(one for the output of the embeddings + one for the output of each layer) of shape(batch_size, sequence_length, hidden_size)
. Hidden-states of the model at the output of each layer plus the initial embedding outputs. - attentions (
tuple(torch.FloatTensor)
, optional, returned whenoutput_attentions=True
is passed or whenconfig.output_attentions=True
) — Tuple oftorch.FloatTensor
(one for each layer) of shape(batch_size, num_heads, sequence_length, sequence_length)
. GPT2Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
The TrajectoryTransformerModel forward method, overrides the __call__
special method.
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the pre and post processing steps while
the latter silently ignores them.
Examples:
>>> from transformers import TrajectoryTransformerModel
>>> import torch
>>> model = TrajectoryTransformerModel.from_pretrained(
... "CarlCochet/trajectory-transformer-halfcheetah-medium-v2"
... )
>>> model.to(device)
>>> model.eval()
>>> observations_dim, action_dim, batch_size = 17, 6, 256
>>> seq_length = observations_dim + action_dim + 1
>>> trajectories = torch.LongTensor([np.random.permutation(self.seq_length) for _ in range(batch_size)]).to(
... device
... )
>>> targets = torch.LongTensor([np.random.permutation(self.seq_length) for _ in range(batch_size)]).to(device)
>>> outputs = model(
... trajectories,
... targets=targets,
... use_cache=True,
... output_attentions=True,
... output_hidden_states=True,
... return_dict=True,
... )