I assume you used trlx to train this model? Can you provide specifics or the implementation? thanks!
· Sign up or log in to comment