Transformers documentation

에이전트 & 도구

Transformers

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

에이전트 & 도구

Transformers Agent는 실험 중인 API이므로 언제든지 변경될 수 있습니다. API나 기반 모델이 자주 업데이트되므로, 에이전트가 제공하는 결과물은 달라질 수 있습니다.

에이전트와 도구에 대해 더 알아보려면 소개 가이드를 꼭 읽어보세요. 이 페이지에는 기본 클래스에 대한 API 문서가 포함되어 있습니다.

에이전트

우리는 기본 Agent 클래스를 기반으로 두 가지 유형의 에이전트를 제공합니다:

CodeAgent는 한 번에 동작합니다. 작업을 해결하기 위해 코드를 생성한 다음, 바로 실행합니다.
ReactAgent는 단계별로 동작하며, 각 단계는 하나의 생각, 하나의 도구 호출 및 실행으로 구성됩니다. 이 에이전트에는 두 가지 클래스가 있습니다:
- ReactJsonAgent는 도구 호출을 JSON으로 작성합니다.
- ReactCodeAgent는 도구 호출을 Python 코드로 작성합니다.

Agent

class transformers.Agent

( tools: Union llm_engine: Callable = <transformers.agents.llm_engine.HfApiEngine object at 0x7fb5a241b940> system_prompt = 'You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.\nTo do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.\nTo solve the task, you must plan forward to proceed in a series of steps, in a cycle of \'Thought:\', \'Code:\', and \'Observation:\' sequences.\n\nAt each step, in the \'Thought:\' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.\nThen in the \'Code:\' sequence, you should write the code in simple Python. The code sequence must end with \'<end_action>\' sequence.\nDuring each intermediate step, you can use \'print()\' to save whatever important information you will then need.\nThese print outputs will then appear in the \'Observation:\' field, which will be available as input for the next step.\nIn the end you have to return a final answer using the `final_answer` tool.\n\nHere are a few examples using notional tools:\n---\nTask: "Generate an image of the oldest person in this document."\n\nThought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = document_qa(document=document, question="Who is the oldest person mentioned?")\nprint(answer)\n```<end_action>\nObservation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."\n\nThought: I will now generate an image showcasing the oldest person.\nCode:\n```py\nimage = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "What is the result of the following operation: 5 + 3 + 1294.678?"\n\nThought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool\nCode:\n```py\nresult = 5 + 3 + 1294.678\nfinal_answer(result)\n```<end_action>\n\n---\nTask: "Which city has the highest population: Guangzhou or Shanghai?"\n\nThought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\nCode:\n```py\npopulation_guangzhou = search("Guangzhou population")\nprint("Population Guangzhou:", population_guangzhou)\npopulation_shanghai = search("Shanghai population")\nprint("Population Shanghai:", population_shanghai)\n```<end_action>\nObservation:\nPopulation Guangzhou: [\'Guangzhou has a population of 15 million inhabitants as of 2021.\']\nPopulation Shanghai: \'26 million (2019)\'\n\nThought: Now I know that Shanghai has the highest population.\nCode:\n```py\nfinal_answer("Shanghai")\n```<end_action>\n\n---\nTask: "What is the current age of the pope, raised to the power 0.36?"\n\nThought: I will use the tool `wiki` to get the age of the pope, then raise it to the power 0.36.\nCode:\n```py\npope_age = wiki(query="current pope age")\nprint("Pope age:", pope_age)\n```<end_action>\nObservation:\nPope age: "The pope Francis is currently 85 years old."\n\nThought: I know that the pope is 85 years old. Let\'s compute the result using python code.\nCode:\n```py\npope_current_age = 85 ** 0.36\nfinal_answer(pope_current_age)\n```<end_action>\n\nAbove example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you have acces to those tools (and no other tool):\n\n<<tool_descriptions>>\n\n<<managed_agents_descriptions>>\n\nHere are the rules you should always follow to solve your task:\n1. Always provide a \'Thought:\' sequence, and a \'Code:\n```py\' sequence ending with \'```<end_action>\' sequence, else you will fail.\n2. Use only variables that you have defined!\n3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in \'answer = wiki({\'query\': "What is the place where James Bond lives?"})\', but use the arguments directly as in \'answer = wiki(query="What is the place where James Bond lives?")\'.\n4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.\n5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.\n6. Don\'t name any new variable with the same name as a tool: for instance don\'t name a variable \'final_answer\'.\n7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables.\n8. You can use imports in your code, but only from the following list of modules: <<authorized_imports>>\n9. The state persists between code executions: so if in one step you\'ve created variables or imported modules, these will all persist.\n10. Don\'t give up! You\'re in charge of solving the task, not providing directions to solve it.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n' tool_description_template = None additional_args = {} max_iterations: int = 6 tool_parser = <function parse_json_tool_call at 0x7fb5a2234ee0> add_base_tools: bool = False verbose: int = 0 grammar: Dict = None managed_agents: List = None )

execute_tool_call

( tool_name: str arguments: Dict )

Parameters

tool_name (str) — Name of the Tool to execute (should be one from self.toolbox).
arguments (Dict[str, str]) — Arguments passed to the Tool.

Execute tool with the provided input and returns the result. This method replaces arguments with the actual values from the state if they refer to state variables.

extract_action

( llm_output: str split_token: str )

Parameters

llm_output (str) — Output of the LLM
split_token (str) — Separator for the action. Should match the example in the system prompt.

Parse action from the LLM output

run

( **kwargs )

To be implemented in the child class

write_inner_memory_from_logs

( summary_mode: Optional = False )

Reads past llm_outputs, actions, and observations or errors from the logs into a series of messages that can be used as input to the LLM.

CodeAgent

class transformers.CodeAgent

( tools: List llm_engine: Callable = <transformers.agents.llm_engine.HfApiEngine object at 0x7fb5a241bbb0> system_prompt: str = 'You will be given a task to solve, your job is to come up with a series of simple commands in Python that will perform the task.\nTo help you, I will give you access to a set of tools that you can use. Each tool is a Python function and has a description explaining the task it performs, the inputs it expects and the outputs it returns.\nYou should first explain which tool you will use to perform the task and for what reason, then write the code in Python.\nEach instruction in Python should be a simple assignment. You can print intermediate results if it makes sense to do so.\nIn the end, use tool \'final_answer\' to return your answer, its argument will be what gets returned.\nYou can use imports in your code, but only from the following list of modules: <<authorized_imports>>\nBe sure to provide a \'Code:\' token, else the run will fail.\n\nTools:\n<<tool_descriptions>>\n\nExamples:\n---\nTask: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."\n\nThought: I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.\nCode:\n```py\ntranslated_question = translator(question=question, src_lang="French", tgt_lang="English")\nprint(f"The translated question is {translated_question}.")\nanswer = image_qa(image=image, question=translated_question)\nfinal_answer(f"The answer is {answer}")\n```<end_action>\n\n---\nTask: "Identify the oldest person in the `document` and create an image showcasing the result."\n\nThought: I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = document_qa(document, question="What is the oldest person?")\nprint(f"The answer is {answer}.")\nimage = image_generator(answer)\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "Generate an image using the text given in the variable `caption`."\n\nThought: I will use the following tool: `image_generator` to generate an image.\nCode:\n```py\nimage = image_generator(prompt=caption)\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "Summarize the text given in the variable `text` and read it out loud."\n\nThought: I will use the following tools: `summarizer` to create a summary of the input text, then `text_reader` to read it out loud.\nCode:\n```py\nsummarized_text = summarizer(text)\nprint(f"Summary: {summarized_text}")\naudio_summary = text_reader(summarized_text)\nfinal_answer(audio_summary)\n```<end_action>\n\n---\nTask: "Answer the question in the variable `question` about the text in the variable `text`. Use the answer to generate an image."\n\nThought: I will use the following tools: `text_qa` to create the answer, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = text_qa(text=text, question=question)\nprint(f"The answer is {answer}.")\nimage = image_generator(answer)\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "Caption the following `image`."\n\nThought: I will use the following tool: `image_captioner` to generate a caption for the image.\nCode:\n```py\ncaption = image_captioner(image)\nfinal_answer(caption)\n```<end_action>\n\n---\nAbove example were using tools that might not exist for you. You only have acces to those Tools:\n<<tool_names>>\n\nRemember to make sure that variables you use are all defined.\nBe sure to provide a \'Code:\n```\' sequence before the code and \'```<end_action>\' after, else you will get an error.\nDO NOT pass the arguments as a dict as in \'answer = ask_search_agent({\'query\': "What is the place where James Bond lives?"})\', but use the arguments directly as in \'answer = ask_search_agent(query="What is the place where James Bond lives?")\'.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n' tool_description_template: str = '\n- {{ tool.name }}: {{ tool.description }}\n Takes inputs: {{tool.inputs}}\n Returns an output of type: {{tool.output_type}}\n' grammar: Dict = None additional_authorized_imports: Optional = None **kwargs )

A class for an agent that solves the given task using a single block of code. It plans all its actions, then executes all in one shot.

parse_code_blob

( result: str )

Override this method if you want to change the way the code is cleaned in the run method.

run

( task: str return_generated_code: bool = False **kwargs )

Parameters

task (str) — The task to perform
return_generated_code (bool, optional, defaults to False) — Whether to return the generated code instead of running it
kwargs (additional keyword arguments, optional) — Any keyword argument to send to the agent when evaluating the code.

Runs the agent for the given task.

Example:

from transformers.agents import CodeAgent

agent = CodeAgent(tools=[])
agent.run("What is the result of 2 power 3.7384?")

React agents

class transformers.ReactAgent

( tools: List llm_engine: Callable = <transformers.agents.llm_engine.HfApiEngine object at 0x7fb5a241bd30> system_prompt: str = 'You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.\nTo do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.\nTo solve the task, you must plan forward to proceed in a series of steps, in a cycle of \'Thought:\', \'Code:\', and \'Observation:\' sequences.\n\nAt each step, in the \'Thought:\' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.\nThen in the \'Code:\' sequence, you should write the code in simple Python. The code sequence must end with \'<end_action>\' sequence.\nDuring each intermediate step, you can use \'print()\' to save whatever important information you will then need.\nThese print outputs will then appear in the \'Observation:\' field, which will be available as input for the next step.\nIn the end you have to return a final answer using the `final_answer` tool.\n\nHere are a few examples using notional tools:\n---\nTask: "Generate an image of the oldest person in this document."\n\nThought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = document_qa(document=document, question="Who is the oldest person mentioned?")\nprint(answer)\n```<end_action>\nObservation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."\n\nThought: I will now generate an image showcasing the oldest person.\nCode:\n```py\nimage = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "What is the result of the following operation: 5 + 3 + 1294.678?"\n\nThought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool\nCode:\n```py\nresult = 5 + 3 + 1294.678\nfinal_answer(result)\n```<end_action>\n\n---\nTask: "Which city has the highest population: Guangzhou or Shanghai?"\n\nThought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\nCode:\n```py\npopulation_guangzhou = search("Guangzhou population")\nprint("Population Guangzhou:", population_guangzhou)\npopulation_shanghai = search("Shanghai population")\nprint("Population Shanghai:", population_shanghai)\n```<end_action>\nObservation:\nPopulation Guangzhou: [\'Guangzhou has a population of 15 million inhabitants as of 2021.\']\nPopulation Shanghai: \'26 million (2019)\'\n\nThought: Now I know that Shanghai has the highest population.\nCode:\n```py\nfinal_answer("Shanghai")\n```<end_action>\n\n---\nTask: "What is the current age of the pope, raised to the power 0.36?"\n\nThought: I will use the tool `wiki` to get the age of the pope, then raise it to the power 0.36.\nCode:\n```py\npope_age = wiki(query="current pope age")\nprint("Pope age:", pope_age)\n```<end_action>\nObservation:\nPope age: "The pope Francis is currently 85 years old."\n\nThought: I know that the pope is 85 years old. Let\'s compute the result using python code.\nCode:\n```py\npope_current_age = 85 ** 0.36\nfinal_answer(pope_current_age)\n```<end_action>\n\nAbove example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you have acces to those tools (and no other tool):\n\n<<tool_descriptions>>\n\n<<managed_agents_descriptions>>\n\nHere are the rules you should always follow to solve your task:\n1. Always provide a \'Thought:\' sequence, and a \'Code:\n```py\' sequence ending with \'```<end_action>\' sequence, else you will fail.\n2. Use only variables that you have defined!\n3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in \'answer = wiki({\'query\': "What is the place where James Bond lives?"})\', but use the arguments directly as in \'answer = wiki(query="What is the place where James Bond lives?")\'.\n4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.\n5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.\n6. Don\'t name any new variable with the same name as a tool: for instance don\'t name a variable \'final_answer\'.\n7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables.\n8. You can use imports in your code, but only from the following list of modules: <<authorized_imports>>\n9. The state persists between code executions: so if in one step you\'ve created variables or imported modules, these will all persist.\n10. Don\'t give up! You\'re in charge of solving the task, not providing directions to solve it.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n' tool_description_template: str = '\n- {{ tool.name }}: {{ tool.description }}\n Takes inputs: {{tool.inputs}}\n Returns an output of type: {{tool.output_type}}\n' grammar: Dict = None plan_type: Literal = 'default' planning_interval: Optional = None **kwargs )

This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The action will be parsed from the LLM output: it consists in calls to tools from the toolbox, with arguments chosen by the LLM engine.

direct_run

( task: str )

Runs the agent in direct mode, returning outputs only at the end: should be launched only in the run method.

planning_step

( task is_first_step: bool = False iteration: int = None )

Parameters

task (str) — The task to perform
is_first_step (bool) — If this step is not the first one, the plan should be an update over a previous plan.
iteration (int) — The number of the current step, used as an indication for the LLM.

Used periodically by the agent to plan the next steps to reach the objective.

provide_final_answer

( task )

This method provides a final answer to the task, based on the logs of the agent’s interactions.

run

( task: str stream: bool = False reset: bool = True **kwargs )

Parameters

task (str) — The task to perform

Runs the agent for the given task.

Example:

from transformers.agents import ReactCodeAgent
agent = ReactCodeAgent(tools=[])
agent.run("What is the result of 2 power 3.7384?")

stream_run

( task: str )

Runs the agent in streaming mode, yielding steps as they are executed: should be launched only in the run method.

class transformers.ReactJsonAgent

( tools: List llm_engine: Callable = <transformers.agents.llm_engine.HfApiEngine object at 0x7fb5a241bee0> system_prompt: str = 'You are an expert assistant who can solve any task using JSON tool calls. You will be given a task to solve as best you can.\nTo do so, you have been given access to the following tools: <<tool_names>>\nThe way you use the tools is by specifying a json blob, ending with \'<end_action>\'.\nSpecifically, this json should have an `action` key (name of the tool to use) and an `action_input` key (input to the tool).\n\nThe $ACTION_JSON_BLOB should only contain a SINGLE action, do NOT return a list of multiple actions. It should be formatted in json. Do not try to escape special characters. Here is the template of a valid $ACTION_JSON_BLOB:\n{\n "action": $TOOL_NAME,\n "action_input": $INPUT\n}<end_action>\n\nMake sure to have the $INPUT as a dictionary in the right format for the tool you are using, and do not put variable names as input if you can find the right values.\n\nYou should ALWAYS use the following format:\n\nThought: you should always think about one action to take. Then use the action as follows:\nAction:\n$ACTION_JSON_BLOB\nObservation: the result of the action\n... (this Thought/Action/Observation can repeat N times, you should take several steps when needed. The $ACTION_JSON_BLOB must only use a SINGLE action at a time.)\n\nYou can use the result of the previous action as input for the next action.\nThe observation will always be a string: it can represent a file, like "image_1.jpg".\nThen you can use it as input for the next action. You can do it for instance as follows:\n\nObservation: "image_1.jpg"\n\nThought: I need to transform the image that I received in the previous observation to make it green.\nAction:\n{\n "action": "image_transformer",\n "action_input": {"image": "image_1.jpg"}\n}<end_action>\n\nTo provide the final answer to the task, use an action blob with "action": "final_answer" tool. It is the only way to complete the task, else you will be stuck on a loop. So your final output should look like this:\nAction:\n{\n "action": "final_answer",\n "action_input": {"answer": "insert your final answer here"}\n}<end_action>\n\n\nHere are a few examples using notional tools:\n---\nTask: "Generate an image of the oldest person in this document."\n\nThought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nAction:\n{\n "action": "document_qa",\n "action_input": {"document": "document.pdf", "question": "Who is the oldest person mentioned?"}\n}<end_action>\nObservation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."\n\n\nThought: I will now generate an image showcasing the oldest person.\nAction:\n{\n "action": "image_generator",\n "action_input": {"prompt": "A portrait of John Doe, a 55-year-old man living in Canada."}\n}<end_action>\nObservation: "image.png"\n\nThought: I will now return the generated image.\nAction:\n{\n "action": "final_answer",\n "action_input": "image.png"\n}<end_action>\n\n---\nTask: "What is the result of the following operation: 5 + 3 + 1294.678?"\n\nThought: I will use python code evaluator to compute the result of the operation and then return the final answer using the `final_answer` tool\nAction:\n{\n "action": "python_interpreter",\n "action_input": {"code": "5 + 3 + 1294.678"}\n}<end_action>\nObservation: 1302.678\n\nThought: Now that I know the result, I will now return it.\nAction:\n{\n "action": "final_answer",\n "action_input": "1302.678"\n}<end_action>\n\n---\nTask: "Which city has the highest population , Guangzhou or Shanghai?"\n\nThought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\nAction:\n{\n "action": "search",\n "action_input": "Population Guangzhou"\n}<end_action>\nObservation: [\'Guangzhou has a population of 15 million inhabitants as of 2021.\']\n\n\nThought: Now let\'s get the population of Shanghai using the tool \'search\'.\nAction:\n{\n "action": "search",\n "action_input": "Population Shanghai"\n}\nObservation: \'26 million (2019)\'\n\nThought: Now I know that Shanghai has a larger population. Let\'s return the result.\nAction:\n{\n "action": "final_answer",\n "action_input": "Shanghai"\n}<end_action>\n\n\nAbove example were using notional tools that might not exist for you. You only have acces to those tools:\n<<tool_descriptions>>\n\nHere are the rules you should always follow to solve your task:\n1. ALWAYS provide a \'Thought:\' sequence, and an \'Action:\' sequence that ends with <end_action>, else you will fail.\n2. Always use the right arguments for the tools. Never use variable names in the \'action_input\' field, use the value instead.\n3. Call a tool only when needed: do not call the search agent if you do not need information, try to solve the task yourself.\n4. Never re-do a tool call that you previously did with the exact same parameters.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n' tool_description_template: str = '\n- {{ tool.name }}: {{ tool.description }}\n Takes inputs: {{tool.inputs}}\n Returns an output of type: {{tool.output_type}}\n' grammar: Dict = None planning_interval: Optional = None **kwargs )

This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The tool calls will be formulated by the LLM in JSON format, then parsed and executed.

step

( )

Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. The errors are raised here, they are caught and logged in the run() method.

class transformers.ReactCodeAgent

( tools: List llm_engine: Callable = <transformers.agents.llm_engine.HfApiEngine object at 0x7fb5a22400a0> system_prompt: str = 'You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can.\nTo do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code.\nTo solve the task, you must plan forward to proceed in a series of steps, in a cycle of \'Thought:\', \'Code:\', and \'Observation:\' sequences.\n\nAt each step, in the \'Thought:\' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use.\nThen in the \'Code:\' sequence, you should write the code in simple Python. The code sequence must end with \'<end_action>\' sequence.\nDuring each intermediate step, you can use \'print()\' to save whatever important information you will then need.\nThese print outputs will then appear in the \'Observation:\' field, which will be available as input for the next step.\nIn the end you have to return a final answer using the `final_answer` tool.\n\nHere are a few examples using notional tools:\n---\nTask: "Generate an image of the oldest person in this document."\n\nThought: I will proceed step by step and use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.\nCode:\n```py\nanswer = document_qa(document=document, question="Who is the oldest person mentioned?")\nprint(answer)\n```<end_action>\nObservation: "The oldest person in the document is John Doe, a 55 year old lumberjack living in Newfoundland."\n\nThought: I will now generate an image showcasing the oldest person.\nCode:\n```py\nimage = image_generator("A portrait of John Doe, a 55-year-old man living in Canada.")\nfinal_answer(image)\n```<end_action>\n\n---\nTask: "What is the result of the following operation: 5 + 3 + 1294.678?"\n\nThought: I will use python code to compute the result of the operation and then return the final answer using the `final_answer` tool\nCode:\n```py\nresult = 5 + 3 + 1294.678\nfinal_answer(result)\n```<end_action>\n\n---\nTask: "Which city has the highest population: Guangzhou or Shanghai?"\n\nThought: I need to get the populations for both cities and compare them: I will use the tool `search` to get the population of both cities.\nCode:\n```py\npopulation_guangzhou = search("Guangzhou population")\nprint("Population Guangzhou:", population_guangzhou)\npopulation_shanghai = search("Shanghai population")\nprint("Population Shanghai:", population_shanghai)\n```<end_action>\nObservation:\nPopulation Guangzhou: [\'Guangzhou has a population of 15 million inhabitants as of 2021.\']\nPopulation Shanghai: \'26 million (2019)\'\n\nThought: Now I know that Shanghai has the highest population.\nCode:\n```py\nfinal_answer("Shanghai")\n```<end_action>\n\n---\nTask: "What is the current age of the pope, raised to the power 0.36?"\n\nThought: I will use the tool `wiki` to get the age of the pope, then raise it to the power 0.36.\nCode:\n```py\npope_age = wiki(query="current pope age")\nprint("Pope age:", pope_age)\n```<end_action>\nObservation:\nPope age: "The pope Francis is currently 85 years old."\n\nThought: I know that the pope is 85 years old. Let\'s compute the result using python code.\nCode:\n```py\npope_current_age = 85 ** 0.36\nfinal_answer(pope_current_age)\n```<end_action>\n\nAbove example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you have acces to those tools (and no other tool):\n\n<<tool_descriptions>>\n\n<<managed_agents_descriptions>>\n\nHere are the rules you should always follow to solve your task:\n1. Always provide a \'Thought:\' sequence, and a \'Code:\n```py\' sequence ending with \'```<end_action>\' sequence, else you will fail.\n2. Use only variables that you have defined!\n3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in \'answer = wiki({\'query\': "What is the place where James Bond lives?"})\', but use the arguments directly as in \'answer = wiki(query="What is the place where James Bond lives?")\'.\n4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block.\n5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters.\n6. Don\'t name any new variable with the same name as a tool: for instance don\'t name a variable \'final_answer\'.\n7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables.\n8. You can use imports in your code, but only from the following list of modules: <<authorized_imports>>\n9. The state persists between code executions: so if in one step you\'ve created variables or imported modules, these will all persist.\n10. Don\'t give up! You\'re in charge of solving the task, not providing directions to solve it.\n\nNow Begin! If you solve the task correctly, you will receive a reward of $1,000,000.\n' tool_description_template: str = '\n- {{ tool.name }}: {{ tool.description }}\n Takes inputs: {{tool.inputs}}\n Returns an output of type: {{tool.output_type}}\n' grammar: Dict = None additional_authorized_imports: Optional = None planning_interval: Optional = None **kwargs )

This agent that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of thinking and acting. The tool calls will be formulated by the LLM in code format, then parsed and executed.

step

( )

Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. The errors are raised here, they are caught and logged in the run() method.

Tools

load_tool

transformers.load_tool

( task_or_repo_id model_repo_id = None token = None **kwargs )

Parameters

task_or_repo_id (str) — The task for which to load the tool or a repo ID of a tool on the Hub. Tasks implemented in Transformers are:
- "document_question_answering"
- "image_question_answering"
- "speech_to_text"
- "text_to_speech"
- "translation"
model_repo_id (str, optional) — Use this argument to use a different model than the default one for the tool you selected.
token (str, optional) — The token to identify you on hf.co. If unset, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
kwargs (additional keyword arguments, optional) — Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as cache_dir, revision, subfolder) will be used when downloading the files for your tool, and the others will be passed along to its init.

Main function to quickly load a tool, be it on the Hub or in the Transformers library.

Loading a tool means that you’ll download the tool and execute it locally. ALWAYS inspect the tool you’re downloading before loading it within your runtime, as you would do when installing a package using pip/npm/apt.

Tool

class transformers.Tool

( *args **kwargs )

A base class for the functions used by the agent. Subclass this and implement the __call__ method as well as the following class attributes:

description (str) — A short description of what your tool does, the inputs it expects and the output(s) it will return. For instance ‘This is a tool that downloads a file from a url. It takes the url as input, and returns the text contained in the file’.
name (str) — A performative name that will be used for your tool in the prompt to the agent. For instance "text-classifier" or "image_generator".
inputs (Dict[str, Dict[str, Union[str, type]]]) — The dict of modalities expected for the inputs. It has one typekey and a descriptionkey. This is used by launch_gradio_demo or to make a nice space from your tool, and also can be used in the generated description for your tool.
output_type (type) — The type of the tool output. This is used by launch_gradio_demo or to make a nice space from your tool, and also can be used in the generated description for your tool.

You can also override the method setup() if your tool as an expensive operation to perform before being usable (such as loading a model). setup() will be called the first time you use your tool, but not at instantiation.

from_gradio

( gradio_tool )

Creates a Tool from a gradio tool.

from_hub

( repo_id: str model_repo_id: Optional = None token: Optional = None **kwargs )

Parameters

repo_id (str) — The name of the repo on the Hub where your tool is defined.
model_repo_id (str, optional) — If your tool uses a model and you want to use a different model than the default, you can pass a second repo ID or an endpoint url to this argument.
token (str, optional) — The token to identify you on hf.co. If unset, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
kwargs (additional keyword arguments, optional) — Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as cache_dir, revision, subfolder) will be used when downloading the files for your tool, and the others will be passed along to its init.

Loads a tool defined on the Hub.

Loading a tool from the Hub means that you’ll download the tool and execute it locally. ALWAYS inspect the tool you’re downloading before loading it within your runtime, as you would do when installing a package using pip/npm/apt.

from_langchain

( langchain_tool )

Creates a Tool from a langchain tool.

push_to_hub

( repo_id: str commit_message: str = 'Upload tool' private: Optional = None token: Union = None create_pr: bool = False )

Parameters

repo_id (str) — The name of the repository you want to push your tool to. It should contain your organization name when pushing to a given organization.
commit_message (str, optional, defaults to "Upload tool") — Message to commit while pushing.
private (bool, optional) — Whether or not the repository created should be private.
token (bool or str, optional) — The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
create_pr (bool, optional, defaults to False) — Whether or not to create a PR with the uploaded files or directly commit.

Upload the tool to the Hub.

For this method to work properly, your tool must have been defined in a separate module (not __main__).

For instance:

from my_tool_module import MyTool
my_tool = MyTool()
my_tool.push_to_hub("my-username/my-space")

save

( output_dir )

Parameters

output_dir (str) — The folder in which you want to save your tool.

Saves the relevant code files for your tool so it can be pushed to the Hub. This will copy the code of your tool in output_dir as well as autogenerate:

a config file named tool_config.json
an app.py file so that your tool can be converted to a space
a requirements.txt containing the names of the module used by your tool (as detected when inspecting its code)

You should only use this method to save tools that are defined in a separate module (not __main__).

setup

( )

Overwrite this method here for any operation that is expensive and needs to be executed before you start using your tool. Such as loading a big model.

Toolbox

class transformers.Toolbox

( tools: List add_base_tools: bool = False )

Parameters

tools (List[Tool]) — The list of tools to instantiate the toolbox with
add_base_tools (bool, defaults to False, optional, defaults to False) — Whether to add the tools available within transformers to the toolbox.

The toolbox contains all tools that the agent can perform operations with, as well as a few methods to manage them.

add_tool

( tool: Tool )

Parameters

tool (Tool) — The tool to add to the toolbox.

Adds a tool to the toolbox

clear_toolbox

( )

Clears the toolbox

remove_tool

( tool_name: str )

Parameters

tool_name (str) — The tool to remove from the toolbox.

Removes a tool from the toolbox

show_tool_descriptions

( tool_description_template: str = None )

Parameters

tool_description_template (str, optional) — The template to use to describe the tools. If not provided, the default template will be used.

Returns the description of all tools in the toolbox

update_tool

( tool: Tool )

Parameters

tool (Tool) — The tool to update to the toolbox.

Updates a tool in the toolbox according to its name.

PipelineTool

class transformers.PipelineTool

( model = None pre_processor = None post_processor = None device = None device_map = None model_kwargs = None token = None **hub_kwargs )

Parameters

model (str or PreTrainedModel, optional) — The name of the checkpoint to use for the model, or the instantiated model. If unset, will default to the value of the class attribute default_checkpoint.
pre_processor (str or Any, optional) — The name of the checkpoint to use for the pre-processor, or the instantiated pre-processor (can be a tokenizer, an image processor, a feature extractor or a processor). Will default to the value of model if unset.
post_processor (str or Any, optional) — The name of the checkpoint to use for the post-processor, or the instantiated pre-processor (can be a tokenizer, an image processor, a feature extractor or a processor). Will default to the pre_processor if unset.
device (int, str or torch.device, optional) — The device on which to execute the model. Will default to any accelerator available (GPU, MPS etc…), the CPU otherwise.
device_map (str or dict, optional) — If passed along, will be used to instantiate the model.
model_kwargs (dict, optional) — Any keyword argument to send to the model instantiation.
token (str, optional) — The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when running huggingface-cli login (stored in ~/.huggingface).
hub_kwargs (additional keyword arguments, optional) — Any additional keyword argument to send to the methods that will load the data from the Hub.

A Tool tailored towards Transformer models. On top of the class attributes of the base class Tool, you will need to specify:

model_class (type) — The class to use to load the model in this tool.
default_checkpoint (str) — The default checkpoint that should be used when the user doesn’t specify one.
pre_processor_class (type, optional, defaults to AutoProcessor) — The class to use to load the pre-processor
post_processor_class (type, optional, defaults to AutoProcessor) — The class to use to load the post-processor (when different from the pre-processor).

decode

( outputs )

Uses the post_processor to decode the model output.

encode

( raw_inputs )

Uses the pre_processor to prepare the inputs for the model.

forward

( inputs )

Sends the inputs through the model.

setup

( )

Instantiates the pre_processor, model and post_processor if necessary.

launch_gradio_demo

transformers.launch_gradio_demo

( tool_class: Tool )

Parameters

tool_class (type) — The class of the tool for which to launch the demo.

Launches a gradio demo for a tool. The corresponding tool class needs to properly implement the class attributes inputs and output_type.

ToolCollection

class transformers.ToolCollection

( collection_slug: str token: Optional = None )

Parameters

collection_slug (str) — The collection slug referencing the collection.
token (str, optional) — The authentication token if the collection is private.

Tool collections enable loading all Spaces from a collection in order to be added to the agent’s toolbox.

[!NOTE] Only Spaces will be fetched, so you can feel free to add models and datasets to your collection if you’d like for this collection to showcase them.

Example:

>>> from transformers import ToolCollection, ReactCodeAgent

>>> image_tool_collection = ToolCollection(collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f")
>>> agent = ReactCodeAgent(tools=[*image_tool_collection.tools], add_base_tools=True)

>>> agent.run("Please draw me a picture of rivers and lakes.")

엔진

에이전트 프레임워크에서 사용할 수 있는 엔진을 자유롭게 만들고 사용할 수 있습니다. 이 엔진들은 다음과 같은 사양을 가지고 있습니다:

입력(List[Dict[str, str]])에 대한 메시지 형식을 따르고 문자열을 반환해야 합니다.
인수 stop_sequences에 시퀀스가 전달되기 전에 출력을 생성하는 것을 중지해야 합니다.

HfApiEngine

편의를 위해, 위의 사항을 구현하고 대규모 언어 모델 실행을 위해 추론 엔드포인트를 사용하는 HfApiEngine을 추가했습니다.

>>> from transformers import HfApiEngine

>>> messages = [
...   {"role": "user", "content": "Hello, how are you?"},
...   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
...   {"role": "user", "content": "No need to help, take it easy."},
... ]

>>> HfApiEngine()(messages, stop_sequences=["conversation"])

"That's very kind of you to say! It's always nice to have a relaxed "

class transformers.HfApiEngine

( model: str = 'meta-llama/Meta-Llama-3.1-8B-Instruct' )

This engine leverages Hugging Face’s Inference API service, either serverless or with a dedicated endpoint.

에이전트 유형

에이전트는 도구 간의 모든 유형의 객체를 처리할 수 있습니다; 도구는 완전히 멀티모달이므로 텍스트, 이미지, 오디오, 비디오 등 다양한 유형을 수락하고 반환할 수 있습니다. 도구 간의 호환성을 높이고 ipython (jupyter, colab, ipython 노트북, …)에서 이러한 반환 값을 올바르게 렌더링하기 위해 이러한 유형을 중심으로 래퍼 클래스를 구현합니다.

래핑된 객체는 처음과 동일하게 작동해야 합니다; 텍스트 객체는 여전히 문자열로 작동해야 하며, 이미지 객체는 여전히 PIL.Image로 작동해야 합니다.

이러한 유형에는 세 가지 특정 목적이 있습니다:

to_raw를 호출하면 기본 객체가 반환되어야 합니다.
to_string을 호출하면 객체가 문자열로 반환되어야 합니다: AgentText의 경우 문자열이 될 수 있지만, 다른 경우에는 객체의 직렬화된 버전의 경로일 수 있습니다.
ipython 커널에서 표시할 때 객체가 올바르게 표시되어야 합니다.

AgentText

class transformers.agents.agent_types.AgentText

( value )

Text type returned by the agent. Behaves as a string.

AgentImage

class transformers.agents.agent_types.AgentImage

( value )

Image type returned by the agent. Behaves as a PIL.Image.

save

( output_bytes format **params )

Parameters

output_bytes (bytes) — The output bytes to save the image to.
format (str) — The format to use for the output image. The format is the same as in PIL.Image.save. **params — Additional parameters to pass to PIL.Image.save.

Saves the image to a file.

to_raw

( )

Returns the “raw” version of that object. In the case of an AgentImage, it is a PIL.Image.

to_string

( )

Returns the stringified version of that object. In the case of an AgentImage, it is a path to the serialized version of the image.

AgentAudio

class transformers.agents.agent_types.AgentAudio

( value samplerate = 16000 )

Audio type returned by the agent.

to_raw

( )

Returns the “raw” version of that object. It is a torch.Tensor object.

to_string

( )

Returns the stringified version of that object. In the case of an AgentAudio, it is a path to the serialized version of the audio.

< > Update on GitHub

←LLM을 최대한 활용하기 자동 클래스→