Abstract
Language models have shown effectiveness in a variety of software applications, particularly in tasks related to automatic workflow. These models possess the crucial ability to call functions, which is essential in creating AI agents. Despite the high performance of large-scale language models in cloud environments, they are often associated with concerns over privacy and cost. Current on-device models for function calling face issues with latency and accuracy. Our research presents a new method that empowers an on-device model with 2 billion parameters to surpass the performance of GPT-4 in both accuracy and latency, and decrease the context length by 95\%. When compared to Llama-7B with a RAG-based function calling mechanism, our method enhances latency by 35-fold. This method reduces the latency to levels deemed suitable for deployment across a variety of edge devices in production environments, aligning with the performance requisites for real-world applications.
Community
Very interesting π
The single symbol function name is a neat little trick, feels so obvious in hindsight.
It would be very interesting to see just how much information we can encapsulate in a single symbol... And then re-use those symbols to increase throughput
A kind of meta-language that represents a topology of layers of abstraction, on top of layers of abstraction
Each time a new concept is learned, it gets a symbol, and that symbol is then used to further train the model. This new alphabet would effectively represent a map of the knowledge in the model.
Kind of like database normalisation for embeddings.
π€―
I think the idea of using special tokens makes a lot of sense. I think we under appreciate the power and expressiveness of token-space in LLMs.
If you look at techniques LlaVa, ViTs need registers and Prompt Fine Tuning, all of these effectively hack the expressiveness of token-space. With long-context the opportunity to use token-space is larger. If you look at models like Bert, almost 30% of the model is embedding weight, but unlike most layers the ability to add just a single new token can be extremely efficient and extremely powerful. I think as a research community there is a lot of exciting stuff on the horizon here.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following (2024)
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (2024)
- AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls (2024)
- SwissNYF: Tool Grounded LLM Agents for Black Box Setting (2024)
- A Survey of using Large Language Models for Generating Infrastructure as Code (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Will the dataset for this be released in full as open source?
Octopus v2: Revolutionizing On-Device AI for Super Agents!
Links π:
π Subscribe: https://www.youtube.com/@Arxflix
π Twitter: https://x.com/arxflix
π LMNT (Partner): https://lmnt.com/
Models citing this paper 4
Datasets citing this paper 0
No dataset linking this paper