13 59 118

Kaito Sugimoto

kaisugi

https://kaisugi.me

kaisugi

AI & ML interests

Japanese LLMs

Recent Activity

liked a Space about 24 hours ago

llm-jp/open-japanese-llm-leaderboard

replied to AkimfromParis's post 1 day ago

reacted to AkimfromParis's post with 👍 1 day ago

Organizations

Posts 5

Post

735

🚀 Llama-3-ELYZA-JP-8B

ELYZA, Inc. has developed two large language models (LLMs) for Japanese called "Llama-3-ELYZA-JP-70B" with 70 billion parameters and "Llama-3-ELYZA-JP-8B" with 8 billion parameters, based on Meta's "Llama 3" series. These models have been fine-tuned through additional pre-training and post-training to improve Japanese language capabilities significantly.

Key Points:

Performance:
- Llama-3-ELYZA-JP-70B surpasses global models such as GPT-4, Claude 3 Sonnet, and Gemini 1.5 Flash.
- Llama-3-ELYZA-JP-8B matches models like GPT-3.5 Turbo and Claude 3 Haiku despite having fewer parameters.

Availability:
- The 8B model is available on Hugging Face Hub and can be used for both research and commercial purposes under the Llama 3 Community License.

Methodology:
- ELYZA enhanced the Japanese performance of the Llama 3 models through additional training with high-quality Japanese corpora and Instruction Tuning with proprietary datasets.

Benchmarks:
- Evaluations using ELYZA Tasks 100 and Japanese MT-Bench showed significant improvements in Japanese language generation.

Inference Speed:
- To address inference speed issues due to model size, ELYZA implemented Speculative Decoding, which achieved up to 1.6 times faster inference for the 70B model.

Overall, ELYZA's models demonstrate state-of-the-art performance in Japanese language tasks and are optimized for both efficiency and effectiveness.

Model URL:
- elyza/Llama-3-ELYZA-JP-8B
- elyza/Llama-3-ELYZA-JP-8B-AWQ
- elyza/Llama-3-ELYZA-JP-8B-GGUF

Blog post (in Japanese):
https://note.com/elyza/n/n360b6084fdbd

Post

616

🚀 KARAKURI LM 8x7B Instruct v0.1

KARAKURI Inc. has publicly released "KARAKURI LM 8x7B Instruct v0.1", the first domestic Large Language Model (LLM) in Japan to support Function calling and Retrieval-Augmented Generation (RAG). This AI agent can handle tasks across various applications autonomously, significantly reducing implementation costs compared to traditional models.

Model Features:
- Capable of autonomously choosing optimal documents and databases for various tasks.
- Applied extensively in customer support for automating responses and processes, analyzing Voice of Customer (VoC), and predicting optimal outreach timings.

Model URL:
karakuri-ai/karakuri-lm-8x7b-instruct-v0.1

Detailed press release (in Japanese):
https://karakuri.ai/seminar/news/karakuri-lm-8x7b-instruct-v0-1/

View all posts

Collections 1

spaces 3

Sleeping

💻

models 9

datasets

None public yet

Kaito Sugimoto

AI & ML interests

Recent Activity

Organizations

Posts 5

Collections 1

PLaMo-100B: A Ground-Up Language Model Designed for Japanese Proficiency

LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

Tagengo: A Multilingual Chat Dataset

Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities

spaces 3

NLP2024 Title Search

Academic Paraphraser

NLP2023 Title Search

models 9

kaisugi/anlp_embedding_model

kaisugi/scitoricsbert_meanpooling

kaisugi/scitoricsbert

kaisugi/BERTRanker_CiRec_RefSeer_global

kaisugi/BERTRanker_CiRec_RefSeer

kaisugi/BERTRanker_CiRec_ACL600_global

kaisugi/BERTRanker_CiRec_ACL600

kaisugi/BERTRanker_CiRec_ACL200_global

kaisugi/BERTRanker_CiRec_ACL200

datasets

Kaito Sugimoto

AI & ML interests

Recent Activity

Organizations

Posts 5

Collections 1

spaces 3 Sort: Recently updated

NLP2024 Title Search

Academic Paraphraser

NLP2023 Title Search

models 9 Sort: Recently updated

datasets

spaces 3

models 9