Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

GPT-J-3.48B-Kazakh

Llama Model Logo

Kazakh Language GPT-J-3.48B

General-purpose Kazakh Language Model

Architecture: GPTJForCausalLM
Tokenizer: retrained GPT2Tokenizer (Vocabulary size: 50,400, Model Max Length: 2048)

Overview

This model is a Kazakh language variant of the GPT-J-3.48B architecture, designed for general-purpose language modeling tasks. It has been trained on a diverse set of Kazakh language texts and is intended to support various natural language processing applications in the Kazakh language.

Usage Example

The model can be used with the Hugging Face Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained("nur-dev/gpt-j-3.4B-kaz")
tokenizer = AutoTokenizer.from_pretrained("nur-dev/gpt-j-3.4B-kaz")
model.eval()

Training Details

The model is being trained using the DeepSpeed library with Zero Optimization Stage 2. During the training process, zero optimization is applied at stage 2, with the optimizer offloaded to the CPU and pin memory enabled. The training also includes allgather partitions with a bucket size of 200M, overlap communication, reduce scatter, an automatic reduce bucket size, and the use of contiguous gradients. Hardware: 4 NVIDIA A100 GPUs (40GB each) Training Steps: Approximately 180,000 (ongoing) Epochs: 1(ongoing) Batch Size: 2 per device (for both training and evaluation) Gradient Accumulation Steps: 4 Learning Rate: 5e-5 Weight Decay: 0.05 Learning Rate Scheduler: Cosine with Restarts Warmup Steps: 15,000 Checkpointing Steps: Every 10,000 steps

Model Authors

Name: Kadyrbek Nurgali

Downloads last month
6
Safetensors
Model size
3.48B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.