--- license: cc-by-nc-sa-4.0 language: - en library_name: transformers tags: - text-to-sql - text2sql - nlp2sql - nlp-to-sql - SQL --- # Model Card for text2sql LLM instruction finetuned for Text-to-SQL task. ## Model Details ### Model Description - **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) - **Funded by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) - **Shared by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) - **Model type:** LlamaForCausalLM - **Language(s) (NLP):** English - **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at contactus@dataeaze.io - **Finetuned from model :** [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) ## Uses ### Direct Use Model can be used a tool to convert queries in expressed in natural language (English) to SQL statements ### Downstream Use The model could be used as the initial stage in a data analytics / business intelligence application pipeline. ### Out-of-Scope Use Model has been fine tuned on a specific task of converting English language statements to SQL queries. Any use beyond this is not guaranteed to be accurate. ## Bias, Risks, and Limitations - **Bias:** Trained for English language only. - **Risk:** Guardrails are reliant on the base models CodeLlama (Llama2). Finetuning could impact this behaviour. - **Limitations:** Intended to be a small model optimised for inference. Does not provide SoTA results on accuracy. ## How to Get Started with the Model Use the code below to get started with the model. ``` import torch from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1", torch_dtype=torch.bfloat16, device_map='auto' ) tokenizer = AutoTokenizer.from_pretrained("dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1") # print("model device :", model.device) tokenizer.pad_token = tokenizer.eos_token model.eval() prompt = """ Below are sql tables schemas paired with instruction that describes a task. Using valid SQLite, write a response that appropriately completes the request for the provided tables. ### Instruction: How many transactions were made by a customer in a specific month? ### Database: RewardsProgramDB61 ### Input: CREATE SCHEMA RewardsProgram; CREATE TABLE Customer ( CustomerID INT NOT NULL AUTO_INCREMENT, FirstName VARCHAR(50) NOT NULL, LastName VARCHAR(50) NOT NULL, Email VARCHAR(100) UNIQUE NOT NULL, Phone VARCHAR(20) UNIQUE, DateOfBirth DATE, PRIMARY KEY (CustomerID) ); CREATE TABLE Membership ( MembershipID INT NOT NULL AUTO_INCREMENT, MembershipType VARCHAR(50) NOT NULL, DiscountPercentage DECIMAL(5, 2) NOT NULL, ValidFrom DATETIME, ValidTo DATETIME, CustomerID INT NOT NULL, PRIMARY KEY (MembershipID), FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) ); CREATE TABLE Transaction ( TransactionID INT NOT NULL AUTO_INCREMENT, TransactionDate TIMESTAMP, TotalAmount DECIMAL(10, 2) NOT NULL, CustomerID INT NOT NULL, PRIMARY KEY (TransactionID), FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) ); CREATE TABLE TransactionDetail ( TransactionDetailID INT NOT NULL AUTO_INCREMENT, TransactionID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, UnitPrice DECIMAL(10, 2) NOT NULL, PRIMARY KEY (TransactionDetailID), FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID), FOREIGN KEY (ProductID) REFERENCES Product(ProductID) ); CREATE TABLE Product ( ProductID INT NOT NULL AUTO_INCREMENT, ProductName VARCHAR(100) NOT NULL, UnitPrice DECIMAL(10, 2) NOT NULL, AvailableQuantity INT NOT NULL, CreatedDate DATETIME, PRIMARY KEY (ProductID) ); ALTER TABLE Membership ADD CONSTRAINT FK_Membership_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID); ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Transaction FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID); ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID);" """ input_ids = tokenizer(prompt, padding=True, return_tensors='pt') outputs = model.generate( input_ids=input_ids['input_ids'].to(model.device), attention_mask=input_ids['attention_mask'].to(model.device), max_new_tokens=3072, ) generated_query = tokenizer.decode(outputs[0], skip_special_tokens=True) print(generated_query) ``` ## Evaluation ### Testing Data & Metrics #### Testing Data [SPIDER dataset Test Set](https://yale-lily.github.io/spider) #### Metrics SQL queries are matched against the correct answer, with two types of evaluation * Execution with Values * Exact Set Match without Values ### Results ``` model-index: - name: dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql results: - task: type: text-to-sql dataset: name: SPIDER 1.0 type: text-to-sql metrics: - name: Execution with Values type: Execution with Values value: 64.3 - name: Exact Set Match without Values type: Exact Set Match without Values value: 29.6 source: name: Spider 1.0 - Leaderboard url: https://yale-lily.github.io/spider ``` ## Model Card Authors * Suyash Chougule * Chittaranjan Rathod * Sourabh Daptardar ## Model Card Contact "dataeaze systems"