|
--- |
|
license: cc-by-nc-sa-4.0 |
|
language: |
|
- en |
|
library_name: transformers |
|
tags: |
|
- text-to-sql |
|
- text2sql |
|
- nlp2sql |
|
- nlp-to-sql |
|
- SQL |
|
--- |
|
# Model Card for text2sql |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
LLM instruction finetuned for Text-to-SQL task. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Developed by:** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Funded by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Shared by :** [dataeaze systems pvt ltd](https://www.dataeaze.io/) |
|
- **Model type:** LlamaForCausalLM |
|
- **Language(s) (NLP):** English |
|
- **License:** [cc-by-nc-sa-4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en) Model is made available under non-commercial use for research purposes only. For commercial usage please connect at [email protected] |
|
- **Finetuned from model :** [CodeLlama-7b-Instruct-hf](https://huggingface.co/codellama/CodeLlama-7b-Instruct-hf) |
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
Model can be used a tool to convert queries in expressed in natural language (English) to SQL statements |
|
|
|
|
|
### Downstream Use |
|
|
|
<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> |
|
The model could be used as the initial stage in a data analytics / business intelligence application pipeline. |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
Model has been fine tuned on a specific task of converting English language statements to SQL queries. |
|
Any use beyond this is not guaranteed to be accurate. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
<!-- This section is meant to convey both technical and sociotechnical limitations. --> |
|
|
|
- **Bias:** Trained for English language only. |
|
- **Risk:** Guardrails are reliant on the base models CodeLlama (Llama2). Finetuning could impact this behaviour. |
|
- **Limitations:** Intended to be a small model optimised for inference. Does not provide SoTA results on accuracy. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
``` |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
"dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1", |
|
torch_dtype=torch.bfloat16, |
|
device_map='auto' |
|
) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("dataeaze/dataeaze-text2sql-codellama_7b_instruct-clinton_text_to_sql_v1") |
|
# print("model device :", model.device) |
|
tokenizer.pad_token = tokenizer.eos_token |
|
model.eval() |
|
|
|
prompt = """ Below are sql tables schemas paired with instruction that describes a task. |
|
Using valid SQLite, write a response that appropriately completes the request for the provided tables. |
|
### Instruction: How many transactions were made by a customer in a specific month? |
|
### Database: RewardsProgramDB61 |
|
### Input: |
|
CREATE SCHEMA RewardsProgram; |
|
|
|
CREATE TABLE Customer ( |
|
CustomerID INT NOT NULL AUTO_INCREMENT, |
|
FirstName VARCHAR(50) NOT NULL, |
|
LastName VARCHAR(50) NOT NULL, |
|
Email VARCHAR(100) UNIQUE NOT NULL, |
|
Phone VARCHAR(20) UNIQUE, |
|
DateOfBirth DATE, |
|
PRIMARY KEY (CustomerID) |
|
); |
|
|
|
CREATE TABLE Membership ( |
|
MembershipID INT NOT NULL AUTO_INCREMENT, |
|
MembershipType VARCHAR(50) NOT NULL, |
|
DiscountPercentage DECIMAL(5, 2) NOT NULL, |
|
ValidFrom DATETIME, |
|
ValidTo DATETIME, |
|
CustomerID INT NOT NULL, |
|
PRIMARY KEY (MembershipID), |
|
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) |
|
); |
|
|
|
CREATE TABLE Transaction ( |
|
TransactionID INT NOT NULL AUTO_INCREMENT, |
|
TransactionDate TIMESTAMP, |
|
TotalAmount DECIMAL(10, 2) NOT NULL, |
|
CustomerID INT NOT NULL, |
|
PRIMARY KEY (TransactionID), |
|
FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID) |
|
); |
|
|
|
CREATE TABLE TransactionDetail ( |
|
TransactionDetailID INT NOT NULL AUTO_INCREMENT, |
|
TransactionID INT NOT NULL, |
|
ProductID INT NOT NULL, |
|
Quantity INT NOT NULL, |
|
UnitPrice DECIMAL(10, 2) NOT NULL, |
|
PRIMARY KEY (TransactionDetailID), |
|
FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID), |
|
FOREIGN KEY (ProductID) REFERENCES Product(ProductID) |
|
); |
|
|
|
CREATE TABLE Product ( |
|
ProductID INT NOT NULL AUTO_INCREMENT, |
|
ProductName VARCHAR(100) NOT NULL, |
|
UnitPrice DECIMAL(10, 2) NOT NULL, |
|
AvailableQuantity INT NOT NULL, |
|
CreatedDate DATETIME, |
|
PRIMARY KEY (ProductID) |
|
); |
|
|
|
ALTER TABLE Membership ADD CONSTRAINT FK_Membership_Customer FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID); |
|
|
|
ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Transaction FOREIGN KEY (TransactionID) REFERENCES Transaction(TransactionID); |
|
|
|
ALTER TABLE TransactionDetail ADD CONSTRAINT FK_TransactionDetail_Product FOREIGN KEY (ProductID) REFERENCES Product(ProductID);" |
|
""" |
|
|
|
input_ids = tokenizer(prompt, padding=True, return_tensors='pt') |
|
outputs = model.generate( |
|
input_ids=input_ids['input_ids'].to(model.device), |
|
attention_mask=input_ids['attention_mask'].to(model.device), |
|
max_new_tokens=3072, |
|
) |
|
|
|
generated_query = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
print(generated_query) |
|
|
|
|
|
``` |
|
|
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data & Metrics |
|
|
|
#### Testing Data |
|
|
|
<!-- This should link to a Dataset Card if possible. --> |
|
|
|
[SPIDER dataset Test Set](https://yale-lily.github.io/spider) |
|
|
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
|
|
SQL queries are matched against the correct answer, with two types of evaluation |
|
* Execution with Values |
|
* Exact Set Match without Values |
|
|
|
### Results |
|
|
|
``` |
|
model-index: |
|
- name: dataeaze/dataeaze-text2sql-codellama_7b_instruct-dzsql |
|
results: |
|
- task: |
|
type: text-to-sql |
|
dataset: |
|
name: SPIDER 1.0 |
|
type: text-to-sql |
|
metrics: |
|
- name: Execution with Values |
|
type: Execution with Values |
|
value: 64.3 |
|
- name: Exact Set Match without Values |
|
type: Exact Set Match without Values |
|
value: 29.6 |
|
source: |
|
name: Spider 1.0 - Leaderboard |
|
url: https://yale-lily.github.io/spider |
|
``` |
|
|
|
|
|
## Model Card Authors |
|
|
|
* Suyash Chougule |
|
* Chittaranjan Rathod |
|
* Sourabh Daptardar |
|
|
|
## Model Card Contact |
|
|
|
"dataeaze systems" <[email protected]> |