Relational structures such as schema linking and schema encoding have been validated as a key component to qualitatively translating nat- ural language into SQL queries. However, in- troducing these structural relations comes with prices: they often result in a specialized model structure, which largely prohibits using large pretrained models in text-to-SQL. To address this problem, we propose RASAT: a Trans- former seq2seq architecture augmented with relation-aware self-attention that could lever- age a variety of relational structures while in- heriting the pretrained parameters from the T5 model effectively. Our model can incorpo- rate almost all types of existing relations in the literature, and in addition, we propose in- troducing co-reference relations for the multi- turn scenario. Experimental results on three widely used text-to-SQL datasets, covering both single-turn and multi-turn scenarios, have shown that RASAT could achieve state-of-the- art results across all three benchmarks (75.5% EX on Spider, 52.6% IEX on SParC, and 37.4% IEX on CoSQL).
Graph Structure based method
Seq2seq Method by just fine tuning PLM
Can we combine the two approaches?
DS-1000 contains 1000 problems originating from 451 unique StackOverflow problems. To defend against potential memoriza- tion, more than half of the DS-1000 problems are modified from the original StackOverflow problems; they include 152 surface perturbations, 235 semantic perturbations, and 162 difficult rewrites.
Below are more DS-1000 examples. For each example, The model needs to fill in the code into “[insert]” in the prompt on the left; the code will then be executed to pass the multi-criteria automatic evaluation, which includes the test cases and the surface form constraints; a reference solution is provided at the bottom left.
NumPy example problem involving randomness, requiring the use of a specialist knowledge test.
For questions from StackOverflow, we equipped them with prompts, test cases, and evaluation functions and called them Origin. To prevent models simply recalling the solutions seen during pre-training, we have perturbed the questions in two ways: surface perturbations and semantic perturbations. To make DS-1000 more challenging, we additionally introduced Difficult Rewrite.
We also provided an Insertion style prompt and a Completion style prompt for each question (including perturbation).
Here is the performance of Codex-davinci-002 on DS-1000.
Here are more prompts, you can copy them and run the models on the playground of OpenAI.
Insertion:
Problem:
Let's say I have a 1d numpy positive integer array like this:
a = array([1,0,3])
I would like to encode this as a 2D one-hot array(for natural number)
b = array([[0,1,0,0], [1,0,0,0], [0,0,0,1]])
The leftmost element corresponds to 0 in `a`(NO MATTER whether 0 appears in `a` or not.), and the rightmost vice versa.
Is there a quick way to do this only using numpy? Quicker than just looping over a to set elements of b, that is.
A:
<code>
import numpy as np
a = np.array([1, 0, 3])
</code>
BEGIN SOLUTION
<code>
[insert]
</code>
END SOLUTION
<code>
print(b)
</code>
Table 4 compares DS-1000 to other datasets. Notably, the average of problem words in DS-1000 is much larger compared to other data science related datasets (e.g., DSP and CoNaLa).
More importantly, the problems in DS-1000 represent more diverse and naturalistic intent and context formats that cannot be seen in any other datasets.
Unlike generic Python code generation benchmarks (MBPP and HumanEval), we note that data science code generation benchmarks have fewer test cases since the annotators need to define program inputs with complex objects such as square matrices, classifiers, or dataframes than simple primitives, such as floats or lists.
Nevertheless, even a few test cases suffice for DS-1000 – only 1.8% of the Codex-002-predicted solutions accepted by our evaluation are incorrect.
Rank | Model | Details | Score |
---|---|---|---|
1 Nov 08, 2022 |
codex-davinci-002
OpenAI Chen et al., '21 |
Insertion
t=0.2, top-p=0.95 len = 1024 |
43.3 |
2 Nov 08, 2022 |
codex-davinci-002
OpenAI Chen et al., '21 |
Completion
t=0.2, top-p=0.95 len = 1024 |
39.2 |
3 Nov 08, 2022 |
codex-davinci-001
OpenAI Chen et al., '21 |
Completion
t=0.2, top-p=0.95 len = 1024 |
20.2 |
4 Nov 08, 2022 |
codex-cushman-001
OpenAI Chen et al., '21 |
Completion
t=0.2, top-p=0.95 len = 1024 |
18.1 |
5 Nov 08, 2022 |
CodeGen-6B
Salesforce Research Nijkamp et al., '22 |
Completion
t=0.2, top-p=0.95 len = 1024 |
8.4 |
6 Nov 08, 2022 |
InCoder-6B
Facebook AI Fried et al., '22 |
Insertion
t=0.2, top-p=0.95 len = 1024 |
7.5 |
7 Nov 08, 2022 |
InCoder-6B
Facebook AI Fried et al., '22 |
Completion
t=0.2, top-p=0.95 len = 1024 |
7.4 |
We thank Noah A. Smith, Tianbao Xie, Shuyang Jiang for their helpful feedback on this work.
@article{Lai2022DS1000,
title={RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL},
author={Yuhang Lai and Chengxi Li and Yiming Wang and Tianyi Zhang and Ruiqi Zhong and Luke Zettlemoyer and Scott Wen-tau Yih and Daniel Fried and Sida Wang and Tao Yu},
journal={ArXiv},
year={2022},
volume={abs/2211.11501}
}