Area of interest
- core engine performance
- self-driving DBs
- large language models
LLMs: Natural Language Interfaces for DB
NL to SQL
LLM→Candidate programs→executable candidates with semantic re-ranking→interleaved candidates
Filtering and Interleaving
NL2SQL:progress so far
- best of breed:
- competitive with state-of-the-art custom trained models
- GSL solution:
- Generic model(Codex)
- prompt-techniques
- top-3 interleaving techniques
Large Schemata
Ambiguity
KaggleDBQA dataset: 41.1% of the questions are ambiguous
- ambiguous mapping to the DB schema
- ambiguous mapping to the DB values
- Ambiguous NL
Agreement Rate
- Human annotators: 62%
- Gpt-3: 44%
- GPT-4: 65%
Responsible AI
- allowed context to use?
- Bias and racism