Generative AI research at GSL (1)

Area of interest

  • core engine performance
  • self-driving DBs
  • large language models

LLMs: Natural Language Interfaces for DB

NL to SQL

LLM→Candidate programs→executable candidates with semantic re-ranking→interleaved candidates

Filtering and Interleaving

NL2SQL:progress so far

  • best of breed:
    • competitive with state-of-the-art custom trained models
  • GSL solution:
    • Generic model(Codex)
    • prompt-techniques
    • top-3 interleaving techniques

Large Schemata

Ambiguity

KaggleDBQA dataset: 41.1% of the questions are ambiguous
  1. ambiguous mapping to the DB schema
  1. ambiguous mapping to the DB values
  1. Ambiguous NL
Agreement Rate
  • Human annotators: 62%
  • Gpt-3: 44%
  • GPT-4: 65%

Responsible AI

  • allowed context to use?
  • Bias and racism