Lecture 2: Modern SQL

1971, the first relational query language called SQUARE was created. 1972 SEQUEL (Structured English Query Language) was created. SQL was added to ANSI standard in 1986. Current standard is SQL:2023.

  • SQL:2023 - property graph queries, multi-dim arrays
  • SQL:2016 - JSON, polymorphic tables etc.

Relational languages:

  • Data Manipulation Langauge (DML)
  • Data Definition Language (DDL)
  • Data Control Language (DCL)

Important: (with duplicates) not sets (no duplicates).

We should try to do everything on the database, in one big query.

Example database:

  • student table: sid, login, gpa etc.
  • enrolled table: sid, cid (course id), grade
  • course table: cid, name

Aggregates:

  • AVG(col). e.g. SELECT AVG(s.gpa) FROM student as s
  • MIN(col)
  • MAX(col)
  • COUNT(col). e.g. SELECT COUNT(LOGIN) as cnt FROM student WHERE login LIKE '%@cs. Equivalently, COUNT(1). Count number of rows where their login matches the pattern.

Groupby: Get average gpa by course id.

SELECT AVG(s.gpa), e.cid
    FROM enrolled as e JOIN student AS s
    ON e.sid = s.sid
GROUP BY e.cid

String operations.

  • LIKE is used for string matching. % matches any substring, including empty strings. _ matches any one character.
  • SQL-92 defines string functions.

Window functions. Perform a sliding calculation across a set of tuples that are related. Like an aggregation but tuples are not grouped into a single output tuple.

SELECT ... FUNC-NAME(...) OVER (...)
    FROM TABLE_NAME

e.g. Get row number per course id.

SELECT cid, sid
    ROW_NUMBER() OVER (PARTITION BY cid)
    FROM enrolled
ORDER BY cid

Nested queries. Invoke a query inside of another query to compose more complex computations. These are often difficult for the DBMS to optimize. e.g. This one below is a join written as a nested query.

outer query ->    SELECT name FROM student WHERE
                    sid IN (SELECT sid FROM enrolled) <- inner query

e.g. Get the names of students in '15-445':

SELECT name FROM student
    WHERE sid IN (
        SELECT sid FROM enrolled
        WHERE cid = '15-445'
    )

Lateral joins. LATERAL operator allows us to reference tables preceding the expression. e.g. below, the second expression can reference t1.

SELECT * FROM (SELECT 1 AS X) AS t1,
    LATERAL (SELECT t1.x+1 AS y) AS t2;

Common Table Expressions. Provides a way to write auxiliary statements for use in a larger query, i.e. a temp table just for one query.

WITH cteName (col1, col2) as (
    SELECT 1, 2
)
SELECT col1 + col2 FROM cteName

Demonstration of CTEs: Find student record with the highest id that is enrolled in at least one course. We use the maxId in the temporary table below.

WITH cteCourse (maxId) AS (
    SELECT MAX(sid) FROM enrolled
)
SELECT name FROM student, cteSource
    WHERE student.sid = cteSource.maxId

We can also use recursion with CTE. Print the sequence of numbers from 1 to 10.

WITH RECURSIVE cteSource (counter) AS (
    (SELECT 1)
    UNION
    (SELECT counter + 1 FROM cteSource
    WHERE counter < 10)
)
SELECT * FROM cteSource