# 12 SQL I: Single Table Queries

The Structured-Query-Language (SQL) is the predominant language used in database systems. It is tailored to the Relational data representation model.

SQL is a declarative language, we don’t write a procedure to compute a relation, we declare what the relation we want to compute looks like. The actual execution is determined and optimized by the database engine. However, there are clear mappings between parts of SQL queries and the operations we have defined so far as implemented in the tidyverse.

The basic construct in SQL is the so-called SFW construct: select-from-where which specifies:

• select: which attributes you want the answer to have
• from: which relation (table) you want the answer to be computed from
• where: what conditions you want to be satisfied by the rows (tuples) of the answer

E.g.: movies produced by disney in 1990: note the rename

select m.title, m.year
from movie m
where m.studioname = 'disney' and m.year = 1990

The select clause can contain expressions (this is paralleled by the mutate operation we saw previously)

• select title || ' (' || to_char(year) || ')' as titleyear
• select 2014 - year

The where clause support a large number of different predicates and combinations thereof (this is parallel to the filter operation)

• year between 1990 and 1995
• title like 'star wars%' title like 'star wars _'

We can include ordering, e.g., find distinct movies sorted by title

select distinct title
from movie
where studioname = 'disney' and year = 1990
order by title;

## 12.1 Group-by and summarize

SQL has an idiom for grouping and summarizing (conditioning as we called it before). Remember this is a very important concept that shows up in many data processing platforms

• What it does: Partition the tuples by the group attributes (year in this case), and do something (compute avg in this case) for each group
• Number of resulting tuples == Number of groups

E.g., compute the average movie length by year

select name, avg(length)
from movie
group by year

## 12.2 Subqueries

You can nest queries as an expression in an SFW query. We refer to these “subqueries” as “nested subquery”:

E.g., find movie with the maximum length

select title, year
from movie
where movie.length = (select max(length) from movie);

E.g., find movies with at least 5 stars: an example of a correlated subquery

select *
from movies m
where 5 >= (select count(*)
from starsIn si
where si.title = m.title and si.year = m.year);

The nested subquery counts the number of actors for that movie.

E.g., rank movies by their length.

select title, year, (select count(*)
from movies m2
where m1.length <= m2.length) as rank
from movies m1;

Key insight: A movie is ranked 5th if there are exactly 4 movies with longer length. Most database systems support some sort of a rank keyword for doing this. Notice that the above query doesn’t work in presence of ties etc.