{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Examples with Baseball\n",
    "\n",
    "Here we show SQL constructs using the Lahman baseball dataset (downloaded from https://github.com/jknecht/baseball-archive-sqlite). We also show how to use a SQL database inside a Jupyter notebook.\n",
    "\n",
    "First, we create a connection to the database. In this case, we are using a `SQLite` database. A good system to prototype database designs. To make the most of a database system, one would use some of the more powerful products: Oracle, Microsoft SQLServer, MySQL (MariaDB), PostgreSQL or other. In all cases, the way to create a connection to the server from Rmarkdown is the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import sqlite3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "con = sqlite3.connect(r'data/lahman2016.sqlite')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Select-From-Where\n",
    "\n",
    "First, we write a query to get batting statistics for Washington Nationals in 2010: note the table *rename*. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>playerID</th>\n",
       "      <th>yearID</th>\n",
       "      <th>H</th>\n",
       "      <th>AB</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>atilalu01</td>\n",
       "      <td>2010</td>\n",
       "      <td>1</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>balesco01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>batismi01</td>\n",
       "      <td>2010</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>bergmja01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>bernaro01</td>\n",
       "      <td>2010</td>\n",
       "      <td>102</td>\n",
       "      <td>414</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>bisenjo01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>brunebr01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>burkeja02</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>burnese01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>cappsma01</td>\n",
       "      <td>2010</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    playerID  yearID    H   AB\n",
       "0  atilalu01    2010    1   25\n",
       "1  balesco01    2010    0    0\n",
       "2  batismi01    2010    1    8\n",
       "3  bergmja01    2010    0    0\n",
       "4  bernaro01    2010  102  414\n",
       "5  bisenjo01    2010    0    0\n",
       "6  brunebr01    2010    0    0\n",
       "7  burkeja02    2010    0    0\n",
       "8  burnese01    2010    0    0\n",
       "9  cappsma01    2010    0    1"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.playerId, b.yearId, b.H, b.AB\n",
    "FROM BATTING AS b\n",
    "WHERE teamID = 'WAS' AND yearID = 2010\n",
    "\"\"\"\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Expressions\n",
    "\n",
    "The **select** clause can contain expressions (this is paralleled by the `assign` operation we saw previously)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>playerID</th>\n",
       "      <th>yearID</th>\n",
       "      <th>AB</th>\n",
       "      <th>BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>aardsda01</td>\n",
       "      <td>2004</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>aardsda01</td>\n",
       "      <td>2006</td>\n",
       "      <td>2</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>aardsda01</td>\n",
       "      <td>2007</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>aardsda01</td>\n",
       "      <td>2008</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>aardsda01</td>\n",
       "      <td>2009</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    playerID  yearID  AB   BP\n",
       "0  aardsda01    2004   0  NaN\n",
       "1  aardsda01    2006   2  0.0\n",
       "2  aardsda01    2007   0  NaN\n",
       "3  aardsda01    2008   1  0.0\n",
       "4  aardsda01    2009   0  NaN"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.playerId, b.yearId, b.AB, 1.0 * b.H / b.AB AS BP\n",
    "FROM BATTING AS b\n",
    "\"\"\"\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### WHERE predicates\n",
    "\n",
    "The **where** clause support a large number of different predicates and combinations thereof (this is parallel to the `query` operation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>playerID</th>\n",
       "      <th>yearID</th>\n",
       "      <th>teamID</th>\n",
       "      <th>AB</th>\n",
       "      <th>BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>abreubo01</td>\n",
       "      <td>2006</td>\n",
       "      <td>NYA</td>\n",
       "      <td>209</td>\n",
       "      <td>0.330144</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>abreubo01</td>\n",
       "      <td>2007</td>\n",
       "      <td>NYA</td>\n",
       "      <td>605</td>\n",
       "      <td>0.282645</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>abreubo01</td>\n",
       "      <td>2008</td>\n",
       "      <td>NYA</td>\n",
       "      <td>609</td>\n",
       "      <td>0.295567</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>aceveal01</td>\n",
       "      <td>2009</td>\n",
       "      <td>NYA</td>\n",
       "      <td>2</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>agbaybe01</td>\n",
       "      <td>2001</td>\n",
       "      <td>NYN</td>\n",
       "      <td>296</td>\n",
       "      <td>0.277027</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    playerID  yearID teamID   AB        BP\n",
       "0  abreubo01    2006    NYA  209  0.330144\n",
       "1  abreubo01    2007    NYA  605  0.282645\n",
       "2  abreubo01    2008    NYA  609  0.295567\n",
       "3  aceveal01    2009    NYA    2  0.000000\n",
       "4  agbaybe01    2001    NYN  296  0.277027"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.playerId, b.yearID, b. teamId, b.AB, 1.0 * b.H / b.AB AS BP\n",
    "FROM BATTING AS b\n",
    "WHERE b.AB > 0 AND\n",
    "  b.yearID > 2000 AND\n",
    "  b.yearID < 2010 AND \n",
    "  b.teamID LIKE 'NY%'\n",
    "\"\"\"\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ORDERING\n",
    "\n",
    "We can include ordering (parallel to `sort_values`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>playerID</th>\n",
       "      <th>yearID</th>\n",
       "      <th>teamID</th>\n",
       "      <th>AB</th>\n",
       "      <th>BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>soriaal01</td>\n",
       "      <td>2002</td>\n",
       "      <td>NYA</td>\n",
       "      <td>696</td>\n",
       "      <td>0.300287</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>reyesjo01</td>\n",
       "      <td>2005</td>\n",
       "      <td>NYN</td>\n",
       "      <td>696</td>\n",
       "      <td>0.272989</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>reyesjo01</td>\n",
       "      <td>2008</td>\n",
       "      <td>NYN</td>\n",
       "      <td>688</td>\n",
       "      <td>0.296512</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>soriaal01</td>\n",
       "      <td>2003</td>\n",
       "      <td>NYA</td>\n",
       "      <td>682</td>\n",
       "      <td>0.290323</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>reyesjo01</td>\n",
       "      <td>2007</td>\n",
       "      <td>NYN</td>\n",
       "      <td>681</td>\n",
       "      <td>0.280470</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    playerID  yearID teamID   AB        BP\n",
       "0  soriaal01    2002    NYA  696  0.300287\n",
       "1  reyesjo01    2005    NYN  696  0.272989\n",
       "2  reyesjo01    2008    NYN  688  0.296512\n",
       "3  soriaal01    2003    NYA  682  0.290323\n",
       "4  reyesjo01    2007    NYN  681  0.280470"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.playerId, b.yearID, b. teamId, b.AB, 1.0 * b.H / b.AB AS BP\n",
    "FROM BATTING AS b\n",
    "WHERE b.AB > 0 AND\n",
    "  b.yearID > 2000 AND\n",
    "  b.yearID < 2010 AND \n",
    "  b.teamID LIKE 'NY%'\n",
    "ORDER BY b.AB DESC, BP DESC\n",
    "\"\"\"\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Group_by and Summarize\n",
    "\n",
    "- What it does: Partition the tuples by the group attributes (*teamId* and *yearId* in this case), and do something (*compute avg* in this case) for each group\n",
    "- Number of resulting tuples == Number of groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>teamID</th>\n",
       "      <th>yearID</th>\n",
       "      <th>AVE_BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TBA</td>\n",
       "      <td>2007</td>\n",
       "      <td>0.289207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MIN</td>\n",
       "      <td>2008</td>\n",
       "      <td>0.287575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>SEA</td>\n",
       "      <td>2007</td>\n",
       "      <td>0.284548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SLN</td>\n",
       "      <td>2003</td>\n",
       "      <td>0.283298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MIN</td>\n",
       "      <td>2001</td>\n",
       "      <td>0.279198</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  teamID  yearID    AVE_BP\n",
       "0    TBA    2007  0.289207\n",
       "1    MIN    2008  0.287575\n",
       "2    SEA    2007  0.284548\n",
       "3    SLN    2003  0.283298\n",
       "4    MIN    2001  0.279198"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.teamId, b.yearId, avg(1.0 * b.H / b.AB) AS AVE_BP\n",
    "FROM BATTING AS b\n",
    "WHERE b.AB > 0 AND\n",
    "  b.yearID > 2000 AND\n",
    "  b.yearID < 2010\n",
    "GROUP BY b.teamId, b.yearId\n",
    "ORDER BY AVE_BP DESC\n",
    "\"\"\"\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For reference, this is how we would do this using \n",
    "the `pandas` operations we learned about previously."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "Batting\n",
    "    .assign(BP = 1.0 * Batting.H / Batting.AB)\n",
    "    .query('AB > 0 & ...')\n",
    "    .groupby(['teamId', 'yearId'])\n",
    "    .agg({'AB': ['mean']})\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subqueries\n",
    "\n",
    "Sometimes it's easier to nest queries like the one above into query and subquery"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>teamId</th>\n",
       "      <th>yearId</th>\n",
       "      <th>AVG_BP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>TBA</td>\n",
       "      <td>2007</td>\n",
       "      <td>0.289207</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>MIN</td>\n",
       "      <td>2008</td>\n",
       "      <td>0.287575</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>SEA</td>\n",
       "      <td>2007</td>\n",
       "      <td>0.284548</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>SLN</td>\n",
       "      <td>2003</td>\n",
       "      <td>0.283298</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>MIN</td>\n",
       "      <td>2001</td>\n",
       "      <td>0.279198</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  teamId  yearId    AVG_BP\n",
       "0    TBA    2007  0.289207\n",
       "1    MIN    2008  0.287575\n",
       "2    SEA    2007  0.284548\n",
       "3    SLN    2003  0.283298\n",
       "4    MIN    2001  0.279198"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT teamID, yearID, avg(BP) AS AVG_BP\n",
    "FROM (SELECT b.teamId, b.yearId, 1.0 * b.H / b.AB AS BP\n",
    "      FROM BATTING AS b\n",
    "      WHERE b.AB > 0 AND\n",
    "        b.yearID > 2000 AND\n",
    "        b.yearID < 2010)\n",
    "GROUP BY teamID, yearID\n",
    "ORDER BY AVG_BP DESC;\n",
    "\"\"\"\n",
    "\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Joins\n",
    "\n",
    "List all players from California, playing in 2015"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>playerID</th>\n",
       "      <th>teamID</th>\n",
       "      <th>birthState</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>alexasc01</td>\n",
       "      <td>KCA</td>\n",
       "      <td>CA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>anderbr05</td>\n",
       "      <td>OAK</td>\n",
       "      <td>CA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>anderco01</td>\n",
       "      <td>CLE</td>\n",
       "      <td>CA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>andrima01</td>\n",
       "      <td>TBA</td>\n",
       "      <td>CA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>arenano01</td>\n",
       "      <td>COL</td>\n",
       "      <td>CA</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    playerID teamID birthState\n",
       "0  alexasc01    KCA         CA\n",
       "1  anderbr05    OAK         CA\n",
       "2  anderco01    CLE         CA\n",
       "3  andrima01    TBA         CA\n",
       "4  arenano01    COL         CA"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "query = \"\"\"\n",
    "SELECT b.playerId, b.teamId, m.birthState\n",
    "FROM Batting as b JOIN master as m on b.playerId == m.playerId\n",
    "WHERE yearId = \"2015\" and m.birthState = \"CA\"\n",
    "\"\"\" \n",
    "\n",
    "df = pd.read_sql_query(query, con)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we close the connection to the database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "con.close()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}