{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "05195b87-9d49-4afd-9a01-6185e7391918",
   "metadata": {},
   "source": [
    "# Getting Started\n",
    "Welcome to the quickstart guide for OpenPoliceData (OPD)! Here, you should find all you need to learn the basics of OPD.\n",
    "\n",
    "* **New to Python?**: Check out the free [first python notebook](https://firstpythonnotebook.org/) course and/or the VS Code Python [Quick Start Guide](https://code.visualstudio.com/docs/python/python-quick-start) and [Tutorial](https://code.visualstudio.com/docs/python/python-tutorial).\n",
    "\n",
    "* **Questions or Comments?**: If you questions or comments about anything related to installing or using OPD, please reach out on our [discussion board](https://github.com/openpolicedata/openpolicedata/discussions).\n",
    "\n",
    "\n",
    "## Installation\n",
    "Install OPD with pip from [PyPI](https://pypi.org/project/openpolicedata/)\n",
    "\n",
    "```bash\n",
    "pip install openpolicedata\n",
    "```\n",
    "For installation in a Jupyter Notebook, replace `pip` with `%pip`. \n",
    "\n",
    "See [here](installation.rst) for advanced installation including how to install [GeoPandas](https://geopandas.org/en/stable/index.html) alongside OPD to enable geospatial analysis of data loaded by OPD.\n",
    "\n",
    "## Import\n",
    "To use OPD, you must always start by importing it into your Python code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "6eb80fff",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# This cell should have \"nbsphinx\": \"hidden\" in its metadata and not be included in the documentation!\n",
    "import sys\n",
    "sys.path.append(\"../../../\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "ac629e9e-439d-4197-a88c-f467c7cfb7a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import openpolicedata as opd"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9b736829-ac81-4629-bf72-f83f79561e91",
   "metadata": {},
   "source": [
    "We recommend shortening openpolicedata to `opd` to make your code more readable. \n",
    "\n",
    "## The Basics\n",
    "OPD provides access to over 300 police datasets with just 2 simple lines of code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "ebfaf449",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "                                                                                                                       \r"
     ]
    }
   ],
   "source": [
    "# Load traffic stops data from Lousiville for the year 2022.\n",
    "src = opd.Source(\"Louisville\")\n",
    "tbl = src.load(table_type=\"TRAFFIC STOPS\", year=2022)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "78547a7e",
   "metadata": {},
   "source": [
    "The table attribute contains the loaded data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) so it can be analyzed with [pandas' simple and powerful capabilities](https://pandas.pydata.org/docs/user_guide/10min.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "6051c91f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>TYPE_OF_STOP</th>\n",
       "      <th>CITATION_CONTROL_NUMBER</th>\n",
       "      <th>ACTIVITY_RESULTS</th>\n",
       "      <th>OFFICER_GENDER</th>\n",
       "      <th>OFFICER_RACE</th>\n",
       "      <th>OFFICER_AGE_RANGE</th>\n",
       "      <th>ACTIVITY_DATE</th>\n",
       "      <th>ACTIVITY_TIME</th>\n",
       "      <th>ACTIVITY_LOCATION</th>\n",
       "      <th>ACTIVITY_DIVISION</th>\n",
       "      <th>ACTIVITY_BEAT</th>\n",
       "      <th>DRIVER_GENDER</th>\n",
       "      <th>DRIVER_RACE</th>\n",
       "      <th>DRIVER_AGE_RANGE</th>\n",
       "      <th>NUMBER_OF_PASSENGERS</th>\n",
       "      <th>WAS_VEHCILE_SEARCHED</th>\n",
       "      <th>REASON_FOR_SEARCH</th>\n",
       "      <th>ObjectId</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>COMPLAINT/CRIMINAL VIOLATION</td>\n",
       "      <td>DU03293</td>\n",
       "      <td>CITATION ISSUED</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>21 - 30</td>\n",
       "      <td>01/02/2022</td>\n",
       "      <td>21:44</td>\n",
       "      <td>M ST                                          ...</td>\n",
       "      <td>4TH DIVISION</td>\n",
       "      <td>BEAT 4</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>26 - 30</td>\n",
       "      <td>2</td>\n",
       "      <td>YES</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>COMPLAINT/CRIMINAL VIOLATION</td>\n",
       "      <td>DV75866</td>\n",
       "      <td>CITATION ISSUED</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>51 - 60</td>\n",
       "      <td>07/21/2022</td>\n",
       "      <td>02:00</td>\n",
       "      <td>KEEGAN WAY                                    ...</td>\n",
       "      <td>7TH DIVISION</td>\n",
       "      <td>BEAT 1</td>\n",
       "      <td>M</td>\n",
       "      <td>HISPANIC</td>\n",
       "      <td>16 - 19</td>\n",
       "      <td>1</td>\n",
       "      <td>YES</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>COMPLAINT/CRIMINAL VIOLATION</td>\n",
       "      <td>DV87754</td>\n",
       "      <td>CITATION ISSUED</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>51 - 60</td>\n",
       "      <td>07/21/2022</td>\n",
       "      <td>02:00</td>\n",
       "      <td>KEEGAN WAY                                    ...</td>\n",
       "      <td>7TH DIVISION</td>\n",
       "      <td>BEAT 1</td>\n",
       "      <td>M</td>\n",
       "      <td>HISPANIC</td>\n",
       "      <td>16 - 19</td>\n",
       "      <td>1</td>\n",
       "      <td>NO</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>COMPLAINT/CRIMINAL VIOLATION</td>\n",
       "      <td>DW19051</td>\n",
       "      <td>CITATION ISSUED</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>21 - 30</td>\n",
       "      <td>01/25/2022</td>\n",
       "      <td>11:23</td>\n",
       "      <td>4500 BLOCK  SOUTHERN PKWY</td>\n",
       "      <td>4TH DIVISION</td>\n",
       "      <td>BEAT 6</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>20 - 25</td>\n",
       "      <td>0</td>\n",
       "      <td>YES</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>COMPLAINT/CRIMINAL VIOLATION</td>\n",
       "      <td>DX65321</td>\n",
       "      <td>CITATION ISSUED</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>31 - 40</td>\n",
       "      <td>01/13/2022</td>\n",
       "      <td>05:30</td>\n",
       "      <td>PRESTON HWY @ OUTER LOOP                      ...</td>\n",
       "      <td>7TH DIVISION</td>\n",
       "      <td>BEAT 6</td>\n",
       "      <td>M</td>\n",
       "      <td>WHITE</td>\n",
       "      <td>51 - 60</td>\n",
       "      <td>1</td>\n",
       "      <td>YES</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   TYPE_OF_STOP CITATION_CONTROL_NUMBER ACTIVITY_RESULTS  \\\n",
       "0  COMPLAINT/CRIMINAL VIOLATION            DU03293       CITATION ISSUED   \n",
       "1  COMPLAINT/CRIMINAL VIOLATION            DV75866       CITATION ISSUED   \n",
       "2  COMPLAINT/CRIMINAL VIOLATION            DV87754       CITATION ISSUED   \n",
       "3  COMPLAINT/CRIMINAL VIOLATION            DW19051       CITATION ISSUED   \n",
       "4  COMPLAINT/CRIMINAL VIOLATION            DX65321       CITATION ISSUED   \n",
       "\n",
       "  OFFICER_GENDER OFFICER_RACE OFFICER_AGE_RANGE ACTIVITY_DATE ACTIVITY_TIME  \\\n",
       "0              M        WHITE           21 - 30    01/02/2022         21:44   \n",
       "1              M        WHITE           51 - 60    07/21/2022         02:00   \n",
       "2              M        WHITE           51 - 60    07/21/2022         02:00   \n",
       "3              M        WHITE           21 - 30    01/25/2022         11:23   \n",
       "4              M        WHITE           31 - 40    01/13/2022         05:30   \n",
       "\n",
       "                                   ACTIVITY_LOCATION ACTIVITY_DIVISION  \\\n",
       "0  M ST                                          ...      4TH DIVISION   \n",
       "1  KEEGAN WAY                                    ...      7TH DIVISION   \n",
       "2  KEEGAN WAY                                    ...      7TH DIVISION   \n",
       "3                          4500 BLOCK  SOUTHERN PKWY      4TH DIVISION   \n",
       "4  PRESTON HWY @ OUTER LOOP                      ...      7TH DIVISION   \n",
       "\n",
       "  ACTIVITY_BEAT DRIVER_GENDER DRIVER_RACE DRIVER_AGE_RANGE  \\\n",
       "0        BEAT 4             M       WHITE          26 - 30   \n",
       "1        BEAT 1             M    HISPANIC          16 - 19   \n",
       "2        BEAT 1             M    HISPANIC          16 - 19   \n",
       "3        BEAT 6             M       WHITE          20 - 25   \n",
       "4        BEAT 6             M       WHITE          51 - 60   \n",
       "\n",
       "   NUMBER_OF_PASSENGERS WAS_VEHCILE_SEARCHED  REASON_FOR_SEARCH  ObjectId  \n",
       "0                     2                  YES                  0         1  \n",
       "1                     1                  YES                  4         2  \n",
       "2                     1                   NO                  0         3  \n",
       "3                     0                  YES                  4         4  \n",
       "4                     1                  YES                  3         5  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# View the 1st 5 rows with pandas' head function\n",
    "tbl.table.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "262b0614",
   "metadata": {},
   "source": [
    "## Finding Datasets\n",
    "OPD provides the `datasets` module for querying what datasets are available in OPD. To get all available datasets, query the source table with no inputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c39721a9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>SourceName</th>\n",
       "      <th>Agency</th>\n",
       "      <th>AgencyFull</th>\n",
       "      <th>TableType</th>\n",
       "      <th>coverage_start</th>\n",
       "      <th>coverage_end</th>\n",
       "      <th>last_coverage_check</th>\n",
       "      <th>Description</th>\n",
       "      <th>source_url</th>\n",
       "      <th>readme</th>\n",
       "      <th>URL</th>\n",
       "      <th>Year</th>\n",
       "      <th>DataType</th>\n",
       "      <th>date_field</th>\n",
       "      <th>dataset_id</th>\n",
       "      <th>agency_field</th>\n",
       "      <th>min_version</th>\n",
       "      <th>query</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler Police Department</td>\n",
       "      <td>ARRESTS</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2024-01-27</td>\n",
       "      <td>01/28/2024</td>\n",
       "      <td>Arrest reports completed by a Chandler Police ...</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/arrest-boo...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/arrest-boo...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>CSV</td>\n",
       "      <td>arrest_date_time</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>0.2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2024-01-27</td>\n",
       "      <td>01/28/2024</td>\n",
       "      <td>This dataset contains details for all of the c...</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/calls-for-...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/calls-for-...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>CSV</td>\n",
       "      <td>call_received_date_time</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler</td>\n",
       "      <td>Chandler Police Department</td>\n",
       "      <td>INCIDENTS</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2024-01-21</td>\n",
       "      <td>01/28/2024</td>\n",
       "      <td>This dataset contains details for all of the g...</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/general-of...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.chandlerpd.com/catalog/general-of...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>CSV</td>\n",
       "      <td>report_event_date</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>0.4.1</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>Gilbert</td>\n",
       "      <td>Gilbert</td>\n",
       "      <td>Gilbert Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2006-11-15</td>\n",
       "      <td>2024-01-27</td>\n",
       "      <td>01/28/2024</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.gilbertaz.gov/maps/2dcb4c20c9a444...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://maps.gilbertaz.gov/arcgis/rest/service...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>EventDate</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Arizona</td>\n",
       "      <td>Gilbert</td>\n",
       "      <td>Gilbert</td>\n",
       "      <td>Gilbert Police Department</td>\n",
       "      <td>EMPLOYEE</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaT</td>\n",
       "      <td>07/06/2023</td>\n",
       "      <td>A data set of all employees that have previous...</td>\n",
       "      <td>https://data.gilbertaz.gov/datasets/TOG::gilbe...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://services1.arcgis.com/JLuzSHjNrLL4Okwb/...</td>\n",
       "      <td>NONE</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     State SourceName    Agency                  AgencyFull  \\\n",
       "0  Arizona   Chandler  Chandler  Chandler Police Department   \n",
       "1  Arizona   Chandler  Chandler  Chandler Police Department   \n",
       "2  Arizona   Chandler  Chandler  Chandler Police Department   \n",
       "3  Arizona    Gilbert   Gilbert   Gilbert Police Department   \n",
       "4  Arizona    Gilbert   Gilbert   Gilbert Police Department   \n",
       "\n",
       "           TableType coverage_start coverage_end last_coverage_check  \\\n",
       "0            ARRESTS     2018-01-01   2024-01-27          01/28/2024   \n",
       "1  CALLS FOR SERVICE     2018-01-01   2024-01-27          01/28/2024   \n",
       "2          INCIDENTS     2018-01-01   2024-01-21          01/28/2024   \n",
       "3  CALLS FOR SERVICE     2006-11-15   2024-01-27          01/28/2024   \n",
       "4           EMPLOYEE            NaT          NaT          07/06/2023   \n",
       "\n",
       "                                         Description  \\\n",
       "0  Arrest reports completed by a Chandler Police ...   \n",
       "1  This dataset contains details for all of the c...   \n",
       "2  This dataset contains details for all of the g...   \n",
       "3                                               <NA>   \n",
       "4  A data set of all employees that have previous...   \n",
       "\n",
       "                                          source_url readme  \\\n",
       "0  https://data.chandlerpd.com/catalog/arrest-boo...   <NA>   \n",
       "1  https://data.chandlerpd.com/catalog/calls-for-...   <NA>   \n",
       "2  https://data.chandlerpd.com/catalog/general-of...   <NA>   \n",
       "3  https://data.gilbertaz.gov/maps/2dcb4c20c9a444...   <NA>   \n",
       "4  https://data.gilbertaz.gov/datasets/TOG::gilbe...   <NA>   \n",
       "\n",
       "                                                 URL      Year DataType  \\\n",
       "0  https://data.chandlerpd.com/catalog/arrest-boo...  MULTIPLE      CSV   \n",
       "1  https://data.chandlerpd.com/catalog/calls-for-...  MULTIPLE      CSV   \n",
       "2  https://data.chandlerpd.com/catalog/general-of...  MULTIPLE      CSV   \n",
       "3  https://maps.gilbertaz.gov/arcgis/rest/service...  MULTIPLE   ArcGIS   \n",
       "4  https://services1.arcgis.com/JLuzSHjNrLL4Okwb/...      NONE   ArcGIS   \n",
       "\n",
       "                date_field dataset_id agency_field min_version query  \n",
       "0         arrest_date_time       <NA>         <NA>         0.2   NaN  \n",
       "1  call_received_date_time       <NA>         <NA>        <NA>   NaN  \n",
       "2        report_event_date       <NA>         <NA>       0.4.1   NaN  \n",
       "3                EventDate       <NA>         <NA>        <NA>   NaN  \n",
       "4                     <NA>       <NA>         <NA>        <NA>   NaN  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_datasets = opd.datasets.query()\n",
    "all_datasets.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b682fcda",
   "metadata": {},
   "source": [
    "The source table provides the information needed to create sources and load data as well as background information. It is a DataFrame that can be filtered with [pandas filtering operations](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html#min-tut-03-subset). Key information includes:\n",
    "\n",
    " * **State**: Optionally used when creating a `Source` to distinguish ambiguous sources (i.e. same city name in different states)\n",
    " * **SourceName**: Original source of the data (typically a shortened name for a police department). Used when creating a `Source`.\n",
    " * **Agency**: Shortened agency / police department name. Typically the same as SourceName. However, it may be `MULTIPLE` if a datasets contains data for multiple agencies.\n",
    " * **TableType**: Type of data (TRAFFIC STOPS, USE OF FORCE, etc.). Used when loading data.\n",
    " * **coverage_start**: Start date of data contained in dataset. Combined with coverage_end, this determines the years available for this datasets when loading data. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n",
    "  * **coverage_end**: Most recently checked date for data contained in dataset. Combined with coverage_start, this determines the years available for this datasets when loading data. If the data has been updated by the dataset owner since the date in `last_coverage_check`, more recent years may be available. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n",
    "  * **source_url**: Homepage for dataset\n",
    "  * **readme**: Direct URL for data dictionary containing definitions of columns, etc. If empty, the `source_url` may also contain a data dictionary.\n",
    "\n",
    "With its optional inputs, `query` can be used to filter for desired data. Here is a very specific query using all optional inputs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "e67f90d8-7008-4878-b3bb-99ead5653fa2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>SourceName</th>\n",
       "      <th>Agency</th>\n",
       "      <th>AgencyFull</th>\n",
       "      <th>TableType</th>\n",
       "      <th>coverage_start</th>\n",
       "      <th>coverage_end</th>\n",
       "      <th>last_coverage_check</th>\n",
       "      <th>Description</th>\n",
       "      <th>source_url</th>\n",
       "      <th>readme</th>\n",
       "      <th>URL</th>\n",
       "      <th>Year</th>\n",
       "      <th>DataType</th>\n",
       "      <th>date_field</th>\n",
       "      <th>dataset_id</th>\n",
       "      <th>agency_field</th>\n",
       "      <th>min_version</th>\n",
       "      <th>query</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>88</th>\n",
       "      <td>California</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2018-01-01</td>\n",
       "      <td>2018-12-31</td>\n",
       "      <td>07/06/2023</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.menlopark.gov/datasets/4036c27030...</td>\n",
       "      <td>https://data.menlopark.org/datasets/4036c27030...</td>\n",
       "      <td>https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...</td>\n",
       "      <td>2018</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>89</th>\n",
       "      <td>California</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2019-01-01</td>\n",
       "      <td>2019-12-31</td>\n",
       "      <td>07/06/2023</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.menlopark.gov/datasets/e88877f5d9...</td>\n",
       "      <td>https://data.menlopark.org/datasets/e88877f5d9...</td>\n",
       "      <td>https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...</td>\n",
       "      <td>2019</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>90</th>\n",
       "      <td>California</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2020-01-01</td>\n",
       "      <td>2020-12-31</td>\n",
       "      <td>07/06/2023</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.menlopark.gov/datasets/510eb69337...</td>\n",
       "      <td>https://data.menlopark.org/datasets/510eb69337...</td>\n",
       "      <td>https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...</td>\n",
       "      <td>2020</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>91</th>\n",
       "      <td>California</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park</td>\n",
       "      <td>Menlo Park Police Department</td>\n",
       "      <td>CALLS FOR SERVICE</td>\n",
       "      <td>2021-01-01</td>\n",
       "      <td>2021-12-31</td>\n",
       "      <td>07/06/2023</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://data.menlopark.gov/datasets/4c04a71c71...</td>\n",
       "      <td>https://data.menlopark.org/datasets/4c04a71c71...</td>\n",
       "      <td>https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...</td>\n",
       "      <td>2021</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         State  SourceName      Agency                    AgencyFull  \\\n",
       "88  California  Menlo Park  Menlo Park  Menlo Park Police Department   \n",
       "89  California  Menlo Park  Menlo Park  Menlo Park Police Department   \n",
       "90  California  Menlo Park  Menlo Park  Menlo Park Police Department   \n",
       "91  California  Menlo Park  Menlo Park  Menlo Park Police Department   \n",
       "\n",
       "            TableType coverage_start coverage_end last_coverage_check  \\\n",
       "88  CALLS FOR SERVICE     2018-01-01   2018-12-31          07/06/2023   \n",
       "89  CALLS FOR SERVICE     2019-01-01   2019-12-31          07/06/2023   \n",
       "90  CALLS FOR SERVICE     2020-01-01   2020-12-31          07/06/2023   \n",
       "91  CALLS FOR SERVICE     2021-01-01   2021-12-31          07/06/2023   \n",
       "\n",
       "   Description                                         source_url  \\\n",
       "88        <NA>  https://data.menlopark.gov/datasets/4036c27030...   \n",
       "89        <NA>  https://data.menlopark.gov/datasets/e88877f5d9...   \n",
       "90        <NA>  https://data.menlopark.gov/datasets/510eb69337...   \n",
       "91        <NA>  https://data.menlopark.gov/datasets/4c04a71c71...   \n",
       "\n",
       "                                               readme  \\\n",
       "88  https://data.menlopark.org/datasets/4036c27030...   \n",
       "89  https://data.menlopark.org/datasets/e88877f5d9...   \n",
       "90  https://data.menlopark.org/datasets/510eb69337...   \n",
       "91  https://data.menlopark.org/datasets/4c04a71c71...   \n",
       "\n",
       "                                                  URL  Year DataType  \\\n",
       "88  https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...  2018   ArcGIS   \n",
       "89  https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...  2019   ArcGIS   \n",
       "90  https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...  2020   ArcGIS   \n",
       "91  https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...  2021   ArcGIS   \n",
       "\n",
       "   date_field dataset_id agency_field min_version query  \n",
       "88       <NA>       <NA>         <NA>        <NA>   NaN  \n",
       "89       <NA>       <NA>         <NA>        <NA>   NaN  \n",
       "90       <NA>       <NA>         <NA>        <NA>   NaN  \n",
       "91       <NA>       <NA>         <NA>        <NA>   NaN  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ds = opd.datasets.query(source_name=\"Menlo Park\", state=\"California\", agency=\"Menlo Park\", table_type=\"CALLS FOR SERVICE\")\n",
    "ds"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "d9d7023e-40ac-4e55-81d4-9f1efb9ec2db",
   "metadata": {},
   "source": [
    "`get_table_types` finds available table types in OPD. Here, we use optional `contains` input to only get the table types containing the word \"STOPS\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "4d0a3fbd-be70-4717-8705-3339434a1c1f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['PEDESTRIAN STOPS',\n",
       " 'STOPS',\n",
       " 'TRAFFIC STOPS',\n",
       " 'TRAFFIC STOPS INCIDENTS',\n",
       " 'TRAFFIC STOPS SUBJECTS']"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "table_types = opd.datasets.get_table_types(contains=\"STOPS\")\n",
    "table_types"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c9075d6b-eb04-4760-8de9-1ff4f289dbb9",
   "metadata": {},
   "source": [
    "## Loading Data\n",
    "The `Source` class is used to explore datasets and load data. We first need to create a source, which we can use to view all datasets from that source. Let's create a source of Columbia, South Carolina. We need to specify the state because there are datasets from Columbias from multiple states"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7f8680d3-ee9a-413f-935c-1eb50a4c8c50",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>SourceName</th>\n",
       "      <th>Agency</th>\n",
       "      <th>AgencyFull</th>\n",
       "      <th>TableType</th>\n",
       "      <th>coverage_start</th>\n",
       "      <th>coverage_end</th>\n",
       "      <th>last_coverage_check</th>\n",
       "      <th>Description</th>\n",
       "      <th>source_url</th>\n",
       "      <th>readme</th>\n",
       "      <th>URL</th>\n",
       "      <th>Year</th>\n",
       "      <th>DataType</th>\n",
       "      <th>date_field</th>\n",
       "      <th>dataset_id</th>\n",
       "      <th>agency_field</th>\n",
       "      <th>min_version</th>\n",
       "      <th>query</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>891</th>\n",
       "      <td>South Carolina</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>Columbia Police Department</td>\n",
       "      <td>ARRESTS</td>\n",
       "      <td>2016-01-01</td>\n",
       "      <td>2022-12-31</td>\n",
       "      <td>07/07/2023</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://coc-colacitygis.opendata.arcgis.com/da...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>Arrest_Date</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>0.2</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>892</th>\n",
       "      <td>South Carolina</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>Columbia Police Department</td>\n",
       "      <td>FIELD CONTACTS</td>\n",
       "      <td>2016-01-01</td>\n",
       "      <td>2022-12-31</td>\n",
       "      <td>07/07/2023</td>\n",
       "      <td>Field Interview is a collection of data result...</td>\n",
       "      <td>https://coc-colacitygis.opendata.arcgis.com/da...</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...</td>\n",
       "      <td>MULTIPLE</td>\n",
       "      <td>ArcGIS</td>\n",
       "      <td>TOC</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>&lt;NA&gt;</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              State SourceName    Agency                  AgencyFull  \\\n",
       "891  South Carolina   Columbia  Columbia  Columbia Police Department   \n",
       "892  South Carolina   Columbia  Columbia  Columbia Police Department   \n",
       "\n",
       "          TableType coverage_start coverage_end last_coverage_check  \\\n",
       "891         ARRESTS     2016-01-01   2022-12-31          07/07/2023   \n",
       "892  FIELD CONTACTS     2016-01-01   2022-12-31          07/07/2023   \n",
       "\n",
       "                                           Description  \\\n",
       "891                                               <NA>   \n",
       "892  Field Interview is a collection of data result...   \n",
       "\n",
       "                                            source_url readme  \\\n",
       "891  https://coc-colacitygis.opendata.arcgis.com/da...   <NA>   \n",
       "892  https://coc-colacitygis.opendata.arcgis.com/da...   <NA>   \n",
       "\n",
       "                                                   URL      Year DataType  \\\n",
       "891  https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...  MULTIPLE   ArcGIS   \n",
       "892  https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...  MULTIPLE   ArcGIS   \n",
       "\n",
       "      date_field dataset_id agency_field min_version query  \n",
       "891  Arrest_Date       <NA>         <NA>         0.2   NaN  \n",
       "892          TOC       <NA>         <NA>        <NA>   NaN  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src = opd.Source(\"Columbia\", state=\"South Carolina\")\n",
    "src.datasets"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cbf410b5",
   "metadata": {},
   "source": [
    "To get a list of available table types:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b8dc350c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ARRESTS', 'FIELD CONTACTS']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src.get_tables_types()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "79d8047b",
   "metadata": {},
   "source": [
    "You can get the number of records for a dataset using `get_count`. Let's get the number of records in the year 2022 for the FIELD CONTACTS dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "edafbd84",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4764"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src.get_count(\"FIELD CONTACTS\", 2022)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8c07a6c1",
   "metadata": {},
   "source": [
    "You can find which years are available for a given table type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "086ba8f4",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src.get_years(table_type=\"FIELD CONTACTS\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b2d809bb-c661-439a-a378-81dc18fccfe5",
   "metadata": {},
   "source": [
    "Now, let's load in some field contacts data for 2022."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "c86f7b24",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "                                                                                                                       \r"
     ]
    },
    {
     "data": {
      "text/plain": [
       "state: South Carolina,\n",
       "source_name: Columbia,\n",
       "agency: Columbia,\n",
       "table_type: FIELD CONTACTS,\n",
       "year: 2022,\n",
       "description: Field Interview is a collection of data resulting from citizen contact related to suspicious activity.,\n",
       "url: https://services1.arcgis.com/Mnt8FoJcogKtoVBs/arcgis/rest/services/FieldInterview/FeatureServer/0,\n",
       "date_field: TOC,\n",
       "source_url: https://coc-colacitygis.opendata.arcgis.com/datasets/ColaCityGIS::field-interview-1-1-2016-3-31-2022/about,\n",
       "urls: {'source_url': 'https://coc-colacitygis.opendata.arcgis.com/datasets/ColaCityGIS::field-interview-1-1-2016-3-31-2022/about', 'readme': None, 'data': 'https://services1.arcgis.com/Mnt8FoJcogKtoVBs/arcgis/rest/services/FieldInterview/FeatureServer/0'}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl = src.load(\"FIELD CONTACTS\", 2022)\n",
    "tbl"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e1ae5a67",
   "metadata": {},
   "source": [
    "The loaded data is contained in a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) in the table attribute.\n",
    "\n",
    "> **NOTE**: Known date fields will automatically be converted to pandas datetime format (pandas Period in rare cases). To keep original date format, set `date_format=False` when calling `load`. With `date_format=False`, the loaded data will be *exactly* the same as the raw source data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5122367e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OBJECTID</th>\n",
       "      <th>Case_Num</th>\n",
       "      <th>TOC</th>\n",
       "      <th>Address</th>\n",
       "      <th>City</th>\n",
       "      <th>Zip</th>\n",
       "      <th>State</th>\n",
       "      <th>Age</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Contact_Type</th>\n",
       "      <th>Year</th>\n",
       "      <th>geolocation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25351</td>\n",
       "      <td>220000108</td>\n",
       "      <td>2022-01-01 21:47:00</td>\n",
       "      <td>12XX  Main St</td>\n",
       "      <td></td>\n",
       "      <td>29201</td>\n",
       "      <td></td>\n",
       "      <td>32</td>\n",
       "      <td>W</td>\n",
       "      <td>M</td>\n",
       "      <td>Field Interview</td>\n",
       "      <td>2022.0</td>\n",
       "      <td>{'x': 1989801.7762467265, 'y': 788862.9678477645}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44038</td>\n",
       "      <td>220000108</td>\n",
       "      <td>2022-01-01 21:47:00</td>\n",
       "      <td>12XX  Main St</td>\n",
       "      <td></td>\n",
       "      <td>29201</td>\n",
       "      <td></td>\n",
       "      <td>32</td>\n",
       "      <td>W</td>\n",
       "      <td>M</td>\n",
       "      <td>Field Interview</td>\n",
       "      <td>2022.0</td>\n",
       "      <td>{'x': 1989801.7762467265, 'y': 788862.9678477645}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   OBJECTID   Case_Num                 TOC         Address City    Zip State  \\\n",
       "0     25351  220000108 2022-01-01 21:47:00  12XX  Main St        29201         \n",
       "1     44038  220000108 2022-01-01 21:47:00  12XX  Main St        29201         \n",
       "\n",
       "  Age Race Sex     Contact_Type    Year  \\\n",
       "0  32    W   M  Field Interview  2022.0   \n",
       "1  32    W   M  Field Interview  2022.0   \n",
       "\n",
       "                                         geolocation  \n",
       "0  {'x': 1989801.7762467265, 'y': 788862.9678477645}  \n",
       "1  {'x': 1989801.7762467265, 'y': 788862.9678477645}  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl.table.head(2)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "762799a5-5b99-46ca-8b68-401dfdf10100",
   "metadata": {},
   "source": [
    "To request multiple years of data, you can include the years in the \"year\" parameter in the form of `[Start Year, Stop Year]`. Date ranges can only be used for multi-year datasets.\n",
    "\n",
    "In `src.datasets` for a multi-year dataset, the column value for \"Year\" is \"MULTIPLE\", and the columns \"coverage_start\" to \"coverage_end\" specifies the dates that exist in a specific TableType data that spans multiple years. For these datasets, the \"year\" parameter can also be set to \"MULTIPLE\" to request the entire dataset. For more information on year/date filtering, see the [Year Filtering Guide](./year_filtering.ipynb)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "9bacf779-2fda-4aa1-a50b-bef057b6772b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "                                                                                                                       \r"
     ]
    },
    {
     "data": {
      "text/plain": [
       "Year\n",
       "2021.0    4786\n",
       "2022.0     984\n",
       "2023.0    2831\n",
       "2024.0     383\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "multiyear_tbl = src.load(\"FIELD CONTACTS\", year = [2021, 2024])\n",
    "multiyear_tbl.table.groupby(\"Year\").size()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c109c211",
   "metadata": {},
   "source": [
    "Data can be saved locally as CSV files. This allows you to:\n",
    "\n",
    " * Open the data using the software of your choice\n",
    " * Re-open the data in OPD from a local copy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "de4fd694",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>OBJECTID</th>\n",
       "      <th>Case_Num</th>\n",
       "      <th>TOC</th>\n",
       "      <th>Address</th>\n",
       "      <th>City</th>\n",
       "      <th>Zip</th>\n",
       "      <th>State</th>\n",
       "      <th>Age</th>\n",
       "      <th>Race</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Contact_Type</th>\n",
       "      <th>Year</th>\n",
       "      <th>geolocation</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>25351</td>\n",
       "      <td>220000108.0</td>\n",
       "      <td>2022-01-01 21:47:00</td>\n",
       "      <td>12XX  Main St</td>\n",
       "      <td>NaN</td>\n",
       "      <td>29201</td>\n",
       "      <td>NaN</td>\n",
       "      <td>32.0</td>\n",
       "      <td>W</td>\n",
       "      <td>M</td>\n",
       "      <td>Field Interview</td>\n",
       "      <td>2022.0</td>\n",
       "      <td>{'x': 1989801.7762467265, 'y': 788862.9678477645}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>44038</td>\n",
       "      <td>220000108.0</td>\n",
       "      <td>2022-01-01 21:47:00</td>\n",
       "      <td>12XX  Main St</td>\n",
       "      <td>NaN</td>\n",
       "      <td>29201</td>\n",
       "      <td>NaN</td>\n",
       "      <td>32.0</td>\n",
       "      <td>W</td>\n",
       "      <td>M</td>\n",
       "      <td>Field Interview</td>\n",
       "      <td>2022.0</td>\n",
       "      <td>{'x': 1989801.7762467265, 'y': 788862.9678477645}</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   OBJECTID     Case_Num                 TOC         Address City    Zip  \\\n",
       "0     25351  220000108.0 2022-01-01 21:47:00  12XX  Main St   NaN  29201   \n",
       "1     44038  220000108.0 2022-01-01 21:47:00  12XX  Main St   NaN  29201   \n",
       "\n",
       "  State   Age Race Sex     Contact_Type    Year  \\\n",
       "0   NaN  32.0    W   M  Field Interview  2022.0   \n",
       "1   NaN  32.0    W   M  Field Interview  2022.0   \n",
       "\n",
       "                                         geolocation  \n",
       "0  {'x': 1989801.7762467265, 'y': 788862.9678477645}  \n",
       "1  {'x': 1989801.7762467265, 'y': 788862.9678477645}  "
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl.to_csv()\n",
    "new_src = opd.Source(\"Columbia\", state=\"South Carolina\")\n",
    "new_tbl = new_src.load_from_csv(2022, table_type=\"FIELD CONTACTS\")\n",
    "new_tbl.table.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "1fd16da3",
   "metadata": {
    "nbsphinx": "hidden"
   },
   "outputs": [],
   "source": [
    "# This cell should have nbsphinx\": \"hidden\" in its metadata and should not be included in the documentation.\n",
    "# Cleanup\n",
    "import os\n",
    "os.remove(\"South_Carolina_Columbia_FIELD_CONTACTS_2022.csv\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8cd627a7",
   "metadata": {},
   "source": [
    "Some datasets contain data for every agency in a state. In this case, you may want to know what agencies are available and optionally, only want agencies containing the word Arlington."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "dcd649e2",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Arlington County Police Department', \"Arlington County Sheriff's Office\"]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src = opd.Source(\"Virginia\")\n",
    "agencies = src.get_agencies(table_type=\"STOPS\", partial_name=\"Arlington\")\n",
    "agencies"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8c8d7e82",
   "metadata": {},
   "source": [
    "We may also want only load data from a specific agency:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9de443dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "tbl = src.load(table_type=\"STOPS\", year=2022, agency=\"Arlington County Police Department\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "51aea38c",
   "metadata": {},
   "source": [
    "To request data for a range of years "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "361ec854",
   "metadata": {},
   "source": [
    "## Data Standardization\n",
    "One of the challenges in analyzing police data is that different agencies will use different column names for the same data and will use different codes and terms for the data in the columns. Particularly, if you are looking at multiple datasets, it is valuable for the data to be standardized so that you know in advance what some key columns will be called and what values will be in those columns. \n",
    "\n",
    "To provide the user with more consistent column names and data, OpenPoliceData provides powerful tools to automatically standardize column names and data in order. Columns that OpenPoliceData can standardize include:\n",
    "\n",
    "* Date\n",
    "* Time\n",
    "* Gender\n",
    "* Age\n",
    "* Race\n",
    "* Ethnicity \n",
    "\n",
    "In addition, OpenPoliceData will combine separate date and time columns into a single datetime column and race and ethnicity into a single combined race column.\n",
    "\n",
    "Let's examine what columns are in the Phoenix Use of Force dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "6ec227a3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['INC_IA_NO', 'INC_IR_NO', 'EMP_BADGE_NO', 'CIT_NUMBER', 'INC_DATE',\n",
       "       'INC_YEAR', 'INC_HOUR', 'INC_DAY_WEEK', 'INC_LOC_COUNTY',\n",
       "       'HUNDRED_BLOCK', 'INC_CITY', 'INC_STATE', 'INC_ZIPCODE', 'INC_PRECINCT',\n",
       "       'CIT_INJURY_YN', 'CIT_GENDER', 'CIT_AGE', 'SUBJ_AGE_GROUP', 'CIT_RACE',\n",
       "       'CIT_ETHNICITY', 'SIMPLE_SUBJ_RE_GRP', 'SIMPLE_EMPL_RE_GRP', 'EMPL_SEX',\n",
       "       'CIT_RESIST_AGG_ACTIV_AGGRESSN', 'CIT_RESIST_ACTIVE_AGGRESSN',\n",
       "       'CIT_RESIST_ACTIVE_RESISTANCE', 'CIT_RESIST_PASSIVE_RESISTANCE',\n",
       "       'CIT_RESIST_PSYCH_INTIMIDATION', 'CIT_RESIST_VRBL_NONCOMPLIANCE',\n",
       "       'CIT_RESIST_NONE'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "src = opd.Source(\"Phoenix\")\n",
    "tbl = src.load(table_type=\"USE OF FORCE\", year=2022, pbar=False)  # pbar=False does not show progress bar\n",
    "# Only showing 1st 30 due to large number of columns\n",
    "tbl.table.columns[:30]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ddb81d8",
   "metadata": {},
   "source": [
    "The data has several columns related to subject demographics:\n",
    "- 'CIT_GENDER'\n",
    "- 'CIT_AGE'\n",
    "- 'SUBJ_AGE_GROUP'\n",
    "- 'CIT_RACE'\n",
    "- 'CIT_ETHNICITY'\n",
    "- 'SIMPLE_SUBJ_RE_GRP'\n",
    "\n",
    "These are not common labels used by datasets from other agencies (i.e. they cannot be predicted in advance). Additionally, when looking at them, the column labels are a bit hard to decipher because they are not all clear and are not consistent in their naming conventions. The data uses 2 different short descriptions for the same subject (CIT and SUBJ), and the *RE* in 'SIMPLE_SUBJ_RE_GRP' is for race/ethnicity so there 3 columns related to race and ethnicity. \n",
    "\n",
    "Similarly, the office demographics data uses the same *RE* abbreviation, and the user must know that *EMPL* is short for employee, which means the officer.\n",
    "- 'SIMPLE_EMPL_RE_GRP'\n",
    "- 'EMPL_SEX'\n",
    "\n",
    "OPD's data standardization will automatically identify columns and rename them to standard column names (while optionally keeping the original columns as RAW_{original name}). This enables the user to know in advance what the column names will be.\n",
    "\n",
    "Now let's examine what's in the subject race and ethnicity columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "fbe2683a",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The unique values in the race column (CIT_RACE) are ['White' 'Black' 'American Indian / Alaskan Native'\n",
      " 'Asian / Pacific Islander' 'Unknown' 'AmIndian']\n",
      "The unique values in the ethnicity column (CIT_ETHNICITY) are ['Hispanic' 'Non-Hispanic' 'Unknown']\n",
      "The unique values in the race ethnicity column (SIMPLE_SUBJ_RE_GRP) are ['Hispanic' 'Black or African American' 'White' 'Other']\n"
     ]
    }
   ],
   "source": [
    "print(f\"The unique values in the race column (CIT_RACE) are {tbl.table['CIT_RACE'].unique()}\")\n",
    "print(f\"The unique values in the ethnicity column (CIT_ETHNICITY) are {tbl.table['CIT_ETHNICITY'].unique()}\")\n",
    "print(f\"The unique values in the race ethnicity column (SIMPLE_SUBJ_RE_GRP) are {tbl.table['SIMPLE_SUBJ_RE_GRP'].unique()}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b56aec4",
   "metadata": {},
   "source": [
    "A few items to note:\n",
    "- Naming conventions are not consistent: Indigenous subjects are labeled 'American Indian / Alaskan Native' and 'AmIndian'. Black subjects are labeled 'Black' in the race column and 'Black or African American' in the race/ethncity column\n",
    "- 'Asian / Pacific Islander' and 'American Indian / Alaskan Native' appear in the race column but not the race/ethnicity column, which does not seem correct unless ALL Asian/Pacific island and Indigenous subjects were Hispanic/Latino (since Hispanic/Latino is typically used for Latinos of all races in race/ethnicity columns), which may seem unlikely.\n",
    "\n",
    "Let's look at some cases where the race is 'Asian / Pacific Islander' or 'American Indian / Alaskan Native':"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "879bd111",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>CIT_RACE</th>\n",
       "      <th>CIT_ETHNICITY</th>\n",
       "      <th>SIMPLE_SUBJ_RE_GRP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>American Indian / Alaskan Native</td>\n",
       "      <td>Non-Hispanic</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>American Indian / Alaskan Native</td>\n",
       "      <td>Non-Hispanic</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Asian / Pacific Islander</td>\n",
       "      <td>Non-Hispanic</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>American Indian / Alaskan Native</td>\n",
       "      <td>Hispanic</td>\n",
       "      <td>Hispanic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Asian / Pacific Islander</td>\n",
       "      <td>Non-Hispanic</td>\n",
       "      <td>Other</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                            CIT_RACE CIT_ETHNICITY SIMPLE_SUBJ_RE_GRP\n",
       "19  American Indian / Alaskan Native  Non-Hispanic              Other\n",
       "20  American Indian / Alaskan Native  Non-Hispanic              Other\n",
       "24          Asian / Pacific Islander  Non-Hispanic              Other\n",
       "25  American Indian / Alaskan Native      Hispanic           Hispanic\n",
       "27          Asian / Pacific Islander  Non-Hispanic              Other"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "i = tbl.table['CIT_RACE'].isin(['American Indian / Alaskan Native', 'Asian / Pacific Islander'])\n",
    "tbl.table[['CIT_RACE', 'CIT_ETHNICITY', 'SIMPLE_SUBJ_RE_GRP']][i].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3fe7b74",
   "metadata": {},
   "source": [
    "Subjects labeled as 'Asian / Pacific Islander' and 'American Indian / Alaskan Native' in the race column are relabeled as 'OTHER' in the race/ethnicity column. Thus, although it is often preferred to use a combined race/ethnicity column, the way that 'SIMPLE_SUBJ_RE_GRP' has been generated actually removes key information.\n",
    "\n",
    "OPD's standardization allows the user to more quickly analyze data by  automatically identifying columns, renaming them to standard column names, and standardizing the data in those columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "e7a596c5",
   "metadata": {},
   "outputs": [],
   "source": [
    "tbl.standardize()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ceb75dde",
   "metadata": {},
   "source": [
    "Let's look at what the standardization did using `get_transform_map`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "39563a1b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "orig_column_name: INC_DATE,\n",
      "new_column_name: DATE,\n",
      "data_maps: None\n",
      "\n",
      "orig_column_name: CIT_RACE,\n",
      "new_column_name: SUBJECT_RACE,\n",
      "data_maps: {'White': 'WHITE', 'Black': 'BLACK', 'American Indian / Alaskan Native': 'INDIGENOUS', 'Asian / Pacific Islander': 'ASIAN/PACIFIC ISLANDER', 'Unknown': 'UNKNOWN', 'AmIndian': 'INDIGENOUS'}\n",
      "\n",
      "orig_column_name: SIMPLE_EMPL_RE_GRP,\n",
      "new_column_name: OFFICER_RACE/ETHNICITY,\n",
      "data_maps: {'White': 'WHITE', 'Hispanic': 'HISPANIC/LATINO', 'Other': 'OTHER', 'Black or African American': 'BLACK', None: 'UNSPECIFIED'}\n",
      "\n",
      "orig_column_name: CIT_ETHNICITY,\n",
      "new_column_name: SUBJECT_ETHNICITY,\n",
      "data_maps: {'Non-Hispanic': 'NON-HISPANIC/NON-LATINO', 'Hispanic': 'HISPANIC/LATINO', 'Unknown': 'UNKNOWN'}\n",
      "\n",
      "orig_column_name: ['SUBJECT_RACE', 'SUBJECT_ETHNICITY'],\n",
      "new_column_name: SUBJECT_RACE/ETHNICITY,\n",
      "data_maps: None\n",
      "\n",
      "orig_column_name: SUBJECT_RACE/ETHNICITY,\n",
      "new_column_name: SUBJECT_RE_GROUP,\n",
      "data_maps: None\n",
      "\n",
      "orig_column_name: OFFICER_RACE/ETHNICITY,\n",
      "new_column_name: OFFICER_RE_GROUP,\n",
      "data_maps: None\n",
      "\n",
      "orig_column_name: CIT_INJURY_YN,\n",
      "new_column_name: SUBJECT_INJURY,\n",
      "data_maps: {'Yes': 'INJURED', 'No': 'NO INJURY'}\n",
      "\n",
      "orig_column_name: CIT_AGE,\n",
      "new_column_name: SUBJECT_AGE,\n",
      "data_maps: None\n",
      "\n",
      "orig_column_name: SUBJ_AGE_GROUP,\n",
      "new_column_name: SUBJECT_AGE_RANGE,\n",
      "data_maps: {'30s': '30-39', '20s': '20-29', '40s': '40-49', '<20': '0-20', '50s': '50-59', '60s': '60-69', 'Not Available': 'Not Available', '70s': '70-79', '90s': '90-99', '80s': '80-89'}\n",
      "\n",
      "orig_column_name: CIT_GENDER,\n",
      "new_column_name: SUBJECT_GENDER,\n",
      "data_maps: {'Male': 'MALE', 'Female': 'FEMALE'}\n",
      "\n",
      "orig_column_name: EMPL_SEX,\n",
      "new_column_name: OFFICER_GENDER,\n",
      "data_maps: {'Male': 'MALE', 'Female': 'FEMALE', None: 'UNSPECIFIED'}\n",
      "\n",
      "orig_column_name: INC_ZIPCODE,\n",
      "new_column_name: ZIP_CODE,\n",
      "data_maps: None\n",
      "\n"
     ]
    }
   ],
   "source": [
    "std_map = tbl.get_transform_map(minimize=True)\n",
    "for t in std_map:\n",
    "    print(f\"{t}\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6e0e9b6f",
   "metadata": {},
   "source": [
    "`get_transform_map` shows changes made by standardization including the following:\n",
    "- Identifying all demographics columns\n",
    "- Identifying the more informative CIT_RACE as the race column instead of SIMPLE_SUBJ_RE_GRP\n",
    "- The identified race (CIT_RACE) and ethnicity columns (CIT_ETHNICITY) were converted to SUBJECT_RACE and SUBJECT_ETHNICITY, respectively and then SUBJECT_RACE and SUBJECT_ETHNICITY are combined into SUBJECT_RACE_ETHNICITY. \n",
    "- The SUBJECT_RACE_ETHNICITY was copied to another column called SUBJECT_RE_GROUP. RE_GROUP columns like SUBJECT_RE_GROUP and OFFICER_RE_GROUP are added for those who want to be able to easily use a RACE_ETHNICITY column if it exists or a RACE column otherwise. The RE_GROUP column will be a copy of the RACE_ETHNICITY column if a RACE_ETHNICITY column has been generated or a RACE column if a RACE column was found but a RACE_ETHNICITY was not generated.\n",
    "- EMPL was identified as indicating officer demographics\n",
    "- The cryptically named SIMPLE_EMPL_RE_GRP was identified as an OFFICER_RACE column \n",
    "- Values of race, gender, age, and age group are standardized to values that will be consistent across all OPD-loaded tables\n",
    "\n",
    "In `data_maps`, `get_transform_map` also includes dictionaries indicating original values (key) and the resulting standardize value (value). \n",
    "\n",
    "Printing the columns shows that the standardized columns are in the front while the original columns are prepended with RAW and moved to the back of the list."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "c11c0796",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The 1st 10 columns after standardization are: Index(['DATE', 'SUBJECT_RACE', 'OFFICER_RACE/ETHNICITY', 'SUBJECT_ETHNICITY',\n",
      "       'SUBJECT_RACE/ETHNICITY', 'SUBJECT_RE_GROUP', 'OFFICER_RE_GROUP',\n",
      "       'SUBJECT_INJURY', 'SUBJECT_AGE', 'SUBJECT_AGE_RANGE', 'SUBJECT_GENDER',\n",
      "       'OFFICER_GENDER'],\n",
      "      dtype='object')\n",
      "The last 11 columns after standardization are: Index(['RAW_INC_ZIPCODE', 'RAW_CIT_INJURY_YN', 'RAW_CIT_GENDER', 'RAW_CIT_AGE',\n",
      "       'RAW_SUBJ_AGE_GROUP', 'RAW_CIT_RACE', 'RAW_CIT_ETHNICITY',\n",
      "       'RAW_SIMPLE_EMPL_RE_GRP', 'RAW_EMPL_SEX'],\n",
      "      dtype='object')\n"
     ]
    }
   ],
   "source": [
    "print(f\"The 1st 10 columns after standardization are: {tbl.table.columns[:12]}\")\n",
    "print(f\"The last 11 columns after standardization are: {tbl.table.columns[-9:]}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9176e7a7",
   "metadata": {},
   "source": [
    "Finally, we can view what values that the new SUBJECT_RE_GROUP column contains:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "53dd1ced",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['HISPANIC/LATINO', 'BLACK', 'WHITE', 'INDIGENOUS',\n",
       "       'ASIAN/PACIFIC ISLANDER', 'UNKNOWN'], dtype=object)"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tbl.table[\"SUBJECT_RE_GROUP\"].unique()\n",
    "# NOTE: We also provide a columns enum so that the user does not have to remember the standardized column names. This would produce the same thing:\n",
    "# tbl.table[opd.defs.columns.SUBJECT_RE_GROUP].unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b7e16e61",
   "metadata": {},
   "source": [
    "## Other Topics\n",
    "\n",
    "- [Data Standardization Guide](../examples/opd-examples/standardization.ipynb) (including methods for customizing the standardization process)\n",
    "- [Related Tables](related_table.ipynb)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}