{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Loading Datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook shows an example of how to load a dataset. \n",
"It assumes you found the dataset using techniques shown in `finding_datasets.ipynb`\n",
"The basic steps it demonstrates to load data is:\n",
"1. Find available datasets with `opd.datasets.query`\n",
"2. Create a data source using `opd.Source` and information from the previous step.\n",
"3. Find available data types for given years using `get_tables_types` and `get_years`\n",
"4. Load the data type for a given year using `load`"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import openpolicedata as opd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" SourceName | \n",
" Agency | \n",
" AgencyFull | \n",
" TableType | \n",
" coverage_start | \n",
" coverage_end | \n",
" last_coverage_check | \n",
" Description | \n",
" source_url | \n",
" readme | \n",
" URL | \n",
" Year | \n",
" DataType | \n",
" date_field | \n",
" dataset_id | \n",
" agency_field | \n",
" min_version | \n",
" query | \n",
"
\n",
" \n",
" \n",
" \n",
" | 479 | \n",
" Maryland | \n",
" Maryland | \n",
" MULTIPLE | \n",
" NaN | \n",
" TRAFFIC STOPS | \n",
" 2007-01-01 | \n",
" 2014-03-31 | \n",
" 01/10/2024 | \n",
" Standardized stop data from the Stanford Open ... | \n",
" https://openpolicing.stanford.edu/data/ | \n",
" https://github.com/stanford-policylab/opp/blob... | \n",
" https://stacks.stanford.edu/file/druid:yg821jf... | \n",
" MULTIPLE | \n",
" CSV | \n",
" date | \n",
" <NA> | \n",
" department_name | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
" | 485 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" TRAFFIC STOPS | \n",
" 2012-06-07 | \n",
" 2024-05-09 | \n",
" 05/10/2024 | \n",
" This dataset contains traffic violation inform... | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" date_of_stop | \n",
" 4mse-ku6q | \n",
" <NA> | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State SourceName Agency \\\n",
"479 Maryland Maryland MULTIPLE \n",
"485 Maryland Montgomery County Montgomery County \n",
"\n",
" AgencyFull TableType coverage_start \\\n",
"479 NaN TRAFFIC STOPS 2007-01-01 \n",
"485 Montgomery County Police Department TRAFFIC STOPS 2012-06-07 \n",
"\n",
" coverage_end last_coverage_check \\\n",
"479 2014-03-31 01/10/2024 \n",
"485 2024-05-09 05/10/2024 \n",
"\n",
" Description \\\n",
"479 Standardized stop data from the Stanford Open ... \n",
"485 This dataset contains traffic violation inform... \n",
"\n",
" source_url \\\n",
"479 https://openpolicing.stanford.edu/data/ \n",
"485 https://data.montgomerycountymd.gov/Public-Saf... \n",
"\n",
" readme \\\n",
"479 https://github.com/stanford-policylab/opp/blob... \n",
"485 \n",
"\n",
" URL Year DataType \\\n",
"479 https://stacks.stanford.edu/file/druid:yg821jf... MULTIPLE CSV \n",
"485 data.montgomerycountymd.gov MULTIPLE Socrata \n",
"\n",
" date_field dataset_id agency_field min_version query \n",
"479 date department_name NaN \n",
"485 date_of_stop 4mse-ku6q NaN "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# We will load Montgormery County, Maryland traffic stop data. First show our dataset options.\n",
"df = opd.datasets.query(table_type='TRAFFIC STOPS', state=\"Maryland\")\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" SourceName | \n",
" Agency | \n",
" AgencyFull | \n",
" TableType | \n",
" coverage_start | \n",
" coverage_end | \n",
" last_coverage_check | \n",
" Description | \n",
" source_url | \n",
" readme | \n",
" URL | \n",
" Year | \n",
" DataType | \n",
" date_field | \n",
" dataset_id | \n",
" agency_field | \n",
" min_version | \n",
" query | \n",
"
\n",
" \n",
" \n",
" \n",
" | 480 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" COMPLAINTS | \n",
" 2013-10-24 | \n",
" 2024-05-06 | \n",
" 05/10/2024 | \n",
" This dataset contains allegations brought to t... | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" created_dt | \n",
" usip-62e2 | \n",
" <NA> | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
" | 481 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" CRASHES - INCIDENTS | \n",
" 2015-12-20 | \n",
" 2024-01-03 | \n",
" 05/10/2024 | \n",
" general information about each collision and d... | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" crash_date_time | \n",
" bhju-22kf | \n",
" <NA> | \n",
" 0.4 | \n",
" NaN | \n",
"
\n",
" \n",
" | 482 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" CRASHES - NONMOTORIST | \n",
" 2015-03-23 | \n",
" 2023-12-31 | \n",
" 05/10/2024 | \n",
" information on non-motorists (pedestrians and ... | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" crash_date_time | \n",
" n7fk-dce5 | \n",
" <NA> | \n",
" 0.5 | \n",
" NaN | \n",
"
\n",
" \n",
" | 483 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" CRASHES - SUBJECTS | \n",
" 2015-06-30 | \n",
" 2024-01-03 | \n",
" 05/10/2024 | \n",
" information on motor vehicle operators (driver... | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" crash_date_time | \n",
" mmzv-x632 | \n",
" <NA> | \n",
" 0.4 | \n",
" NaN | \n",
"
\n",
" \n",
" | 484 | \n",
" Maryland | \n",
" Montgomery County | \n",
" Montgomery County | \n",
" Montgomery County Police Department | \n",
" INCIDENTS | \n",
" 2017-04-02 | \n",
" 2024-05-10 | \n",
" 05/10/2024 | \n",
" list of Police Dispatched Incidents records | \n",
" https://data.montgomerycountymd.gov/Public-Saf... | \n",
" <NA> | \n",
" data.montgomerycountymd.gov | \n",
" MULTIPLE | \n",
" Socrata | \n",
" start_time | \n",
" 98cc-bc7d | \n",
" <NA> | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State SourceName Agency \\\n",
"480 Maryland Montgomery County Montgomery County \n",
"481 Maryland Montgomery County Montgomery County \n",
"482 Maryland Montgomery County Montgomery County \n",
"483 Maryland Montgomery County Montgomery County \n",
"484 Maryland Montgomery County Montgomery County \n",
"\n",
" AgencyFull TableType \\\n",
"480 Montgomery County Police Department COMPLAINTS \n",
"481 Montgomery County Police Department CRASHES - INCIDENTS \n",
"482 Montgomery County Police Department CRASHES - NONMOTORIST \n",
"483 Montgomery County Police Department CRASHES - SUBJECTS \n",
"484 Montgomery County Police Department INCIDENTS \n",
"\n",
" coverage_start coverage_end last_coverage_check \\\n",
"480 2013-10-24 2024-05-06 05/10/2024 \n",
"481 2015-12-20 2024-01-03 05/10/2024 \n",
"482 2015-03-23 2023-12-31 05/10/2024 \n",
"483 2015-06-30 2024-01-03 05/10/2024 \n",
"484 2017-04-02 2024-05-10 05/10/2024 \n",
"\n",
" Description \\\n",
"480 This dataset contains allegations brought to t... \n",
"481 general information about each collision and d... \n",
"482 information on non-motorists (pedestrians and ... \n",
"483 information on motor vehicle operators (driver... \n",
"484 list of Police Dispatched Incidents records \n",
"\n",
" source_url readme \\\n",
"480 https://data.montgomerycountymd.gov/Public-Saf... \n",
"481 https://data.montgomerycountymd.gov/Public-Saf... \n",
"482 https://data.montgomerycountymd.gov/Public-Saf... \n",
"483 https://data.montgomerycountymd.gov/Public-Saf... \n",
"484 https://data.montgomerycountymd.gov/Public-Saf... \n",
"\n",
" URL Year DataType date_field \\\n",
"480 data.montgomerycountymd.gov MULTIPLE Socrata created_dt \n",
"481 data.montgomerycountymd.gov MULTIPLE Socrata crash_date_time \n",
"482 data.montgomerycountymd.gov MULTIPLE Socrata crash_date_time \n",
"483 data.montgomerycountymd.gov MULTIPLE Socrata crash_date_time \n",
"484 data.montgomerycountymd.gov MULTIPLE Socrata start_time \n",
"\n",
" dataset_id agency_field min_version query \n",
"480 usip-62e2 NaN \n",
"481 bhju-22kf 0.4 NaN \n",
"482 n7fk-dce5 0.5 NaN \n",
"483 mmzv-x632 0.4 NaN \n",
"484 98cc-bc7d NaN "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# To access the data, create a source using a Source Name (usually a police department name). There is an optional state input to clarify ambiguities.\n",
"# We will use the above cell's information for Maryland to choose the agency \"Montgomery County\" which we select for the source_name\n",
"\n",
"src = opd.Source(source_name=\"Montgomery County\", state=\"Maryland\")\n",
"src.datasets.head()"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"['COMPLAINTS', 'CRASHES - INCIDENTS', 'CRASHES - NONMOTORIST', 'CRASHES - SUBJECTS', 'INCIDENTS', 'TRAFFIC STOPS']\n"
]
}
],
"source": [
"# Find out what types of data are available from this source\n",
"types = src.get_tables_types()\n",
"\n",
"print(types)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024]\n"
]
}
],
"source": [
"# Find out what years are available from the stops table\n",
"# IF you do not have a key setup you may see the message: \"WARNING:root:Requests made without an app_token will be subject to strict throttling limits.\" This is normal.\n",
"years = src.get_years(table_type=types[0])\n",
"print(years)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Load traffic stop data for 2021\n",
"t = src.load(year=2021, table_type='TRAFFIC STOPS')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" geometry | \n",
" seq_id | \n",
" date_of_stop | \n",
" time_of_stop | \n",
" agency | \n",
" subagency | \n",
" description | \n",
" location | \n",
" latitude | \n",
" longitude | \n",
" ... | \n",
" driver_state | \n",
" dl_state | \n",
" arrest_type | \n",
" search_conducted | \n",
" search_outcome | \n",
" search_reason_for_stop | \n",
" search_disposition | \n",
" search_reason | \n",
" search_type | \n",
" search_arrest_reason | \n",
"
\n",
" \n",
" \n",
" \n",
" | 0 | \n",
" POINT (-77.13047 39.01268) | \n",
" f08d0293-6ade-4802-84c1-4b7b1a707245 | \n",
" 2021-01-01 | \n",
" 03:12:00 | \n",
" MCP | \n",
" 2nd District, Bethesda | \n",
" RECKLESS DRIVING VEHICLE IN WANTON AND WILLFUL... | \n",
" IFO 9609 SINGLETON DR | \n",
" 39.0126813333333 | \n",
" -77.130466 | \n",
" ... | \n",
" MD | \n",
" MD | \n",
" A - Marked Patrol | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 1 | \n",
" POINT (-77.13047 39.01268) | \n",
" f08d0293-6ade-4802-84c1-4b7b1a707245 | \n",
" 2021-01-01 | \n",
" 03:12:00 | \n",
" MCP | \n",
" 2nd District, Bethesda | \n",
" FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE A... | \n",
" IFO 9609 SINGLETON DR | \n",
" 39.0126813333333 | \n",
" -77.130466 | \n",
" ... | \n",
" MD | \n",
" MD | \n",
" A - Marked Patrol | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 2 | \n",
" POINT (-77.13047 39.01268) | \n",
" f08d0293-6ade-4802-84c1-4b7b1a707245 | \n",
" 2021-01-01 | \n",
" 03:12:00 | \n",
" MCP | \n",
" 2nd District, Bethesda | \n",
" NEGLIGENT DRIVING VEHICLE IN CARELESS AND IMPR... | \n",
" IFO 9609 SINGLETON DR | \n",
" 39.0126813333333 | \n",
" -77.130466 | \n",
" ... | \n",
" MD | \n",
" MD | \n",
" A - Marked Patrol | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 3 | \n",
" POINT (-77.13047 39.01268) | \n",
" f08d0293-6ade-4802-84c1-4b7b1a707245 | \n",
" 2021-01-01 | \n",
" 03:12:00 | \n",
" MCP | \n",
" 2nd District, Bethesda | \n",
" FAILURE OF VEH. DRIVER TO STOP AFTER UNATTENDE... | \n",
" IFO 9609 SINGLETON DR | \n",
" 39.0126813333333 | \n",
" -77.130466 | \n",
" ... | \n",
" MD | \n",
" MD | \n",
" A - Marked Patrol | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
" | 4 | \n",
" POINT (-77.13047 39.01268) | \n",
" f08d0293-6ade-4802-84c1-4b7b1a707245 | \n",
" 2021-01-01 | \n",
" 03:12:00 | \n",
" MCP | \n",
" 2nd District, Bethesda | \n",
" FAILURE OF VEH. DRIVER INVOLVED IN ACCIDENT TO... | \n",
" IFO 9609 SINGLETON DR | \n",
" 39.0126813333333 | \n",
" -77.130466 | \n",
" ... | \n",
" MD | \n",
" MD | \n",
" A - Marked Patrol | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
5 rows × 43 columns
\n",
"
"
],
"text/plain": [
" geometry seq_id \\\n",
"0 POINT (-77.13047 39.01268) f08d0293-6ade-4802-84c1-4b7b1a707245 \n",
"1 POINT (-77.13047 39.01268) f08d0293-6ade-4802-84c1-4b7b1a707245 \n",
"2 POINT (-77.13047 39.01268) f08d0293-6ade-4802-84c1-4b7b1a707245 \n",
"3 POINT (-77.13047 39.01268) f08d0293-6ade-4802-84c1-4b7b1a707245 \n",
"4 POINT (-77.13047 39.01268) f08d0293-6ade-4802-84c1-4b7b1a707245 \n",
"\n",
" date_of_stop time_of_stop agency subagency \\\n",
"0 2021-01-01 03:12:00 MCP 2nd District, Bethesda \n",
"1 2021-01-01 03:12:00 MCP 2nd District, Bethesda \n",
"2 2021-01-01 03:12:00 MCP 2nd District, Bethesda \n",
"3 2021-01-01 03:12:00 MCP 2nd District, Bethesda \n",
"4 2021-01-01 03:12:00 MCP 2nd District, Bethesda \n",
"\n",
" description location \\\n",
"0 RECKLESS DRIVING VEHICLE IN WANTON AND WILLFUL... IFO 9609 SINGLETON DR \n",
"1 FAILURE OF VEH. DRIVER IN ACCIDENT TO LOCATE A... IFO 9609 SINGLETON DR \n",
"2 NEGLIGENT DRIVING VEHICLE IN CARELESS AND IMPR... IFO 9609 SINGLETON DR \n",
"3 FAILURE OF VEH. DRIVER TO STOP AFTER UNATTENDE... IFO 9609 SINGLETON DR \n",
"4 FAILURE OF VEH. DRIVER INVOLVED IN ACCIDENT TO... IFO 9609 SINGLETON DR \n",
"\n",
" latitude longitude ... driver_state dl_state arrest_type \\\n",
"0 39.0126813333333 -77.130466 ... MD MD A - Marked Patrol \n",
"1 39.0126813333333 -77.130466 ... MD MD A - Marked Patrol \n",
"2 39.0126813333333 -77.130466 ... MD MD A - Marked Patrol \n",
"3 39.0126813333333 -77.130466 ... MD MD A - Marked Patrol \n",
"4 39.0126813333333 -77.130466 ... MD MD A - Marked Patrol \n",
"\n",
" search_conducted search_outcome search_reason_for_stop search_disposition \\\n",
"0 NaN NaN NaN NaN \n",
"1 NaN NaN NaN NaN \n",
"2 NaN NaN NaN NaN \n",
"3 NaN NaN NaN NaN \n",
"4 NaN NaN NaN NaN \n",
"\n",
" search_reason search_type search_arrest_reason \n",
"0 NaN NaN NaN \n",
"1 NaN NaN NaN \n",
"2 NaN NaN NaN \n",
"3 NaN NaN NaN \n",
"4 NaN NaN NaN \n",
"\n",
"[5 rows x 43 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# The loaded table is stored in the table parameter as a pandas DataFrame (https://pandas.pydata.org/docs/user_guide/10min.html#min)\n",
"# Show the first 5 rows of the table\n",
"t.table.head(n=5)\n",
"# Now you are ready for analyzing the data in the table t."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.12 ('opd')",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "a73158d29711b2da05ac73de25b71e5d8cae591f14917bba77a9573b5c85a0ce"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}