{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Year Filtering Guide\n",
"Year filtering is necessary in multiple methods (load, load_iter, load_from_csv, get_count, get_agencies) of the Source class that is used to interact with datasets from a particular agency. In most cases, the basic year filtering described in the [Getting Started Guide](./index.ipynb) is all that is needed. However, this guide is provided to document how to handle the rare more complex cases.\n",
"\n",
"Year filtering serves 2 purposes:\n",
"\n",
"* Identifying which single dataset to request data or information from\n",
"* Filtering a dataset for only a specified year or for a date/year range\n",
"\n",
"> **NOTE**: The combination of inputs including `year` for the load, load_iter, load_from_csv, get_count, and get_agencies methods must specify a single dataset. OPD requires this because the structure and contents of different datasets can differ even between what appear to be the same data from the same source but for 2 different years. After loading data with OPD, you can [concatenate](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html) datasets manually using pandas.\n",
"\n",
"This guide describes how to handle the following cases:\n",
"\n",
"* [Request single year of single year dataset](#request-single-year-of-single-year-dataset)\n",
"* [Request single year of multi-year dataset](#request-single-year-of-multi-year-dataset)\n",
"* [Request all of a multi-year dataset](#request-all-of-a-multi-year-dataset)\n",
"* [Request year range of a multi-year dataset](#request-year-range-of-multi-year-dataset)\n",
"* [Request date range of a multi-year dataset](#request-date-range-of-multi-year-dataset)\n",
"* [Filtering of Excel and CSV Files with load_iter and get_count](#filtering-of-excel-and-csv-files-with-load_iter-and-get_count)\n",
"\n",
"Most datasets associated with a single source and table type are either contained in a single multi-year dataset or in multiple single year datasets. To better demonstrate year filtering concepts, this guide will mostly use Use of Force datasets from Northampton and Asheville because Northampton consists of a multi-year dataset AND a single year dataset and Asheville consists of 2 multi-year datasets."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
"# This cell should have \"nbsphinx\": \"hidden\" in its metadata and not be included in the documentation!\n",
"import sys\n",
"sys.path.append(\"../../..\")\n",
"\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" SourceName | \n",
" Agency | \n",
" AgencyFull | \n",
" TableType | \n",
" coverage_start | \n",
" coverage_end | \n",
" last_coverage_check | \n",
" Description | \n",
" source_url | \n",
" readme | \n",
" URL | \n",
" Year | \n",
" DataType | \n",
" date_field | \n",
" dataset_id | \n",
" agency_field | \n",
" min_version | \n",
" query | \n",
"
\n",
" \n",
" \n",
" \n",
" | 502 | \n",
" Massachusetts | \n",
" Northampton | \n",
" Northampton | \n",
" Northampton Police Department | \n",
" USE OF FORCE | \n",
" 2014-01-01 | \n",
" 2020-12-31 | \n",
" 8/13/2023 | \n",
" <NA> | \n",
" https://northamptonpd.com/open-data-portal.html | \n",
" <NA> | \n",
" https://northamptonpd.com/images/ODP%20Spreads... | \n",
" MULTIPLE | \n",
" Excel | \n",
" <NA> | \n",
" <NA> | \n",
" <NA> | \n",
" 0.3.1 | \n",
" NaN | \n",
"
\n",
" \n",
" | 503 | \n",
" Massachusetts | \n",
" Northampton | \n",
" Northampton | \n",
" Northampton Police Department | \n",
" USE OF FORCE | \n",
" 2021-01-01 | \n",
" 2021-12-31 | \n",
" 8/13/2023 | \n",
" <NA> | \n",
" https://northamptonpd.com/open-data-portal.html | \n",
" <NA> | \n",
" https://northamptonpd.com/images/ODP%20Spreads... | \n",
" 2021 | \n",
" Excel | \n",
" <NA> | \n",
" <NA> | \n",
" <NA> | \n",
" 0.3.1 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State SourceName Agency AgencyFull \\\n",
"502 Massachusetts Northampton Northampton Northampton Police Department \n",
"503 Massachusetts Northampton Northampton Northampton Police Department \n",
"\n",
" TableType coverage_start coverage_end last_coverage_check Description \\\n",
"502 USE OF FORCE 2014-01-01 2020-12-31 8/13/2023 \n",
"503 USE OF FORCE 2021-01-01 2021-12-31 8/13/2023 \n",
"\n",
" source_url readme \\\n",
"502 https://northamptonpd.com/open-data-portal.html \n",
"503 https://northamptonpd.com/open-data-portal.html \n",
"\n",
" URL Year DataType \\\n",
"502 https://northamptonpd.com/images/ODP%20Spreads... MULTIPLE Excel \n",
"503 https://northamptonpd.com/images/ODP%20Spreads... 2021 Excel \n",
"\n",
" date_field dataset_id agency_field min_version query \n",
"502 0.3.1 NaN \n",
"503 0.3.1 NaN "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import openpolicedata as opd\n",
"northampton = opd.Source('Northampton')\n",
"northampton.datasets[northampton.datasets['TableType']=='USE OF FORCE']"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" SourceName | \n",
" Agency | \n",
" AgencyFull | \n",
" TableType | \n",
" coverage_start | \n",
" coverage_end | \n",
" last_coverage_check | \n",
" Description | \n",
" source_url | \n",
" readme | \n",
" URL | \n",
" Year | \n",
" DataType | \n",
" date_field | \n",
" dataset_id | \n",
" agency_field | \n",
" min_version | \n",
" query | \n",
"
\n",
" \n",
" \n",
" \n",
" | 711 | \n",
" North Carolina | \n",
" Asheville | \n",
" Asheville | \n",
" Asheville Police Department | \n",
" USE OF FORCE | \n",
" 2018-04-12 | \n",
" 2020-12-26 | \n",
" 05/10/2024 | \n",
" APD Use of Force Data View from 2018 through 2... | \n",
" https://data-avl.opendata.arcgis.com/datasets/... | \n",
" https://docs.google.com/document/d/1sScS5Jez1w... | \n",
" https://services.arcgis.com/aJ16ENn1AaqdFlqx/a... | \n",
" MULTIPLE | \n",
" ArcGIS | \n",
" date_occurred | \n",
" <NA> | \n",
" <NA> | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
" | 712 | \n",
" North Carolina | \n",
" Asheville | \n",
" Asheville | \n",
" Asheville Police Department | \n",
" USE OF FORCE | \n",
" 2020-12-16 | \n",
" 2024-03-29 | \n",
" 05/10/2024 | \n",
" APD Use of Force data starting January 2021 | \n",
" https://data-avl.opendata.arcgis.com/datasets/... | \n",
" https://docs.google.com/document/d/1sScS5Jez1w... | \n",
" https://services.arcgis.com/aJ16ENn1AaqdFlqx/a... | \n",
" MULTIPLE | \n",
" ArcGIS | \n",
" occurred_date | \n",
" <NA> | \n",
" <NA> | \n",
" <NA> | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State SourceName Agency AgencyFull \\\n",
"711 North Carolina Asheville Asheville Asheville Police Department \n",
"712 North Carolina Asheville Asheville Asheville Police Department \n",
"\n",
" TableType coverage_start coverage_end last_coverage_check \\\n",
"711 USE OF FORCE 2018-04-12 2020-12-26 05/10/2024 \n",
"712 USE OF FORCE 2020-12-16 2024-03-29 05/10/2024 \n",
"\n",
" Description \\\n",
"711 APD Use of Force Data View from 2018 through 2... \n",
"712 APD Use of Force data starting January 2021 \n",
"\n",
" source_url \\\n",
"711 https://data-avl.opendata.arcgis.com/datasets/... \n",
"712 https://data-avl.opendata.arcgis.com/datasets/... \n",
"\n",
" readme \\\n",
"711 https://docs.google.com/document/d/1sScS5Jez1w... \n",
"712 https://docs.google.com/document/d/1sScS5Jez1w... \n",
"\n",
" URL Year DataType \\\n",
"711 https://services.arcgis.com/aJ16ENn1AaqdFlqx/a... MULTIPLE ArcGIS \n",
"712 https://services.arcgis.com/aJ16ENn1AaqdFlqx/a... MULTIPLE ArcGIS \n",
"\n",
" date_field dataset_id agency_field min_version query \n",
"711 date_occurred NaN \n",
"712 occurred_date NaN "
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"asheville = opd.Source(\"Asheville\")\n",
"uof_datasets = asheville.datasets[asheville.datasets['TableType']=='USE OF FORCE']\n",
"uof_datasets"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Request single year of single year dataset\n",
"Requesting a single year of a single year dataset simply requires inputting the value in the `Year` column of the datasets table as the `year` input:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: Massachusetts,\n",
"source_name: Northampton,\n",
"agency: Northampton,\n",
"table_type: USE OF FORCE,\n",
"year: 2021,\n",
"description: ,\n",
"url: https://northamptonpd.com/images/ODP%20Spreadsheets/2021/Use%20of%20Force.xlsx,\n",
"source_url: https://northamptonpd.com/open-data-portal.html,\n",
"urls: {'source_url': 'https://northamptonpd.com/open-data-portal.html', 'readme': None, 'data': 'https://northamptonpd.com/images/ODP%20Spreadsheets/2021/Use%20of%20Force.xlsx'}"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = northampton.load('USE OF FORCE', 2021)\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Request single year of multi-year dataset\n",
"Requesting a single year of a multi-year dataset requires inputting the desired year as the `year` input. If the desired year corresponds to a single-year dataset, use the `url_contains` input to indicate that the multi-year dataset is desired."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: Massachusetts,\n",
"source_name: Northampton,\n",
"agency: Northampton,\n",
"table_type: USE OF FORCE,\n",
"year: MULTIPLE,\n",
"description: ,\n",
"url: https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx,\n",
"source_url: https://northamptonpd.com/open-data-portal.html,\n",
"urls: {'source_url': 'https://northamptonpd.com/open-data-portal.html', 'readme': None, 'data': 'https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx'}"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = northampton.load('USE OF FORCE', 2020)\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are no overlapping years between the multi-year (2014-2020) and single year (2021) datasets for Northampton. However, to demonstrate what would be required if the multi-year dataset also contained 2021 data and the user wanted to request 2021 data from that dataset, let's enter a substring of the URL (from the `URL` column of the datasets table) of the multi-year dataset (from the datasets table above) to request 2021 data from the multi-year dataset (the returned data will have 0 rows).\n",
"\n",
"> NOTE: Overlap between single year and multi-year datasets does not currently exist in OPD's datasets as of April 2024."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: Massachusetts,\n",
"source_name: Northampton,\n",
"agency: Northampton,\n",
"table_type: USE OF FORCE,\n",
"year: MULTIPLE,\n",
"description: ,\n",
"url: https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx,\n",
"source_url: https://northamptonpd.com/open-data-portal.html,\n",
"urls: {'source_url': 'https://northamptonpd.com/open-data-portal.html', 'readme': None, 'data': 'https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx'}"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = northampton.load('USE OF FORCE', 2021, url_contains='2014-2020')\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Request all of a multi-year dataset\n",
"Requesting the entirety of multi-year dataset requires inputting \"MULTIPLE\" as the `year` input. If there are multiple multi-year datasets, use the `url_contains` input to indicate the desired multi-year dataset."
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: Massachusetts,\n",
"source_name: Northampton,\n",
"agency: Northampton,\n",
"table_type: USE OF FORCE,\n",
"year: MULTIPLE,\n",
"description: ,\n",
"url: https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx,\n",
"source_url: https://northamptonpd.com/open-data-portal.html,\n",
"urls: {'source_url': 'https://northamptonpd.com/open-data-portal.html', 'readme': None, 'data': 'https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx'}"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = northampton.load('USE OF FORCE', \"MULTIPLE\")\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The Asheville dataset has multiple multi-year datasets for use of force. Use `url_contains` to distinguish between them (by inputting all or portion of the URL in the `URL` column of the datasets table)."
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: North Carolina,\n",
"source_name: Asheville,\n",
"agency: Asheville,\n",
"table_type: USE OF FORCE,\n",
"year: MULTIPLE,\n",
"description: APD Use of Force Data View from 2018 through 2020.,\n",
"url: https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0,\n",
"date_field: date_occurred,\n",
"source_url: https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about,\n",
"readme: https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing,\n",
"urls: {'source_url': 'https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about', 'readme': 'https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing', 'data': 'https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0'}"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = asheville.load('USE OF FORCE', \"MULTIPLE\", url_contains=uof_datasets.iloc[0]['URL'])\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Request year range of multi-year dataset\n",
"You can request a range of years for multi-year datasets. Below, we will review the different possible cases starting with the most common.\n",
"\n",
"First, let's request a year range for when there is a single multi-year dataset. Enter a list with the start and stop year as the `year`."
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: Massachusetts,\n",
"source_name: Northampton,\n",
"agency: Northampton,\n",
"table_type: USE OF FORCE,\n",
"year: MULTIPLE,\n",
"description: ,\n",
"url: https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx,\n",
"source_url: https://northamptonpd.com/open-data-portal.html,\n",
"urls: {'source_url': 'https://northamptonpd.com/open-data-portal.html', 'readme': None, 'data': 'https://northamptonpd.com/images/ODP%20Spreadsheets/NPD_Use_of_Force_2014-2020_incident_level_data.xlsx'}"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = northampton.load('USE OF FORCE', [2018,2020])\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A similar request can be made if there are multiple multi-year datasets. If the requested year range is all within a single dataset, you can just input a year range."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: North Carolina,\n",
"source_name: Asheville,\n",
"agency: Asheville,\n",
"table_type: USE OF FORCE,\n",
"year: [2018, 2019],\n",
"description: APD Use of Force Data View from 2018 through 2020.,\n",
"url: https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0,\n",
"date_field: date_occurred,\n",
"source_url: https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about,\n",
"readme: https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing,\n",
"urls: {'source_url': 'https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about', 'readme': 'https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing', 'data': 'https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0'}"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = asheville.load('USE OF FORCE', [2018,2019])\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `url_contains` input is required if multi-year datasets overlap AND data is requested in the overlapping region. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"state: North Carolina,\n",
"source_name: Asheville,\n",
"agency: Asheville,\n",
"table_type: USE OF FORCE,\n",
"year: [2019, 2020],\n",
"description: APD Use of Force Data View from 2018 through 2020.,\n",
"url: https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0,\n",
"date_field: date_occurred,\n",
"source_url: https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about,\n",
"readme: https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing,\n",
"urls: {'source_url': 'https://data-avl.opendata.arcgis.com/datasets/7bc4d78e6bc64f2c9ca02fb2486aad0f_0/about', 'readme': 'https://docs.google.com/document/d/1sScS5Jez1w8I-DUb6bagsj8efad7tDtIkbebicTXUOM/edit?usp=sharing', 'data': 'https://services.arcgis.com/aJ16ENn1AaqdFlqx/arcgis/rest/services/APDUseOfForce/FeatureServer/0'}"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = asheville.load('USE OF FORCE', [2019,2020], url_contains=uof_datasets.iloc[0]['URL'])\n",
"table"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Request date range of multi-year dataset\n",
"You can request a range of dates for multi-year datasets. Requesting a date range is very similar to requesting a year range so it is recommended that the [filtering by year range](#request-year-range-of-multi-year-dataset) section be read first. \n",
"\n",
"To request data for a date range, enter a list with the start and stop date in YYYY-MM-DD format as the `year` input."
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Earliest date in table is 2020-09-13 00:00:00\n",
"Latest date in table is 2020-10-10 00:00:00\n"
]
}
],
"source": [
"table = asheville.load('USE OF FORCE', ['2020-09-13', '2020-10-10'])\n",
"print(f\"Earliest date in table is {table.table['date_occurred'].min()}\")\n",
"print(f\"Latest date in table is {table.table['date_occurred'].max()}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Filtering of Excel and CSV Files with load_iter and get_count\n",
"By default, OPD discourages you (by throwing an error) from using year/date filtering when calling `load_iter` and `get_count` for file-based datasets (`DataType` is CSV or Excel). The reason for this is that these files cannot be filtered by year/date without reading in the whole file. In most cases, it is likely more efficent to:\n",
"\n",
"* `load_iter` replacement for CSV and Excel datasets: use `load` instead\n",
"* `get_count` replacement for CSV and Excel datasets: use `load` to load the entire dataset and use pandas operations to filter the data and find the number of rows\n",
"\n",
"However, the `load_iter` and `get_count` can be used for Excel and CSV files by setting `force=True`.\n",
"\n",
"The Chicago STOPS data is a CSV file:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" State | \n",
" SourceName | \n",
" TableType | \n",
" coverage_start | \n",
" coverage_end | \n",
" DataType | \n",
" Year | \n",
"
\n",
" \n",
" \n",
" \n",
" | 300 | \n",
" Illinois | \n",
" Chicago | \n",
" STOPS | \n",
" 2012-01-01 | \n",
" 2020-05-16 | \n",
" CSV | \n",
" MULTIPLE | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" State SourceName TableType coverage_start coverage_end DataType \\\n",
"300 Illinois Chicago STOPS 2012-01-01 2020-05-16 CSV \n",
"\n",
" Year \n",
"300 MULTIPLE "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"chicago = opd.Source('Chicago')\n",
"chicago.datasets.loc[chicago.datasets['TableType']=='STOPS', ['State', 'SourceName', 'TableType', 'coverage_start','coverage_end','DataType', 'Year']]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Using `get_count` for Excel and CSV files:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of rows from 2019 to 2020 is 762553\n"
]
}
],
"source": [
"year_range = [2019,2020]\n",
"count = chicago.get_count('STOPS', year_range, force=True)\n",
"print(f'Number of rows from {year_range[0]} to {year_range[1]} is {count}')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If the data will also be loaded, this is a more efficient method for getting the count for Excel and CSV files:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" \r"
]
},
{
"data": {
"text/plain": [
"Index(['raw_row_number', 'date', 'time', 'location', 'lat', 'lng',\n",
" 'geocode_source', 'beat', 'district', 'subject_age', 'subject_race',\n",
" 'subject_sex', 'officer_id_hash', 'officer_age', 'officer_race',\n",
" 'officer_sex', 'officer_years_of_service', 'unit', 'type', 'violation',\n",
" 'arrest_made', 'citation_issued', 'outcome', 'contraband_found',\n",
" 'contraband_drugs', 'contraband_weapons', 'contraband_alcohol',\n",
" 'contraband_other', 'search_conducted', 'search_person',\n",
" 'search_vehicle', 'vehicle_make', 'vehicle_model', 'raw_race',\n",
" 'raw_driver_race'],\n",
" dtype='object')"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"table = chicago.load('STOPS', 'MULTIPLE')\n",
"table.table.columns"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The date column for this dataset is called `date`."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of rows from 2019 to 2020 is 762553\n"
]
}
],
"source": [
"df_filtered = table.table[ (table.table['date'].dt.year>=year_range[0]) & (table.table['date'].dt.year<=year_range[1])]\n",
"\n",
"print(f'Number of rows from {year_range[0]} to {year_range[1]} is {len(df_filtered)}')"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "opd-py3.12",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}