{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "05195b87-9d49-4afd-9a01-6185e7391918", "metadata": {}, "source": [ "# Getting Started\n", "Welcome to the quickstart guide for OpenPoliceData (OPD)! Here, you should find all you need to learn the basics of OPD.\n", "\n", "* **New to Python?**: Check out the free [first python notebook](https://firstpythonnotebook.org/) course\n", "\n", "* **Questions or Comments?**: If you questions or comments about anything related to installing or using OPD, please reach out on our [discussion board](https://github.com/openpolicedata/openpolicedata/discussions).\n", "\n", "\n", "## Installation\n", "Install OPD with pip from [PyPI](https://pypi.org/project/openpolicedata/)\n", "\n", "```bash\n", "pip install openpolicedata\n", "```\n", "For installation in a Jupyter Notebook, replace `pip` with `%pip`. \n", "\n", "See [here](installation.rst) for advanced installation including how to install [GeoPandas] alongside OPD to enable geospatial analysis of data loaded by OPD.\n", "\n", "## Import\n", "To use OPD, you must always start by importing it into your Python code:" ] }, { "cell_type": "code", "execution_count": 5, "id": "ac629e9e-439d-4197-a88c-f467c7cfb7a9", "metadata": {}, "outputs": [], "source": [ "import openpolicedata as opd" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9b736829-ac81-4629-bf72-f83f79561e91", "metadata": {}, "source": [ "We recommend shortening openpolicedata to `opd` to make your code more readable. \n", "\n", "## The Basics\n", "OPD provides access to over 300 police datasets with just 2 simple lines of code:" ] }, { "cell_type": "code", "execution_count": 13, "id": "ebfaf449", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] } ], "source": [ "# Load traffic stops data from Lousiville for the year 2022.\n", "src = opd.Source(\"Louisville\")\n", "tbl = src.load_from_url(2022, table_type=\"TRAFFIC STOPS\") " ] }, { "attachments": {}, "cell_type": "markdown", "id": "78547a7e", "metadata": {}, "source": [ "The table attribute contains the loaded data as a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) so it can be analyzed with [pandas' simple and powerful capabilities](https://pandas.pydata.org/docs/user_guide/10min.html)." ] }, { "cell_type": "code", "execution_count": 12, "id": "6051c91f", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TYPE_OF_STOPCITATION_CONTROL_NUMBERACTIVITY_RESULTSOFFICER_GENDEROFFICER_RACEOFFICER_AGE_RANGEACTIVITY_DATEACTIVITY_TIMEACTIVITY_LOCATIONACTIVITY_DIVISIONACTIVITY_BEATDRIVER_GENDERDRIVER_RACEDRIVER_AGE_RANGENUMBER_OF_PASSENGERSWAS_VEHCILE_SEARCHEDREASON_FOR_SEARCHObjectId
0COMPLAINT/CRIMINAL VIOLATIONDU03293CITATION ISSUEDMWHITE21 - 3001/02/202221:44M ST ...4TH DIVISIONBEAT 4MWHITE26 - 302YES01
1COMPLAINT/CRIMINAL VIOLATIONDV75866CITATION ISSUEDMWHITE51 - 6007/21/202202:00KEEGAN WAY ...7TH DIVISIONBEAT 1MHISPANIC16 - 191YES42
2COMPLAINT/CRIMINAL VIOLATIONDV87754CITATION ISSUEDMWHITE51 - 6007/21/202202:00KEEGAN WAY ...7TH DIVISIONBEAT 1MHISPANIC16 - 191NO03
3COMPLAINT/CRIMINAL VIOLATIONDW19051CITATION ISSUEDMWHITE21 - 3001/25/202211:234500 BLOCK SOUTHERN PKWY4TH DIVISIONBEAT 6MWHITE20 - 250YES44
4COMPLAINT/CRIMINAL VIOLATIONDX65321CITATION ISSUEDMWHITE31 - 4001/13/202205:30PRESTON HWY @ OUTER LOOP ...7TH DIVISIONBEAT 6MWHITE51 - 601YES35
\n", "
" ], "text/plain": [ " TYPE_OF_STOP CITATION_CONTROL_NUMBER ACTIVITY_RESULTS \\\n", "0 COMPLAINT/CRIMINAL VIOLATION DU03293 CITATION ISSUED \n", "1 COMPLAINT/CRIMINAL VIOLATION DV75866 CITATION ISSUED \n", "2 COMPLAINT/CRIMINAL VIOLATION DV87754 CITATION ISSUED \n", "3 COMPLAINT/CRIMINAL VIOLATION DW19051 CITATION ISSUED \n", "4 COMPLAINT/CRIMINAL VIOLATION DX65321 CITATION ISSUED \n", "\n", " OFFICER_GENDER OFFICER_RACE OFFICER_AGE_RANGE ACTIVITY_DATE ACTIVITY_TIME \\\n", "0 M WHITE 21 - 30 01/02/2022 21:44 \n", "1 M WHITE 51 - 60 07/21/2022 02:00 \n", "2 M WHITE 51 - 60 07/21/2022 02:00 \n", "3 M WHITE 21 - 30 01/25/2022 11:23 \n", "4 M WHITE 31 - 40 01/13/2022 05:30 \n", "\n", " ACTIVITY_LOCATION ACTIVITY_DIVISION \\\n", "0 M ST ... 4TH DIVISION \n", "1 KEEGAN WAY ... 7TH DIVISION \n", "2 KEEGAN WAY ... 7TH DIVISION \n", "3 4500 BLOCK SOUTHERN PKWY 4TH DIVISION \n", "4 PRESTON HWY @ OUTER LOOP ... 7TH DIVISION \n", "\n", " ACTIVITY_BEAT DRIVER_GENDER DRIVER_RACE DRIVER_AGE_RANGE \\\n", "0 BEAT 4 M WHITE 26 - 30 \n", "1 BEAT 1 M HISPANIC 16 - 19 \n", "2 BEAT 1 M HISPANIC 16 - 19 \n", "3 BEAT 6 M WHITE 20 - 25 \n", "4 BEAT 6 M WHITE 51 - 60 \n", "\n", " NUMBER_OF_PASSENGERS WAS_VEHCILE_SEARCHED REASON_FOR_SEARCH ObjectId \n", "0 2 YES 0 1 \n", "1 1 YES 4 2 \n", "2 1 NO 0 3 \n", "3 0 YES 4 4 \n", "4 1 YES 3 5 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# View the 1st 5 rows with pandas' head function\n", "tbl.table.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "262b0614", "metadata": {}, "source": [ "## Finding Datasets\n", "OPD provides the `datasets` module for querying what datasets are available in OPD. To get all available datasets, query the source table with no inputs:" ] }, { "cell_type": "code", "execution_count": 11, "id": "c39721a9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateSourceNameAgencyAgencyFullTableTypecoverage_startcoverage_endlast_coverage_checkDescriptionsource_urlreadmeURLYearDataTypedate_fielddataset_idagency_fieldmin_version
0ArizonaGilbertGilbertGilbert Police DepartmentCALLS FOR SERVICE2006-11-152023-05-1405/15/2023<NA>https://data.gilbertaz.gov/maps/2dcb4c20c9a444...<NA>https://maps.gilbertaz.gov/arcgis/rest/service...MULTIPLEArcGISEventDate<NA><NA><NA>
1ArizonaGilbertGilbertGilbert Police DepartmentEMPLOYEENaTNaT05/15/2023A data set of all employees that have previous...https://data.gilbertaz.gov/datasets/TOG::gilbe...<NA>https://services1.arcgis.com/JLuzSHjNrLL4Okwb/...NONEArcGIS<NA><NA><NA><NA>
2ArizonaGilbertGilbertGilbert Police DepartmentSTOPS2008-01-012018-05-2305/15/2023Standardized stop data from the Stanford Open ...https://openpolicing.stanford.edu/data/https://github.com/stanford-policylab/opp/blob...https://stacks.stanford.edu/file/druid:yg821jf...MULTIPLECSVdate<NA><NA><NA>
3ArizonaMesaMesaMesa Police DepartmentCALLS FOR SERVICE2017-01-012023-05-1205/15/2023<NA>https://data.mesaaz.gov/Police/Police-Computer...<NA>data.mesaaz.govMULTIPLESocratacreation_datetimeex94-c5ad<NA><NA>
4ArizonaMesaMesaMesa Police DepartmentINCIDENTS2016-01-012023-03-3105/15/2023Incidents based on initial police reports take...https://data.mesaaz.gov/Police/Police-Incident...<NA>data.mesaaz.govMULTIPLESocratareport_date39rt-2rfj<NA><NA>
\n", "
" ], "text/plain": [ " State SourceName Agency AgencyFull TableType \\\n", "0 Arizona Gilbert Gilbert Gilbert Police Department CALLS FOR SERVICE \n", "1 Arizona Gilbert Gilbert Gilbert Police Department EMPLOYEE \n", "2 Arizona Gilbert Gilbert Gilbert Police Department STOPS \n", "3 Arizona Mesa Mesa Mesa Police Department CALLS FOR SERVICE \n", "4 Arizona Mesa Mesa Mesa Police Department INCIDENTS \n", "\n", " coverage_start coverage_end last_coverage_check \\\n", "0 2006-11-15 2023-05-14 05/15/2023 \n", "1 NaT NaT 05/15/2023 \n", "2 2008-01-01 2018-05-23 05/15/2023 \n", "3 2017-01-01 2023-05-12 05/15/2023 \n", "4 2016-01-01 2023-03-31 05/15/2023 \n", "\n", " Description \\\n", "0 \n", "1 A data set of all employees that have previous... \n", "2 Standardized stop data from the Stanford Open ... \n", "3 \n", "4 Incidents based on initial police reports take... \n", "\n", " source_url \\\n", "0 https://data.gilbertaz.gov/maps/2dcb4c20c9a444... \n", "1 https://data.gilbertaz.gov/datasets/TOG::gilbe... \n", "2 https://openpolicing.stanford.edu/data/ \n", "3 https://data.mesaaz.gov/Police/Police-Computer... \n", "4 https://data.mesaaz.gov/Police/Police-Incident... \n", "\n", " readme \\\n", "0 \n", "1 \n", "2 https://github.com/stanford-policylab/opp/blob... \n", "3 \n", "4 \n", "\n", " URL Year DataType \\\n", "0 https://maps.gilbertaz.gov/arcgis/rest/service... MULTIPLE ArcGIS \n", "1 https://services1.arcgis.com/JLuzSHjNrLL4Okwb/... NONE ArcGIS \n", "2 https://stacks.stanford.edu/file/druid:yg821jf... MULTIPLE CSV \n", "3 data.mesaaz.gov MULTIPLE Socrata \n", "4 data.mesaaz.gov MULTIPLE Socrata \n", "\n", " date_field dataset_id agency_field min_version \n", "0 EventDate \n", "1 \n", "2 date \n", "3 creation_datetime ex94-c5ad \n", "4 report_date 39rt-2rfj " ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "all_datasets = opd.datasets.query()\n", "all_datasets.head()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b682fcda", "metadata": {}, "source": [ "The source table provides the information needed to create sources and load data as well as background information. It is a DataFrame that can be filtered with [pandas filtering operations](https://pandas.pydata.org/docs/getting_started/intro_tutorials/03_subset_data.html#min-tut-03-subset). Key information includes:\n", "\n", " * **State**: Optionally used when creating a `Source` to distinguish ambiguous sources (i.e. same city name in different states)\n", " * **SourceName**: Original source of the data (typically a shortened name for a police department). Used when creating a `Source`.\n", " * **Agency**: Shortened agency / police department name. Typically the same as SourceName. However, it may be `MULTIPLE` if a datasets contains data for multiple agencies.\n", " * **TableType**: Type of data (TRAFFIC STOPS, USE OF FORCE, etc.). Used when loading data.\n", " * **coverage_start**: Start date of data contained in dataset. Combined with coverage_end, this determines the years available for this datasets when loading data. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n", " * **coverage_end**: Most recently checked date for data contained in dataset. Combined with coverage_start, this determines the years available for this datasets when loading data. If the data has been updated by the dataset owner since the date in `last_coverage_check`, more recent years may be available. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n", " * **source_url**: Homepage for dataset\n", " * **readme**: Direct URL for data dictionary containing definitions of columns, etc. If empty, the `source_url` may also contain a data dictionary.\n", "\n", "With its optional inputs, `query` can be used to filter for desired data. Here is a very specific query using all optional inputs:" ] }, { "cell_type": "code", "execution_count": 15, "id": "e67f90d8-7008-4878-b3bb-99ead5653fa2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateSourceNameAgencyAgencyFullTableTypecoverage_startcoverage_endlast_coverage_checkDescriptionsource_urlreadmeURLYearDataTypedate_fielddataset_idagency_fieldmin_version
73CaliforniaMenlo ParkMenlo ParkMenlo Park Police DepartmentCALLS FOR SERVICE2018-01-012018-12-3105/15/2023<NA>https://data.menlopark.org/datasets/4036c27030...https://data.menlopark.org/datasets/4036c27030...https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...2018ArcGIS<NA><NA><NA><NA>
74CaliforniaMenlo ParkMenlo ParkMenlo Park Police DepartmentCALLS FOR SERVICE2019-01-012019-12-3105/15/2023<NA>https://data.menlopark.org/datasets/e88877f5d9...https://data.menlopark.org/datasets/e88877f5d9...https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...2019ArcGIS<NA><NA><NA><NA>
75CaliforniaMenlo ParkMenlo ParkMenlo Park Police DepartmentCALLS FOR SERVICE2020-01-012020-12-3105/15/2023<NA>https://data.menlopark.org/datasets/510eb69337...https://data.menlopark.org/datasets/510eb69337...https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...2020ArcGIS<NA><NA><NA><NA>
76CaliforniaMenlo ParkMenlo ParkMenlo Park Police DepartmentCALLS FOR SERVICE2021-01-012021-12-3105/15/2023<NA>https://data.menlopark.org/datasets/4c04a71c71...https://data.menlopark.org/datasets/4c04a71c71...https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/...2021ArcGIS<NA><NA><NA><NA>
\n", "
" ], "text/plain": [ " State SourceName Agency AgencyFull \\\n", "73 California Menlo Park Menlo Park Menlo Park Police Department \n", "74 California Menlo Park Menlo Park Menlo Park Police Department \n", "75 California Menlo Park Menlo Park Menlo Park Police Department \n", "76 California Menlo Park Menlo Park Menlo Park Police Department \n", "\n", " TableType coverage_start coverage_end last_coverage_check \\\n", "73 CALLS FOR SERVICE 2018-01-01 2018-12-31 05/15/2023 \n", "74 CALLS FOR SERVICE 2019-01-01 2019-12-31 05/15/2023 \n", "75 CALLS FOR SERVICE 2020-01-01 2020-12-31 05/15/2023 \n", "76 CALLS FOR SERVICE 2021-01-01 2021-12-31 05/15/2023 \n", "\n", " Description source_url \\\n", "73 https://data.menlopark.org/datasets/4036c27030... \n", "74 https://data.menlopark.org/datasets/e88877f5d9... \n", "75 https://data.menlopark.org/datasets/510eb69337... \n", "76 https://data.menlopark.org/datasets/4c04a71c71... \n", "\n", " readme \\\n", "73 https://data.menlopark.org/datasets/4036c27030... \n", "74 https://data.menlopark.org/datasets/e88877f5d9... \n", "75 https://data.menlopark.org/datasets/510eb69337... \n", "76 https://data.menlopark.org/datasets/4c04a71c71... \n", "\n", " URL Year DataType \\\n", "73 https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/... 2018 ArcGIS \n", "74 https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/... 2019 ArcGIS \n", "75 https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/... 2020 ArcGIS \n", "76 https://services7.arcgis.com/uRrQ0O3z2aaiIWYU/... 2021 ArcGIS \n", "\n", " date_field dataset_id agency_field min_version \n", "73 \n", "74 \n", "75 \n", "76 " ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ds = opd.datasets.query(source_name=\"Menlo Park\", state=\"California\", agency=\"Menlo Park\", table_type=\"CALLS FOR SERVICE\")\n", "ds" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d9d7023e-40ac-4e55-81d4-9f1efb9ec2db", "metadata": {}, "source": [ "`get_table_types` finds available table types in OPD. Here, we use optional `contains` input to only get the table types containing the word \"STOPS\":" ] }, { "cell_type": "code", "execution_count": 5, "id": "4d0a3fbd-be70-4717-8705-3339434a1c1f", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['PEDESTRIAN STOPS', 'STOPS', 'TRAFFIC STOPS']" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table_types = opd.datasets.get_table_types(contains=\"STOPS\")\n", "table_types" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c9075d6b-eb04-4760-8de9-1ff4f289dbb9", "metadata": {}, "source": [ "## Loading Data\n", "The `Source` class is used to explore datasets and load data. We first need to create a source, which we can use to view all datasets from that source. Let's create a source of Columbia, South Carolina. We need to specify the state because there are datasets from Columbias from multiple states" ] }, { "cell_type": "code", "execution_count": 6, "id": "7f8680d3-ee9a-413f-935c-1eb50a4c8c50", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
StateSourceNameAgencyAgencyFullTableTypecoverage_startcoverage_endlast_coverage_checkDescriptionsource_urlreadmeURLYearDataTypedate_fielddataset_idagency_fieldmin_version
744South CarolinaColumbiaColumbiaColumbia Police DepartmentARRESTS2016-01-012022-12-3107/07/2023<NA>https://coc-colacitygis.opendata.arcgis.com/da...<NA>https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...MULTIPLEArcGISArrest_Date<NA><NA>0.2
745South CarolinaColumbiaColumbiaColumbia Police DepartmentFIELD CONTACTS2016-01-012022-12-3107/07/2023Field Interview is a collection of data result...https://coc-colacitygis.opendata.arcgis.com/da...<NA>https://services1.arcgis.com/Mnt8FoJcogKtoVBs/...MULTIPLEArcGISTOC<NA><NA><NA>
\n", "
" ], "text/plain": [ " State SourceName Agency AgencyFull \\\n", "744 South Carolina Columbia Columbia Columbia Police Department \n", "745 South Carolina Columbia Columbia Columbia Police Department \n", "\n", " TableType coverage_start coverage_end last_coverage_check \\\n", "744 ARRESTS 2016-01-01 2022-12-31 07/07/2023 \n", "745 FIELD CONTACTS 2016-01-01 2022-12-31 07/07/2023 \n", "\n", " Description \\\n", "744 \n", "745 Field Interview is a collection of data result... \n", "\n", " source_url readme \\\n", "744 https://coc-colacitygis.opendata.arcgis.com/da... \n", "745 https://coc-colacitygis.opendata.arcgis.com/da... \n", "\n", " URL Year DataType \\\n", "744 https://services1.arcgis.com/Mnt8FoJcogKtoVBs/... MULTIPLE ArcGIS \n", "745 https://services1.arcgis.com/Mnt8FoJcogKtoVBs/... MULTIPLE ArcGIS \n", "\n", " date_field dataset_id agency_field min_version \n", "744 Arrest_Date 0.2 \n", "745 TOC " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src = opd.Source(\"Columbia\", state=\"South Carolina\")\n", "src.datasets" ] }, { "attachments": {}, "cell_type": "markdown", "id": "cbf410b5", "metadata": {}, "source": [ "To get a list of available table types:" ] }, { "cell_type": "code", "execution_count": 17, "id": "b8dc350c", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['ARRESTS', 'FIELD CONTACTS']" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.get_tables_types()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "79d8047b", "metadata": {}, "source": [ "You can get the number of records for a dataset using `get_count`. Let's get the number of records in the year 2022 for the FIELD CONTACTS dataset." ] }, { "cell_type": "code", "execution_count": 18, "id": "edafbd84", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "2382" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.get_count(2022, \"FIELD CONTACTS\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8c07a6c1", "metadata": {}, "source": [ "You can find which years are available for a given table type:" ] }, { "cell_type": "code", "execution_count": 12, "id": "086ba8f4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2016, 2017, 2018, 2019, 2020, 2021, 2022]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src.get_years(table_type=\"FIELD CONTACTS\")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b2d809bb-c661-439a-a378-81dc18fccfe5", "metadata": {}, "source": [ "Now, let's load in some field contacts data for 2022." ] }, { "cell_type": "code", "execution_count": 7, "id": "c86f7b24", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ " \r" ] }, { "data": { "text/plain": [ "state: South Carolina,\n", "source_name: Columbia,\n", "agency: Columbia,\n", "table_type: FIELD CONTACTS,\n", "year: 2022,\n", "description: Field Interview is a collection of data resulting from citizen contact related to suspicious activity.,\n", "url: https://services1.arcgis.com/Mnt8FoJcogKtoVBs/arcgis/rest/services/FieldInterview/FeatureServer/0,\n", "date_field: TOC,\n", "source_url: https://coc-colacitygis.opendata.arcgis.com/datasets/ColaCityGIS::field-interview-1-1-2016-3-31-2022/about" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl = src.load_from_url(2022, \"FIELD CONTACTS\")\n", "tbl" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e1ae5a67", "metadata": {}, "source": [ "The loaded data is contained in a [pandas DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) in the table attribute." ] }, { "cell_type": "code", "execution_count": 13, "id": "5122367e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OBJECTIDCase_NumTOCAddressCityZipStateAgeRaceSexContact_TypeYeargeometry
0253512200001082022-01-01 21:47:0012XX Main St2920132WMField Interview2022.0POINT (1989801.776 788862.968)
1253502200001612022-01-02 15:05:0021XX Main St2920129BMField Interview2022.0POINT (1988210.189 793174.093)
\n", "
" ], "text/plain": [ " OBJECTID Case_Num TOC Address City Zip State \\\n", "0 25351 220000108 2022-01-01 21:47:00 12XX Main St 29201 \n", "1 25350 220000161 2022-01-02 15:05:00 21XX Main St 29201 \n", "\n", " Age Race Sex Contact_Type Year geometry \n", "0 32 W M Field Interview 2022.0 POINT (1989801.776 788862.968) \n", "1 29 B M Field Interview 2022.0 POINT (1988210.189 793174.093) " ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl.table.head(2)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c109c211", "metadata": {}, "source": [ "Data can be saved locally as CSV files. This allows you to:\n", "\n", " * Open the data using the software of your choice\n", " * Re-open the data in OPD from a local copy" ] }, { "cell_type": "code", "execution_count": 19, "id": "de4fd694", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OBJECTIDCase_NumTOCAddressCityZipStateAgeRaceSexContact_TypeYeargeometry
025351220000108.02022-01-01 21:47:0012XX Main StNaN29201NaN32.0WMField Interview2022.0POINT (1989801.7762467265 788862.9678477645)
125350220000161.02022-01-02 15:05:0021XX Main StNaN29201NaN29.0BMField Interview2022.0POINT (1988210.189304456 793174.0931758583)
\n", "
" ], "text/plain": [ " OBJECTID Case_Num TOC Address City Zip \\\n", "0 25351 220000108.0 2022-01-01 21:47:00 12XX Main St NaN 29201 \n", "1 25350 220000161.0 2022-01-02 15:05:00 21XX Main St NaN 29201 \n", "\n", " State Age Race Sex Contact_Type Year \\\n", "0 NaN 32.0 W M Field Interview 2022.0 \n", "1 NaN 29.0 B M Field Interview 2022.0 \n", "\n", " geometry \n", "0 POINT (1989801.7762467265 788862.9678477645) \n", "1 POINT (1988210.189304456 793174.0931758583) " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl.to_csv()\n", "new_src = opd.Source(\"Columbia\", state=\"South Carolina\")\n", "new_tbl = new_src.load_from_csv(2022, table_type=\"FIELD CONTACTS\")\n", "new_tbl.table.head(2)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8cd627a7", "metadata": {}, "source": [ "Some datasets contain data for every agency in a state. In this case, you may want to know what agencies are available and optionally, only want agencies containing the word Arlington." ] }, { "cell_type": "code", "execution_count": 8, "id": "dcd649e2", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[\"Arlington County Sheriff's Office\", 'Arlington County Police Department']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src = opd.Source(\"Virginia\")\n", "agencies = src.get_agencies(table_type=\"STOPS\", partial_name=\"Arlington\")\n", "agencies" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8c8d7e82", "metadata": {}, "source": [ "We may also want only load data from a specific agency:" ] }, { "cell_type": "code", "execution_count": 9, "id": "9de443dc", "metadata": {}, "outputs": [], "source": [ "tbl = src.load_from_url(2022, table_type=\"STOPS\", agency=\"Arlington County Police Department\")" ] }, { "cell_type": "markdown", "id": "361ec854", "metadata": {}, "source": [ "## Data Standardization\n", "One of the challenges in analyzing police data is that different agencies will use different column names for the same data and will use different codes and terms for the data in the columns. Particularly, if you are looking at multiple datasets, it is valuable for the data to be standardized so that you know in advance what some key columns will be called and what values will be in those columns. \n", "\n", "To provide the user with more consistent column names and data, OpenPoliceData provides powerful tools to automatically standardize column names and data in order. Columns that OpenPoliceData can standardize include:\n", "\n", "* Date\n", "* Time,\n", "* Gender\n", "* Age\n", "* Race\n", "* Ethnicity \n", "\n", "In addition, OpenPoliceData will combine separate date and time columns into a single datetime column and race and ethnicity into a single combined race column.\n", "\n", "Let's examine what columns we have in our data:" ] }, { "cell_type": "code", "execution_count": 10, "id": "6ec227a3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['incident_date', 'agency_name', 'jurisdiction', 'reason_for_stop',\n", " 'person_type', 'race', 'ethnicity', 'age', 'gender', 'english_speaking',\n", " 'action_taken', 'specific_violation', 'virginia_crime_code',\n", " 'person_searched', 'vehicle_searched', 'physical_force_by_officer',\n", " 'physical_force_by_subject', 'residency'],\n", " dtype='object')" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl.table.columns" ] }, { "cell_type": "markdown", "id": "6ddb81d8", "metadata": {}, "source": [ "The DataFrame currently has date, race, ethnicity, age, and gender columns called `incident_date`, `race`, `ethnicity`, `age`, and `gender`, respectively.\n", "\n", "The unique values in the `race` and `ethnicity` columns are:" ] }, { "cell_type": "code", "execution_count": 11, "id": "fbe2683a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The unique values in the race column are ['WHITE' 'BLACK OR AFRICAN AMERICAN' 'AMERICAN INDIAN OR ALASKA NATIVE'\n", " 'ASIAN OR NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER' 'UNKNOWN']\n", "The unique values in the ethnicity column are ['HISPANIC OR LATINO' 'NOT HISPANIC OR LATINO' 'UNKNOWN']\n" ] } ], "source": [ "print(f\"The unique values in the race column are {tbl.table['race'].unique()}\")\n", "print(f\"The unique values in the ethnicity column are {tbl.table['ethnicity'].unique()}\")" ] }, { "cell_type": "markdown", "id": "d3fe7b74", "metadata": {}, "source": [ "Now, let's standardize the data:" ] }, { "cell_type": "code", "execution_count": 12, "id": "e7a596c5", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The columns after standardization are:\n" ] }, { "data": { "text/plain": [ "Index(['DATE', 'SUBJECT_RACE', 'SUBJECT_AGE', 'SUBJECT_GENDER', 'AGENCY',\n", " 'jurisdiction', 'reason_for_stop', 'person_type', 'english_speaking',\n", " 'action_taken', 'specific_violation', 'virginia_crime_code',\n", " 'person_searched', 'vehicle_searched', 'physical_force_by_officer',\n", " 'physical_force_by_subject', 'residency', 'SUBJECT_ETHNICITY',\n", " 'SUBJECT_RACE_ONLY', 'RAW_incident_date', 'RAW_agency_name', 'RAW_race',\n", " 'RAW_ethnicity', 'RAW_age', 'RAW_gender'],\n", " dtype='object')" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "tbl.standardize()\n", "\n", "print(\"The columns after standardization are:\")\n", "tbl.table.columns" ] }, { "cell_type": "markdown", "id": "9176e7a7", "metadata": {}, "source": [ "The columns have been standardized to include `DATE`, `SUBJECT_RACE` (subject indicates the demographics it applies to a subject/civilian rather than a police officer), `SUBJECT_AGE`, `SUBJECT_GENDER`, `AGENCY`, `SUBJECT_RACE_ONLY`, and `SUBJECT_ETHNICTY`. Each of these columns has had its data standardized into a common format that would be applied to any data that the function `standardize` is applied to. Note also that the original non-standardized columns still exist but they now have \"RAW_\" appended to their column names.\n", "\n", "The original `race` and `ethnicity` columns were standardized to `SUBJECT_RACE_ONLY` and `SUBJECT_ETHNICTY` and then combined into `SUBJECT_RACE`. This combination is consistent with practice used in many fields. The standardized race/ethnicities are Latino of all races, White Non-Latino, Black Non-Latino, etc.\n", "\n", "The standardized values in the `SUBJECT_RACE` column are:" ] }, { "cell_type": "code", "execution_count": 14, "id": "53dd1ced", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array(['HISPANIC/LATINO', 'BLACK', 'INDIGENOUS', 'WHITE',\n", " 'ASIAN / PACIFIC ISLANDER', 'UNKNOWN'], dtype=object)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This would produce the same thing:\n", "# tbl.table[opd.defs.columns.RACE_SUBJECT].unique()\n", "tbl.table[\"SUBJECT_RACE\"].unique()" ] }, { "cell_type": "markdown", "id": "b7e16e61", "metadata": {}, "source": [ "These standardized names will be used in all tables with race columns that are standardized (although they can be customized)." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.3" } }, "nbformat": 4, "nbformat_minor": 5 }