{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Data Source Table Dictionary/Key\n", "\n", "The source table provides the information needed to create sources and load data as well as background information about each dataset. Below are the definitions for each column in the source table:\n", "\n", " * **State**: Name of the state where the agency(s) described in the data are. If agencies are in multiple states, value will be `MULTIPLE`. This column is optionally used when creating a `Source` to distinguish ambiguous sources (i.e. same city name in different states)\n", " * **SourceName**: Original source of the data (typically a shortened name of a police department). Used when creating a `Source`.\n", " * **Agency**: Shortened agency / police department name. Typically the same as SourceName. Value is `MULTIPLE` if a datasets contains data for multiple agencies.\n", " * **AgencyFull**: Full name of agency.\n", " * **TableType**: Type of data (TRAFFIC STOPS, USE OF FORCE, etc.). Used when loading data.\n", " * **coverage_start**: Start date of data contained in dataset. Combined with coverage_end, this determines the years available for this datasets when loading data. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n", " * **coverage_end**: End date of data contained in dataset at the time of the msot recent update. Combined with coverage_start, this determines the years available for this datasets when loading data. If the data has been updated by the dataset owner since the date in `last_coverage_check`, more recent data may be available. NOTE: Often, agencies store their data in different datasets for different years so one table type may be spread across multiple datasets corresponding to each year of data.\n", " * **last_coverage_check**: Date that `coverage_start` and `coverage_end` were last updated.\n", " * **Year**: Year of the dataset. Either a single year for data that is released annually, `MULTIPLE` for data containing multiple years, or `NONE` if the data is not for a particular year or set of years.\n", " * **agency_originated**: Whether the data was originally generated by the agency it describes. If the value is 'Yes' or empty, the data originated with the agency described.\n", " * **supplying_entity**: The organization that supplied the data if it was not the agency described in the data.\n", " * **Description**: Description of the dataset\n", " * **source_url**: Homepage for dataset\n", " * **readme**: URL for data dictionary containing definitions of columns, etc. If empty, the `source_url` may also contain a data dictionary.\n", " * **URL**: Location of data or API endpoint. If `dataset_id` is not empty, URL is combined with `dataset_id` to locate data.\n", " * **DataType**: Type of data (CSV, Excel, ArcGIS, Socrata, etc.)\n", " * **date_field**: Column in the data where date information is stored. Absence of this value does not indicate that there is no date field. This value may be empty if OPD does not internally require a value to be set.\n", " * **dataset_id**: If required, one or more dataset IDs are stored here that are used in combination with `URL` to locate data.\n", " * **agency_field**: For multi-agency data, this is the the column in the data that indicates which agency a row corresponds to.\n", " * **min_version**: Minimum OPD version required to load a dataset\n", " * **py_min_version**: Minimum Python version required to load a dataset\n", " * **query**: Query to perform on the data after loading from the source but prior to providing to the user. This is only used in rare cases where the data must be filtered in order to match the dataset described in the source table. For example, a dataset could have data for both a municipality's police and fire department, and the fire departments may be filtered out since OPD only provides information on law enforcement agencies. Another example is if the `TableType` is Officer-Involved Shootings and the full dataset contains all shootings (not just officer-involved), the data would be filtered so that only officer-involved shootings are returned. \n", "\n", "With its optional inputs, `query` can be used to filter for desired data. Here is a very specific query using all optional inputs:" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }