📐
DataSF | Data Standards Handbook
  • Introduction
  • Data Structure and Formats
    • Data Structure and Formats
      • Column Headers & Order
      • Date and Time
      • Text
      • Numeric
      • Location (coordinates)
      • Location (addresses)
  • Standard Reference Data
    • Reference Data Overview
    • Reference: General Admin
      • Department Names and Codes
    • Reference: Demographics
      • Sexual Orientation & Gender Identity
      • Race and Ethnicity
        • City and County of San Francisco
          • San Francisco Recommended Standard
            • Appendices
          • Department of Public Health’s Ethnicity Guidelines
        • State of California
        • Federal Government
    • Reference: Basemap
      • Overview
      • Parcels
      • Building Footprints
      • Address Numbers
      • Street Names
      • Street Suffix Abbreviations
      • Street Centerlines and Nodes
    • Reference: Boundaries
      • Census
      • Neighborhoods
      • Supervisor Districts
      • Zoning Use Districts
  • Appendix
    • Reserved Column Names
    • Reference Data Index
    • Contributing
    • Acknowledgements
    • License
    • See our other explainers
Powered by GitBook
On this page
  • Column Headers
  • Column Order
  • Is anything wrong, unclear, missing?

Was this helpful?

  1. Data Structure and Formats
  2. Data Structure and Formats

Column Headers & Order

PreviousData Structure and FormatsNextDate and Time

Last updated 2 years ago

Was this helpful?

Column Headers

  • Only use alphanumeric or these 3 special characters: period (.), dash (-), and underscore (_)

    • Ampersand (&) should be replaced by “and” if needed

  • Each must be unique

    • Can’t have two headers called "duration"

  • Units of measure should be omitted

    • Units can and should be provided with the data dictionary

  • Keep short (less than 30 characters)

    • A full description can and should be provided with the data dictionary

Example: date_received, applicants_address, supervisor_district

Column Order

  • Unique identifiers should be in the left-most column if applicable

  • Date and time variables should be in the first column for time series data

  • Fixed or classified variables should be ordered with the highest-level variable on the left and most granular variable on the right, for example

    • : service_name, service_subtype, service_details

    • : category, descript

  • Observed variables should always be on the rightmost columns, these are measured variables often numeric, for example:

    • Duration

    • Number of Units

    • Number of Stories

    • Year Built

    • People Served

Is anything wrong, unclear, missing?

311 cases
Police incidents
Leave a comment.