📐
DataSF | Data Standards Handbook
  • Introduction
  • Data Structure and Formats
    • Data Structure and Formats
      • Column Headers & Order
      • Date and Time
      • Text
      • Numeric
      • Location (coordinates)
      • Location (addresses)
  • Standard Reference Data
    • Reference Data Overview
    • Reference: General Admin
      • Department Names and Codes
    • Reference: Demographics
      • Sexual Orientation & Gender Identity
      • Race and Ethnicity
        • City and County of San Francisco
          • San Francisco Recommended Standard
            • Appendices
          • Department of Public Health’s Ethnicity Guidelines
        • State of California
        • Federal Government
    • Reference: Basemap
      • Overview
      • Parcels
      • Building Footprints
      • Address Numbers
      • Street Names
      • Street Suffix Abbreviations
      • Street Centerlines and Nodes
    • Reference: Boundaries
      • Census
      • Neighborhoods
      • Supervisor Districts
      • Zoning Use Districts
  • Appendix
    • Reserved Column Names
    • Reference Data Index
    • Contributing
    • Acknowledgements
    • License
    • See our other explainers
Powered by GitBook
On this page
  • Considerations for categorical variables
  • Character case
  • Is anything wrong, unclear, missing?

Was this helpful?

  1. Data Structure and Formats
  2. Data Structure and Formats

Text

PreviousDate and TimeNextNumeric

Last updated 5 years ago

Was this helpful?

  • UTF-8 encoding should be used

    • This ensures that special characters can be decoded by users

  • No line breaks within cells

    • This can break parsing in software like Excel, introducing data integrity issues

    • There are many ways to remove and detect line breaks, but this can vary based on how you're extracting data

Considerations for categorical variables

  • Please maintain consistency with canonical and standard reference lists

    • This helps with analysis across departments and data systems

  • , including the departmental steward of the list where applicable and links to the data

Character case

Text should be presented in the easiest to interpret/read format where appropriate.

Title case

  • Address String

  • Categories when either the source system presents them this way or it is easy to interpret from the source

Upper case

  • Acronyms - e.g - PSA (Park Service Area)

  • States - e.g. CA

Lower case

  • Categories when the source system presents them in caps and there's no way to interpret them to title case

Is anything wrong, unclear, missing?

for humans and just as useful to machines, note exceptions above

Common reference lists are provided within this document
Research suggests lower case as opposed to uppercase is easier to read
Leave a comment.