Text
UTF-8 encoding should be used
This ensures that special characters can be decoded by users
No line breaks within cells
This can break parsing in software like Excel, introducing data integrity issues
There are many ways to remove and detect line breaks, but this can vary based on how you're extracting data
Considerations for categorical variables
Please maintain consistency with canonical and standard reference lists
This helps with analysis across departments and data systems
Common reference lists are provided within this document, including the departmental steward of the list where applicable and links to the data
Character case
Text should be presented in the easiest to interpret/read format where appropriate.
Title case
Address String
Categories when either the source system presents them this way or it is easy to interpret from the source
Upper case
Acronyms - e.g - PSA (Park Service Area)
States - e.g. CA
Lower case
Categories when the source system presents them in caps and there's no way to interpret them to title case
Research suggests lower case as opposed to uppercase is easier to read for humans and just as useful to machines, note exceptions above
Is anything wrong, unclear, missing?
Last updated