2. Review Datasets

Summary

Similar to the data system review, reviewing datasets will require downloading your department's list from the open data portal, review that list, and make any necessary changes, deletions, or additions.

What is a dataset?

A dataset is the contents of a single database table, worksheet or defined view (like an excel sheet or table in a SQL database). It is stored in a data system and it is used for analysis, reporting or recording information. We need to inventory both published (Available on the Open Data Portal) and unpublished (Available to departments but not on the Open Data Portal) datasets for this inventory. Luckily, you do not have to start from scratch, you can start with a list of your department's previously inventoried datasets.

Where can I find the list of datasets?

The Dataset Inventory is a public dataset on the Open Data Portal containing all known datasets across the city. Find your department's datasets by following the steps below:

  1. Visit the Dataset Inventory page on the Open Data Portal

  2. Click Actions -> Query Data in the upper right corner of the page

  3. Use the Filter function in the lower-left corner to filter for Department or Division

    • You need to press "Apply" at the very bottom for the filter to run

  4. Next click export in the upper right corner to download your list of datasets

How to review datasets

Once the list of existing datasets have been downloaded, the review should focus on three questions:

  • Is the list complete? Does the list have every dataset used by your department? Coordinate with data stewards and other department employees to ensure the completeness of the list. Add new datasets to the spreadsheet if any are not included.

    • Note: it can be helpful to brainstorm datasets from one data system at a time or think of processes which use data and work backwards to find the dataset

  • Can any be removed? It is possible your department is no longer using a data set. If any dataset has been deprecated or is not longer owned/maintained by your department, please remove it from the list.

  • Is the information correct? Each dataset has metadata associated with it such as data classification, lawful bias, and purpose. Please have an owner review each dataset's metadata to ensure it is accurate. A description of each column is included on the dataset primer page on the open data portal, just scroll down to the "Columns in this Dataset" section.

How to update information on datasets

If everything is accurate, no further action is required for the dataset inventory and you can send everything back to DataSF. If you want to make any changes, please add a new column to the far right indicating what changes you've made. See example below:

  • Updated: Some information in the row has been modified (no need to specify what)

  • Delete: This system is no longer in use

  • New: This is a new system that was added in this inventory

Return your reviewed dataset list

Email your list to datasf@sf.gov and cc dan.tonkovich@sfgov.org

Next....

Once the dataset review is complete, a publishing plan can be created

Last updated