Data management

Behind any excellent dashboard is a robust data model. For public dashboards, there are specific dataset (and data connection) best practices that are different from internal operational dashboards.

Your underlying data should be publicly available, preferably on the Open Data Portal.

If the underlying data is on the Open Data Portal, link your dashboard up to that data pipeline to ensure both remain synced and to simplify the data pipeline.
📌
Power BI tip: In PowerBI, when connecting to an API, be sure to adjust the limit to bring in all rows (adding ?$limit=999999999999 to the end of the CSV link).
  • Double click on "source" step and "Ignore all quoted line breaks"
  • For geographic datasets, do not bring in the multipolygon field. Specify only certain columns by: https://data.sfgov.org/resource/d2ef-idww.csv?$limit=99999999999999999&$select=specimen_collection_date,area_type,id,acs_population,new_confirmed_cases,last_updated_at
Screenshot in Power BI showing the data source settings. Ensure you add a limit statement to the end of an API to get all rows, and ignore quoted line breaks.

Review the underlying data for any private, health, or sensitive information before publication.

If you are a CCSF employee DataSF can help with this, along with your department’s privacy or compliance office. You should assume that once you publish the dashboard, users can access all the underlying data tables in your public dashboard. If that is a problem, the data needs to be aggregated to the appropriate observation level before being linked into the dashboard.

Aim for a small, simple underlying data model.

Just as the data visuals for the public need to be clear, simple, and accessible, the underlying data model should mirror this.
If there are complicated data pipelines feeding into your dashboard, aim to have those combined and cleaned upstream from the dashboard itself. This ensures that no complicated logic exists solely in Power BI or Tableau and is inaccessible to other analysts or curious journalists.

Maintain internal data documentation.

Documentation should be is accessible to the entire team maintaining public dashboards. This is critical for the sustainability and resiliency of your dashboards. Create the documentation in whatever tool your whole team has access to and is convenient (this could be Sharepoint Word/Visio/Excel, MIRO, Trello, Asana, etc.).
The documentation should contain vital information on each dashboard, specifically, enough information that someone else on the team could use the information to de-bug a problem and re-publish. This includes:
  • Data source information (including all the datasets within the dashboard, links to that data, contact information who owns the source data, time it updates, any special notes).
  • Access/audience (confirm the approved audience level for each)
  • Location of dashboards (with links to the online reports, datasets, and, if applicable, the desktop files).
  • Any step-by-step guidance someone would need to troubleshoot an issue (that isn’t familiar with the dashboard).
  • This documentation should include, or link to, any other resources your team may use (like templates, standard work, checklists, etc).

📌
Power BI data tips

Organize your measures into folders. This is particularly helpful if you have a lot of measures and dynamic titles and alt text. Read more about organizing measures into folders.
Maintain a separate public workspace to contain all public dashboards. This is good for documentation and resiliency (if anything goes down, your whole team will know where to go), and for web performance. Ensure that workspace is not a premium workspace. Read more in DataSF's Publish to Web Tip Sheet.
If you have multiple dashboards/reports that need to connect to one data source (or group of data sources), leverage Power BI’s dataset capability. This will save you many headaches when troubleshooting, updating, or correcting data. Read more about Power BI datasets: