Often defined as % of data values that are complete, e.g. % of buildings with a parcel number. Mandatory elements should be 100% complete. You can also define completeness for a dataset (versus a single field).
We have a universe of 300 buildings and 250 entries. Our dataset has a completeness rate of 83%.
Each data field has a syntax, range or set of rules it should conform to. It can be the contents of a pick list (constrained list of possible values) or valid numerical ranges (greater than 0) or format like UTM. During Step 2, you should have listed what is a valid input and then ensure that input methods and checks reinforce those rules.
The date of building occupancy should be recorded as YYYYMMDD. (Note - for usability purposes, you wouldn't want to require end users to input in this format. A drop down calendar would be much easier.)
How timely is the data? What is the time difference between when the event, activity, etc took place and when you obtain the data? You can measure it as the time difference or frequency of collection. An example rule could be: if we collect data on paper forms, it must be inputted within a week. Timeliness should meet the business needs of the dataset.
The core building data is collected annually. A subset if fields must be updated quarterly.
If an object or event is unique in the real world, it should be unique in your dataset. You could define a metric such as items in dataset / count in real world. If the metric is 100% you are on target, if not you either have too many records (duplicates) or you are missing buildings. Another example: having no more than 1 enrollment record per student per school year. This allows for multiple records of the same student in the same dataset, but with different values in related fields, e.g. year.
For our 300 buildings example, we should have no more than 300 buildings in our dataset.
Does your dataset share fields with other datasets? For example, neighborhoods, districts, gender, etc. If so, your dataset should use the same definitions. If possible, use the same definition for shared fields (e.g. client information, address, race/ethnicity, gender). Make geographic boundaries like planning district, supervisor district or census tract standard as well. Having standards and being consistent in implementation makes it easier to combine datasets. Standards may come from City, State, or Federal governments as well as professional organizations. Consistency also applies to format. For example, dates should have a consistent format across all datasets. It can also apply to derived data, for example, how you calculate age from date of birth. You can also constrain how one data element relates to another. For example, an address with a zip code of 94102 must mean the state is "CA".
The field building use must match the definition in other datasets (e.g. commercial, public, private, etc).
Accuracy is the degree to which your dataset represents reality. This one is trickier to check because you have to look outside your dataset. Ways to check accuracy include comparing to similar datasets as well as doing spot checks or audits. Use the accuracy rates in sample inspections to estimate the accuracy of the dataset and/or fields. You should also establish procedures for incorporating accuracy checks into your change management processes.
Annually, staff takes a sample of buildings and compares the database to site visits.