Step 2: Define the Dataset
In Step 1, you collected needs and requirements. You defined the purpose, identified your stakeholders, and assessed their needs and requirements. In Step 2, you will define the data you will obtain. Follow the steps below to define your data model. The steps will be iterative:
2.1 Collect data standards and requirements
2.2 Define your tables
2.3 Define your fields
2.4 Define business rules
2.1 Collect data standards and requirements
Assemble existing data standards and requirements, such as:
Naming conventions and standards
Existing field definitions and standards (stay tuned for future resources on this)
Data documentation requirements
Reference data lists and sources
Modeling tools or guidance and/or expectations
You'll reference and incorporate these as you build your model.
2.2 Define your tables
Your data may be a single table. If you have more than one table, you may want to create a diagram. This is known as data modeling or creating an entity relationship diagram. There are many helpful templates and references for doing this. The basic process is to:
List and define the objects, people, places, events or concepts in your business process.
Describe how they relate to each other.
You can use powerpoint or diagramming software (there are many) to model your data.
2.3 Define your fields
For each table, define the fields in the table. As appropriate, capture the following for each field:
Adapted from:
"A guide to data development", Australian Institute of Health and Welfare, Caberra, 2007.
"Guide to Using Data Collection Form and Data Specifications", Adult Cardiac Surgery Database, v2.61 Training Manual, Society of Thoracic Surgeons.
2.4 Define business rules
What constraints do common sense, policy and users put on the data? What input triggers an action or event? For example:
House numbers on a certain street must be within a certain range
Real estate sale prices shouldn't be negative
And no one alive today was born before 1900
When a 'status' field changes to "closed", automatically populate the 'dateclosed' field with the current date and time
Restate business rules in the language of your database (e.g., Date of Birth >= 01/01/1900). Enforce business rules at the point of collection or during maintenance. This will help you identify:
Mistakes such as incorrect data input or
Systemic issues, such as out of date reference data or address updates not being available in a timely manner
2.4.1 Questions to elicit business rules
Use the following questions to help identify business rules:
What are examples of nonsensical entries? What rules would exclude nonsense?
What commonsense rules follow from the time course of events associated with each record?
If two records describe earlier and later versions of one entity, are the two records consistent?
For each money transaction, there is a corresponding transaction somewhere in the world. Should that transaction be visible somewhere else in the database?
What laws or policies govern the subject of the data? What rules do laws and policies impose?
2.4.2 Sample business rules
The following business rules are from the design of a Campaign Finance database.
A report from B that states B contributed to A should have a match report from A that states that A received a contribution from B
Starting balance on report should match ending balance on previous report
If no previous report, starting cash balance is $0
Date of transaction must be within proper date range of the report.
Total of contributions from individual to candidate must be less than or equal to legal limit for each time period
Amended status must be "N" for first report submitted for any given year and cycle
Amended status must be "Y" for all but first report submitted for any given year and cycle
Expense descriptions are not null and must have content other than "expense", "reimbursement", etc
Employer and Employer Address are not null for larger donations (as defined by law)
All amount fields must be positive
Last updated