Step 2: Define the Dataset

In Step 1, you collected needs and requirements. You defined the purpose, identified your stakeholders, and assessed their needs and requirements. In Step 2, you will define the data you will obtain. Follow the steps below to define your data model. The steps will be iterative:

  • 2.1 Collect data standards and requirements

  • 2.2 Define your tables

  • 2.3 Define your fields

  • 2.4 Define business rules

2.1 Collect data standards and requirements

Assemble existing data standards and requirements, such as:

  • Naming conventions and standards

  • Existing field definitions and standards (stay tuned for future resources on this)

  • Data documentation requirements

  • Reference data lists and sources

  • Modeling tools or guidance and/or expectations

You'll reference and incorporate these as you build your model.

2.2 Define your tables

Your data may be a single table. If you have more than one table, you may want to create a diagram. This is known as data modeling or creating an entity relationship diagram. There are many helpful templates and references for doing this. The basic process is to:

  • List and define the objects, people, places, events or concepts in your business process.

  • Describe how they relate to each other.

You can use powerpoint or diagramming software (there are many) to model your data.

2.3 Define your fields

For each table, define the fields in the table. As appropriate, capture the following for each field:

Adapted from:

2.4 Define business rules

What constraints do common sense, policy and users put on the data? What input triggers an action or event? For example:

  • House numbers on a certain street must be within a certain range

  • Real estate sale prices shouldn't be negative

  • And no one alive today was born before 1900

  • When a 'status' field changes to "closed", automatically populate the 'dateclosed' field with the current date and time

Restate business rules in the language of your database (e.g., Date of Birth >= 01/01/1900). Enforce business rules at the point of collection or during maintenance. This will help you identify:

  • Mistakes such as incorrect data input or

  • Systemic issues, such as out of date reference data or address updates not being available in a timely manner

2.4.1 Questions to elicit business rules

Use the following questions to help identify business rules:

  • What are examples of nonsensical entries? What rules would exclude nonsense?

  • What commonsense rules follow from the time course of events associated with each record?

  • If two records describe earlier and later versions of one entity, are the two records consistent?

  • For each money transaction, there is a corresponding transaction somewhere in the world. Should that transaction be visible somewhere else in the database?

  • What laws or policies govern the subject of the data? What rules do laws and policies impose?

2.4.2 Sample business rules

The following business rules are from the design of a Campaign Finance database.

  • A report from B that states B contributed to A should have a match report from A that states that A received a contribution from B

  • Starting balance on report should match ending balance on previous report

    • If no previous report, starting cash balance is $0

  • Date of transaction must be within proper date range of the report.

  • Total of contributions from individual to candidate must be less than or equal to legal limit for each time period

  • Amended status must be "N" for first report submitted for any given year and cycle

  • Amended status must be "Y" for all but first report submitted for any given year and cycle

  • Expense descriptions are not null and must have content other than "expense", "reimbursement", etc

  • Employer and Employer Address are not null for larger donations (as defined by law)

  • All amount fields must be positive

Last updated