Sarah Cohen
The Washington Post

You've had a good daily story, or gotten a good tip and worked your way through the bureaucratic morass of public records. The data is on its way. It's sitting in your email in a huge attachment, the CD is being picked up or the tape was sent out for reading. Do the data dance now. You might never get the chance again.

Early reporting splits into two forms -- the nerd work and the reporting work. Don't shortchange either one as soon as you get the data.

1. Did you get what you asked for? Is it in the form you expected? Did you get all the columns (fields) you think you were entitled to? Did you get about the right number of records? Are there any big holes? Some of this you won't know until you run some reports on your data. You might have missing years, missing people or missing cases.

2. Is it in the form you expected? Increasingly, "helpful" public affairs offices are compiling reports or importing your database into an Excel spreadsheet. Don't let this happen - if there are mistakes, you want to be the one to make them.

3. Regroup and check the story. By the time you get data, you've usually moved ahead on your story and it no longer resembles the original tip. Look at the data you got in a new light -- how would it help you document the newer ideas? Does the original story still work now that the data has arrived? One big problem in database work is that, if you limited your original request too much, you may not have the data you need now to move ahead. It's a good reason to ask for everything, not just what you think you'll need.

4. Look for the impossible. Run early queries to find examples that should never happen - old people going to high school, children with driving infractions, convictions without charges. Report them out. These might lead you to wrongdoing or point out errors - either way, you want to know.

5. Look for the illegal or improbable. Closely related to #4, but this time you're looking for things that the rules say should not happen, but might be good stories if they do -- buildings flunking inspections but getting occupancy permits; tickets wiped out without being paid or contested, repeat offenses without punishment. Report them out. They might be data errors, they might be stories. Either way, they'll help you know what you can do with the data.

5. Make the first (of many) reality checks. Look up cases you've already reported out -- the day story that prompted your investigation, or a tip that a source led you to believe would make a good story. How does it look in the data? How would you find more? Does the data reflect the real world? How can you distinguish these newsworthy cases from the mundane? Look for the M-O.

6. Make an "audit trail" plan NOW. Resist the urge to start correcting mistakes you stumble on in the data. Instead, decide how you will document changes and updates you make, and whether you want to correct errors for some when you can't for all. Key to this is making sure that every table has a unique identifier that can be retraced if you have to go back to the original data or work with an updated copy from an agency. Sometimes it's just a matter of assigning a sequential number to each record as it's imported. Consider how much "hand work" and pointing-and-clicking you want to do (rather than programming). If you do much, document it profusely. You will need it.

7. Review the basics. Go back to the investigative reports from the GAO, inspector generals, auditors or even academic papers to see how they have used similar data, how they identified cases. Look at the data for anything they missed, or review the data to see if you can repeat their work.

8. Interview (or re-interview) the nerd at the agency. Now that you have the data, it is often much easier to get the agency to allow you to talk with the person who created the database for you. (Now, you just have "technical" questions on what you got.) Sit down with the expert -- ask for advice, and ask if there are any fields you failed to get that you need. Now that you have this, you can often just get any data you forgot to ask for directly, treating it as a "clarification" of your original request.

9. Talk with your editor about your goal. The key decision you will have to make early in this process is whether you will use your database as a source of leads for reporting, a source of statistics, or both. If you need the statistics, look carefully at the flaws in the data and decide if they are acceptable. Talk with your editor about the difference If it's OK to say "at least 10 cases were confirmed", then you don't need stats. If you need to say "the agency dropped one-third of its cases" then you do. Evaluate the data both ways. Data impossible to use for statistics is often great for story leads.

10. Look for complementary data NOW. Maybe the data you'd hoped for isn't in the data you got. Maybe you can find it elsewhere. Start those public records requests now.

11. Draft the strongest sentence you would hope to write. Once you've written your dream sentence, go back and see whether the data would be able to document it. Keep changing the sentence until you find one that can be proven or disprove with the data. Decide how much work it would take and how many have to be reported out to let you write it. Don't fill in the numbers now in your sentence, but make sure you know how many are "enough", and how pervasive a trend must be to be newsworthy. Ten murderers who escaped from halfway houses might be enough. Ten contracts with the county executive's pals might not be.

Back to journalism links page

On to Phase II -- Lab to Field and Back Again