Turning emergency department data into public health insight

New York’s SPARCS emergency department database made it possible to study disease burden across both space and time, but only after extensive cleaning and geocoding work.

The underlying data was large and messy. Address fields were inconsistent, incomplete, and shaped by the realities of emergency room intake rather than the needs of research. Making the data usable required large-scale preprocessing, regular expressions, Python scripting, and batch geocoding workflows built to handle variation across more than 100 million records.

Custom geocoders were developed using TIGER/Line data and reference data gathered from local government sources across New York State. After cleaning and processing, 97.3% of the records were successfully geocoded, with 95% match accuracy or higher.

That level of preparation changed what the database could be used for. Once records could be reliably located in space and linked with ICD-9 codes and visit dates, the database became useful for examining disease burden geographically and over time. Patterns that would be difficult to detect in tables or citywide summaries became visible at a more local scale.

The limits of the data were part of the work as well. Some hospitals, for example, assigned a common placeholder location to represent homeless patients. That kind of institutional practice creates visible concentrations in the data that do not necessarily reflect residential pattern. Issues like that shape how the results need to be interpreted and are part of what makes this kind of systems work necessary.

What matters here is not the map for its own sake. It is the ability to make patterns visible that would otherwise remain buried inside a large institutional system.

Related publications

Previous
Previous

Building a usable food environment dataset for New York City