Public Data: A Data Collection Specialist’s Perspective

3/3/2017 | JAMIE STONE

Public Data: A Data Collection Specialist’s Perspective
Jamie Stone

Jamie Stone | Senior Software Engineer | Bio

Data is ever-growing. The current rate is exponential.  In fact, 90% of the world’s existing data was created in the last two years alone.  We live in a world that is constantly looking for new ways to access data and use it to be more connected to the people around us, both far and near.

Through the last two decades, we have drastically changed the way that we handle and analyze data.  In 1990, if you wanted to know what year Colorado became a state, you would break out an encyclopedia and spend minutes flipping through pages trying to find a single piece of information. In 2017, a simple Google search will produce the same results in 0.76 seconds.  No paper cuts, no lugging around 20-pound books, and no need to worry if a page or book you needed isn’t in your collection.

Every second, over 40,000 searches are sent through Google from all over the world.  People are constantly looking for answers to new questions as well as looking for faster ways to find these answers.  People are constantly demanding knowledge, attention, interaction, and connection, all of which is provided in some way by searching for and accessing data.  Data is spread far and wide throughout the web, in a multitude of forms.  But it isn’t always so easy to find exactly what you are looking for and sometimes Google isn’t always the answer.  This is where I come in.

As a data collection specialist, I have spent years behind a keyboard, researching public data and how it is managed and accessed throughout the web.  In my years of experience, I have figured out a few important points about public data:

  1. Public data is not necessarily easy to find.
  2. Public data is not always free.
  3. Public data is not always presented in a useable format.

Working with Water Sage, I quickly found out that the three points above ring true for many states’ public data systems.  Land parcel data is a perfect example of these points.  For example, Colorado has digital land parcel data available for 47 of its 64 counties.  We are able to collect and aggregate all of these counties into a single dataset in our application for easier use, but it was not an easy effort.  It took months to search through each county’s website to see which has data available and which does not.  You can imagine when we began working on Texas, this process took even longer for Texas’ 254 counties.

The other option currently available is to pay another company to do a similar process.  There are a couple companies that maintain similar data warehouses of land parcel data for counties across the United States.  It is very convenient to get data from these companies, but they charge a large fee because of the effort needed to obtain the information.  For instance, if you wanted to purchase parcel data for the entire state of Texas from one company, it would be over $45,000 for a one-time download of the current data they have available, some of which may be months old.  You would then have to pay that same amount each time you wished to have it updated in the future.  Although this data is public, it definitely is not free.

While building Water Sage, we also have come to realize that public data isn’t always available in simple or easy to use formats.  Some entities provide data in new and easy to use formats, like shapefiles, KML (Google Earth) or csv (comma-separated values).  This makes it very easy to read the data and insert it into our data warehouse.  Other times we might find the data only in formats that allow for manual processing.  For example, many counties still maintain their land parcel data in paper format stored at the county courthouse.  This would require that someone go to the courthouse to access these files or have the staff spend hours scanning documents.  You then would have to spend the time and money to convert the data into a useable format.  The amount of work required to gain access to an up-to-date version of the entire county’s dataset, would be impossible to maintain.

We have invested years into collecting and organizing both public and private data sources from western states to provide the wealth of knowledge we now offer with Water Sage.  We currently have the largest collection of Colorado land parcels available in one system, and this alone took months to accomplish.   Each state that we add to our application requires effort to set up automated collections to ensure that our data is as up to date as possible so that our users have access to the best collection of data available.

Water Sage is a great example of how the right data (no pun intended), with proper management and analysis, can provide a simple and efficient way to answer the questions that would previously require hours or days of work to accomplish.  Our team puts a lot of effort into aggregating a multitude of complex datasets into a single platform that is not only functional, but user-friendly as well.  The data that we have worked into Water Sage comes from a variety of sources ranging from state and federal entities to local associations and everywhere in between.  We provide a consistent way to research the collected data and to get answers much faster than through other means.

In the end, the better we can manage and analyze data, the better we can provide Water Sage users with quick and easy access to answer to their questions.  Simple, easy to use data revolutionizes the way research is completed.