Wrangling the data sets that you can so efficiently access in Water Sage is a full time endeavor for a whole team of people. From cleansing data to understanding legalities to creating intuitive visuals, our team brings order and ease to the world of water rights research. Read this first in a series of technical blog posts to gain some insight into our approach to all things data.
If you’re interested in water issues in Western states, you’ve probably seen the photo above. (Is that Will Ferrell on the right?) And you’ve heard the adage. (It was not coined by Mark Twain, though it sounds like him.) And if you’re new to water rights, from these two tidbits you can see that the driving force around water rights in Western states is contention. There’s only so much water out there and a whole lot of people want to use it. If you don’t want to run the risk of getting shot, you’ve got to know how much water you have a right to, and what you can use it for, and where you can put it. You’ve also got to know if you can even take it out. Where do you stand in the prior appropriation system (respect thy seniors!)? Will someone else get all the water before you, this year? When can you divert the water? If you don’t have access to enough, who can you call on? How has the water right you just bought performed in the past?
There’s a lot to know. When I started at Ponderosa, I thought, as important as water is, that surely there is a central data set to peruse? Surely there is some kind of data standard? As a database professional, naturally I’m on a continuous quest for order. But I soon found out why a lot of people, on hearing what we’re trying to do, say, “Oh wow, that sounds hard!” or just “Good luck.” From a data point of view, it can be hard to tell what a water right is, much less find all the information about it.
Thankfully, I get to make a lot of water puns. Well, can you blame me? They just flow. This is a drop in the bucket.
Actually, various state websites provide a lot of data. As much as they can. But often state governments aren’t funded well enough in the IT realm to be able to do everything they want to with the information they have. It’s time-consuming enough for these state agencies to convert paper data into digital format and verify its accuracy, in addition to their day-to-day administration of water rights. And if you’re comparing one state to another, formats and websites differ widely. Regulations and legal considerations vary widely. If you want to get a grip on multiple states, you’ll have to learn multiple websites, conventions, nuances, and legal definitions, and probably build your own geospatial database to hold it all, not to mention cleanse the data itself. You’ll need technical know-how, legal expertise, and lots and lots of time and patience.
That’s where the Ponderosa team comes in. We aim to augment and combine these disparate data sets into one where geospatial queries are easy, more search functionality is provided, and legalities and other important state-specific factors are distilled (whiskey pun!) into the presentation of the data. And it takes a full-time team. I oversee the data: I have market insight flowing in from a team of research analysts, data flowing in from a team of programmers who acquire it, and I help shape the data to flow into Water Sage, where our team of developers take over and create the slick interface you can now buy access to.
We hope our tools will help reduce the fightin’. The drinkin’ is a matter of discretion.
In this blog series, I’ll chronicle the ways in which building the databases behind Water Sage is, and will continue to be, an adventure. With each new state we study, it seems we’re awash in a sea of unruly information, for a time. It’s not a typical database project, since every state has different regulatory and legal frameworks that affect how the data is organized. Each state also has different elements of what constitutes a water right, and the data’s also in varying levels of quality and completeness. Sometimes there’s a unique identifier, sometimes not. Sometimes priority dates are given in the state system, sometimes not, or there may be additional ways of ranking water rights. There may be duplicate records, or types of records that supersede other records. There are usually multiple spellings of commonly used tags that need to be resolved into a normalized table. Sometimes the township/range/section information of a place of use isn’t within the state (turns out it’s hard to use Montana water in Kathmandu). It’s all in a day’s work for a business intelligence developer though, and today we have two states completed, a third on the way, and a plan to tackle the rest of the West ahead of us.
As for our visual tool, it’s created through a fairly fast-paced and fluid feedback loop within the team. I build, clean, and manage our data sets for each state based on a month-long design phase, the developers build out a Water Sage interface around the data structures, and then we test and modify both, responding to market feedback and new requirements that may arise. We want to devise the best way to show every important aspect of a water right.
What does it mean that a water right is involved in an augmentation plan or a temporary water use agreement? How significant is it that a water right is located in a designated basin, or that the groundwater is classified as non-tributary? How will engineers use the data differently than attorneys or real estate professionals or individual water right holders? How will we find and provide more data, and make it intuitive to use? How in the world do we model all of these data sets to generate actionable intelligence? We’ll talk about these issues and more in this technical blog column.