Data science and the creek

9/10/2017 | SAMANTHA FOX

Data science and the creek
Samantha Fox

Samantha Fox | Senior Product and Data Analyst | Bio

My path to data science started in a Carolina creek, and ended up in the world of Western water.

One of my earliest memories is playing in the shallow creek behind our house when I was about 8 or so. I contemplated the way water bumped over the rocks, the dips and eddies it took, how the flow slowed down along the edges, and the irregular way it splashed, and thought, If I knew everything about this stream, how fast and how much water, the arrangement of every rock, where it hit sand or finer mud – could I predict exactly how this water would flow? For some reason I kept coming back to this thought over the years. I wondered exactly how much we could know with certainty, and how much of it I could figure out.

Years later, thanks to that formative thought, I had a degree in engineering physics. But by graduation, I had already become attached to the database work I did while working my way through school. To me, data management and analytics are kind of like getting paid to do puzzles every day. So for the next 15 years or so I worked as an analyst and database professional. I learned how to organize, clean, store, retrieve, report, and visualize data. I learned how to optimize for speed and scale. I learned about modeling and forecasting data, and how to draw stories out of it. All of it mirrored that thought at the creek, and led me to think the answer to my question was yes.

The natural sciences study vast bodies of information and phenomena, seeking to elucidate the rules that govern them, which can be used to predict possible future outcomes. Data science does the same. Search engines and social media are the most obvious examples of “big data,” taking tons of pieces of information and finding patterns and relationships, even predicting who you want to connect with and what you’ll like. There’s tons of data about weather and climate, data on money changing hands, data on demographics, crime, disease, and food production. You might think, as I did, that since water is so vital to life in the arid West, it would have equally rich data. Not quite. It’s the most challenging body of data I’ve ever worked on. It’s troubling to think you can’t possibly find the answers you need because you don’t have the data you need, that the answer to my long-ago formative question is no.

There is enough data on water to get by, for now. The infrastructure and investment in water conveyance and administration in the western half of the US is certainly impressive, and has made civilization possible here. Water rights are generally databased and there is data on use and availability. There are some good systems and there are places at the forefront of innovation. But overall, water scarcity is forcing water managers, users and scientists alike to look at the data we have and see that we need to know more and quantify more about water in order to survive, literally. The current problems stem from the fact that water information has grown organically since the Gold Rush, with none of the overarching vision or structure that, for example, commodity markets or digital technologies now have. As with all data sets that have grown this way, it’s time for an overhaul. We have to map the rocks and sands in our creek, and the situation upstream of us, in order to know how (or if) the water will flow.

In my three years on the water team at Water Sage, I’ve seen that too much of the data we need – on supply, demand, and rules that govern water – lives in people’s heads, paper copies, and even microfiche. If it’s digital, it’s stored in hundreds of databases in various stages of repair or neglect. It is often siloed as well – agencies keep only the data they need, in the format they need it, with sometimes very little capacity for sharing it in accessible formats or in real time. At Water Sage we work on building data bridges to understand multiple interconnected regulatory regimes, geopolitical divisions, measurement methods, and time series. As our population grows, our infrastructure ages, and the climate changes, the pressures on water will snowball. A finer understanding of local cause and effect, as well as big-picture insights, will be needed in order to maximize efficiencies and find creative solutions. We’ll need an “Internet of Water” to tie everything together and put our finger on the pulse. We’ll have to have our data correct, integrated across state lines, and well mapped, in order to make sense of it and be able to draw meaningful inferences and predictions from it. I’m happy to be working on a team that is innovating to make this possible, to make the Internet of Water a reality.