Data, Data, Everywhere…Or Not

This year’s Climate and Society class is out in the field (or lab or office) completing a summer internship or thesis. They’ll be documenting their experiences one blog post at a time. Read on to see what they’re up to.

Tae Hamm, C+S ’18


The purple area on the map to the right is from the world atlas of large flood events (shown on the map to the left) clipped on to an overlay of Vietnam. Each boundary shows the area affected by the flood and represents a discrete flood event.

Data is important. Really, really important. It is the key ingredient for us to create a model, which can then produce some useful insights. For the last two semesters, I have dealt with many different types of data and models to plug that data into. While I have learned how data can affect the model outcome (i.e., garbage in, garbage out), I’ve learned relatively little about how data is managed in different parts of the world. When it came to finding data for many of my projects, I was a spoiled. All I had to do is visit International Research Institute for Climate and Society (IRI) Data Library, select the product and region I was interested and hit download. But that all changed this summer during my internship at the IRI with Pietro Ceccato.

This summer, I’ve been working on precipitation and flood data for Vietnam. Specifically, I’ve been working to correlate Ir:a long period of precipitation data to flood occurrences sin search of patterns and creatinge a map for that shows vulnerable populations and households at risk of flooding. To achieve these, I had appropriate data for precipitation, floods and various social indicators like poverty levels.

Precipitation and flood data were easy to come by thanks to IRI Data Library and Dartmouth Flood Observatory (DFO). I soon found out, however, that the flood data from DFO had a major red flag. Areas affected by flood in the dataset were selected using “natural neighbor interpolation,” a spatial analysis method that generates a natural neighbor surrounding the data points, the areas reported to have experienced flood. Floods do not affect a single location, so it is logical to use this algorithm, but the data from DFO included locations quite far away from the reported area. For instance, for flood that occurred in Chongzua, China, the data included areas that extended to Danang, Vietnam, nearly 440 miles away. Since my study focused on Vietnam, it was crucial for me to have an accurate data about whether or not floods actually affected Vietnam, which was hard to confirm with DFO data.

Precipitation data plotted along with the number of flood occurred from 1985 to 2010.

To make matters worse, the social indicators necessary to create the flood vulnerability map, such as the poverty index, total household, and senior population data were nowhere to be found. There were no government agencies offering an open data platform for any of these indicators. I was extremely frustrated, but it also made me wonder why the Vietnamese government hasn’t initiated top-down open data projects like the ones led by the Federal Emergency Management Agency (FEMA) in the U.S. such as the data feeds that provide data for disaster and emergency declarations.

While I continued creating a correlation between the precipitation and flood, all the trouble just to find the right data left me wondering why is it so much more difficult to create a map for Vietnam that I can whip up in an hour for the U.S.?

It turns out that there is an international charter on open data, but most governments, particularly those in the developing countries, are not meeting the basic principles of the charter. That’s largely because there is often a significant discrepancy between the supply and demand for open data. Often times, datasets can improve the transparency of the government (i.e., commitment to the international agreement), something that’s not always in the political interest of the country.

For instance, international agreements over water resources might mean countries that are not complying to such standard won’t want to share relevant data. Moreover, collecting data can be very expensive for countries with limited budgets and many needs to fulfill. If providing open data is more difficult for developing countries, are there any benefits for them to even put effort into having one? Sure, open data in Vietnam would have made internship a lot easier, but it also would have also served a great purpose for the people of Vietnam, allowing the local government and other public entities to allocate resources more effectively in times of emergency.

Most importantly, open data is a great avenue for citizens to get more involved with different governmental decisions. It offers a chance for people of diverse specialty to use the available data for a useful product development and analysis.

With a better set of data, what I’ve been able to create during my internship could have possibly helped the National Meteorological and Hydrological service in Vietnam to understand relationship between precipitation and flood, as well as the impact, and potentially help improve their flood forecast. Instead, I found my end-product to be limited to qualitative assessment. Qualitative data can encourage better analysis and study by the public, something that I hope countries without open data platforms can take a note of.

Submit Comment

Your email address will not be published. Required fields are marked *