In 2014, the World Bank and the Government of Tamil Nadu designed and implemented a Participatory Tracking project of around 70 multiple choice questions, on approximately 32,000 Self-Help Group (SHG) women about their daily well-being. We wanted to present the data in a revealing way, while still remaining accessible to even the least literate and numerate women. Data from a sample of 2,000 women in the pilot district Theni, was used to design the initial prototype. Visualizing this data would allow the project to give the collected data back to the women.
We wanted to present the data in a revealing way, while still remaining accessible to even the least literate and numerate women. This article discusses how our three guiding objectives led us to several design paradigms.
1: Revealing visualizations that are multi-dimensional
The goal of displaying information graphically is to reveal interesting relationships found in the data. Since many questions in the survey explored topics that were related (e.g. marriage, food, sanitation), we wanted to incorporate related questions into a single visualization.
Conventional plots are insufficient because a woman who answered five questions about a topic would ideally be represented as a 5 dimensional data point. Traditional plots take either 1 or 2 dimensional slices of high dimensional data, so if there were interesting higher-order relationships
Therefore, using conventional charts to show data for 5 questions about marriage, may lead us to produce 5 histograms, i.e. one for each question. However, what this visualization would miss was the relationships between how one woman’s answers related to each other. A scatterplot, for example would need 10 scatterplots to plot all the possible two-way pairings between the five variables. That’s a lot of charts – charts that would still miss some of the interesting and critical relationships between questions.
For our asset, diet, and marriage visualizations, we explored representing each woman as a point in high dimensional space using a small multiple representation (Tufte,1983). A small multiple uses the same custom graphic design structure to represent each view of the data, allowing viewers to easily compare between them.
Each small multiple takes up screen space, so we found it was sometimes impossible to show an entire village’s data on the screen. To solve this, the computer randomly samples a subset of data points to display, and allows viewers the ability to click and randomly resample the dataset.
2: Accessible visualizations that are redundant
The women viewing the visualizations came from many different backgrounds. They included women who had little to no schooling, women with some primary or high school education, and a few who had completed high school or had some college education.
To make a visualization that would serve this diverse audience, we needed to use redundant methods for communicating the same data. Redundancy is a common theory used within computer networks, where the same information is sent multiple times to increase the chances that the message will get through.
In our visualizations, we attempted to send messages through different channels such as photographs, cartoon images, colors, numerical scales, and when necessary text. For our non-traditional visualizations, we attempted to remove the dependence on text entirely. It should be noted that even for the visualizations that use text such as the histogram visualization, a viewer who is both illiterate and innumerate should still be able to obtain information from images, colors, and discussions occurring around them.
One byproduct of designing for redundancy is that visualizations became more intuitive even to populations that were literate and numerate. Many of our visualizations can be read at a glance from the shade of green (good) or grey (bad) on the page.
Objective 3: Actionable visualizations generated using fuzzy indices
Actionable visualizations should inform definitive steps the women could take to improve their situation. To this end, the women wanted compare their village to other villages.
One way to allow regional comparisons is to use an index that boils down the high-dimensional data point of a village into a single numerical value. However, using a single number to represent multidimensional data seemed to oversimplify the complex connections between the data and we hotly debated whether to include an index in our visualizations. Eventually, we decided that we could use an index if we could make the use of the index less damaging.
To build the, we first removed all questions for which we felt uncomfortable declaring a correct answer; for, example:
If you are ill,
what is the method of treatment you take?
a. Your own treatment
b. Nearby medical shop
c. Government Hospital
d. Private Hospital
In this question, it seemed prescriptive to claim that your own treatment of your illness would be better that the treatment from a nearby medical shop or vice versa. Similarly, some of the tribal villages we visited had a tradition of marrying blood relatives and it was decided that the index should respect these customs.
Secondly, in an attempt to prevent fixation on an arbitrary index value, we reinforced this fuzziness in the visualizations. No numbers are ever presented for the indices because they are not actual measurements — we communicate the score of the index by mapping each score to a color on a spectrum from grey to green using a piecewise linear scale*. Close values on an index are hard to distinguish visually, thus reflecting our own ambiguity about the index.
*The color spectrum maps from grey to green as to opposed from one saturated color to another (e.g. red to green), as Borland (2007) shows that people perceive values on the grey to color spectrum more continuously than a color-to-color spectrum, which causes false striations in the visualization that are not present in the data.
While we wanted the visualizations to be actionable and empowering, we continuously wrestled with the idea that actionable visualizations may contradict one of the central tenants of developmental research, i.e. to not impose Western beliefs, biases and methodologies onto other cultures. We have endeavored to cautiously balance these two values during our project, and hope that our final design honored both principles.
Our next article covers how we the visualizations evolved using community feedback.
Related to a forthcoming paper: “Democratizing Data: Participatory Tracking in Tamil Nadu” written by Nethra Palaniswamy, Vijayendra Rao, Smriti Sakhamuri, R.V. Shajeevana and Cassandra Xia.