bitsy on bits: data

Showing posts with label data. Show all posts

9.16.2010

Location's Fourth Dimension

While I was prepping for the Minnesota State Fair edition of the Foursquare Traffic Report, it became apparent that a fourth dimension is noticeably missing from the current discussion about location.
Those of us who work with location-based data know what a pain it is to define where something is. We start with coordinates on the globe, a single point on a quasi two-dimensional plane. From there we look at area, but is that enough? Do we define the Minnesota State Fairgrounds as a single instance of coordinates? What about locations that are within the boundaries of the grounds?

Over the course of the Minnesota State Fair There were over 200 unique venues used within the Minnesota State Fairgrounds. Only 12 were duplicates, and nearly half of the checkins came from a single venue, the 2010 Minnesota State Fair.

The 2010 Fair itself is a more popular venue than the fairgrounds. How often does when trump where? If someone checks in from Austin, TX the second week of March, chances are they're at South by Southwest. Which of the expansive list of venues are they going to use? I'm willing to bet that a popular new venue will show up in early March called 'SXSW 2011'.

As more and more people use location-based services we'll see more 'temporary venues' based around events. Right now Foursquare has pages for venue owners. What happens when several 'temporary venues' exist within a larger venue? Does this change the way venue administrators look at managing their traffic data?

5.31.2010

James Squire Guide to Beer

I love the way visualizations simplify decisionmaking. On a recent trip to Sydney I had dinner at James Squire, and when deciding which of their signature beers to pair with my meal, the following chart made my task a lot easier:

It brought to mind this lovely number that was on display last year at Bobby Van's Steakhouse in Manhattan:

I find so much power in a simple two-axis matrix for choosing within a set of products. I'd love to see more services do this type of presentation to help consumers navigate the sometimes overwhelming choices that they are faced with. I've been thinking these types of presentations in the context of The Paradox of Choice by Barry Schwartz and Nudge by Thaler and Sunstein. Does this type of visualization make increasingly complex decisions easier to navigate? I tend to think so. I like an aromatic beer, with savory undertones, so I should choose an ale rather than a lager. Rather than struggling to identify the nuances between the myriad of beer choices, I'm given a useful map. What other types of choices can be simplified by a little visual design? I'd wager the categories are endless. It highlights to me the importance of empathy in design. If the first mantra of the information designer is to know the purpose of your data, the second may be to respect your audience. Be aware that as a designer you're steering them to a product, you may as well steer them to the product they're going to like the most.

3.31.2010

Transparency Camp

Transparency Camp was great! I facilitated a session on intellectual accessibility and data visualization. The conversation started with the image below:

Right off the bat, Brian Behlendorf from Health and Human Services guessed that it was happiness over the course of a person's life. Totally threw me for a loop (as I've never presented this graphic before). Apparently the "Trough of Despair" (a.k.a. middle school and high school) is a little hard to miss. I think it still works as a demonstration of the power of context and storytelling, tho.

We continued our discussion for with a meta visualization, laying out the types of data visualizations that are currently out there on a scatter plot with continuum axes. I tend to think about info vis like this, as a matrix of options that fits into different families or categories.

I'm kicking myself a little for not documenting the results of our discussion better. Never again shall I neglect to take pictures!

Something really interesting that came out of the conversation was the idea to establish a 'Visualizer Code of Ethics'. It was clear from our discussion that Information Designers such as myself can feel just as much pressure to massage our visualizations as do data analysts and statisticians. I think the challenge in a code of ethics is that a visualization relies on the story or message of the analysis. I've become increasingly conflicted regarding the use of pyramid and 3-D pie charts. I've always loathed their use, but what's the point of a graphic? If you think about the continuum of visualization methods above, statistical graphics are one small part of a universe of visual communication techniques. I've seen people painstakingly massage venn diagrams into the appropriate area relationships to represent percentages, and I have to ask, what's the point? As a concept graphic, a venn's purpose is to demonstrate intellectual relationship. If you're trying to show precise proportional relationship, there still isn't anything better than a bar chart. (Linear differences are so much easier to see than area differences.)

Maybe a good place to start with an visualization code of ethics is "know the purpose of your data".

3.11.2010

Google Public Data Explorer

I made the chart above using the new Public Data Explorer from Google. It doesn't have the capability to upload your own data set yet, but it's still pretty nifty. As is usually the case, the way the data set is structured has a huge impact on the ease of analysis. Some of the data sets have better categorization than others, and it isn't possible to select a whole set of variables at one time, which would be nice. As tools go, it's really easy to get the html needed to embed in any website. The base visualizations are really clean, which is fantastic, but the tool doesn't give much flexibility in terms of combining seemingly unrelated data sets (like showing rate of unemployment and HIV infection on the same chart). I'm looking forward to seeing how Google will handle the data uploading issue, as well as the continued development of the data files. Nat Torkington just wrote an illuminating post on open data formatting issues. I hope that (as he suggests) open data will follow in the model of the open source community.

3.09.2010

UPA Usability Metrics Workshop

I went to a great UPA workshop last month, presented by Bill Albert and Tom Tullis. It was a jam-packed day, covering a nearly overwhelming amount of content. Tom and Bill walked us through an overview of metric types and collection methods, and then concentrated on four different types of metrics: Performance, Self-Reported, Combined, and Observational.

One thing that struck me was the relationship between these metrics as they were presented. Performance and Self-Reported metrics are both gleaned through a structured interaction with a sample set of users. Combined metrics give an overall view of the health of an interaction, and are calculated using both Performance and Self-Reported metrics. Observational metrics are used less often than Performance metrics, and are seldom used in a Combined metric format.

Given that I've been working within a traditional survey-based market research environment for the past four years, I found the combination of these different types of performance indicators very interesting, especially given the practice of using self-reported data to create derived KPI (key performance indicators). It struck me that our rapidly changing access to data has the capacity to completely overhaul the way we approach measuring the success or failure of a product. Ultimately we must be able to demonstrate a profitable return on investment, but what happens when we have larger data sets to work with? Can we combine post-launch observational metrics with pre-launch performance tests to validate continuing development?

Market research and usability have their roots in the production and sale of products, not services. As we shift focus from the production of meatworld goods to the design of online services, we suddenly have the ability to approach larger and more disparate groups of consumers. With agile methodologies creating ever shorter development cycles, we have more opportunities to use our user communities as test subjects for working services. Google has famously tested 41 shades of blue for one of its toolbars, measuring click through rates to determine which shade is most appropriate. Services like Clickable provide the ability to make immediate decisions about search advertising ad copy. Users can see the real results on their pay per click online ads and use that information to refine their messaging (compare that to a six-week messaging study). Granted, this is easy to do in a low-cost medium with infinite dimensions. (When a prototype takes at least a year to develop and costs at least a million dollars to produce, the barrier to this kind of live experimentation is high). Happily, we're living in a new age now, where interfaces and services and messages are expected to shift and grow and refine with changing consumer demand.

So what does this mean for usability metrics? For virtual services we will need to find a way to integrate those traditional metrics with live Observational metrics. Heck, we could even create a way to automate this type of reporting. I've got some ideas on how to do this, but that will have to wait for another day.

2.21.2010

OK Trends continues to kick ass

It looks like the folks at OK Cupid are paying attention to their critics. As I've mentioned before, their analysis is inspired, but their data graphics need some work. Case in point, this post about profile photo myths. The graphs had a lot of rookie mistakes.

For Example... the chart on the right was taken from OK Trends, and there are a number of problems. It's always a good idea to eliminate all unnecessary data pixels. The outlines around the bars of these charts provides some visual distraction that can easily be eliminated. In addition, the legend requires a bit of ocular calisthenics. Your brain only keeps five to nine bits of information in short-term memory. In order to read any chart you need to understand the numbers and the context. The further you remove the context from the numbers, the more your brain has to work to put the pieces together. In this example the numbers in green at the top of the chart can't be understood until you read the title at the top of the chart and the legend at the bottom. In the western world we read from left to right, top to bottom, but this chart forces the viewer to read the top, then the bottom, then side to side. While the green data series is the first you encounter with your eyes, it's the last that's listed in the legend.

My version of the chart is on the right. Notice the legend is on the right of the chart, in the same order as the data series. Your eye reads left to right, the way it naturally wants to. I added an axis showing the sum of all the categories, so you don't need to do the mental math to know that these are parts of a whole population. In addition, the unnecessary outlines around the data series have been removed.

What I find interesting is the use of a stacked column chart. I think sometimes the impulse is to use a stacked column chart to show data that adds to 100%. I tend to use stacked column charts when I'm showing categories on a continuum. At times it may be better to show the information in a clustered bar chart, like the next one shown here. It depends on what you're trying to express with the data graphic. All of these charts contain the same data, but the last chart makes it easier to see the differences between men and women when it comes to profile pictures. It also makes it easier to see relative differences between categories.

I'm really impressed with the improvements that OK Trends employed in their most recent post. They're employing a wider variety of visualizations, and the graphics they're using are much more sophisticated. If you look at the fourth chart I've posted here, the information is presented in a more manageable form. Grid lines have been provided for quick and easy reference, there's a label indicating the sample size, and the labels are all oriented so they can be read left to right. It makes me really happy to see their data graphics rise to the quality of their analysis.

bitsy on bits