Virtual Tool Cupboard | e-lab

Digital Creation: Information Visualization: Word Clouds, Phrase Nets, Tree Maps

A visualization is a way of quickly and clearly expressing complex information. Information visualizations are constantly being used and created – a quick sketch, a scribbled map, the pictorial instructions provided to help assemble furniture – are all examples of everyday practices of information visualization. We are used to seeing and interpreting bar charts and pie charts in presentations, reports, and even on sign boards. Most of us have used a pen and napkin to draw out ideas for someone else to understand; digital tools can extend this common presentational practice to to various forms of data. There are numerous tools designed to help you make charts and graphs, some of which can be found within your word processing software. However, there are many other open source tools which can be found online and used for free. Beyond professionalizing your portfolio, learning how to create visualizations will give you another way of experimenting with and articulating arguments about data.

Visualization of data does not necessarily have to involve numbers. Data can be represented in many other forms. Word Lists can segment documents into long word strings; Word Clouds and Tree Maps can create graphic ways of seeing data; and Word Trees and Phrase Nets can point to connections between word strings.

Beginning Your Work

A good place to begin creating visualizations is with a simple ‘cut and paste’ resource such as a Word Cloud tool like Wordle (http://www.wordle.net). This site gives you three ways to input data and rapidly produces an image you can save as a screen shot or save to the site. Secondary research can begin resource hubs like IBM’s Many Eyes (http://www-958.ibm.com/software/data/cognos/manyeyes/). Although you will need to create a user account and log into the site, Many Eyes brings together and hosts a large collection of visualization software and takes users through the basics of preparing their data.

One of the main things to remember when it comes to creating and using visualizations is that they present a different way of reading. In order to present data, software programs chop text up into its component parts. In order to make most effective use of the software, you will need to organize and normalize your research data, and you will need to know how to “read” the visualization.

The program will need to normalize your data in order to be able to classify the text and create the visualizations. Most frequently this means stripping some of the formatting from the text which is applied as part of a word processing tool. There are a number of ways of doing this, through your browser, in your word processor, and in note pad widget. For more advanced visualizations you may also need to preprocess your data by getting it into a spread sheet so the program can read it.

In many visualizations, size is equated with frequency: the physical size of a word in the cloud accords with the frequency of its appearance in the text being analyzed. For the most part, these tools will remove some of the most frequently-used “stop words” such as “and” and “the,” making it easier to see the remaining “hot words.” This makes it easier for the researcher to identify some of the more interesting words in the text. However, because cloud tools remove form and context, the quick picture they develop may represent an accurate pictorial concordance, but it does not retell a text. Analysis begins with the scholar’s interpretation of what is present and what is absent. Alternatively, concept maps and phrase nets preserve some context, enabling scholars to “read” across visualizations; but these types of visualization also remove phrases from the large framework. So, as with the cloud tools, s maps and phrase nets enable discovery, but the scholar must be careful to support any assertions about the text based on visualization alone.

Secondary Uses

One of the off-label prescriptions for these tools is also one of the most useful. These resources can be employed to analyze your own writing. Phrase nets and Word Trees can be used to look for over-used phrases. Word clouds are excellent ways of detecting repetition across documents that you have authored, which is helpful, since repetition is not easy to recognize in your own work.

Web Tool Visualizations

Word Clouds
http://www.wordle.net/

Visualization Tools Hub
http://www-958.ibm.com/software/data/cognos/manyeyes/

Additional Reading

Philippe Gambette and Jean Veronis. “Visualizing a Text with a Tree Cloud.” IFCS'09: International Federation of Classification Societies Conference. (2009).

Vuillemot Romain, Tanya Clement, Catherine Plaisant, and Amit Kumar. “What’s Being Said Near “Martha”?: Exploring Name Entities in Literary Text Collections”

Wattenberg , Martin and Fernanda B. Viégas. “The Word Tree, an Interactive Visual Concordance.” IEEE Transaction on Visualization and Computer Graphics 14. 6 (2008).

BurkeIsotype Christopher. “Representing Social Facts Pictorially: Information Design Journal (IDJ) 17.3 (2009).