Run a Word Frequency query


You can use Word Frequency queries to list the most frequently occurring words or concepts in your sources.

In this topic


 


Understand Word Frequency queries

Use Word Frequency queries to list the most frequently occurring words or concepts in your sources.

You can select the source content you want to search, by selecting sources, nodes, sets, folders or search folders. You can choose to search only in the textual content of your sources, in the annotations or both.

You could use a Word Frequency query to

  • Identify possible themes, particularly in the early stages of a project

  • Analyze the most frequently used words in a particular demographic. For example, analyze the most common words used by farmers when discussing climate change  You could do a coding query to gather all content coded at climate change and at nodes with the attribute farmer—then select the result node as the criteria for the Word Frequency query.

  • You can look for exact words, or broaden your search to find the most frequently occurring concepts. For example, if you look for the most frequent words in a dataset survey, you might find that water, health, and harmful are the most frequently occurring words. However, if you group similar words together, you might find that the concept of pollution (including pollutants, pollution, polluted, and pollutes) occurs most frequently.

Before you run a Word Frequency query, make sure the text content language is set to the language of your source materials—refer to Set the text content language and stop words for more information.

NOTE  

  • If you are working with French or Japanese text, we recommend that all team members use NVivo 10 for Windows Service Pack 3 (or later).

  • If you are working with Chinese text, we recommend that all team members use NVivo 10 for Windows Service Pack 4 (or later).

Top of Page

Create a Word Frequency query using the Wizard

  1. On the Query tab, in the Create group, click Query Wizard.

The Query Wizard opens.

  1. Click Identify frequently occurring terms in content, and then click Next.

  2. On Step 2 of the Wizard, you can:

  • Limit the number of words displayed in the results—for example, show only the top 20 words.

  • Exclude small words—for example, include only words with four letters or more.

  • Adjust the Grouping slider, if you want to group related words together. By default, the slider is set to find 'Exact matches'—refer to Understand text match settings for more information.

  1. On Step 3 of the Wizard, choose whether you want to count words in all your sources, or restrict the count to words in selected items or folders.

  2. On Step 4 of the Wizard, choose whether you want to run the query just once or add it to your project (and run it). If you choose to add it to your project, you must enter a name. You can optionally enter a description.

  3. Click Run.

The query is executed and the results are displayed in Detail View.

NOTE  If you want to use Word Frequency query features that are not available via the Wizard—for example, only count words in sources created by specific users—you can add the query to your project and update it later. If you are familiar with NVivo queries, you may prefer to create the query outside the Wizard.

Top of Page

Create a Word Frequency query outside the Wizard

If you are not familiar with NVivo queries, you may want to create your Word Frequency query using the Wizard—the Wizard guides you through the process of setting your query criteria. However, not all query features are available in the Wizard, so you may sometimes want to create your Word Frequency queries outside the Wizard, as described below:

  1. On the Query tab, in the Create group, click Word Frequency.

The Word Frequency Query dialog box opens.

  1. Adjust the Finding matches slider, if you want to find concepts rather than words. By default, the slider is set to find exact words only—refer to Understanding text match settings for more information.

  2. In the Search in box, select whether you want to search in Text, Annotations or both.

  3. To change the scope of the query:

  • In the Of box, select which project items you want to include in the search. Use the Select button to choose specific project items.

  • In the Where box, choose to include only project items created or modified by selected users—use the Select button to select the users.

  1. Under Display Words, you can choose

  •  All  to include all words found in the selected project items.

  • <number> most frequent to include a specific number of words—for example, you could display the 100 most frequently occurring words.

  1. (Optional) Enter a With minimum length to exclude short  words from the results—for example, enter 7 to display only words with seven or more letters.

  2. Click Run.

NOTE To save the Word Frequency query, select the Add to Project check box and enter the name and description (optional) in the General tab.

Top of Page

Understand the results

When you run a Word Frequency query the results are displayed in Detail View. There are four tabs on the right—the Summary, Word Cloud, Tree Map and Cluster Analysis tabs. You can change which tab is displayed by default—refer to the display options in Set application options for more information.

Summary tab

1  The most frequently occurring words excluding any stop words. If you adjusted the slider to return similar words, the most frequently occurring word from the group is displayed in this column.

2  Length—the number of letters or characters in the word.

3  Count—the number of times that the word occurs within the project items searched. If you adjusted the slider to include similar words, this count is the total for all the similar words.

4  Weighted Percentage—the frequency of the word relative to the total words counted. If you adjusted the slider to include similar words, a word may be part of more than one group of similar words. The weighted percentage assigns a portion of the word's frequency to each group so that the overall total does not exceed 100%.

5  Similar Words—other words that have been included as a result of adjusting the slider to include similar words—for example, if you include words with the same stem, then pollutants, pollution, and polluted would be grouped together. This column is not available if you use 'Exact match only'.

Word Cloud tab

This tab displays up to 100 words in varying font sizes, where frequently occurring words are in larger fonts.

When you view the results as a Word Cloud, you can change the style— on the Word Cloud tab (ribbon), choose from a gallery of styles.

Tree Map tab

The Tree Map tab displays up to 100 words as a series of rectangles, where frequently occurring words are in larger rectangles.

Cluster Analysis tab

The Cluster Analysis tab displays up to 100 words as a horizontal dendrogram, where words that co-occur are clustered together.

When you click on the cluster analysis diagram, the Cluster Analysis tab (on the ribbon) becomes available, you can use the commands on this ribbon tab to:

  • Change the diagram type—you can show the data as a horizontal or vertical dendrogram, a circle graph, or a 2D or 3D cluster map

  • In 2D or 3D cluster maps, select the Word Frequency check box if you want to use word frequency to determine the size of the bubbles in the cluster map.

For more information, refer to Change the appearance or content of a cluster analysis diagram.

Top of Page

See all the references for a selected word

When you run a Word Frequency query, a preview node is created for each word—this lets you see all references to the word. To open a preview node:

  • In the Summary, Word Cloud, Tree Map or Cluster Analysis tab, double-click the word you want to explore.

In the preview node, you see each occurrence of the selected keyword in context:

The context (the text around the word) is displayed in grey—by default it is a 'narrow' context. To expand the context for a selected reference, on the View tab, in the Detail View group, click Node and choose the coding context.

You can also change the definition of 'narrow', to show more or less words on each side of the selected word—refer to Narrow and broad and custom reach settings for more information.

Top of Page

When determining the frequency of words, NVivo applies the following rules:

  • Words containing punctuation (such as hyphens, periods and other symbols) are divided into separate words. For example, part-time will be counted as part and time.

  • Words containing apostrophes (such as o'clock and d'accord) are treated as one word but if the apostrophe is followed by an 's then the s is not included (Tom's would be counted as Tom).

  • In audio and video transcripts, only words in the Content field (column) are counted—any words in custom transcript fields are ignored.

  • In datasets, only words in codable fields (columns) are counted—any words in classifying fields are ignored.

  • When searching text in selected nodes, if a word is coded against multiple nodes, it is counted once for each node. Similarly, if a word has been coded by multiple users to the same node, it is counted once for each user.

  • Word Frequency queries do not include 'stop words'—refer to Exclude particular words when running Word Frequency queries for more information.

  • A Word Frequency query does not search text in framework matrix summaries

  • Word Frequency queries do not search text within images. PDFs created by scanning paper documents may contain only images—each page is a single image. If you want to use Word Frequency queries to explore the text in these PDFs, then you should consider using optical character recognition (OCR) to convert the scanned images to text (before you import the PDF files into NVivo).

  • If the text content language is Japanese, the 'base form' is listed in the query results, but the count includes any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page

Exclude particular words when running Word Frequency queries

Word Frequency queries do not include 'stop words'—by default, these are less significant words like conjunctions or prepositions, that may not be meaningful to your analysis. You can view and edit the list of stop words, refer to Set the text content language and stop words for more information.

You can add a word displayed in your query results to the stop words list—select the word you want to exclude from the query results, then click Add to Stop Words List, in the Actions group on the Query tab. The words you add to the stop word list will be excluded the next time you run a Word Frequency or Text Search query.

NOTE  In server projects, only Project Owners can add words to the stop word list—refer to About teamwork in a server project for more information.

Top of Page

You can create a node that includes all the references to a word you select in the Word Frequency query results.

  1. Select the word you want to use to create a node.

  2. On the Create tab, in the Items group, click Create As Node.

The Select Location dialog box opens.

  1. Select a location and name the node.

  2. Click OK.

NOTE  If the text content language is Japanese, the node will include references to the base form or any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page

You can run a Text Search query for a selected word in the Word Frequency query results.

  1. On the Query tab, in the Actions group, click Other Actions, and then click Run Text Search Query.

The Text Search Query dialog box opens.

  1. (Optional) Change the Text Search Criteria or Query Options. Refer to Run a Text Search query for more information.

  2. Click Run.

NOTE  If the text content language is Japanese, the Text Search query will find all occurrences of the base form or any alternate forms of the word—refer to Working with Japanese text in queries for more information.

Top of Page