Automatic coding using existing coding patterns

 


Automatic coding based on existing coding patterns is an experimental feature introduced in NVivo 10 for Windows Service Pack 4.

What do you want to do?


 


Before you use this experimental feature

Pattern-based auto coding is an experimental feature that you can test and try out. This feature is designed to speed up the coding process for large volumes of textual content. Pattern-based coding was introduced in NVivo 10 for Windows Service Pack 4 and updated in Service Pack 5.

Pattern-based auto coding may work better for some projects than others and may code content that is irrelevant to a node. It is important to review the results and if you are not satisfied, you may need to 'undo' the auto coding (or revert to a backup copy of your project). We recommend you make a backup copy of your project before using this feature.

We invite you to share your experiences with us on the NVivo forum, so that we can further develop this feature in future releases of the software. However, due to the experimental nature of this feature, we are not able to troubleshoot issues with pattern-based auto coding.

Top of Page

Understand auto coding using existing coding patterns

Pattern-based auto coding enables you to do approximate 'broad-brush' coding of large volumes of text quickly, which you can then review and refine.

Before you use pattern-based auto coding, you need to start with manual 'pilot' coding of your source materials. For example, if your research involves analyzing 100 interviews, you could manually code the first 10 interviews. Then, you could auto code the remaining interviews based on the coding patterns from your initial coding.

When you auto code using existing patterns, NVivo compares each text passage—for example, sentence or paragraph—to the content already coded to existing nodes. If the content of the text passage is similar in wording to content already coded to a node, then the text passage will be coded to that node.

During pattern-based auto coding, words in the content to be coded are compared to the words in previously coded nodes. Stop words are ignored when comparing text content to existing nodes. Words with the same stem—for example house, houses and housing—are grouped together. For best results, make sure the text content language is set to match the language of your source materials—refer to Set the text content language and stop words for more information.

When comparing the text passages to content coded to existing nodes, any earlier pattern-based coding is ignored, in order to preserve the quality of the coding patterns.

Coding references that have been auto coded based on existing coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

For more information, refer to How does pattern-based coding work?

Top of Page

When can pattern-based coding be useful?

We expect pattern-based auto coding will be most useful when coding to descriptive nodes–for example, it may be able to identify that paragraph 1 is about housing and paragraph 2 is about public transport.

You may also be able to use pattern-based coding to identify passages that mention particular people (for example, politicians or leaders), places or organizations that are important to your research.

We do not recommend using pattern-based auto coding to code to nodes that represent:

  • Sentiment—for example, positive, negative

  • Attitudes, tones, or emotions—for example, enthusiastic, sarcastic, happy

  • Interpretations of the data—for example distance to school increases the likelihood of truancy

  • The speaker in an interview transcript—it cannot accurately identify who was speaking

Top of Page

Strategies for auto coding based on existing coding patterns

Before you use pattern-based auto coding, you need to start with manual 'pilot' coding of your source materials. Here are some strategies to consider when pilot coding:

  • The quality of the pilot coding will influence the quality of the results of auto coding.

  • If your pilot coding is too small or narrowly-focussed, you may get poor results (or no results).

  • Coding smaller passages—for example, a sentence—may achieve better results than coding larger passages.

  • You may achieve better results by coding only the text that is relevant to a particular theme. For example, if a paragraph contains two ideas, only code the part that is relevant to the theme.

If your project involves capturing datasets iteratively, you can pilot code the initial dataset. For example, if you capture Twitter data at multiple intervals or gather responses from SurveyMonkey with multiple collectors, then you can manually pilot code the initial data and then auto code the subsequent data.

If your source materials contain responses to questions on a range of topics or issues, you may get better results with pattern-based coding if you auto code the responses to one question at a time using specific thematic nodes that relate to that question. For example, if you have a dataset containing 1000 responses to a survey about public policy, you could:

  1. Gather the responses into a node for each question—refer to Automatic coding in dataset sources (Auto code a dataset at nodes for selected columns) for more information.

  2. Open the node for a particular question—for example Views on council funding—and for the first 50 responses, manually 'code on' to a group of thematic nodes (animals, libraries, parks, public health, recycling).

  3. Use pattern-based coding to auto code the question node Views on council funding to the specific thematic nodes for that question (animals, libraries, parks, public health, recycling).

The selections you make in the Auto Code Wizard can have an impact on the quality of the results.

  • Be selective about the nodes you choose to auto code to—for example, only use certain thematic nodes. Pattern-based auto coding is designed to work with thematic nodes rather than case nodes.

  • Experiment with the slider. If you choose 'Less', then NVivo applies stricter criteria when deciding whether to code the content.

Review the results of auto coding. If you are not satisfied with the results, 'undo' the Auto Code action, or revert to a backup copy of your project. Consider doing more pilot coding or adjusting the slider in the Auto Code Wizard next time you auto code.

Top of Page

Auto code using existing coding patterns

IMPORTANT Auto coding using existing coding patterns can perform a large amount of coding very quickly. It is a good idea to make a backup copy of the project before you start. If you are working in a server project, you may want to open the project exclusively before you auto code—this ensures that you can 'undo' the auto code if you are not satisfied with the results.

To auto code using existing coding patterns:

  1. In List View, select the items you want to auto code. You can select sources or nodes. Sources do not have to be of the same source type. If you want to select items from different folders, you can use a set or search folder.

  2. On the Analyze tab, in the Coding group, click Auto Code.

The Auto Code Wizard opens. Follow the steps on the Wizard.

 

Wizard step Description

Choose how you would like to auto code

Click Auto code using existing coding patterns.

Select the nodes you would like to code at

You can code at Selected nodes or All nodes. Your choice here will depend on the nodes and previous coding in your project—for example, you may want to include only the thematic nodes that you included in your initial manual coding.

You can adjust the slider—if you choose 'Less', then NVivo applies stricter criteria when deciding whether to code the content.

Checking existing coding patterns

NVivo checks the existing coding patterns in the nodes you have selected to code at to determine their suitability for pattern-based coding.

Once the check is complete, you will receive feedback on the suitability of your nodes. If any issues are detected, NVivo displays a warning message—for example, to indicate that there is insufficient coding at a node.

You can click the Expand buttons to view the nodes for each message. Nodes with warnings will not be used to code at due to the issues detected. However, if you still want to code at a node, you can select it again in this step of the Wizard.

Select how your text passages will be coded

Choose how finely NVivo should code text passages:

  • Code sentences if you want individual sentences to be coded.

  • Code paragraphs if you want entire paragraphs to be coded

  • Code entire cell for datasets, transcripts and logs if you want entire cells to be coded for datasets, transcripts and picture logs. For other source types, entire paragraphs are coded.

NOTE  If your text content language is Chinese or Japanese, you will not be able to select Code sentences.

By default, the results are saved as a node matrix in the Node Matrices folder. You can clear the Save auto code results in the Node Matrices folder check box. However, if you clear this check box, the results will be displayed as a temporary node matrix that you will not be able to save.

Top of Page

Working with the results of auto coding

When you auto code using existing coding patterns, the results are displayed in Detail View, and by default, the results are saved as a node matrix in the Node Matrices folder. You can refer to the saved node matrix later if you want a record of the coding performed by the Wizard at a particular date and time. This node matrix is a static record that is not updated if you subsequently uncode some of the content.

1  Columns display the names of the nodes that have been coded to by the Wizard.  

2   Rows display the sources that have been coded by the Wizard.

3   Cells display the number of coding references that were created for a source (row) at a node (column). You can change the display, for example transpose the columns and rows—click Transpose in the Rows & Columns group, on the Layout tab.

4   Click the Chart tab to see a visual representation of the auto coding results.

Top of Page

Reviewing and fine-tuning auto coding

It is a good idea to review the coding to check the relevance of coding references. Pattern-based coding is a complex task—for example, the meaning of a word varies depending on the context in which it appears, so you may see coding references that are unrelated to the node.

You may want to confirm that you are satisfied with the auto coding before performing other actions in your project, so that you can undo the auto coding if you need to. The 'undo' function can reverse up to five recent actions. Alternatively, if you made a backup copy of your project prior to auto coding, then you can restore the backup copy.

Here is an approach you may want to take to reviewing the results:

  1. Review what has been coded. Double click a cell in the matrix to see the content that was coded at the intersection of the source and node. Is the content relevant to that node? Take a look at other cells in the matrix.

  2. Decide whether you are satisfied with the results—do you want to keep some of the coding or undo the entire auto coding operation?

  • If you are mostly satisfied with the results, but need to fine-tune some of the auto coding, you may want to uncode some of the references. The coding reference is still displayed in Detail View for the cell, even if you have uncoded it. If you want to view the coding references resulting from pattern-based coding—excluding uncoded references—refer to How can I identify pattern-based coding references?

  • If you are not satisfied with the overall results, you may want to 'undo' the auto coding completely. You may want to refer to Strategies for auto coding based on existing coding patterns before trying again.

Top of Page

Why am I getting unexpected results from pattern-based coding?

Pattern-based auto coding uses machine-learning algorithms to look for existing coding patterns in nodes you have previously coded to in your project.  The coding patterns are then used to determine further coding. You may get unexpected results because the algorithms can be influenced by:

  • The quality of the coding in your project (irrelevant content may influence the results)

  • The amount of coding in your project

  • The presence of advertising in your source materials. If you are working with web pages, capture only the main content on the page before importing into your project.

  • Words with multiple meanings—the meaning of a word can vary depending on the context

  • The uniqueness of the words in the nodes—if your nodes predominantly contain the same words, then it is more difficult for the algorithms to identify patterns

If you want to understand how the machine-learning algorithms are used to determine pattern-based coding, refer to How does pattern-based coding work?

Top of Page

How can I identify pattern-based coding references?

Coding references that were created by the Wizard based on coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

If you have performed multiple pattern-based coding operations, you will not be able to distinguish which references were created by a particular pattern-coding operation. To see the references from a particular operation, you can view the specific node matrix in the Node Matrices folder.

You can run a matrix coding query to display the coding references currently associated with the user 'NVivo'. For example,

  • If you have used pattern-based coding on your sources, then display the sources in rows and display the nodes you coded at in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.

  • If you have used pattern-based coding operations to 'code on' from individual question nodes, then display the question nodes in rows and display the nodes you coded at in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.

Other ways that you can identify pattern-based coding references:

Top of Page