About dataset sources

A dataset contains structured data arranged in records (rows) and fields (columns). You can create datasets by importing a range of structured data—for example, data from spreadsheets, responses from SurveyMonkey, or social media data gathered from Facebook, LinkedIn and Twitter using NCapture.

If you are working with a dataset of survey results, you may also want to watch the video tutorial Work with survey results.

In this topic

Understand dataset sources
Open and navigate dataset sources
What can I do in a dataset?
Import data from spreadsheets, text files and database tables
Import NCapture files containing social media data
Import survey responses from SurveyMonkey
Learn about codable and classifying fields
Setup demographic attributes based on the classifying information in a dataset
Learn about datasets that contain source shortcuts

Understand dataset sources

A dataset contains structured data arranged in records (rows) and fields (columns). Datasets are created by importing data, and cannot be edited inside NVivo.

You can create datasets by importing:

Spreadsheets, structured text files or database tables
NCapture files containing data collected from Facebook, LinkedIn, Twitter and YouTube.
Responses from SurveyMonkey

When you import data from spreadsheets, structured text files or database tables, you can choose which fields to import and decide which fields contain data that you want to code, and which contain classifying information (for example, demographic characteristics of a survey respondent).

When you import NCapture files containing data from Facebook, LinkedIn Twitter and YouTube, the fields in the dataset are predetermined—based on the type of social media data you are importing.

When you import from SurveyMonkey, you can choose which fields to import and give fields more meaningful names. The analysis type (codable or classifying) and data type (for example, Text or Integer) are automatically assigned based on the SurveyMonkey question type.

Example - A dataset containing online survey results

The table below displays an example of a dataset containing survey responses. Each record (row) represents a single survey respondent. The fields (columns) contain demographic information about the respondent or their responses to the survey questions.

Respondent	Age	Sex	Question 1 response	Question 2 response
Anna	29	Female	I think there should be more car-free zones.	Electric buses and taxis would help reduce pollution in the inner city.
Jack	31	Male	Pedestrians need to feel safe. There should be better lighting and more police.	We should create more green spaces.

If you auto code this dataset, you must select the columns (or values) that you want to use to create your nodes.

You can create classified nodes to represent the respondents using the Classify nodes from Dataset wizard.

Example - A dataset containing Facebook posts

The table below displays a simplified example of a dataset containing Facebook data imported from an NCapture file. Each row represents a single post or comment. The structure and fields (columns) are automatically defined.

Posted by Username	Post	Gender	Commenter Username	Comment Text
Mike Jones	I'm heading to a workshop on rainwater tank installation.	Male
			Mary Smith	You'll have to tell me about it afterwards.
			Carlos Garcia	I've been thinking of installing one too.
Adam Lee	Water storage levels are at a thirty year low.	Male

When you auto code datasets containing social media data like this one, there are extra options, that make it easy to auto code the content based on the predetermined fields—for example, you could code this Facebook content by Username or Conversation.

And, when you auto code by Username, classified nodes are created and biographical information about the Facebook user is stored as attribute values on the nodes (you do not need to use the Classify from Dataset Wizard for this).

Refer to Automatic coding in dataset sources for more detailed information.

Top of Page

Open and navigate dataset sources

You can double-click a dataset in List View to open it in Detail View.

When you open a dataset, it opens in Table View (below)—the records and fields are displayed in a grid:

Form View (below) shows only one record at a time, laid out as a form:

1 Classifying fields—contain information about your data—for example, the age and sex of survey respondents. Classifying fields have a grey background. Refer to Learn about codable and classifying fields (columns) for more information.

2 Codable fields—contain the information you want to analyze—for example, responses to open-ended survey questions. Codable fields have a white background. Refer to Learn about codable and classifying fields (columns) for more information.

3 Table and Form View tabs—use these tabs to switch between Table View and Form View. You may have additional tabs if your dataset was created by importing NCapture files containing social media data. For example, you may have a Chart tab or a Cluster Analysis tab.

When the dataset is open in Detail View, you can switch between Table View (view all records) and Form View (view one record at a time).

Each row in a dataset has a unique record ID, based on the order in which it is imported. The ID is the first column In Table View, and the first field in Form View. If you sort the dataset by the values in the ID column, the dataset is displayed in the order that the records were imported into NVivo.

You can navigate from record to record on Table or Form View by using the navigation buttons—you can move to the first, previous, next or last record.

1 Go to first record

2 Go to previous record

3 Current record

4 Go to next record

5 Go to last record

When you click in the Current record box, you can type a record number and then press ENTER to navigate to that record. The record number is counted sequentially from the beginning of the records as currently visible in Table View. Hidden records are not counted—the Status bar indicates if any records are hidden (filtered) or if all records are visible (the dataset is unfiltered) . The record number does not correspond to the ID value or any other field value.

You can also:

Use scroll bars to move up and down, or left and right
Use 'Go To' to quickly jump to a dataset record ID, see also link, or annotation If you try to jump to a record that is hidden, you will go to the next visible record.

Top of Page

What can I do in a dataset?

When working in a dataset source you can:

Sort or filter the dataset based on the values in classifying columns
Annotate the text in codable columns
Create see also links on the text in codable columns
Code or query the text in codable columns (you can also code source shortcuts)
Auto code to organize the data—for example, you could gather the survey responses for each participant in a case node
Organize demographic attributes for your case nodes using the classifying information in the dataset (for example, age or gender)

For more ideas and information on how you can work with datasets, refer to the following topics:

Top of Page

Import data from spreadsheets, text files and database tables

If you plan to create a dataset by importing a spreadsheet, text file or database table, you should consider how you want to use the data in NVivo.

You cannot change the data after you have imported it into NVivo, so before import, you should check that:

You have collected together all the data you need.
You have checked the quality and accuracy of the data.
You have considered the analysis type you will set for each field—classifying or codable. Refer to Learn about classifying and codable fields for more information.
You have considered what data type you will use for each classifying field—for example, text, date or decimal.

If you have survey responses, and you want to create a case node for each respondent, then the dataset must contain a unique identifier that identifies the responses of each individual. A unique identifier could be the respondent's name, however, in a large survey, names may not be unique. For uniqueness and to protect the identity of your respondents, you may prefer to assign each respondent a unique ID number. You can then gather all responses of an individual respondent to a single node—refer to Approaches to analyzing survey results for more information.

If you have a very large amount of data to import and analyze, it is a good idea to experiment with a subset of the data. If you import a small amount of data, you can experiment with the various approaches to analyzing a dataset. Once you are confident that you have imported the data in a way that supports your analysis, then you can import all the data, and commence coding in earnest. Make sure you delete the sample dataset that you used for experimental purposes.

Refer to Import data from spreadsheets and text files or Import data from database tables for more information.

Top of Page

Import NCapture files containing social media data

You can use NCapture to collect data from Facebook, LinkedIn, Twitter or YouTube as a dataset. For example, you can capture wall posts from Facebook and bring the posts and profile information about the users into NVivo. The data is saved to an NCapture file, which you can import into NVivo as a dataset.

For more information on importing social media data, refer to:

Top of Page

Import survey responses from SurveyMonkey

If you use SurveyMonkey to collect survey responses, you can import the responses directly into your NVivo project. The imported data becomes a dataset source that you can sort, filter or auto code.

The analysis type (codable or classifying) and data type (for example, Text or Date/Time) and are automatically assigned based on the SurveyMonkey question type.

If you have a large number of responses to import and analyze, it is a good idea to experiment with a subset of the data. You can import a random sample of responses and then experiment with the various approaches to analyzing the dataset.

Refer to Import from SurveyMonkey for more information.

Top of Page

Learn about codable and classifying fields

When you import data from spreadsheets (or text files and database tables), you can choose the 'analysis type' for each field (column)—you can select 'codable' or 'classifying'.

You cannot change the analysis type (codable or classifying) of a field (column) after import, so you should decide how you want to use your data before you create a new dataset.

Fields that contain data that you intend to code and analyze should be stored as codable fields—for example, responses to open-ended survey questions, such as How do you think we can reduce our carbon emissions?

Fields that describe your data (metadata) should be stored as classifying fields—for example, the ID number, Age, Sex and Annual Income of your survey respondents. Values in classifying fields:

Can be used to sort and filter the records in your dataset.
Provide context when you view coded dataset content in a node.
Can be used to build node structures that group your codable content—for example, by Age or Sex.
Can be used to create and classify nodes that represent the subjects (cases) of your research. For example, if you create a 'person' node for a survey respondent, you can use the classifying field values Age or Sex as attribute values on the node.

When you import social media data collected using NCapture, the analysis type of fields is predetermined. For example, in a dataset containing content from Facebook, the posts and comments are codable fields. Other fields—for example, Posted By Username, and Gender are classifying fields.

When you import survey responses from SurveyMonkey, the analysis type of the fields is based on the SurveyMonkey question type. For example, open-ended questions, such as text or comment boxes are imported as codable fields. Other types of questions, for example multiple choice questions, are imported as classifying fields.

The following table compares codable and classifying fields:

Comparison	Codable fields	Classifying fields
Type of content	Textual content that you want to analyze—for example, survey responses to open-ended questions such as What do you think is the most important environmental issue in your local area?	Values that describe the data—for example, in a set of survey responses, you may have classifying columns which contain the name, age or sex of the survey participants. Scaled responses—for example, your survey might include questions that are answered by choosing a point on a 'strength of agreement scale' containing points ranging from Strongly Disagree to Strongly Agree.
Data types	Text or source shortcut	Text, integer, decimal, date, time, date/time, or boolean
Background color	White	Grey
Edit content	No	No
Code content	Yes	No
Use values to build node hierarchies	No	Yes
Use values to populate node classification attribute values	No	Yes
Annotate & link	Yes	No
Query	Yes	No (see note below)
Sort & filter	No	Yes

NOTE You cannot directly query the values of classifying fields, but you can use these values to create nodes or attribute values, and use the nodes or attribute values when you run queries, generate charts or other visualizations. Refer to Approaches to analyzing survey results for more information.

Top of Page

Setup demographic attributes based on the classifying information in a dataset

After importing a dataset, you can create case nodes for the people, places or other cases that are represented in the data—for example, you could auto code to create a case node for each respondent in a survey and gather all their answers.

Then, you can assign attributes (like age or gender ) to the case nodes and use these attributes to make comparisons—what did the men say, how does it compare to what the women say?

You can setup this demographic data using the classifying fields in your dataset—refer to Classify nodes from values in a dataset for more detailed instructions.

For more information about working with demographic data in NVivo, refer to Organize your demographic data.

Learn about datasets that contain source shortcuts

When you import material into NVivo as a dataset—for example, an NCapture file containing Facebook data or a database file—there may be content that cannot be stored directly in the dataset.

Facebook

NCapture can capture publicly available photos included in posts on a Facebook wall. Photos on Page walls are generally public, whereas photos on User walls are usually private. The photos are stored as separate picture source files in a folder at the same location as the dataset.

The dataset containing the Facebook data displays an icon representing the picture source. The icon is a source shortcut.

Databases

Databases can store 'binary objects', including images, documents, audio or video files. If you import a database table that includes 'binary objects', any image, document, audio or video files in supported formats (except plain text files) will be imported into NVivo.

Binary objects are imported into NVivo as separate source files, and stored in a folder at the same location as the dataset. The dataset displays an icon representing the source:

Source type	Icon
Document
Picture
Audio
Video

Working with source shortcuts

The icon is a shortcut to the source—if you click the icon, the source opens in Detail View. If you code the source shortcut, the entire source is coded at the node.

If you move the source to a new project folder location, the source shortcut in the dataset is updated. If you delete the source, the source shortcut is deleted from the dataset.

Top of Page