# Categorical Data

guide > Concepts > Categorical Data- General
- Load Data

A data set is defined as a collection of data items, where an item may, for instance, represent a country in a collection of statistical data. The characteristics of a data item are described by a collection of variables. A variable can be defined as a property, or a characteristic, of a data item that may vary from one item to another or over time. As an example below, the data items represent countries, the variables may represent various characteristics of the countries such as population size, percentage for various ageing groups or a category such as belong to a certain continent or a classified Human Development Index (HDI).
A multivariate data set is simply a data set including two or more variables. The items of a multivariate data set can be thought of as points in a multidimensional space where each dimension represents a variable. A standard format used for structuring multivariate data is to use an m-by-n matrix including m rows, usually representing data items, and n columns, usually representing variables. The table below displays a small example of a world data matrix where the first two columns represents the identification (ISO value and Name) and column C and beyond are data variables.

Data variables are here classified into two different types based on common classification taxonomy used in most visualization and data mining literature: numerical (quantitative) and categorical (qualitative). Categorical data can then be different names (Continent) or a classification that provide enough information to order the items (HDI Level).

Read a more descriptive introduction to categorical data in this document.

Categorical data can be visualized as distinguishable colours with a special colour map as shown in the interactive map linked with scatter plot example above. We see how african countries are group together with low value of "life expectancy" and "HDI" value., or as spatial groupings as in the Distribution Plot and the Treemap.

Data variables are here classified into two different types based on common classification taxonomy used in most visualization and data mining literature: numerical (quantitative) and categorical (qualitative). Categorical data can then be different names (Continent) or a classification that provide enough information to order the items (HDI Level).

Read a more descriptive introduction to categorical data in this document.

Categorical data can be visualized as distinguishable colours with a special colour map as shown in the interactive map linked with scatter plot example above. We see how african countries are group together with low value of "life expectancy" and "HDI" value., or as spatial groupings as in the Distribution Plot and the Treemap.

Categorical data can be loaded into the application in two ways:

1. Excel files with mixed categorical and numeric indicators can be read in any of the supported structures found at Excel Data section. Since the categories should not vary over time - when using time data - the categorical indicators does not need to be repeated.

2. You can also load categories fitting for the data after loading the normal numeric indicators as usual. If the Excel data are structured as the file below you can load it through the group and category control. The first column should be the record ID:s of the regions you have in your regular data and the following columns should be the categorical values for each region. You can also load unicode files containing only categorical indicators by marking them with PARSETYPE "C".

1. Excel files with mixed categorical and numeric indicators can be read in any of the supported structures found at Excel Data section. Since the categories should not vary over time - when using time data - the categorical indicators does not need to be repeated.

2. You can also load categories fitting for the data after loading the normal numeric indicators as usual. If the Excel data are structured as the file below you can load it through the group and category control. The first column should be the record ID:s of the regions you have in your regular data and the following columns should be the categorical values for each region. You can also load unicode files containing only categorical indicators by marking them with PARSETYPE "C".