Terms and definitions can lay the foundation for understanding larger concepts and practical applications within Data Visualization. The following data visualization glossary contains a few important ideas to know when starting to work with data.

## Basic Terms & Definitions

**Data Visualization**: The graphical representation of information. Data is encoded into elements like length, slope, color, volume, angle and area to provide visual cues to the audience to better understand the information being presented. Visualizations commonly take the form of maps, charts and graphs.

**Dataset** (or **Data Set**): A collection of data, and the source of information for a visualization.

**Chart Type**: The most appropriate chart type depends on the message being conveyed. Examples include: line chart, area chart, scatter plot, bubble chart, bar chart, stacked bar chart, pie chart. To explore different types of charts and when to use them, read our Guide to Chart Types.

**Variable**: Any measure or attribute describing a particular item, or “record,” in a dataset.

**Categorical Variables**: Descriptive labels given to individual records, assigning them to different groups. The simplest categorical data is dichotomous, meaning that there are just two possible groups—in an election, for instance, people either voted, or they did not. But often there are multiple categories.

**Continuous Variables**: Data consisting of numbers that can have a range of values on a sliding scale. When working with weather data, for instance, continuous variables might include the total amount of rainfall recorded for each day.

**Relationship**: A connection or correlation between two or more variables through the data presented, like the market cap of a given stock over time versus overall market trend.

**Comparison**: Setting one set of variables apart from another, and displaying how those two variables interact, like the number of visitors to five competing websites in a single month.

**Composition**: Collecting different types of information that make up a whole and displaying them together, like the search terms that those visitors used to land on your site, or how many of them came from links, search engines, or direct traffic.

**Distribution**: A collection of related or unrelated information laid out to see how it correlates, if at all, and to understand if there’s any interaction between the variables, like the number of bugs reported during each month of a beta.

**Database**: Information organized into rows and columns. A spreadsheet or table is one of the most basic forms of a database.

**Tidy Data**: Well-structured data that is organized so that each variable is a column in a table and each record is a row.

**Sort**: A method for analyzing data. Ordering data from largest to smallest, oldest to newest, in alphabetical order, etc.

**Filter**: A method for analyzing data. Selecting a defined subset of the data.

**Summarize**: A method for analyzing data. Deriving one value from a series of other values to produce a summary statistic. Examples include: count, sum, mean, maximum, minimum, group.

**Join**: A method for analyzing data. Merging entries from two or more tables of data based on common variables.

**About this Tutorial**

This information was adapted from sources such as Data Visualization for Storytellers with Peter Aldhous, and Digital Media Skills Certificate Course with Jeremy Rue.

This content may not be republished in print or digital form without express written permission from Berkeley Advanced Media Institute. Please see our content redistribution policy.

© 2020 The Regents of the University of California

### Upcoming Workshops:

**TBA**