Skip to main content

Terminology of the Dataverse Project

The Dataverse Project is software, similar to GitHub, that allows the creation of data repositories. Here we highlight the terminology used with the project.

Host

A Dataverse host is an installation of the Dataverse Project software. At Harvard for example, the host can be accessed on the web at [https://dataverse.harvard.edu/]. A host is usually created for an institution for users/researchers to create projects called a Dataverse.

Dataverse

Similar to GitHub, users create repositories, referred to here as Dataverses. A Dataverse can contain any number of files that reside in datasets.

Datasets

Datasets are containers or folders for files. So while GitHub allows users to drop files directly into a repository root, files in a Dataverse are placed within datasets. Datasets are not simply a named folder to place files. Datasets can have a rich set of metadata applied to them, even before any files are added to the dataset.

Files

Files in a Dataverse dataset are the same as any files residing on your computer. Files might be images (eg. .jpg, .tiff, .gif), text documents (eg. .docx, .txt, .md), archives (eg. .zip, .tar), you name it. Files may also have metadata applied to them, such as tags/labels to