Skip to main content

Research Data Management: Use and Store

Data Storages

  

When choosing a storage for the research data, you have to consider several things: 

  • what kind of research data you will produce and how it will be processed? (the type and amount of research data may prevent usage of some storage services)
  • how are you going to save, store, use, backup and transfer your data?
  • to whom are you going to share the data?
  • what kind of access control you need?
  • are you going to actively modify your data?
  • is your data sensitive?

If you can answer these questions, it helps to determine what kind of storage system you need. Storage services provided by Tampere higher education community can be modified according to need.

Photo by Stas Knop on Pexels

Data storage services at Tampere higher education community

Personal storage space

  • Protected and backed-up personal network disk space for members of the academic community
  • For storing sensitive or non-sensitive data (p suitable for materials which may not be stored in a cloud service)
  • For storing personal work files and research data

Group storage space

  • Network disk space reserved for groups and protected from outsiders
  • For storing sensitive or non-sensitive data (p suitable for materials which may not be stored in a cloud service)
  • Files stored on the group disk space are accessible to all group members

Cloud storage

  • Office 365 OneDrive for Business can be used as a cloud storage and for sharing files with other users (also for external users)
  • For storing non-sensitive data only
  • The storage space is 1TB

Storage on a virtual server

  • A virtual server for storaging needs or for other purposes
  • For storing sensitive or non-sensitive data
  • Can be modified according to need

IDA - Research data storage service

  • A service provided by CSC (IT Centre for Science)
  • Meant for storing stable research data
  • For non-sensitive data only
  • Can be used for sharing files (also for external users)
  • Becoming an IDA user

More about data storage services (in Finnish) and computing and processing support (Tuni intra) at Tampere higher education community. Please contact it-helpdesk@tuni.fi for more information

File formats

Choose file formats according to long-term access if possible and use formats which are in common use by the research community. Favor following properties:

  • Interoperability among diverse platforms and applications
  • Availability without fees or restrictions
  • Implementable by multiple software providers without  any intellectual property restrictions

You may have to choose certain formats during data collection and analyses, and others for long term preservation. The formats you choose can depend on how you plan to analyse your data or software compatibility. You may need to convert your data files to a preservation file format at some point of your research.

Some preferred file formats for long term preservation:

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG (MPEG-1/2, MPEG-4), AVI, MXF
  • Sounds: WAV, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF/A, PNG, GIF, BMP
  • Tabular data: CSV
  • Text: XML (ODT, DOCX), PDF/A, HTML, ASCII (RTF, TXT)

More information about the file formats in Data Management Guidelines by the Finnish Social Science Data Archive (FSD) or in Recommended formats by UK Data Service.

Backup and version control

With backups we refer to the creation of additional copies of your data that can be used to restore data if the original is damaged or deleted. You should always have at least 3 copies of your data, and copies should be geographically distributed.

  • Original - Local copy - Remote copy

Many data storage systems have automatic backup functions and even some version management tools are included. It's recommended to choose your data storage according to that if possible. Do not count backup functions as a part of version management though, you should always have other copies of your data despite of backups.

  • Whichever backup method is used, it is recommended not to overwrite old backups with new ones.
  • It is important to verify and validate backup files regularly by fully restoring them to another location and comparing them with the original.
  • Use automatic backup systems provided by data storages if possible, but consider keeping your own backups alongside.

Version control

The idea of version control is to make copies of files while processing them further and to be able to go back and retrieve earlier version of any file if needed. It is important to ensure that different copies in all different locations are subject to version control. You can manage your version control many ways. A simple version control can just be based on saving multiple versions of your files and naming them descriptively enough. This method might not be enough if collaborating with multiple people though, it would be better to establish some ground rules for version control.

Some storage services offer version control features like snapshots and backup versioning, some services automatically create new versions of your files every time you modify them and some services offer version branching and merging. These are technical tools for version control to enhance working or to help users in situations when their own version control fails. Version control tools like snapshots or any backup related tools should not be regarded as a part of your version management as backups are for recovering from the worst case scenarios when your own version control fails.

Best practices:

  • Decide how many versions of a file to keep, which versions to keep and how to organise versions
  • Major and minor versioning, and how to set milestones
  • Systematic naming
  • Record changes
  • Tracking the location of files if they are stored in a variety of locations
  • Synchronise files
  • Identify one location for the master versions
  • Control rights to file-editing

Data security

Data security means protecting data from external and internal hazards, like accidents and errors but also from unauthorized access or use. Data security has to settle to a level with the nature of the data and risks involved. Data that contains confidential or personal information should be treated with higher levels of security.

It's advised to make a data security plan as a part of your DMP. It can contain following aspects:

  • Storage and security of data 
  • Version control
  • Encrypting data
  • Backing up data
  • Checking data integrity
  • Erasing data

Actions to improve data security:

  • Encrypting sensitive data before it's stored or transmitted
  • Separate sensitive data content if possible
  • Pay attention also when erasing your data, deleting it won't destroy the data

Please check the Data security policy at Tampere higher education community and contact tietoturva@tuni.fi for more details.

Contact us

Is there something you did not found in this guide? Or is some important information missing? You can always contact us for further information, and we will help you with the research data management.