Skip to main content

Research Data Management: Describe your data

Description of your data

  • what data will you use or produce in the project?
  • in which file formats will the data be in?
  • how much data will you have (approximately)
  • if you will you use or develop special software or code to analyse your data.
  • if you use personal or sensitive data (see the guidance after “Tips for best practices”)

Your answer to following question forms a general structure for your data management plan. Categorise your data in such a way that you can refer to it later in the plan. For example,

  1. Data collected or produced by you or your research group
  2. Data collected by other researchers
  3. Data from other sources such as registers, statistics, measuring stations etc.
  4. Other materials needed to use and understand the data, such as codes, softwares, lab notebooks etc.

The categorisation follows the license policy of your data sets. For example, briefly describe according to which license you are entitled to (re)use the data.

In DMP, describe the required disk space - not how many informants were participating the project. A rough estimation of the size of the data is sufficient - e.g. less than 100 Gb, approx. 1 Tb, or several petabytes.

Tips for best practices 
  • When categorizing your data, use bullet points for a concise way of presenting 
    • data types
    • The file formats (for example, .csv, .txt, .docx, .xslx, .tiff) used during the research project may differ from those used in archiving the data. List both. The file format is a primary factor in the accessibility and reusability of your data in the future.
    • Favour software and formats based on open standards to enable data reuse, interoperability and sharing.
    • the size of the data sets
    • the software used (especially if the software is coded in your project)
    • other relevant information related to your data sets.
  • AVOID OVERLAPPING WITH THE RESEARCH PLAN! Data analysis and methodological issues related to data and materials should be described in your research plan.

File formats

Choose file formats according to long-term access if possible and use formats which are in common use by the research community. Favor following properties:

  • Interoperability among diverse platforms and applications
  • Availability without fees or restrictions
  • Implementable by multiple software providers without  any intellectual property restrictions

You may have to choose certain formats during data collection and analyses, and others for long term preservation. The formats you choose can depend on how you plan to analyse your data or software compatibility. You may need to convert your data files to a preservation file format at some point of your research.

Some preferred file formats for long term preservation:

  • Containers: TAR, GZIP, ZIP
  • Databases: XML, CSV
  • Geospatial: SHP, DBF, GeoTIFF, NetCDF
  • Moving images: MOV, MPEG (MPEG-1/2, MPEG-4), AVI, MXF
  • Sounds: WAV, AIFF, MP3, MXF
  • Statistics: ASCII, DTA, POR, SAS, SAV
  • Still images: TIFF, JPEG 2000, PDF/A, PNG, GIF, BMP
  • Tabular data: CSV

More information about the file formats in Data Management Guidelines by the Finnish Social Science Data Archive (FSD) or in Recommended formats by UK Data Service.

Non-digital research data

Some research data are still gathered and handled non-digitally. Examples of such data might include e.g. paper-based data, biospecimen, fossil specimen, art samples, artefacts or other concrete objects that either cannot be converted to digital form at all or such processing would require too much labor or other resources to be feasible.

Regardless of whether research data are digital or non-digital, proper data management is always crucial. Non-digital data require different approaches, methods and tools for preservation and management than digital data. Non-digital research data might require e.g. filing cabinets, archive-friendly filing systems, physical storage solutions, specific climate conditions and other special tools and instructions for preservation.

Metadata production is a key element in non-digital research data management. Different types of metadata (e.g. descriptive, structural, process-related, administrative etc.) are required to ensure the proper care and preservation of non-digital research data. The principles of data documentation will be explained in Metadata & Documentation -section. In addition to this, Tampere University Archives also give further instructions on metadata guidelines regarding non-digitally preserved research data.

For more information and advice, please contact:

  • Tampere University Laboratory Services provide health-related research projects with services regarding e.g. laboratory
    equipment and sample storage
  • Tampere University Archives provide assistance and information on e.g. long-term preservation, data classification,
    metadata management and physical storage for research data requiring long-term preservation.

If you have personal or sensitive data

Specify all dataset types that contain personal, sensitive or confidential data. Identifying the sensitive components of research data is particularly important, as the planning of data management focuses on the identification and management of related risks. If you work with personal data, specify the party serving as the controller.

Sensitive data is information that could cause damage if made public:

  • Sensitive personal data; no comprehensive listing of sensitive personal data can be drawn up. The parties conducting the research are responsible for identifying data whose disclosure could harm the study subjects
  • Sensitive data may relate to health or the risk of developing a disease, sexual orientation, ethnic background, trade union membership or religious conviction.
  • Sensitive species data, such as data concerning endangered animals and plants, data related to nature conservation or biosafety
  • Other confidential information, such as patents, trade secrets, military information, or organisational information that could, at the very least, damage you or the organization’s you represent reputation.
  • According to its Data Protection Policy (chapter 7), Tampere University is designated as the data controller in research projects that are approved by the University and conducted with core funding granted to the University by the Finnish Government or with external funding provided by an external party.

Personal information includes all identifiers from which the person is identifiable directly or indirectly.

  • Direct identifiers: name, phone number, personal identity code, picture, voice, fingerprint, dental chart.
  • Indirect identifiers: gender, age, education, profession, nationality, work history, system log history, marital status, residence information, IP address, car license number.
  • Remember that spatial data can also contain enough information for identification of individuals.
Further information

Data service

Research Data Services assist the staff and students of the Tampere higher education community in matters related to research data management. What we do:

  • We organise research data management and data protection trainings covering topics such as describing your data, data protection, data storage services and sharing your data. Content of trainings and workshops can be tailored to meet your needs. More detailed information on trainings will be updated to our website. Don't hesitate to contact us!
  • We provide you with this Data management guide and other instructions and resources for the planning, organising, storing, sharing and sharing of research data.
  • We comment on data management plans

Plase email and let’s solve your problem together!