Skip to Main Content

Researcher's guide to responsible and open science

What is responsible data management?

Responsible data management emphasises careful planning of the life cycle of data and consideration of the principles of research ethics in the collection, processing and opening of data. Responsible conduct of research applies to all research, and its basic principles are reliability, honesty, respect, and responsibility. In accordance with responsible conduct of research, the acquisition, processing, and storage of research data must meet the criteria for scientific research, and be ethically sustainable. In addition, research conducted in different fields may be regulated by field-specific ethical guidelines, which must always be followed when collecting and processing data.

Examples of field- and subject-specific ethical guidelines and recommendations:

Also, always check what kind of discussion is taking place about research ethics in the context of your own scientific field, and what kind of key sources are cited. In many fields, there is printed literature and online texts on field-specific ethical reflection (e.g. historical research, internet research, children's and youth research, fields of technology, media and communication studies).

Developing a research career requires competence in responsible data management, and internalising ethical principles is part of every researcher's basic skills. Each researcher and members of the research group are responsible for ensuring that both general and field-specific principles of research ethics are followed in conducting research. Responsibility is based on planning and, in the case of data, on careful planning of its life cycle. A data management plan helps in planning and managing the life cycle of data. 

 

Figure: Example of planning the ethical lifecycle of data.

 

In responsible data management, you can identify research ethical perspectives and process data and research subjects accordingly. Both data management and research ethics should be understood as a cross-cutting theme for the entire research process. It is often difficult to give one correct answer to questions on research ethics, as ethical challenges vary depending on the research question and data.

Research data refers to the data produced, modified, and used in scientific research on which the results of the research are based. Research data also consists of metadata describing the context, datasets, or observational units of the data. The materials, methods, and results used and produced in research, development and innovation (RDI) projects are also research data in this context.

Different data types may have different requirements for data management, which the researcher must be able to understand.

  • Data collected from an archive
    • The data already has an owner and a storage location, and the data is already in long-term storage and can be reused. Therefore, data in an archive does not need to be made open again. The metadata of archival data is also typically already available. However, researchers should produce metadata on the archival data they use, especially if the data is collected from several different archives. Publishing metadata on the entire dataset and how it was collected may benefit other researchers.
  • Images
    • Where and how the image is collected or obtained affects data management. In the case of images produced by others, copyright may restrict the use of the image and, in particular, making it open access. There may also be restrictions on publishing images in research publications. Images received from research participants are also subject to copyright, and the participants must grant permission that the images can be used. If there are people in the image, it contains personal data, in which case the images should be treated as personal data. The people featured in the images must give their consent to the publication of the image. The use of images and other copyrighted material is also affected by the term of protection for copyright. 
  • Numerical data generated by a device
    • The amount of data can be large, so a lot of storage space may be required. There may also be a need for computing capacity. It may not always be necessary to publish or even store raw data. The possible disposal of raw data should therefore be considered. In metadata, it is essential to describe e.g. the hardware, the test situation, the tests, the sensors, the protocols, and other key elements related to data collection and processing.
  • Tissue or other biological samples
    • This is typically a physical sample, in which case its collection, transfer, and storage take place in a physical, controlled environment. Typically, the sample cannot be made open access for reuse. Obtaining samples may often require a permit or an application (e.g. biobanks).
  • Audio data
    • Sound is personal data, so data protection and its prerequisites must always be considered when collecting audio data. Audio data is typically transcribed, and the transcribed text is anonymised. The storage of audio tapes must be carefully considered for the sake of personal data, and the tapes must be kept separate from the anonymised transcripts. Making audio files open access is often impossible due to data protection. The Language Bank of Finland receives audio files, but always check the criteria at the Language Bank.

Responsible data management in a researcher's career plan

Strong competence in data management skills is utilised to produce high-quality and impactful research in which the processes used, and the outputs obtained are repeatable and transparent. Solid knowledge of data management is important at all stages of a researcher's career.

According to the general principles of research competence defined by the University, the University "places an exceedingly high value on research competence and provides its staff with support and resources for acquiring, assessing, developing and maintaining their research competence throughout their careers." To maintain these skills, each researcher at the University "must ensure that they have a basic level of competence at least in the areas of research ethics, data protection, data management, research methodologies and open science."

To achieve a basic level of competence in data management, a researcher should:

In the later stages of a researcher's career, in addition to the basic level of data management skills, the researcher must master several data management responsibilities related to project management. The good data management skills of the project manager serve as an example for the other researchers participating in the research and guarantee top-level research. Well-managed data management work enables:

  1. Making a more systematic data management plan that streamlines project work.
  2. A clear division of responsibilities for data management measures related to the different phases of the research project, which eases the workload of the principal investigator.
  3. Systematic and proactive consideration of the legal clauses applicable to research data, the principles of research ethics, and the requirements of funders and their own organisation.

The project manager’s data management responsibilities include:

  • Being responsible for maintaining the project's data management plan.
  • Being able to lead data management so that the requirements of funders and their own organisation are met.
  • Taking care of agreements related to research data. They must have an overall view of e.g. the requirements of legislation at all stages of the project.
  • Defining the resources required for data management (time, money, tools, and services).
  • Leading shared data management practices.
  • Delegating the research group's data management tasks, i.e. appointing the persons who are responsible for certain areas of data management.
  • Committing the research group to good data management and ensuring that the group has access to the necessary tools. They are also responsible for ensuring that everyone in the project has the necessary training in the area of data management for which they are responsible.
  • Considering the special characteristics of data management in domestic and international consortium projects. The principal investigator for the research should share their own responsibility with the other project participants.
  • Participating in the development of data-related matters in their own organisation by, for example, commenting, taking initiatives, and highlighting areas for development and any shortcomings.

To achieve the above-mentioned learning outcomes related to project management, the principal investigator of the research must also take care of their own competence in data management. The principal investigator shall:

  • Maintain their own basic data management skills and ensure that everyone in the project has sufficient data management skills.
  • Know and actively use data management support services.

Agreements regarding data

When conducting research, always agree on the rights of use, ownership, division of responsibilities, processing of personal data, and management of sensitive data even before starting data collection. This way you can ensure and clarify your own and other researchers' rights to use the data. If issues related to data rights have not been considered early enough, it may not be possible to share and make the data open access. Agreements regarding data, researchers' rights, responsibilities, and obligations are also part of responsible conduct of research. In addition, research funders may have conditions related to contracts, rights, and ownership in their funding terms.

Agreements regarding data are emphasised in research projects. It is the responsibility of the principal investigator to ensure that the necessary agreements are signed. It is safest to make the agreement in writing and update it if necessary. In this way, researchers' rights of use can be defined and confirmed, and the party that makes the decisions related to the data can be determined.

Always also agree on the authorship of the data in the project. Defining authorship helps downstream users of research data to refer to the author correctly. The production and distribution of datasets is also counted as a scientific merit in the template for researcher's curriculum vitae. The definition of authorship is also central from the perspective of responsible conduct of research. Make sure to agree on the authorship of the research data at the very beginning of the research project. The Finnish National Board on Research Integrity's recommendation on agreeing on authorship for research publications (PDF) can be used as a model.

Questions to consider when coming to an agreement regarding data:

  • Which data will be released for reuse?
  • If one or more researchers have contributed data from previous research to the research project, will this be included in the data released for reuse?
  • When can the data or parts of the data be released for further use?
  • For what purposes is the data released (e.g. only for research, teaching, and study purposes)?
  • Who has the right to sign an archiving agreement on the research data?
  • Do you want to set conditions for reuse?
  • If reuse is to be made subject to a permission, who decides on granting the permission?

You can get help in drafting research-related agreements from the University's legal services. Agreement templates for more general agreements can be found on the University's intranet. Research projects may include, among other things, the following agreements:

  • research agreement
  • transfer of rights agreement
  • consortium agreement
  • non-disclosure agreement
  • archiving agreement
  • agreements related to data protection and the processing of personal data
    • data processing agreement
    • consent forms
    • agreement on joint controllership

Licenses

The rights related to data also include deciding on the licence of the published data. Licensing ensures the usability of the data according to clear terms. In accordance with the Open Science Policy of the Tampere University community, machine-readable licences that allow reuse should be favoured in the publication of data. Funders may also have conditions regarding licences for data and metadata. CC licenses require people using your work to credit you as the original author in a manner you want, but your work can be shared and edited according to the conditions you set. The recommended Creative Commons licenses for research data are CC BY and CC0. A CC0 license is recommended for metadata. Read more about choosing a license on the Creative Commons page.

Social media data

Research use of social media data has increased. The attractive aspects of social media data are the amount and diversity of data, as well as the opportunity to study social media with different methods and perspectives. There is no one right way to study social media data. Social media sets boundaries for the collection and use of data, which must always be considered before data is collected.

When collecting and using social media data, the following points should be considered:

  • Different platforms and their terms of use.
    • Usage rights: Platforms may have restrictions on how data from them can be collected, used, and shared. Automated collection or harvesting is often prohibited. Similarly, sharing data with third parties is typically prohibited.
    • The terms of use of the platforms can be difficult to interpret. The terms can change frequently and unpredictably.
    • The interests of platforms are primarily commercial and typically they own the data produced by users. A researcher is a third party between the platform producing the service and the user producing the content.
    • Platforms do not tend to enable large-scale data collection.
      • X has been the most research-positive, but data collection has become subject to a charge.
      • Meta's CrowdTangle tool is currently not recommended for use in Tampere University.

  • Social media data is often data produced by others (text, image, video, etc.) that cannot be used freely.
    • Copyright law applies to the use of content produced by others. You need permission to use the data for your purposes.
    • You probably will not be able to share copyrighted data and, for example, archive it.
  • Social media data often contains information about people.
    • The General Data Protection Regulation is applied, i.e. compliance with data protection principles is required.
    • What is being studied and how it is investigated determines whether data protection is considered. Even if you do not directly study individuals, it can be difficult to detach them from a phenomenon or a subject.
    • A social media profile can be a fake profile or a bot.
    • Informing research participants may be difficult or impossible. Consider how informing and signing an agreement to participate in the research can be implemented.
      • Deviating from informing research subjects may require an data protection impact assessment.
      • Any deviation from informed consent requires ethical review in human sciences.
      • When you collect data from public sources, you can publish a privacy notice and an information sheet for example on your research group's website.
  • Social media data is remarkably diverse data.
    • It can include text, images, multimedia, audio, videos, games, films, collective works (e.g. with multiple copyrights), etc.
    • Some of the data disappears quickly. Its collection and storage should be carefully planned.
  • The boundaries between public and private are not clear in all cases.
    • Please note that data published on social media is in many cases not public and does not compare, for example, to newspaper articles. Privacy settings for platforms, sites, and groups are subject to change.
    • Be careful when using data that is public, or appears to be public, and check with the site administrator or content creator that you can use the material for research purposes.
  • It may not be possible to make material collected from social media open access for reuse.
    • Platforms may prohibit the sharing of data with a third party in their terms of use.
    • Archiving data for example in Finnish Social Science Data Archive (FSD) is possible under the following conditions:
      • the data has been collected from a Finnish platform,
      • the archiving is approved by the service provider,
      • research subjects (people using the platform) have received information regarding the archiving,
      • the data can be anonymised.

Further reading:

Ahteensuu, Marko (2019), Do you use social media data in your research? Responsible research web pages.

Laaksonen, Salla-Maaria (2018). Expert ethical online research. Responsible research web pages.

Rossi, Arianna (2022). The Hitchhiker's Guide to the Social Media Data Research Galaxy - A Primer. In: Bieker, F., Meyer, J., Pape, S., Schiering, I., Weich, A. (eds) Privacy and Identity Management. Privacy and Identity 2022. IFIP Advances in Information and Communication Technology, vol 671. Springer, Cham. https://doi.org/10.1007/978-3-031-31971-6_6

 

Examples of studies, where social media data has been used:

Hiippala, T., Hausmann, A., Tenkanen, H., Toivonen, T., (2019) Exploring the linguistic landscape of geotagged social media content in urban environments, Digital Scholarship in the Humanities, Volume 34, Issue 2, June 2019, 290–309, DOI: https://doi.org/10.1093/llc/fqy049.

Chen, Y., Sherren, K., Smit, M., & Lee, K. Y. (2023). Using social media images as data in social science research. New Media & Society25(4), 849-871. DOI: https://doi.org/10.1177/14614448211038761.

Jauho, M., Pääkkönen, J., Isotalo, V., Pöyry, E. & Laaksonen, S-M., (2023) How do trendy diets emerge? An exploratory social media study on the low-carbohydrate diet in Finland, Food, Culture & Society, 26:2, 344-369, DOI: 10.1080/15528014.2021.1971436.

Ohme, J., Araujo, T., Boeschoten, L., Freelon, D., Ram, N., B. Reeves, B., N. Robinson, T.,  (2023) Digital Trace Data Collection for Social Media Effects Research: APIs, Data Donation, and (Screen) Tracking, Communication Methods and Measures, DOI: 10.1080/19312458.2023.2181319.

Logo

Email: library@tuni.fi
P. 0294 520 900

Kirjaston kotisivut | Library homepage
Andor

Palaute | Feedback