A safe depository

Illustration laptop and file folders — Image: FGC/shutterstock

How FAU researches databases

March 7, 2022

Databases regulate how information is stored and who is allowed to access it. Furthermore, they help to make scientific findings available to a wider community. FAU also has its fair share of databases used for research purposes, for mining data and for storing information.

Financial and insurance service providers, industrial companies, online retailers, public administration – they all use data, be it in reference to bank accounts, insurance policies, products, suppliers, customers, or tax payers. The one thing they all have in common is the need to store this information in a well-structured way and to be able to access it quickly as and when required. At the same time, it is essential that data is maintained in a central system, even if access is shared. This is why we have databases. ‘Some people think that an Excel file is a database,’ says Prof. Dr. Klaus Meyer-Wegener from the Chair of Computer Science 6 (Data Management) at FAU. ‘In some instances, it may be, but only if access rights are regulated and there are not stray copies floating about where data take on a life of their own.’

It depends on the type

Unlike simple spreadsheets, database management systems allow coordinated access by at times very large numbers of users, generally via an interface: when the Tax Office checks the latest tax declaration, personal information and documents are stored on a central server, and the latest version can be accessed by other staff. The same applies to data in the manufacturing industry or the retail trade, for example in ERP systems. Klaus Meyer-Wegener explains: ‘Database systems also communicate with other software programmes and regulate administrative processes such as monitoring and data backups.’

By far the most common type of databases are relational systems. They are based on tables and contain only structured data, in other words categorised content and descriptions rather than free text. They typically use Structured Query Language (SQL), a programming language developed in the 1970s by IBM. SAP, the most successful enterprise software in the world, is also based on relational databases. In recent years, non-relational NoSQL databases have become more common. They cannot be used to coordinate complex processes, but as they offer considerably quicker access to information they are often used for search queries on the Internet. In his latest research, Klaus Meyer-Wegener is investigating how to accelerate query processing in large databases. He is focusing on what is known as near data processing. ‘Data, for example from large online retailers, tend to be accessed from background memory. However, this is approximately one thousand times slower than the working memory that manages the query,’ he explains. Together with colleagues at FAU and the University of Magdeburg, he is investigating how these data can be filtered and re-structured as soon as they are accessed from background memory. ‘We could ease the pressure on the working memory by calculating batch totals in the background, or by only transferring those goods that are actually available for delivery,’ explains Meyer-Wegener. ‘When you consider that there are millions of such queries every day, changing our habits clearly has the potential for making significant energy savings as well.’

Valuable research data

Optimisation processes such as those outlined above are of less relevance for research data. The focus here is the ability to cope with the huge amounts of data generated at universities and other research institutes. As a result, research funding is increasingly linked to the condition that research data are recorded in a structured manner and made available for interdisciplinary and inter-institutional research. Five years ago, the European Commission launched the GO-FAIR initiative, and FAU has embraced this concept. FAIR stands for Findable, Accessible, Interoperable and Reusable. ‘A lot of researchers experience difficulties when they try to put it into practice,’ says Dr. Marcus Walther. ‘They either have to acquire the knowledge for developing complex databases and workflows single-handedly or involve external specialists, who are often unaware of the peculiarities of the academic world.’

Grafik zeigt Server- und Datenbankfunktionen. — Databases make it easier to access information. Image: FGC/shutterstock

Walther is the managing director of the Competence Unit for Research Data and Information, CDI, that was established at FAU in early April 2021. CDI acts as an internal university competence centre for research data management. Freely exchanging research data is very hard using common publication practices. ‘Look at collections of historic artefacts, for example, be it medical devices or preserved animals. Semantic data models are suitable for describing them, but for structured searches to be effective, metadata have to be standardised.’ However, there are obstacles for interoperability even with what may appear to be definite values, such as physical measurements. Units of measurement, for example Celsius, Kelvin or Fahrenheit, need to be just as clearly defined as the measurement requirements and parameters of the devices used.

As a central research institution, four members of staff at CDI are currently continuing the work started by the working group on research data and research information (AGFD) launched at FAU in 2019. Supported by the approximately 40 members of CDI, they are on hand to give advice about dealing with digital research data, help setting up IT systems for research data management and answer questions about licensing laws and data protection. They cooperate closely with the University Library and the Data Protection Officer at FAU. In future, the CDI will act as a hub between the Erlangen Regional Computing Centre (RRZE), the Medical Centre for Information and Communication Technology at Universitätsklinikum Erlangen (MIK) and the individual research teams at FAU.

Biobanks – Connecting patients’ data and samples

In Biobanken werden besondere Daten archiviert: Neben Patienten/ innen-Informationen auch Gewebe- und Flüssigkeitsproben. — Special data is archived in biobanks: tissue and fluid samples as well as patient data. Image: angellodeco/shutterstock

One special type of research databases are the biobanks of the Faculty of Medicine at FAU and Universitätsklinikum Erlangen. Many clinics and departments at UKER archive not only patient information but also human samples such as tissue samples of skin or tumour cells, or liquid samples, especially blood, urine or saliva. They are mostly taken during routine health checks, in rarer cases also as part of medical research. Storage is anything but straightforward: tissue samples are preserved in paraffin or in some cases liquid nitrogen, and liquid samples are kept chilled at a permanent temperature of minus 80 degrees Celsius. The idea behind biobanks is to make samples available for medical research. This has proved rather problematic so far, however. Anyone who wanted to investigate, for example, the correlation between enzyme activity and an autoimmune disease would usually have to find out themselves where information and samples were archived, where the data originated from or which procedures had to be followed in order for the data to be released.

Simplified procedures for researchers

In order to simplify this procedure, the Central Biobank Erlangen (CeBE) was established in late 2020. A total of 16 biobanks have already joined the project. CeBE is a member of the German Biobank Alliance (GBA), in which nearly all German university hospitals are involved. ‘We support decentralised biobanks in characterising, registering and archiving human samples,’ explains Dr. Christina Schüttler, the coordinator at CeBE. ‘At the same time, we are the central point of contact for the transfer office of the data integration centre and provide the research database with the metadata of all biological samples from UKER which have broadly been approved for use for secondary purposes.’

This eases the burden for researchers considerably. In future, they will be able to forward their project request directly to the transfer office and then, presuming approval is granted by the Ethics Commission and the Use & Access Committee (UAC), they will be provided with the biological samples from CeBE for their studies.

Webseite of the Competence Unit for Research Data and Information

Webseite of the Central Biobank Erlangen

from Matthias Münch

Cover alexander Nr. 117 — alexander 117 herunterladen

The topics of the new issue are: Database systems and research at FAU, iris implants made from artificial muscles, a drug against Long-COVID, the European University EELISA, in which universities from Europe have joined forces to think engineering further, the second part of our series on FAU strategy, the new Green Office and much more.