The continuous progress in understanding relevant genomic basics, e.g. for treatment of cancer patients, collides with the tremendous amount of data that need to be processed. For example, the human genome consists of approx. 3.2 billion base pairs, which translates to a 3.2 GB of data for single coverage. Quality assurance in medical services requires the use of 30x and more covered data. Identifying a concrete sequence of 20 base pairs within the raw genomic data takes hours up to days if performed manually.
Processing and analyzing genomic data are challenges for medical and biological experts that has a significant impact on the progress of research projects. From a software engineering point of view, improving the analysis of genomic data is both a concrete research and engineering challenge. Learn more about applications of our platform to enhance processing and analysis of big medical data.
Genome Data Acquisition
Before the genome data analysis can start, the whole genome needs to be reconstructed. This is a necessity since nowadays next-generation sequencing devices process only a limited number of base pairs together. We support individual of genome data processing pipelines to fit your individual requirements. During the alignment phase, these DNA snippets are mapped to their corresponding position within the whole genome. The reconstructed genome is compared to a reference, e.g. normal tissue. Discrepancies between both genomes are identified during the variant calling.
Genome Data Analysis
After completing these processing steps, the actual analysis of the acquired DNA data can take place. The requirements for this analysis are very specific depending on the actual medical or research questions to answer. Since there is no single analysis, it requires today a variety of manual steps, such as literature search, identification of similar cases, or research in various annotation databases. Our Genome Browser enables the interactive exploration of genome variants, its comparison, and links to related knowledge databases and articles.
The combination of real-time analysis of big scientific data with specific applications to answer concrete research questions of medical and biological experts is the foundation of this project. For example, our analysis of patient cohorts supports the identification or verification of hypotheses regarding similarities and differences in patient cohorts. In interdisciplinary teams, we apply our knowledge in software engineering and in-memory database technology to improve the processing and analysis of genomic data. Thus, our work addresses the requirements of clinics, medical students, researchers, and physicians.