The ongoing progress in understanding relevant genomic functions, e.g. for treatment of cancer patients, contrasts to the tremendous amount of genome data that needs to be processed for it. For example, the human genome consists of approx. 3.2 billion base pairs. They translate to a 3.2 GB of genome data for a single run. Quality assurance in medical services requires the use of 30-fold and more coverage data. Identifying a concrete sequence of 20 base pairs within the raw genome data takes hours up to days if performed manually.
Processing and analyzing genome data are challenges for medical and biological experts. Theses have a significant impact on the progress of research projects. From a software engineering point of view, improving the analysis of genome data is both a concrete research and engineering challenge. Learn more about applications of our platform to enhance processing and analysis of big medical data.
Genome Data Acquisition
The whole genome needs to be reconstructed prior to genome data analysis. This is a necessity since nowadays next-generation sequencing devices process only a limited number of base pairs together. We support individual data processing pipelines to fit your individual requirements. During the alignment phase, these DNA snippets are mapped to their corresponding position within the whole genome. The reconstructed genome is compared to a reference, e.g. normal tissue. Discrepancies between both genomes are identified during the variant calling.
Genome Data Analysis
After completing these processing steps, the actual analysis of the acquired DNA data can take place. The requirements for this analysis are very specific depending on the actual medical or research questions to answer. Since there is no single analysis, it requires a variety of manual steps, such as literature search, identification of similar cases, or research in various annotation databases. Our Genome Browser enables the interactive exploration of genome variants, its comparison, and links to related knowledge databases and articles.
The combination of real-time analysis and specific applications for medical and biological experts is the foundation of our projects. For example, our analysis of patient cohorts supports the identification or verification of hypotheses regarding similarities and differences in patient cohorts. Together with interdisciplinary project teams, we apply our knowledge in software engineering and in-memory database technology to improve the processing and analysis of genome data. Thus, our work addresses the requirements of clinics, medical students, researchers, and physicians.