- A fully reproducible one-stop-shop for the analysis of iTRAQ/TMT data
Veit Schwämmle, University of Southern Denmark
Labelled peptide mass spectrometry provides fast large-scale comparison of protein abundances over multiple conditions. To date, no one stop shop software solution exists that enables the common researcher to carry out the full analysis of the acquired raw data. Pipelines for this analysis have often been established in laboratories that are based on a combination of different software tools and in-house programs. Different and often new versions of the used tools and issues with the compatibility of apparently interoperable tools make it very difficult to ensure reproducible proteomics data analysis.
We present a 100% reproducible software protocol to fully analyse data from one of the most popular types of proteomics experiments. The protocol is fully based on open source tools installed on a docker container, additionally providing a user-friendly and interactive browser interface for guidance of configuration and execution of the different operations. An example use case is provided that can be used for testing and adaption of own data sets. With this setup, analysis of labelled MS data will yield identical results on any computer that meets the computational bandwidth to run the analysis.
- Kinetic modelling of metabolism – Concept and Application
Thomas Nägele, Ludwig-Maximilians-Universität München
Quantitative analysis of metabolism combines the experimental quantification of metabolites, proteins and/or transcripts with mathematical approaches. This ultimately aims at deriving a causal explanation of experimentally observed dynamics of metabolism being due to, for example, diurnal regulation or stress exposure. Kinetic modelling of metabolism functionally combines dynamics of metabolite concentrations, i.e. substrates and products of biochemical reactions, with enzyme kinetic parameters to quantitatively describe and predict reaction rates in biochemical networks.
This workshop will introduce basic concepts of kinetic models and their application to solve a biochemical problem. It will introduce a step-wise procedure how to draft a biochemical reaction network, how to derive a mathematical representation using ordinary differential equations (ODEs) and how to solve these equations using experimental information. Kinetic modelling will be performed within the numerical software environment MATLAB® with a hands-on session. The course will be held on an introductory level and will provide all necessary basics of enzyme kinetics, ODE modelling, parameter optimization and its computational application. Hence, theoretical background of kinetic modelling is not preliminary for course participation.
- Using machine learning methods to analyze metabolomics and proteomics data by COVAIN
Xiaoliang Sun, Vienna Metabolomics Center (VIME)
COVAIN, a software package for Omics data analysis covering uni- and multi-variate statistics, time-series and network analysis, has been preferred by experimental biologists for its easy-to-use graphical user interface, comprehensive statistical analysis, convenient multi-level omics integration and versatile plotting functionalities. Recently, it has been added a powerful supervised machine learning module which aims to extract the most relevant features to predict the treatment effects.
Typically, metabolomics data contain hundreds of variables (compounds) under different treatments/conditions. Usually not all variables are relevant to each treatment; some are statistically significant and others not, which can be revealed by univariate statistics such as t-test. Selecting the best subset of variables to distinguish each treatment is a biomarker search approach. Traditional feature selection methods that are applied to each variable are based on “filtering” concept where variables below some user-defined statistics threshold are filtered out. The filtering criteria can be p-value in t-test, ANOVA, stepwise regression or size of the loadings in PCA (principal component analysis), the ViP scores of PLS (partial least squares), etc. However, they often fail to select the best subset since the combined effects of two or more variables are not considered in predicting the treatments.
In this workshop, genetic algorithm will be introduced as multivariate feature selection method. Genetic algorithm mimics the natural evolution that the initial populations which contain several variables are evaluated by their fitness function (prediction performance); the populations with good fitness can have offspring, which then undergo crossover, random mutation and have next generation. The process is repeated to many generations until the best subset are identified.
The prediction performance will be evaluated by the ROC and AUC curves of classification models, which themselves include many popular methods such as SVM (support vector machine), random subspace, KNN (k-nearest neighbours), logistic regression, etc. We will use a recently published Gestational Diabetes Mellitus metabolomics data to illustrate the whole workflow. Thus, participants will systematically learn the principles, the advantages and disadvantages of popular machine learning methods, understand the general workflow and get hands-on experiences on using COVAIN to analyze their own data.
- Protein interaction networks
Jörg Menche, Research Center for Molecular Medicine of the Austrian Academy of Sciences (CEMM)
Protein interaction networks are fundamental to our understanding of complex genotype to phenotype relationships. In this work shop we will discuss various aspects of network biology. The includes analytical concepts, data generation and the impact of networks in genome and proteome research, from the understanding of basic molecular mechanisms and cellular processes, the importance of network approaches in understanding complex human diseases and drug action and potential implications in practical medicine.