About this course
Introduction: What is a protein? What is protein function? Overview over prediction of protein function. Predicting protein function using sequence: motifs, annotation transfer by homology (homology-based inference), de novo predictions. Predicting protein function using structure: structural motifs, annotation transfer via structure similarity. Prediction of: subcellular location, protein-protein interactions, protein-DNA and –RNA interactions, protein-substrate interactions, prediction of enzymatic activity, prediction of functional classes (e.g. GO classes). Prediction of the effect of single point mutations (sequence variants) on protein function and the organism (focus on single amino acid variants).
As opposed to the first part (Protein Prediction I), protein structure plays a minor role confined to what is helpful to further our understanding of protein function. Another major difference is that alignment methods will not be discussed although their results (evolutionary information) will be central to almost all prediction methods.
Learning outcomes
Students understand crucial aspects of protein function prediction. They have learned how to solve problems particular to protein function prediction, and how to address challenges for AI/ML originating from constraints as imposed by the realities of function prediction, i.e., mostly by very limited noisy data.
As almost all major solutions are based on ML and AI, THE most important challenges for the field are to avoid over-fitting, to cope with data bias, and generally to create ML/AI solutions that will not mislead users. To understand key strategies to avoid data leakage and over-estimates of performance. These issues are particularly relevant for PP2 because there is substantially less experimental data available for protein function than for structure.
Students have learned how evolutionary information from multiple sequence alignments (MSAs) is crucial to advance prediction methods, and what the limitations of such solutions are. Due to the evolution of the field over the last few years, another major focus will be on understanding the power of protein Language Models (pLMs) which are now outperforming MSA-based methods in several aspects of function prediction.
Unlike the situation for structure prediction, function prediction poses particular challenges. What these are, how to possibly overcome them, and what to learn for other ML/AI applications from this are part of the lessons.
Ultimately, students have learned to develop their own prediction methods (in groups guided by tutors) by combining existing methods, or algorithms, and / or by creating their own new methods. Students are able to critically analyze and evaluate published methods (as readers of the publication, as peer-reviewers, and as competitors). Based on the outcome of these evaluations they can create a tool that is readily usable by experimental and computational biologists. Almost all solutions will imply to develop or apply methods based on learning (ML) and artificial intelligence (AI). This means, that they are able to convert an abstract idea of a solution under consideration of technical aspects into pseudo-code and optionally further into executable programs during the exercises.
Examination
The module is graded by an oral exam at the end of the semester. The exam takes 20 minutes for an individual respectively 60 minutes for a group. The questions will be sampled entirely from the oral lectures and the project work.
Project work in the exercises leads up to an original scientific analysis - typically the development or testing of a machine learning-based solution - which will be presented by the group at the end of the semester. Students who contribute crucially to a successful project work will get a 0.3 grade bonus if the exam has been passed.
In the exam, successful participants demonstrate their ability to devise and discuss appropriate computational approaches for a solution for a biological problem in the field of protein function prediction. For instance, they choose the appropriate methods depending on the type of data available (such as sequence, structure, experimental annotations). They can also choose the appropriate data abstraction level (such as GO level, EC classes, or structural classes).
They demonstrate their understanding of the concepts in the choice of appropriate solution approaches to the given tasks and they can evaluate these in terms of a discussion of the various pro's and con's of alternative approaches in biological as well as in technical aspects. They can demonstrate their ability to create a usable tool implementing a solution approach down to the level of pseudo-code.
Most crucial is for students to show that they have learned what it takes to communicate with researchers from other disciplines - including experimental and computational biologists - about their needs for particular aspects of advanced AI or about the strengths and limitations of particular state-of-the-art AI approaches.
More details are announced at the lecture beginning.
Course requirements
"Protein Prediction 2 - Function" for Informatics targets those who study Informatics/Computer Science at the Master level. Typically, students have completed a bachelor in informatics/computer sciences or related topics at some university and have had hands-on experience with machine learning and artificial intelligence. Ideally, participants have successfully completed the first leg in this series, i.e., Protein Prediction 1 for Informaticians, however, very motivated students may benefit from taking PP2 without PP1. There is another course tailored to students of bioinformatics/computational biology.
Resources
- Overall on protein function (book too detailed for lecture, but good to scan): AM Lesk 2004 OUP Review protein function prediction: B Rost et al (2003) Cellular and Molecular Life Sciences 60: 2637–50 Location: T Goldberg et al (2012) Bioinformatics 28: i458–i65, H Stark et al (2021) Bioinform Adv 1: vbab035 Interactions: Y Ofran & B Rost (2003) JMB 325: 377–87, M Littmann et al (2021) Scientific Reports 11: 23916, T Hamp and B Rost (2015) Bioinformatics 31: 1945–50 Variant effect: Y Bromberg & B Rost (2007) NAR 35: 3823–35, C Marquet, J Schlensok, M Abakarova, B Rost and E Laine (2024) Bioinformatics 40:
Activities
Lectures, Exercises, Questions & Answers (Q&A) sessions Lectures (include Q&A): Theoretical background for all topics will be presented in traditional lecture style with slides, as well as, interactively through white board presentations and Q&A sessions. Exercises (include Q&A): Programming of a particular novel prediction method; this will deepen and apply the material presented in the lectures; occasionally, presentation of additional material needed for better understanding; exercises also include interactive Q&A sessions, and presentations from the students.
Additional information
- More infoCourse page on website of Technical University of Munich
- Contact a coordinator
- About studying within the Euroteq alliancehttps://euroteq.eurotech-universities.eu/initiatives/building-a-european-campus/course-catalogue/
- LevelMaster
- Contact hours per week0
- InstructorsBurkhard Rost
