Comparing Biological Sequences with Neural Networks
Comparison of biological sequences is often performed by a simple dynamic programming algorithm called Needleman-Wunsch algorithm. Due to its nature, the algorithm only admits simple scoring schemes which do not fully capture the properties of biological sequences. The goal of this thesis is to design an analogous algorithm, where cells of the dynamic programming matrix will be replaced with a neural network, thus admiting much more complex non-linear scoring schemes. More general scoring scheme that can be automatically trained will allow us to apply sequence alignment to other sequence representations, including electrical signals from nanopore sequencing or to various representations of biological sequences that include uncertainty.