Blog - Page 2 of 2 - Jarno N. Alanko

April 20, 2020February 17, 2021

Predicting nucleotides in the E. coli genome

In this post I model the reference sequence U00096.3 of Escherichia coli strain K-12 using Markov models and simple neural networks. The goal is to try predict to the nucleotide in some position of the genome, given some number of preceding nucleotides as a context. Let’s start with some basic statistics. There are 4.6 million […]

April 13, 2020February 17, 2021

Visualizing k-mer statistics of bacterial genomes

Let us start with a brief recap of the biology of gene expression. A genome is a string of nucleotides A, C, G and T. This string is the source code for proteins that the cell can produce. Proteins are strings of amino-acids, where the amino-acids are selected from a set of 20 naturally occurring […]