The Science of Discworld II
goes on for about three billion letters.
The phase space for genomes, DNA-space, consists of all possible sequences of a given length. If weâre thinking about human beings, the relevant DNA-space comprises all possible sequences of three billion code letters C, G, A, T. How big is that space? Itâs the same problem as the cars in the car park, mathematically speaking, so the answer is 4 à 4 à 4 à ⦠à 4 with three billion 4s. That is, 4 3,000,000,000 . This number is a lot bigger than the 70-digit number we got for the car-parking problem. Itâs a lot bigger than L-space for normal-sized books, too. In fact, it has about 1,800,000,000 digits. If you wrote it out with 3,000 digits per page, youâd need a 600,000-page book to hold it.
The image of DNA-space is very useful for geneticists who are considering possible changes to DNA sequences, such as âpoint mutationsâ where one code letter is changed, say as the result of a copying error. Or an incoming high-energy cosmic ray. Viruses, in particular, mutate so rapidly that it makes little sense to talk of a viral species as a fixed thing. Instead, biologists talk of quasi-species, and visualise these as clusters of related sequences in DNA-space. The clusters slosh around as time passes, but they stay together as one cluster, which allows the virus to retain its identity.
In the whole of human history, the total number of people has been no more than ten billion, a mere 11-digit number. This is an incredibly tiny fraction of all those possibilities. So actual human beings have explored the tiniest portion of DNA-space, just as actual books have explored the tiniest portion of L-space. Of course, the interesting questions are not as straightforward as that. Most sequences of lettersdo not make up a sensible book; most DNA sequences do not correspond to a viable organism, let alone a human being.
And now we come to the crunch for phase spaces. In physics, it is reasonable to assume that the sensible phase space can be âpre-statedâ before tackling questions about the corresponding system. We can imagine rearranging the bodies of the solar system into any configuration in that imaginary phase space. We lack the engineering capacity to do that, but we have no difficulty imagining it done, and we see no physical reason to remove any particular configuration from consideration.
When it comes to DNA-space, however, the important questions are not about the whole of that vast space of all possible sequences. Nearly all of those sequences correspond to no organism whatsoever, not even a dead one. What we really need to consider is âviable-DNA-spaceâ, the space of all DNA sequences that could be realised within some viable organism. This is some immensely complicated but very thin part of DNA-space, and we donât know what it is. We have no idea how to look at a hypothetical DNA sequence and decide whether it can occur in a viable organism.
The same problem arises in connection with L-space, but thereâs a twist. A literate human can look at a sequence of letters and spaces and decide whether it constitutes a story; they know how to âreadâ the code and work out its meaning, if itâs in a language they understand. They can even make a stab at deciding whether itâs a good story or a bad one. However, we do not know how to transfer this ability to a computer. The rules that our minds use, to decide whether what weâre reading is a story, are implicit in the networks of nerve cells in our brains. Nobody has yet been able to make those rules explicit. We donât know how to characterise the âreadable booksâ subset of L-space.
For DNA, the problem is compounded because there isnât some kind of fixed rule that âtranslatesâ a DNA code into an organism. Biologists used to think there would be, and had high hopes of learning the âlanguageâ involved. Then the DNA for a genuine (potential) organism would be a code sequence that told a coherent story of biological development, and all other DNA sequences would be gibberish. Ineffect, the biologists expected to be able to look at the DNA sequence of a tiger and see the bit that specified the stripes, the bit that specified the claws, and so on.
This was a bit optimistic. The current state of the art is that we can see the bit of DNA that specifies the protein from which claws are made, or the bits that make
Weitere Kostenlose Bücher