Nei said that for many years he has suspected that the statistical methods were faulty. "The methods assume that when natural selection occurs the number of nucleotide substitutions that lead to changes in amino acids is significantly higher than the number of nucleotide substitutions that do not result in amino acid changes," he said. "But this assumption may be wrong. Actually, the majority of amino acid substitutions do not lead to functional changes, and the adaptive change of a protein often occurs by a rare amino acid substitution. For this reason, statistical methods may give erroneous conclusions." Nei also believes that the methods are inaccurate when the number of nucleotide substitutions observed is small.
To demonstrate the faultiness of the statistical methods, Nei's team compiled data collected by their Emory University colleague, Shozo Yokoyama, on the genes that control the abilities of fish to see light at different water depths and on the genes that control color vision in a variety of animals. The team used these data to compare statistically predicted sites of natural selection with experimentally determined sites. They found that the statistical methods rarely predicted the actual sites of natural selection, which had been identified by Yokoyama through experiments. "In some cases, statistical method completely failed to identify the true sites where natural selection occurred," said Nei. "This particular exercise demonstrated the difficulty with which statistical methods are able to detect natural selection."
To demonstrate how small sample sizes can lead to incorrect results, the team used computer simulations to examine the evolution of genes in three primates: humans, chimpanzees, and macaques. The scientists mimicked the procedures used by the authors of a 2007 paper, which applied the branch-site method to 14,000 orthologous genes -- genes that are genealogically identical among different species -- and which found that the method predicted selection in 32 of the genes. Nei and his team also studied selection using Fisher's exact test, but this test did not detect any selection. "The results indicate that the number of nucleotide substitutions that occurred were too small to detect any selection; therefore, all of the 32 cases obtained by the branch-site method must be false positives," said Nozawa.
Reliabilities of identifying positive selection by the branch-site and the site-prediction methods
Masafumi Nozawa et al.
Natural selection operating in protein-coding genes is often studied by examining the ratio (ω) of the rates of nonsynonymous to synonymous nucleotide substitution. The branch-site method (BSM) based on a likelihood ratio test is one of such tests to detect positive selection for a predetermined branch of a phylogenetic tree. However, because the number of nucleotide substitutions involved is often very small, we conducted a computer simulation to examine the reliability of BSM in comparison with the small-sample method (SSM) based on Fisher's exact test. The results indicate that BSM often generates false positives compared with SSM when the number of nucleotide substitutions is ≈80 or smaller. Because the ω value is also used for predicting positively selected sites, we examined the reliabilities of the site-prediction methods, using nucleotide sequence data for the dim-light and color vision genes in vertebrates. The results showed that the site-prediction methods have a low probability of identifying functional changes of amino acids experimentally determined and often falsely identify other sites where amino acid substitutions are unlikely to be important. This low rate of predictability occurs because most of the current statistical methods are designed to identify codon sites with high ω values, which may not have anything to do with functional changes. The codon sites showing functional changes generally do not show a high ω value. To understand adaptive evolution, some form of experimental confirmation is necessary.