|Class Research Reveals Unexpected Discrepancies in Genomic Annotations
July 27, 2009
Contact: Bill Giduz
|Students in the genomics course were Max Win '10, Will DeLoache '09, Matt Lotz '11, Jay McNair '09, Laura Voss '10, Nick Carney '11, Mary Gearing '11, Peter Bakke '09, Pallavi Penumetcha '11, and Samantha Simpson '09.
By structuring his genomics laboratory methods course as collaborative research, Davidson College Professor Malcolm Campbell anticipated that he and his students would walk an unpredictable path of discovery. But never did he imagine that his first offering of this one-semester course might end up shaking the foundations of the science they explored!
But that's what happened. A paper published this week in the journal PLoS One by Campbell, Associate Professor of Mathematics Laurie Heyer, a Danish colleague, and the 10 undergraduates in his fall 2008 "Laboratory Methods in Genomics" course explains their discovery that computer-based annotations of genome sequences agree with each other less than 50% of the time.
PLoS One publishes primary research on science and medicine. According to an editor, it is extremely rare that undergraduate research appears in this international, peer-reviewed online journal. And it is even more rare that the journal publishes a genomics paper. But the consequences of the Davidson class's findings were recognized as significant enough to have major impact on the field.
"Laboratory Methods" is an upper level course in the college's genomics concentration, but it enrolls students from a variety of majors. It is a computer lab rather than a wet lab, teaching students to use computers to analyze genomes holistically and as individual genes or metabolic pathways. Rather than teach the course through a series of canned and predictable experiments, Campbell believed students would learn more effectively by conducting real and original laboratory research. So he arranged for his students to work with the Joint Genome Institute (JGI) to "annotate" the genome of the recently discovered microbe Halorhabdus utahensis.
"I am very excited about this new course," Campbell wrote prior to the semester. "Only 15 schools in the world get to participate in a pilot program run through JGI, and we are the only school taking on an entire genome solo. It will be a lot of fun to do real genomics research on a species which is poorly understood."
JGI is funded by the Department of Energy, and was interested in Halorhabdus utahensis because of its potential to generate biofuel. The microbe lives only in highly salty environments, such Utah's Great Salt Lake.
Campbell's students learned to use JGI software to analyze this previously unanalyzed microbe. JGI supplied the class with the microbe's genetic sequence-an uninterrupted string of about three million of the letters "G," "C," "A" and "T," which represent the four building blocks of DNA. Various combinations of those four letters identify specific genes, and genes are what control biological functions. The science of "bioinformatics" employs software to compare the combinations of letters of the unknown organism to known genes in other organisms. Once the Davidson team knew what genes were present in Halorhabdus utahensis, they could begin to answer questions such as how the gene reproduces, how it creates proteins, and where it gets its energy.
The class was enhanced by the presence of Campbell's faculty colleague and frequent faculty collaborator, Laurie Heyer, a bioinformatics specialist. In addition, four of the 10 students enrolled had taken Heyer's bioinformatics class the previous semester. Their expertise proved critical to the overall class success.
The "research-as-pedagogy" approach was unlike any Campbell had experienced in his 15 years of teaching at Davidson. "I wasn't in control," he said. "It was uncomfortable at times to not know where we were headed, but it was exciting at the same time to really be collaborating with students. I was learning as they were learning."
However, a month into the semester, things went awry. The Davidson team wasn't satisfied with the Halorhabdus utahensis genome annotation supplied by JGI, and decided to send the sequence to two other annotation services.
That seemingly inconsequential action turned the project in an entirely unexpected direction. As they looked at the three annotations side by side, the team noticed that they were not identical. In fact, less than half the genes were identical in all three annotations. Campbell was astounded. The annotations were different, and there was no way to know which one was biologically correct.
Annotating genomes of life forms has become a primary endeavor in biological science to develop medical treatments and new products in fields such as energy, ecology and evolution. The Davidson team's results could call into question results of countless annotations conducted since 1995, when the first scientific analysis of a complete genome was conducted. To check his suspicion about the significance of his class' discovery, Campbell asked colleagues in the genomics community if they had ever compared annotation services. He found no one who had done so. "Everyone just used one service to get one annotation and assumed it was correct," he said.
The discovery was akin to finding that different dictionaries spell the same word in different ways, or getting different results when running the same data through Excel and Lotus software. "Everyone sequencing genomes runs it through a service, and everyone assumes there's only one possible result," said Campbell. "But we showed the results are sometimes flawed, and showed how they're flawed."
Campbell hypothesized about the consequences of an incorrect annotation. He said, "A company might think it knows all the proteins a microbe makes, and start making drugs to alter ones that are pathogenic. But it might be that their annotation produced the wrong information, and find that it spent big bucks in a fruitless pursuit."
Campbell and the students then pursued means of developing more accurate and consistent results. The four students who had taken the bioinformatics course refined commonly used annotation software to give more consistent results. Other students studied specific genes and pathways that had been identified by the three different annotations, and figured out which annotation was likely to be the most biologically accurate.
To conclude the semester, Campbell assigned the students to write a paper about the overall research experience, and to highlight specific cases that illustrated how they resolved particular three-way contradictions. One eager and able student, Peter Baake '09, volunteered to edit all 10 papers into a single manuscript for possible publication. Campbell said Baake showed exemplary initiative, working on the article long after the course concluded, and even after his graduation. The final draft was submitted just days before Baake headed to the Yukon for training for a career in wilderness education.
Though the course didn't proceed as originally planned, Campbell is tremendously proud of the results. "Major research institutes around the world have been ignorant of this problem, and 10 undergrads at Davidson discovered it."
He also said the results validates the use of research-based undergraduate courses. "The students demonstrated that if faculty give students assignments that are real, the class gets absorbed by the intellectual challenge and students do beautiful work," Campbell said. "Our students rose to the occasion. They hit a wall, figured out how to get around it, and in less than a year produced a paper that got published in a very good journal."
He concluded, "If we had stuck with the original goal, we might have characterized the genome of one microbe. But we ended up making a big impact in the whole field of genome annotation. It's a great lesson for students in how science can change directions, and how temporary frustrations can lead to new discoveries."
Davidson is a highly selective independent liberal arts college for 1,800 students located 20 minutes north of Charlotte in Davidson, N.C. Since its establishment in 1837 by Presbyterians, the college has graduated 23 Rhodes Scholars and is consistently regarded as one of the top liberal arts colleges in the country. Through The Davidson Trust, the college became the first liberal arts institution in the nation to replace loans with grants in all financial aid packages, giving all students the opportunity to graduate debt-free. Davidson competes in NCAA athletics at the Division I level, and a longstanding Honor Code is central to student life at the college.