Closing The Gap On Memory Performance For The Development Of Genetic Data Analysis Software
Dedicated to enhancing the health of children, the Research Institute at Nationwide Children's Hospital engages in high quality, cutting-edge research. Nationwide Children's Hospital responds to the healthcare needs of children and their families through their acclaimed research programs which have included ongoing initiatives to improve the quality of the lives of children they serve.
The Research Institute at Nationwide Children's Hospital is a subsidiary of Nationwide Children's Hospital of Columbus, Ohio, and is recognized as one of the nation's ten largest free-standing pediatric research centers.
An important, strategic goal established by Dr. Veronica Vieland of the Research Institute at Nationwide Children’s Hospital was to develop an in-house software package to implement statistical genetics methodologies developed within the Institute. The software could then be used to assist with other data analyses.
Dr. Vieland and her group first developed a novel class of Quasi-Bayesian models. The next step was to expand the set of genetic features handled by the underlying likelihoods on an ongoing basis.
A major goal of the computing group became to develop the high-performance computational approaches required for the application of these statistical methods. Ultimately, the group wanted to produce a comprehensive genetic data analysis program with user-friendly interfaces to facilitate statistical genetic research that will lead to identification and localization of disease susceptible genes.
The Research Institute at Nationwide Children’s Hospital wanted to keep the ongoing costs associated with this project reasonable. Having already experienced the high cost of maintaining heavily customized software to run parallel computational research, the Institute realized they wanted to utilize a system developed from commodity off-the-shelf hardware that could handle the memory demands of their jobs. They also wanted to keep the system simple and easy enough for non-scientific members of their staff to understand. All of that meant investing in a system specifically designed for their unique requirements.
The Research Institute originally selected a 32-node Apex cluster, utilizing Pinnacle 10D200 servers. However, after getting their cluster up and running, they noticed almost immediately that there were times when as many as 1,000 jobs were cued up at once. Projects were taking as long as one week to finish running, creating a constant backlog in the cue. Having multiple projects, coupled with the time required to finish those projects and the memory demands of their jobs, they began to realize that they had already run out of resources. They needed to upgrade their cluster to keep their jobs running optimally.
Following an intensive consultation with Advanced Clustering Technologies’ technical staff, the Institute needed to decide if they should purchase a new cluster or simply add to their existing cluster. With all things considered, they felt confident that adding 32 Pinnacle 1O200 nodes and increasing memory to their existing Apex cluster was the way to go.
Since their upgrade, the project has been a success. Through the development of their software package, the Research Institute’s data analyses on various projects have identified regions of interests for different diseases — in some cases, a functional single nucleotide polymorphism (SNP).
“What we ended up with were the necessary components that met the demands of our computationally intensive and memory demanding data analyses,“ says Huang. “With our new cluster we have found the ease and flexibility of job management to be most helpful. Through its performance, our cluster has aided our research by shortening project completion time and speeding up research findings.”
Confident that their Apex cluster is performing according to their expectations, the Research Institute already has plans to move into the next phase of their project — to fine tune the performance of their newly-developed software and add more algorithms as options. As the amount of available genetic information grows dramatically, the Research Institute must constantly find ways to handle the increasing volume of data and make computation tractable. Their Apex cluster is the tool that will enable them to continuously improve upon their ability to conduct this highly-specialized statistical genetics research.