A Quarterly Newsletter from GeneInsightIssue 2: March 2014
We are pleased to share the Winter/Spring 2014 edition of our newsletter, GeneInsights. In this edition, we feature an editorial that examines the Network Approach to Genomic Interpretation and highlight the RBM20 gene as part of our ongoing Featured Assessment series.
We hope you will enjoy the latest issue of GeneInsights. Happy Spring!
Network Approach to Genomic Interpretation
This editorial highlights clinical challenges involved in the interpretation of genetic variants, and illuminates the movement in the clinical genomics community to address these challenges through data sharing and collaboration. We also discuss many of the risks, both real and perceived, associated with community-based data sharing models and our perspective on how these risks can be mitigated / navigated.
Many of the challenges involved in the clinical interpretation of variants are driven by the absence of clinically validated sources of content for labs to leverage during the variant interpretation process. While there are a number of commercially available databases and curation services, they often lack the level of validation required for clinical interpretation.
Characterizing novel variants to assess which variants are responsible for causing disease may require analysis of large populations in order to achieve strong statistical evidence for pathogenicity. What can be considered a single data point in isolation may become actionable content when pooled with information from other genetic testing laboratories. In fact, the rare occurrence of many variants means that no single laboratory will ever be able to characterize every genetic variant.
When laboratories encounter a new variant, they are particularly interested in understanding whether the variant has ever been seen before – more often than not this will require them to look outside of their existing laboratory. Heidi Rehm, PhD, Chief Laboratory Director, Partners’ Laboratory for Molecular Medicine (LMM) has noted that in her experience at the LMM “68% of pathogenic or likely pathogenic variants have only been seen once and 96% of variants have only been seen less than 10 times.” Pursuing strategies that help to increase sample size and pool content where appropriate will help elucidate clinical significance.
Faced with these challenges, it is clear that sharing of genomic data is not only useful to understand the genetic basis of disease, it is necessary.
Crowdsourcing can be a highly effective approach to solve complex problems particularly when the relevant expertise is widely distributed. A networked, collaborative approach to determine the significance of genetic variants through structured ‘crowdsourcing’ requires infrastructure, ideally tightly connected distributed infrastructure, to effectively enable labs to share genomic information with each other. Once such an infrastructure is in place, more clinical data and resources are accessible to geneticists and genetic counselors responsible for drafting clinical reports. More information may lead to higher quality interpretations that have potential to improve patient care.
The GeneInsight Network® facilitates data sharing and collaboration among clinical laboratories and has enabled the development of a clinical consortium focused on a ‘share and share alike’ networking philosophy; this network includes Partners HealthCare’s Laboratory for Molecular Medicine (LMM), ARUP Laboratories, and Mount Sinai Genetic Testing Laboratory at Icahn School of Medicine in New York. Participating members leverage the GeneInsight knowledgebase infrastructure which can be federated with other laboratories’ knowledge repositories to enable de-identified variant level sharing.
Lisa Edelman, Associate Professor and Director of Mount Sinai Genetic Testing Laboratory at Icahn School of Medicine in New York, a founding member of the network, said “Sharing data in real time is critically important as our technical capabilities continue to grow. We are entering the age of clinical genomic testing and collaboration around variant interpretation is imperative to support quality care of patients.”
This network is committed to sharing and pooling knowledge, specifically variant level data interpreted in a translational or clinical setting, as a means to characterize the vast amount of genomic variation within and among populations. While the network members are starting with variant sharing they will presumably move to sharing de-identified case level information. However, data contributed to the network will not include any patient identifying information, i.e. Protected Health Information (PHI).
This type of real time structured sharing environment can serve as a staging area for laboratories open to further sharing their data with ClinVar, a NIH funded initiative to develop a robust, public, annotated database of genotype-phenotype information. The network enabled by GeneInsight infrastructure is synergistic with this effort and network members are committed to supporting ClinVar by facilitating the process of submitting well structured data to the ClinVar database.
While many members of the broader genomics community support the idea of networked infrastructure and ‘crowdsourced’ genomic interpretations, there are some who have expressed concerns over potential participation in such a model. These concerns are important to mention as part of this discussion, and we express our thoughts below.
Microattributions: Concern over the ability to cite microattributions, i.e. giving credit and priority to the original source of data, when leveraging data sourced from a network. Labs often note the difficulties of acknowledging individual contributions to the generation of data in the context of an overall interpretation of a genetic test result and as part of subsequent publications.
: To be sure, there are certainly technical challenges associated with tracking contributions as knowledge gets consumed and propagated through the system. However, the larger question being raised seems to center on the following: at what point do you ‘take control’ of the data?
When considering the question of control, it is important to realize that every day when clinicians and geneticists sign out a report, using various data sources or evidence to draw clinical conclusions, they take on the responsibility for the clinical interpretation of the patient’s condition. If you take on the responsibility, you inherently take control and accountability over that interpretation and related evidence.
Additionally, private networks can actually enable mechanisms to connect with other laboratories in order to co-author publications where their data has been leveraged. Data sharing through this mechanism can facilitate clinical and academic collaborations that enable members to leverage new information and connections in a different way.
Loss of Competitive Advantage: Concern related to potential adverse commercial implications and perceived risk of losing competitive advantage if they share their data with other clinical laboratories.
: Share and share alike networks may actually serve to enhance not diminish competitive advantage for participating members through higher quality interpretations that may ultimately lead to better patient care. The clinical genomic community is moving toward more data sharing, and it will be difficult for laboratories that do not participate in share and share alike networks to compete effectively. The community as a whole is likely to reward those laboratories willing to participate and make their data available.
Additionally, there may be quality related issues associated with hoarding data since you cannot cross check your results against other interpretations. Labs that are not cross-checking are taking on additional risk. Ultimately, the goal of any clinical laboratory is to provide the highest quality and most accurate interpretations to ensure the highest level of clinical care for its patients.
Although some participants in the network enabled by GeneInsight just started sharing data at the end of 2013, they have already experienced the benefits associated with data sharing and collaboration. Recently, an FBN1 variant of unknown significance was re-classified to benign when data accumulated by ARUP Laboratories and the LMM was merged, providing both laboratories with more information and thus a stronger argument that the variant in question was not disease-causing.
Inequity between ‘takers’ and ‘givers’: Concern that there may be network participants who join with the intent to take data from the system, develop their own test, and leave the network.
: The benefit of a share and share alike network is the ability to put safeguards and audit trails in place to make sure that there is fairness and professional courtesy between contributors. A major goal of the network enabled by GeneInsight is to continuously learn what works well and what doesn’t and develop guidelines that articulate good etiquette around data sharing. The network members collectively review applications from laboratories that wish to participate in order to vet potential participants before they join and reduce the risk of allowing in labs unwilling to participate in accordance with network policy.
Laboratories who decide to leave the network and no longer contribute data are only allowed to take information on the variants they have manually pulled across to establish their own interpretations. Furthermore, they only capture a static understanding of a body of knowledge that is known to evolve, often dramatically, over time.
Patient Privacy: Given the charged environment around patient privacy, there are understandably real concerns regarding efforts to share genetic information in a collaborative forum.
As previously mentioned, laboratories participating in the network enabled by GeneInsight will be limited to sharing data at the variant level, although there may be opportunities in the future to further enrich the variant data with de-identified case level information. Contributing labs share the responsibility to ensure any descriptive text or evidence fields associated with a shared genetic variant are free of Protected Health Information (PHI), specifically the 18 elements identified by HIPAA as identifiable.
The number of variants interrogated in the context of a clinical case is relatively small; however these variants represent the most valuable subset of information to share with participating laboratories.
In a recent article in Genetics in Medicine titled “Mining the Genome”, author Barbara Evans PhD, JD, draws a strong comparison between the value that modern medicine can extract from the human genome to the value that many “extractive industries” (i.e. mining, fishing, energy) reap from existing natural resources. Core to this point of comparison is the need for extractive industries to invest in and develop the necessary infrastructure required to unearth, refine, and process raw materials into a form that is available and accessible to the consumer. With respect to genetic information, there is an overwhelming need for IT infrastructure to help identify and communicate genetic discoveries to the end users, primarily clinicians and patients, in a manner that confers clinical utility and actionability.
Dr. Evans goes on to make an argument that a core principle of resource economics can also be extended to the field of genetics. This principle involves the dynamics of multiple “operators” working with a common pool of resources and the recognition that competitive efforts generally result in undue waste, while collaboration and coordination of efforts maximizes output and overall value. As the field of medical genetics continues to expand with an increasing number of institutions actively identifying, studying, and annotating new genetic variants, networking infrastructure and data sharing models are essential to prevent the growth of siloed and fragmented data resources.
Through the adoption of new data sharing strategies, the goal is to reduce “waste” and limit the time required to mine and extract raw data from the human genome and translate this information into new insights with real clinical impact.
Our Featured Assessment Series is intended to highlight the assessment of a specific gene or variant as part of an ongoing series included in the GeneInsights newsletter. This segment will review the process of assigning significance to a variant or gene in the clinical and/or research setting.
Reviewing and assessing the available evidence for the classification of a DNA variant identified during clinical testing has been standard practice in clinical genetic testing laboratories for many years. During this process, various levels of evidence must be met to determine if the variant is causative of disease (pathogenic).
For many years, sequencing technologies were expensive and therefore limited the number of genes that could be analyzed simultaneously. Consequently, gene panels were typically limited to genes that were well-researched and contributed a large fraction of the detection rate. Next generation sequencing has allowed for expanded gene panels and even exome/genome sequencing (ES/GS), such that it is now possible to cover genes that have limited information associating them to a disease.
It is now more critical than ever to ensure that each gene included on a targeted panel, or in which a variant is identified by ES/GS, is thoroughly assessed to determine if it is in fact associated to one or more diseases and the strength of that evidence. Without a complete understanding of the gene-disease relationship, it is not possible to truly assess the classification of any variant identified in that gene.
When assessing a gene, many pieces of evidence must be collected and reviewed, including:
- the phenotype of individuals carrying variant(s) in this gene (in the literature, research studies, or clinical labs)
- the evidence supporting the variants identified in affected individuals
- the location, function, expression, and interactions of the gene/protein (if known)
- in vitro studies and animal models
Gene: RBM20 (RNA-Binding Motif Protein 20)
During the systematic review of potential genes to be added to the first version of the Laboratory for Molecular Medicine’s expanded NGS panel for cardiomyopathy (the Pan Cardiomyopathy Panel, launched in July 2011), the RBM20 gene was assessed. The RBM20 gene was initially implicated in familial dilated cardiomyopathy (DCM) by Brauch and colleagues in 2009 after performing linkage studies on 2 large multi-generational families with DCM. This resulted in the identification of 2 missense variants in exon 9.
Further study in a larger cohort identified additional variants that all clustered in exon 9, specifically the RS domain. All of these variants segregated with disease and were not identified in controls. This association was further examined by Li and colleagues in 2010, identifying both novel and previously reported variants (Brauch 2009) in the same region of exon 9 in a different cohort of individuals with DCM. More recent studies provide further support for this association (Guo 2012, Rafeet 2012).
While further study is needed, variants in this particular region of RBM20 appear to be associated with an earlier age of onset, high penetrance, end-stage heart failure and a high mortality.
While only a handful of variants had been identified in this gene, the strength of evidence for those variants and the available studies on this gene all supported that this gene would be an important addition to our newest gene panel. Today, after 3 years of testing, our laboratory has identified 3 clinically significant variants (classified as pathogenic or likely pathogenic) in this hotspot region of exon 9 in 6 families with DCM; 2 of these families had been previously tested by our laboratory and were negative for any disease-causing variants. All 3 of these variants had been previously reported (Brauch 2009, Li 2010), further supporting that the RS region of exon 9 is likely a hotspot of variation. In addition, the phenotype in these families is consistent with those described previously, which emphasizes the importance and significance of identifying families with disease-causing variation in RBM20 so that the proper management and screening can be implemented.
Brauch KM, Karst ML, Herron KJ, de Andrade M, Pellikka PA, Rodeheffer RJ, Michels VV, Olson TM. 2009. Mutations in ribonucleic acid binding protein gene cause familial dilated cardiomyopathy. J. Am. Coll. Cardiol. 54(10):930-41.
Guo W, Schafer S, Greaser ML, Radke MH, Liss M, Govindarajan T, Maatz H, Schulz H, Li S, Parrish AM, Dauksaite V, Vakeel P, Klaassen S, Gerull B, Thierfelder L, Regitz-Zagrosek V, Hacker TA, Saupe KW, Dec GW, Ellinor PT, MacRae CA, Spallek B, Fischer R, Perrot A, Özcelik C, Saar K, Hubner N, Gotthardt M. 2012. RBM20, a gene for hereditary cardiomyopathy, regulates titin splicing. Nat. Med. 18(5):766-73.
Li D, Morales A, Gonzalez-Quintana J, Norton N, Siegfried JD, Hofmeyer M, Hershberger RE. 2010. Identification of novel mutations in RBM20 in patients with dilated cardiomyopathy. Clin Transl Sci. 3(3):90-7.
Linke WA, Bücker S. 2012. King of hearts: a splicing factor rules cardiac proteins. Nat. Med. 18(5):660-1.
Refaat MM, Lubitz SA, Makino S, Islam Z, Frangiskakis JM, Mehdi H, Gutmann R, Zhang ML, Bloom HL, MacRae CA, Dudley SC, Shalaby AA, Weiss R, McNamara DM, London B, Ellinor PT. 2012. Genetic variation in the alternative splicing regulator RBM20 is associated with dilated cardiomyopathy. Heart Rhythm. 9(3):390-6.
Wells QS, Becker JR, Su YR, Mosley JD, Weeke P, D’Aoust L, Ausborn NL, Ramirez AH, Pfotenhauer JP, Naftilan AJ, Markham L, Exil V, Roden DM, Hong CC. 2013. Whole Exome Sequencing Identifies a Causal RBM20 Mutation in a Large Pedigree with Familial Dilated Cardiomyopathy. Circ Cardiovasc Genet.
GeneInsight in the News
GeneInsight team member, Matthew Lebo, PhD, was recently profiled in GenomeWeb.
GeneInsight will be exhibiting at the conferences below in 2014.
American College of Medical Genetics and Genomics (ACMG) Annual Clinical Genetics Meeting
March 25-29, 2014, Nashville, TN