The following nomenclature may be used for the description of both Regulatory Fusions and Chimeric Transcript Fusions in the context of Categorical Gene Fusions or Assayed Gene Fusions as applicable. The nomenclature components are organized into three categories: Gene Components, Transcript Sequence Components, and Regulatory Nomenclature. These may be used interchangeably, in accordance with the below General Rules.
All components are joined together by the double-colon (
::) operator. Additional rules apply for sub-components of Regulatory Nomenclature.
When describing Chimeric Transcript Fusions, structural components are ordered in 5’ to 3’ orientation with respect to the transcribed gene product.
When describing Regulatory Fusions, the regulatory element is indicated first (e.g. reg_e@GATA2::EVI1).
When describing Chimeric Transcript Fusions by Junction Components (in lieu of full Transcript Segment Components), the 5’ fusion partner junction must be the first component, and the 3’ fusion partner junction must be the last component.
Throughout the nomenclature components, some information may be provided optionally. In these cases, the optional text is colored orange and may be omitted.
Some fusions are inferred from an assayed genomic rearrangement, typically in the context of a phenotypic presentation that is associated with the inferred gene fusion event. In these cases, the nomenclature may indicate that the fusion was inferred through the use of parentheticals surrounding the double-colon operator (shown in red):
<Gene Symbol>(::)<Gene Symbol>
An example of this is provided in the Unknown Gene Component section.
Gene components are used in coarse representation of gene fusions by constituent gene partners, and are generally aligned with previous recommendations on gene-gene fusion nomenclature as provided by HGNC [Bruford2021], with attention paid to additional considerations discussed in our response to this publication [Wagner2021]. The most common of these is the Specific Gene Component, which is complemented by the Multiple Possible Gene Component (for Categorical Gene Fusions) and the Unknown Gene Component (for Assayed Gene Fusions).
Specific Gene Component
The syntax for a specific gene is as follows:
First use of a gene in a document: <Gene Symbol>(<Gene ID>)
Subsequent use in a document: <Gene Symbol>(<Gene ID>)
An example fusion using two Specific Gene Components:
Unknown Gene Component
The syntax for an unknown (typically inferred) gene component (used for Assayed Gene Fusions) is a
An example fusion using an unknown gene component may be inferred from an ALK break-apart assay:
Multiple Possible Gene Component
The syntax for a multiple possible gene component (used for Categorical Gene Fusions) is a
Transcript Sequence Components
Transcript sequence components are used in precise representation of gene fusions by sequence representations, and are designed for compatibility with the HUGO Gene Variation Society (HGVS) variant nomenclature. Primary among these components is the Transcript Segment Component, and the closely-related 5’ and 3’ Junction Components. Additional components are used to represent intervening sequences, provided as a stand-alone literal sequence (Linker Sequence Component) or as a sequence derived from a Genomic Location (Templated Linker Sequence Component).
Transcript Segment Component
The Transcript Segment Component explicitly describes a segment transcript sequence by start and end exons, and is represented using the following syntax:
<Transcript ID>(<Gene Symbol>):e.<start exon><+/- offset>_<end exon><+/- offset>
Offsets, if omitted, indicate that there is no offset from the segment boundary (which is often the case in gene fusions). For a full description on the use of exon coordinates and offsets, see Structural Elements.
Transcript segment components would be used, for example, to represent COSMIC Fusion 165 (COSF165) under the gene fusion nomenclature as follows:
The 5’ and 3’ Junction Components represent only 5’ and 3’ junction locations, respectively, for Chimeric Transcript Fusions. These components contrast with the Transcript Segment Component which represents a full segment. As noted in the General Rules, these components must be used only as the beginning or ending components, respectively, for a fusion.
The syntax for these components follows:
5’ Junction Component: <Transcript ID>(<Gene Symbol>):e.<end exon><+/- offset>
3’ Junction Component: <Transcript ID>(<Gene Symbol>):e.<start exon><+/- offset>
Optional use of offsets have the same meaning as in the Transcript Segment Component.
In the description of gene fusions, at most one regulatory element component may be used to describe the fusion, and it must be designated first (see General Rules). However, regulatory components are complex data objects themselves, and may be comprised of multiple subcomponents which collectively describe the regulatory element of interest. This section specifies the nomenclature for defining regulatory elements, which may be used as a component in the broader description of Regulatory Fusions.
Every regulatory element component begins with a description of the regulatory element class, which is typically an
enhancer or promoter. This is designated as
reg_p, respectively. In rare cases, it may be
necessary to represent other classes of regulatory elements within the INSDC regulatory class vocabulary, which
may be specified using this syntax by appending the regulatory class name to
reg_ as applicable (e.g.
Feature ID subcomponent
A regulatory element may be described by reference to a registered identifier, such as the registered cis-regulatory elements from ENCODE. These are represented using the syntax:
An example registered enhancer element is reg_e_EH38E1516972.
Only one of a Feature ID OR a Feature location subcomponent may be specified.
Feature location subcomponent
A regulatory element may be described by reference to a Genomic Location. These are represented using the syntax:
<Chromosome ID>(chr <1-22, X, Y>):g.<start coordinate>_<end coordinate>
Only one of a Feature Location OR a Feature ID subcomponent may be specified.
Associated gene subcomponent
A regulatory element may also be described by reference to an associated gene. An associated gene is represented using the syntax:
First use of a gene in a document: @<associated gene symbol>(<associated gene ID>)
Subsequent use in a document: @<associated gene symbol>(<associated gene ID>)
An associated gene may be indicated in addition to, or in lieu of, a Feature ID subcomponent or Feature location subcomponent. If representing a regulatory element without an associated feature ID or feature location subcomponent, an associated gene subcomponent MUST be used. The associated gene subcomponent is always placed at the end of the regulatory element description.
Bruford EA, et al., HUGO Gene Nomenclature Committee (HGNC) recommendations for the designation of gene fusions. Leukemia (October 2021). doi:10.1038/s41375-021-01436-6
Wagner AH, et al., Recommendations for future extensions to the HGNC gene fusion nomenclature. Leukemia (December 2021). doi.org/10.1038/s41375-021-01493-x