The myNEO ImmunoEngine neoantigen prediction platform incorporates years of development ensuring confident identification of as many antigens as possible. The variety of filters/ranking mechanisms based on biological processes retain the neoantigen targets providing the most clinical benefit in terms of response strength, duration (memory induction), breadth, and overall survival.
The advanced features are divided amongst three modules, as per their purpose:
Tumour-specific variant: to identify all the tumour-specific alterations with high confidence based on the sequencing data of the tumour
Tumour-specific surface peptide: to predict which of the identified alterations will lead to changes in the presented antigen repertoire on the surface of the tumour cells.
Immunogenicity response: to evaluate the immunogenic impact of a therapy against the selected peptides
A non-synonymous Single Nucleotide Variant (SNV) occurs when a genomic event alters one nucleotide via mutation, causing a different codon coding for a different amino acid in the resulting translated peptide. In tumours with a high mutational burden, somatic SNVs can occur up to 10 mutations per megabase. Since SNVs are a highly important source of cancer mutations, myNEO’s pipeline has already been optimised to detect SNVs with a high precision and sensitivity.
An indel is caused by insertion or deletion of a few amino acids, potentially causing a frameshift and thus altering all subsequent codons in the transcript. This event occurs less frequently than SNVs, but is responsible for a large share of the final list of mutant peptides.
Chromosomal rearrangement events are frequent in cancer and can result in gene fusions, where parts of two different genes find themselves joined. On average 34 gene fusions can be detected in a tumour sample, depending on the tumour type. These gene fusions can result in chimeric proteins and are thus an interesting source of neoantigens, which the myNEO pipeline takes into account.
Alternative splicing events are common in cancer, and result in unusual transcripts (‘neoisoforms’) coming to expression. These novel isoforms can be caused by a variety of events in the tumour, such as intron retention or exon skipping. In addition to potentially contributing to the oncogenic process, neoisoforms constitute a potential source for targetable neoantigens. myNEO utilises its own set of tools to identify these neoisoforms and the neoantigens that arise from it.
Transposable elements (TEs) are DNA sequences that have the ability to change their position within a genome where they can alter the gene expression or lead to the synthesis of chimeric proteins. The disruption of cellular control mechanisms during cancer, i.e. TE activation through DNA demethylation, facilitates novel insertion events of transposable elements. As such, the myNEO ImmunoEngine includes detection capabilities for TE insertion events and their potentially derived neoantigens.
Neoantigens can also derive from non-canonical or cryptic peptides, including those translated from alternative open reading frames, novel exon-exon junctions, intronic sequences and 5′ untranslated regions (5’UTR). Since these non-canonical peptides are recognised by the anti-tumour cytotoxic T-lymphocytes, T-cell-mediated surveillance of the integrity of the genome thus extends to some intronic regions. These ncRNA mutations are therefore also taken into account.
In recent years, the paradigm that genomic abnormalities in cancer cells arise through progressive accumulation of mutational events has been challenged by the discovery of single catastrophic events (chromoanagenesis). Examples of these events are chromothripsis (chromosomal breaks at multiple points followed by random reassembly), chromoanasynthesis (rearrangements due to defective DNA replication), and chromoplexy (accumulation of linked translocations involving multiple chromosomes). Due to the high impact of these events, it is important to take their occurrence into account when considering neoantigens.
Tumours have been shown to alter their peptide presentation machinery, causing a different antigen repertoire to be presented than expected. As an example, defects in the antigen-processing machinery are often caused by deficiency of the peptide transporter TAP. In these tumours, certain antigens (TEIPP antigens) are selectively presented.
RNA editing is a post-transcriptional process that modifies the primary RNA and microRNA transcripts. This process can result in nonsynonymous protein coding substitutions or alternative splicing, and thus alters the antigen repertoire of the cell. Interestingly, deregulated RNA editing contributes to cancers, and over-editing causes tumour-associated self-antigens. It has been shown that T-cells with cytotoxic reactivity against RNA-edited peptides are physiologically present in cancer tissue and thus in patients without evidence of severe side effects.
Phosphorylation is a post-translational modification on serine, threonine or tyrosine amino acids within a peptide, often observed on regulatory proteins. Deregulation of signalling pathways is a hallmark of malignant transformation, and the amount of phosphopeptides present is thus a measure of the aggressiveness and malignancy of the tumour sample. Phosphopeptides can serve as targetable antigens, due to their prevalence in tumour samples. These modified peptides can be presented on MHC I and II molecules and specifically be recognized by T-cells. The research group of Cobbold (2013) observed that CD8+ T-cell lines specific for these phosphopeptides recognized and killed leukaemia cell lines.
Other types of post-translational modifications (e.g. methyl, disaccharide, and N-linked GlcNAc groups) have demonstrated similar characteristics.
DNA copy number variants (CNV), as one of the types of DNA mutations, have been associated with various human cancers. CNVs vary in size from 1 bp up to one complete chromosome arm, and can cause aberrant patterns to be seen when considering the allelic frequency (and clonality by extension) of predicted variants.
Transcription is the process by which the information in a strand of DNA is copied into a new molecule of messenger RNA (mRNA). Therefore, sequencing of mRNA can be used to infer expression across the transcriptome. In order to retain only expressed somatic mutations, the final set of myNEO variants is obtained by interfering both the variants detected in the tumour DNA with those detected on RNA level.
The variant allelic frequency (VAF) can be defined as the fraction of reads observed matching a specific variant allele. Low VAFs may suggest low tumour purity or the presence of subclonal populations of cancer cells (commonly known as tumour heterogeneity). To ensure the identification of mutations that are present in a large fraction of the cancer cells, mutations with high VAFs are prioritised in myNEO’s pipeline.
To maximise the true positive fraction in the variant calling analysis, an ensemble approach is taken, combined with custom filters based on benchmarking studies. This ensures that only high confidence alterations are kept for further investigation.
Not every mutation results in an aberrant protein with altered functionality. Therefore, myNEO’s pipeline assesses the impact of one mutation on the normal protein function. To do so, only nonsynonymous mutations resulting in missense variants are considered. In addition, the resulting amino acid substitution is annotated and evaluated for functional impact.
Large datasets containing mutations detected in a wide variety of cancer types are readily available. Using these databases, myNEO’s pipeline checks if a mutation has already been detected before. In addition, databases are used to evaluate if a mutation affects a known cancer driver gene.
Somatic mutations are genetic alterations acquired by cancer cells, that have not been inherited and thus absent from all other cells. It is important that only somatic mutations are considered, as these mutations result in peptides that are unique to the tumour, and absent from all healthy tissue. Autoimmune reactions are often caused if the mutation is present in a healthy cell's genome as well (germ line mutation).
Mutagens are chemical compounds (such as tobacco smoke) or forms of radiation (such as ultraviolet light or X-rays) that cause irreversible mutations in DNA. These mutagens tend to alter the DNA in a specific manner, resulting in a unique mutational signature. Based on the similarity between the mutational profile of the tumour to a known mutational signature of mutagen, mutations can be prioritised. For example, a confirmed UV-signature has a high percentage of C→T mutations. Therefore, C→T mutations are prioritised because they are less likely to be sequencing artefacts.
Cancer progression is an evolutionary process driven by stepwise, somatic cell mutations with sequential, sub-clonal selection. Therefore, a tumour does not consist of only one cancer cell genotype, but rather comprises a multitude of different cancer cell subpopulations. Mutations that are present in the majority of the cancer cells are called clonal. Subclonal mutations, on the other hand, can only be found in a small subpopulation of the cancer cells. In order to target most of the cancer cells and decrease the odds of tumour escape, clonal mutations are prioritised in the myNEO pipeline.
The efficiency at which mRNA is translated to proteins depends on multiple factors. These include specific mRNA signatures and the pool of translation components like tRNAs, mRNAs and translational factors. For vaccine design, we consider mRNA sequence, i.e. the choice of codons, to optimise thermodynamic stability and translational control of mRNA molecules. The efficiency at which mRNA is translated to proteins depends on multiple factors. These include specific mRNA signatures and the pool of translation components like tRNAs, mRNAs and translational factors. For vaccine design, we consider mRNA sequence, i.e. the choice of codons, to optimise thermodynamic stability and translational control of mRNA molecules.
Another determining factor of peptide abundance in the neoantigen repertoire, is the turnaround time of the originating protein. Longer-lived proteins have a higher likelihood of being picked up by the proteasome and being processed and presented on the tumour surface bound to MHC molecules.
An essential step in peptide presentation, is the degradation of the originating protein by the (immuno) proteasome, through proteasomal cleavage sites. Modelling the probability of proteasomal degradation could thus lead to an improved prediction of the final antigen repertoire on the tumour.
After proteasomal degradation, the peptides are transported to the cell surface to be presented. Modelling this transportation process could lead to an extra factor to consider when determining the final antigen repertoire.
The transporter associated with antigen processing (TAP) protein complex plays an important role in the transport of peptide fragments to MHC class I molecules. Thus, when considering the probability that a peptide will be present in the MHC-I antigen repertoire, binding of the peptide to the TAP protein complex should be taken into account.
The largest contributing factor whether a peptide will be present in the antigen repertoire of a patient, is the binding affinity of that peptide to the MHC-molecules of that specific patient. The myNEO ImmunoEngine performs assessment of both alleles of the MHC-I and MHC-II genes and predicts neoantigen-MHC affinity.
It has been postulated that the presence of a peptide in the antigen repertoire would not only depend on its affinity to the MHC molecules, but also to the stability of that binding. Also, the neoantigen-MHC stability has been shown to correlate with immunogenicity of the neoantigen.
myNEO has developed a cross-patient employable deep-learning algorithm that predicts the probability of peptide presence on the tumour surface in a wholistic approach, without propagating uncertainty by approximating all individual steps involved in antigen presentation. This model, conveniently named the neoMS algorithm, is based on patterns detected in over 2.5 million mass spectrometry ligandome data. By training a neural network on MS data representative of the complete antigen presentation process (including peptide processing, peptide transportation and localization, and MHC binding), it predicts a more complete process than its counterparts using biochemical data based on just one of these steps (mostly the MHC-binding affinity). Also, it offers an alternative to per-patient mass spectrometry validation experiments.
When selecting a preferred peptide (neoantigen) for vaccine production, it is of utmost importance that this peptide is only expressed in tumour tissue. Presence of this peptide on a normal cell can either cause autotoxicity responses after vaccination (i.e. the immune system attacks its own cells) or render the vaccine non-immunogenic due to central tolerance mechanisms. To avoid these undesired effects, efforts are taken to verify that the selected epitopes are not present in the patient’s normal peptidome.
Molecular mimicry involves the cross-activation of T- or B-cells against tumour neoantigens that show high similarity with microbial-derived peptides. As the microbial-derived peptides are able to elicit a strong anti-cancer immune response due to cross-reactive memory T-cells, this molecular mimicry opens new opportunities for the identification of bacterial antigens resembling tumour neoantigens, or vice versa, for the development of personalised cancer vaccines.
A promising feature that is incorporated in the myNEO pipeline, is the assessment of neopeptide immunogenicity. Although the dissimilarity of neopeptides to self are already maximised, matching neoantigens to a database of known bacteria- and virus-derived immunogenic peptides ensures the selection of neoantigen candidates with the highest immunogenic potential.
One important step in the induction of an effective immune response is the interaction between the peptide-MHC complex and the T-cell receptor (TCR). It is becoming increasingly understood that not all well-presented peptides are strongly immunogenic, suggesting the existence of peptide features that influence T-cell recognition independently of peptide processing and presentation. Therefore, in addition to presentation by MHC molecules, structural features of the peptide are considered for predicting neoantigen immunogenicity.
Not all reads in the sequencing dataset of the tumour biopsy are attributable to tumour cells. Depending on the purity of the sample, certain sequencing reads can be attributed towards either normal neighbouring tissue or infiltrating immune cells. The reads of the latter are used to gain more information about the immune system activity.
The ImmunoEngine also analyses the tumour microenvironment and its immune signature, contributing to a deeper understanding of the tumour phenotype. The levels of immune cell infiltration are determined for factors both positive and negative to patient outcome (e.g. CD8 T-cells and Treg cells respectively).
The transcriptional profile of the tumour is analysed for an exhaustive set of cancer driver genes, immune-related genes, and CTA genes. This profile is compared with that of other specimen of the same tumour type, and with the mean values across other malignancies.
Various clinical patient values, besides the sequencing datasets, are also incorporated into the myNEO platform for optimal analysis. These include cancer type, disease stage, primary and metastatic tumour sites, histological diagnosis, previous treatments.
Construct windows surrounding the epitope are selected to target both CD4 and CD8 T-cells. Candidates producing multiple epitopes are prioritised, and depending on the scenario, one out of three ranking mechanisms are used for the windows. The max, mean, and supermax window scoring functions respectively examine the highest score of all epitopes it contains, the mean score of all epitopes it contains, or a convolution maximizing the number of highly scored epitopes it contains.
mRNAs encoding the same polypeptide via different codon assignments can vary dramatically in the amount of protein translated (Nguyen2004; Angov 2011; Zhao 2017). Furthermore, synonymous codon changes can affect protein conformation and stability, change sites of post-translational modifications, and alter protein function (Hanson 2018; Mauro 2014). Once the final peptide sequence is known, it is thus key to select the correct codons that code for this epitope, which are not per se the base pair sequence as was detected in the tumour.