TUDP: genomics, bioinformatics and knowledge for undiagnosed diseases

FondazioneTelethon’s TUDP combines genomics, bioinformatics and research to identify the genetic causes of rare paediatric diseases without a diagnosis.

Some children and families go through diagnostic journeys lasting as long as eight years, without ever receiving an answer. Vincenzo Nigro, Professor of Medical Genetics at the University of Campania “Luigi Vanvitelli” and coordinator of the Telethon Undiagnosed Diseases Programme (TUDP) at TIGEM, calls them “impossible cases”: patients who have already undergone all available genetic tests and whom no centre has been able to diagnose. Since 2016, TIGEM’s TUDP has been addressing these cases through an advanced research approach that integrates three complementary genomic platforms, DNA and genetic-molecular analysis, bioinformatics and international comparison.

The results, documented in a study published in Genetics in Medicine OPEN, are clear: out of more than 1,300 paediatric cases collected across 22 Italian clinical centres between 2016 and 2023, 49% now have an identified genetic cause, with 16 new disease genes published and a median time of 18 months from enrolment to answer. More than 330 genes are involved, and the programme has generated 74 scientific publications.

“Our work is not simply to make a diagnosis, but to carry out very deep research where diagnoses have failed. Our dimension is knowledge” explains Nigro.

What is new is the structured process, the same for every patient, which selects only severe and significant paediatric cases analysed as trios, together with both biological parents. The technologies used are also advanced and continuously updated: should new ones become available, the institute would be ready to adopt them.

This is an approach that does not merely seek answers for individual patients, but generates new scientific knowledge: every solved case can rewrite a chapter of medical genetics, and every unsolved case opens the way to new hypotheses and new avenues of research.

The genomic technologies behind the TUDP project

The programme involves collecting samples from patients and their families and analysing them through a complex diagnostic and research pathway. The integrated approach begins with a detailed phenotypic analysis, which includes the clinical course of the disease, clinical, biochemical and psychological assessments, growth curves and, in the case of motor disorders, movement images. All this is carried out at an extremely high level of detail: this set of characteristics is assigned unique codes, which are useful for identifying other similar patients.

After this step, the programme moves on to genetic analysis using three technologies: short-read sequencing, long-read sequencing and optical mapping.

All steps are carried out exclusively as trio analyses, meaning that samples are collected and analysed from the patient and both biological parents. Seventy per cent of rare diseases of genetic origin are caused by de novo mutations, which are absent in both parents and appear for the first time in the child. Comparing the three DNA samples makes it possible to identify these mutations rapidly, reduce background noise, determine which allele each variant comes from and improve the overall quality of the readings. Without trio analysis, detection power is reduced by more than fourfold. For this reason, when the biological parents are not available, the programme can only offer standard diagnostics.

At the beginning of the project, TIGEM researchers analysed patients’ exomes, meaning only the coding portions of the genome, excluding the regions between genes and the non-coding sequences known as introns. Even today, many diagnostic laboratories still work by sequencing the exome.

However, the exome accounts for only 2% of the entire genome: the remaining 98% non-coding portion includes elements that may be potential causes of disease, but are much more difficult to interpret. “The exome of these patients has already been analysed during their previous diagnostic journey. This is why our work focuses on whole genome sequencing” explains Nigro.

The first technological approach is short-read sequencing: a complete sequencing of the human genome, but in fragments, carried out using Illumina technology and NovaSeq instruments. Although highly accurate, it cannot resolve repeated regions and does not work in certain areas: “this approach can read 80–85% of the entire genome” explains Nigro.

To overcome this limitation, two further technologies are needed. The second is called long-read sequencing using Oxford Nanopore technology, which enables an electrical reading of DNA: the molecule passes through a pore and induces changes in the flow of electric current, making it possible to identify the DNA base in transit and its methylation status. The fragments read are on average 15,000–20,000 nucleotides long, around 100 times longer than those read by short-read sequencing. “The second system has a much higher error rate, so it cannot replace short-read sequencing, but it can complement it. The two techniques are complementary” comments Nigro.

The third technique, optical mapping using Bionano technology, analyses the distribution of DNA fragments without reading them. The molecules, labelled with fluorescent signals at specific positions, are passed through microfluidic channels. The resulting pattern makes it possible to reconstruct the chromosomal structure with a resolution of up to 500 bases, revealing translocations, deletions and inversions that sequencing techniques do not capture.

“We have the very small dimension of the first reading, the intermediate dimension of long-read sequencing, and finally the larger dimension of optical mapping. Integrating these three elements allows us to completely reconstruct a genome from the first to the last nucleotide” comments Nigro.

DNA analyses are complemented by RNA analyses, which can also be performed using short or long fragments. Long-read sequencing is particularly useful when a gene presents alternative splicing, because it makes it possible to reconstruct the different splicing variants across tissues.

From data analysis to interpretation: bioinformatics, AI and human factors

The data produced by the three genomic platforms and by RNA analysis must be processed through a multi-step bioinformatics pipeline. For short reads, protocols are well established internationally; for long reads, however, standardisation is still immature and the protocols remain partly bespoke, making the analysis more complex.

“Artificial intelligence — through tools such as the VarGenius pipeline, developed internally at TIGEM, or EMEDGENE (Illumina) — helps us solve cases where the gene is already known. But in most TUDP cases, the gene responsible for the disease does not appear in the ranking, because it is still unknown and the system cannot make predictions about something that is absent from the scientific literature” explains Nigro.

Once the bioinformatics analysis has narrowed the field down to one or two candidate genes, the programme launches an international call: the patient’s anonymised data and the variants in these genes are shared with research groups in other countries to look for a second case with the same genetic defect and a similar phenotype. “Sometimes, from a single patient, it is possible to identify as many as 10–15 cases, which confirms that the discovery is robust” comments Nigro. Without this confirmation, the research remains pending.

Unresolved cases are not abandoned: reanalysis is carried out systematically on an annual basis, thanks to updates in bioinformatics technologies, the publication of new disease genes in the international literature, and collaboration with the European Solve-RD project. Of the 425 cases reanalysed as of December 2023, this process produced a 17.2% diagnostic increase.

Ultimately, the limiting factor remains human interpretation. Each individual carries around 5 million DNA variants compared with others: identifying which one is responsible for the disease requires experience, analytical ability and in-depth investigation that no algorithm can replace.

Facing the unknown: unknown genes, VUS and validation

Not all cases solved by the TUDP are the same. In some, the cause is an already known gene, but with a clinical presentation so different from what was expected — or with such an atypical mutation — that it had not been identified by standard tests. In the most scientifically relevant cases, however, the gene had never previously been associated with any disease, and the condition itself had not yet been described.

These are the most difficult cases, and the reason lies in the numbers: every human genome contains thousands of variants of uncertain significance (VUS), variants for which it is not possible to establish whether they are pathogenic or entirely harmless. When one of these variants falls within a gene whose function is unknown, interpretation becomes enormously more complex. “The mutation may be visible, but alongside many others: the problem is understanding which one is actually responsible for the disease” explains Nigro.

But obtaining confirmation is not straightforward. If no other patients are found, the process moves on to biological validation: the suspected mutation is reproduced in animal models — mainly zebrafish (Danio rerio) and mice (Mus musculus) — or in cellular models, including patient-derived iPS cells. This can take anything from a few months to one or two years.

The RNU4-2 case illustrates the entire pathway. The patients — children with neurodevelopmental disorders — had no mutations in coding genes. By extending the analysis to non-coding regions, researchers discovered that they all shared a de novo mutation in RNU4-2, a gene that produces a molecule involved in regulating RNA splicing. The mutation, initially classified as being of unknown significance, recurred in several patients. An international call was launched, and the TIGEM group solved eleven cases at once. Today, RNU4-2 syndrome is considered one of the most common neurodevelopmental syndromes in this type of patient.

From diagnosis to therapy, and the challenges that remain

When the programme identifies the genetic cause, the patient receives a clinical report validated according to ISO procedures: the mutation is confirmed on an independent sample, and the result is communicated to the family together with genetic counselling and, where possible, guidance towards a therapeutic pathway.

Knowing the gene opens up a concrete perspective. In some cases, conventional drugs may already exist that are effective on the mechanism of action involved. For the rarest mutations, TIGEM is exploring an N-of-one approach: a therapy targeted not at the disease, but at the individual mutation, using antisense oligonucleotides capable of correcting the genetic defect. At present, no patients have been formally enrolled as candidates for this therapy, but the group coordinated by Diego di Bernardo and Nicola Brunetti-Pierri at TIGEM has already selected several promising TUDP profiles to begin preclinical studies.

The most severe cases, moreover, do not remain isolated: they represent the tip of an iceberg of less dramatic situations, in which milder mutations in the same genes give rise to intermediate phenotypes and potentially more common diseases. Studying extreme cases can therefore open up our understanding of a much broader and still largely unexplored territory. As Nigro points out, “we think that reading means understanding, but in fact we read and yet do not understand most genes”. Of around 63,000 human genes, a cause-and-effect relationship with a disease is known for fewer than 5,000 genes.

To move beyond these limits, TIGEM is developing new genomic analysis technologies capable of detecting single-cell mosaicism and tissue-specific methylation states: alterations that current platforms do not capture and that could be responsible for clinical pictures that remain unexplained today. But technology alone is not enough. Genomic analyses of this complexity require large numbers and centralisation: “If you do not work with large numbers, you cannot clearly see all the problems and pitfalls. With small numbers, the risk is making big mistakes” warns Nigro.

Looking ahead, the programme is therefore focused on four objectives: transferring this expertise to the National Health Service, deepening the study of candidate genes through animal and cellular models, identifying patients for targeted therapeutic pathways, and strengthening diagnostics with next-generation instruments. The new phase of the Fondazione Telethon project will be structured precisely in this direction, offering researchers at TIGEM and other institutes the opportunity to carry out targeted investigations into the most promising genes.

Il tuo browser non è più supportato da Microsoft, esegui l'upgrade a Microsoft Edge per visualizzare il sito.