Below are listed the main software developed and published to bioinformatics community by InfOmics lab members. For a comprehensive and complete list of software developed by our lab click here
CCRISPRme is a tool for comprehensive off-target assessment available as a web application online, offline, and command line. It integrates human genetic variant datasets with orthogonal genomic annotations to predict and prioritize CRISPR-Cas off-target sites at scale. The method considers both single-nucleotide variants (SNVs) and indels, accounts for bona fide haplotypes, accepts spacer:protospacer mismatches and bulges, and is suitable for population and personal genome analyses. CRISPRme takes care of all steps in the process including data download, executing the complete search, and presents an exhaustive report with tables and figures within interactive web-based GUI.
GRAFIMO (GRAph-based Finding of Individual Motif Occurrences) is a command-line tool that extends the traditional Position Weight Matrix (PWM) scanning procedure to pangenome variation graphs (VGs). GRAFIMO can search the occurrences of a given PWM in many genomes in a single run, accounting for the effects that SNPs, indels and potentially any structural variation (handled by VG) have on found potential motif occurrences. As result, GRAFIMO produces a report containing the statistically significant motif candidates found, reporting their frequency within the haplotypes embedded in the scanned VG and if they contain genomic variants or belong to the reference genome sequence.
APDB is a user-friendly platform for the inspection of collected air pollutants and the search for similar molecules in the investigated chemical spaces. The web interface allows (i) to browse data by category, (ii) to query the database by molecules and targets, and (iii) to visualize and download molecular descriptors and structures as well as computed similarities.
Stardust is a clustering algorithm for Spatial Transcriptomics data. Its aim is, given as input an expression matrix, the positions of spots and a space weight configuration, to derive a vector of cluster identities for each spot in the input data. The similarity between spots is composed by integrating transcripts and physical distance of spots in a user-driven fashion.
Signals pertaining to transcriptional activation are transferred from enhancers found in intergenic loci to genes in the form of transcription factors, cofactors, and various transcriptional machinery such as RNA Pol II. How and where this information is transmitted to and from is central for decoding the regulatory landscape of any gene and identifying enhancers. Esearch3D is an unsupervised algorithm designed to predict enhancer regions from 3C data. It works by reverse engineering the flow of regulatory information from enhancers to genes. Using this approach we were able to identify intergenic regulatory enhancers using solely gene expression and 3D genomic data. Esearch3D models chromosome conformation capture (3C) data as chromatin interaction network (CIN) and then exploits graph-theory algorithms to integrate RNA-seq data to calculate an imputed activity score (IAS) for intergenic regions. We also provide a visualisation tool to allow an easy interpretation of the results.
CRISPRitz is software suite to perform in-silico CRISPR analysis and assessment. CRISPRitz suite contains 5 different tools dedicated to perform predictive analysis and result assessement on CRISPR/Cas experiments.
LErNet is a method to in silico define and predict the roles of IncRNAs. The core of the approach is a network expansion algorithm which enriches the genomic context of IncRNAs. The context is built by integrating the genes encoding proteins that are found next to the non-coding elements both at genomic and system level. The pipeline is particularly useful in situations where the functions of discovered IncRNAs are not yet known.
PanDelos is a stand alone tool for the discovery of pan-genome contents among phylogenetic distant genomes. The methodology is based on information theory and network analysis. It is parameter-free because thresholds are automatically deduced from the context.
There is a plethora of incomplete genomes that are available on public resources,meaning that the process for reconstructing their genomic sequence is not complete, and only fragments of their sequence are currently available. However, the information contained in these fragments may play an essential role in the analyses. We present PanDelos-frags, a computational tool which exploits and extends previous results in analysing complete genomes of PanDelos. It provides a new methodology for inferring missing genetic information and thus for managing incomplete genomes. PanDelos-frags outperforms state-of-the-art approaches in reconstructing gene families in synthetic benchmarks and in a real use case of metagenomics
GraphGrepSX is a querying system for databases of graphs. It is based on its predecessor GraphGrep. The system implements efficient graph searching algorithms together with advanced filtering techniques that allow an i nitial approximate search. It allows users to select candidate subgraphs rather than entire graphs. The method searches for subgraph isomorphism (monomorphism) between a query graph (pattern) and graphs inside a database of graphs (targets).
cuRnet is a package that provides a wrap of parallel graph algorithms developed in CUDA for the R environment. It makes available GPU solutions to R end-users in a transparent way, by including basic data structures for representing graphs, and parallel implementation of of BFS (Breadth-First Search), SCC (Strongly Connected Components) and SSSP (Single-Source Shortest Paths) customized for biological network analysis on GPUs (Busato and Bombieri, 2016).
APPAGATO is a stochastic and parallel algorithm to find approximate occurrences of a query network in biological networks. APPAGATO allows nodes and edges mismatches. To speed-up the querying process, APPAGATO has been also implemented in parallel to run on graphics processing units (GPUs).
GRAPES is a querying system for parallel searching in databases of graphs, and single target graph, using symmetric multiprocessing (SMP) architectures. It implements a parallel version of well established graph searching algorithms providing efficient solutions for graphs indexing and matching. The GRAPES method consists of three main phases: indexing, filtering, and matching. In the indexing phase, features are extracted from target graphs and a database index is built off-line. A feature is a labeled path presents in a graph. GRAPES extracts all paths up to a fixed length (lp) and stores them in a compact trie structure. Moreover, starting nodes of such paths are also stored. Once the index is built, the system is able to find subgraph isomorphims between a query and all the target graphs. The filtering phase allows to a priori discard such database graphs which do not contain the query's features. It is also able to preliminarily recognize unmatching nodes, and entire regions, of database graphs. This useful behavior allows GRAPES to dial both with large database or single target graphs. Finally, an exact subgraph matching algorithm is run in parallel. See the related scientific paper for more details.
RI is a general purpose algorithm for one-to-one exact subgraph isomorphism problem maintaining topological constraints. It is both a C++ library and a standalone tool, providing developing API and a command line interface, with no dependencies out of standard GNU C++ library. RI works on Unix and Mac OS X systems with G++ installed, and it can be compiled under Windows using Gygwin. Working graphs may be directed, undirected, multigraphs with optional attributes both on nodes and edges. Customizable features allow user-defined behaviors for attribute comparisons and the algorithm's flow. RI aims to provide a better search strategy for the common used backtracking approach to the subgraph isomorphism problem. It can be integrated with additional preprocessing steps or it can be used for the verification of candidate structures coming from data mining, data indexing or other filtering techniques. RI is able to find graphs isomorphisms, subgraph isomorphisms and induced subgraph isomorphisms. It is distributed in several versions divide chiefly in two groups respectively for static or dynamically changing attributes. All proposed versions are developed taking into account trade-offs between time and memory requirements. Optional behaviors such as stop at first encountered match, processing of result matches, type of isomorphism and additional features may be enabled thanks to high modularity and library's API.