About RummagenexRummaGEO

RummagenexRummaGEO

Recently, the Ma'ayan Lab developed two new resources: Rummagene and RummaGEO. Rummagene is a web server application that provides access to over 750,000 gene sets extracted from supporting materials of 140,000+ articles after scanning 6 million articles available on PubMed Central. Similarly, RummaGEO is a web server application that automates extracting and categorizing over 300,000 human and mouse gene sets from the Gene Expression Omnibus (GEO) for comprehensive gene expression signature search.

Since these two new resources produced massive collections of independent annotated gene sets, we sought to cross them to discover gene sets that highly overlap but originate from different seemingly unrelated studies. In total, we intersected 748,220 gene sets from Rummagene with 158,062 RummaGEO mouse gene sets and 135,264 RummaGEO human gene sets. This comparison led to the discovery of over 16 million gene set pairs that show high overlap (p-value < 0.001). The top 1 million sets (by p value and odds ratio) are stored in a database and made available for search via the RummagenexRummaGEO website. In addition to providing the overlapping pairs for search, we generated hypotheses for the top 1000 sets (by p value and odds ratio) which are also availabe on the site.

Furthermore, the top overlapping sets with abstract dissimilarity were examined for possible new connections between biological and biomedical concepts. For example, drugs and their unknown mechanisms of action, or two diseases with no prior knowledge about their similar molecular mechanisms.

This site is programatically accessible via a GraphQL API.


RummagenexRummaGEO is actively being developed by the Ma'ayan Lab