implementation of oneAPI analytics toolkit in Medical Science
Abhishek Nandy
Kolkata, WB
- 0 Collaborators
We will be exploring single cell data (eg:- scRNA sequence). We will be porting Clustergrammer2 to AI analytics toolkit. ...learn more
Project status: Published/In Market
Intel Technologies
oneAPI,
Intel Python,
DevCloud
Overview / Usage
.We will be exploring single cell data (eg:- scRNA sequence).
We will be porting Clustergrammer2 to AI analytics toolkit.
Clustergrammer2 produces highly interactive visualizations that enable intuitive exploration of high-dimensional data and has several optional biology-specific features (e.g. enrichment analysis; see Biology-Specific Features) to facilitate the exploration of gene-level biological data.
It is a web base tool for visualizing and analysing high dimensional data (eg single cell RNA sequence) as interactive and shareable heatmaps.
Methodology / Approach
Intel DevCloud used for the project
We will be exploring gene expression data that has got very good implementation I terms of studying diseases such as cancer.
As we explore heatmaps the information we get is very useful for studying where gene mutation has occurred.
Porting Clustergrammer 2 to AI Analytics toolkit gives us an edge of exploring data interactively of 2700 PBMC’s(Peripheral blood mono nuclear cell)obtained from 10X GENOMICS(dataset).
We will be using Intel Optimized Python from AI analytics toolkit and run the programs in Intel DevCloud
We will also use an external dataset for exploration known as CIBERSORT(This dataset provides an es timation of abundances of number of cell types in a mixed population using gene expression data.
We will be loading the data as a Sparse matrix format.
The dataset consists of 32 thousand genes and 2700 single cells.
Using Intel Optimized python we will normalize the dataset(i.e gene expression data GEX data) and find top expressing genes.
Then we will implement ArcSinh transform and Z-Score.
After that we load the data into CLusterGrammer2 that we ported for AI Analytics toolkit. We observe interactive heatmaps.
Here are the features of ClusterGrammer2
-Zooming and Panning
Allows users to zoom into and pan across their heatmap by scrolling and dragging
-MouseOver Interations
Mousing over elements in the heatmap brings up additional information using tooltips.
-Row and column reordering
Interactive Dimensionality reduction
Dimensional reduction is useful data analysis technique that is often used to reduce the dimensionality of high dimensional datasets down to number that can be visualized.
-Interactive Dendogram
Clustergrams typically have dendrogram trees (for both rows and columns) to depict the hierarchy of row and column clusters produced by hierarchical clustering. The height of the branches in the dendrogram depict the distance between clusters. Clustergrammer depicts this hierarchical tree one slice at a time using trapezoids.
Sample Code on Intel Dev Cloud
import numpy as np
import pandas as pd
from clustergrammer2 import net, Network, CGM2
import warnings
warnings.filterwarnings('ignore')
#Load Data
df = {}
df['clean'] = pd.read_csv('../data/rc_two_cat_clean.csv', index_col=0)
df['meta_col'] = pd.read_csv('../data/meta_col.csv', index_col=0)
df['meta_cat'] = pd.read_csv('../data/meta_cat_col.csv', index_col=0)
#Widget Viewer
net.load_df(df['clean'], meta_col=df['meta_col'])
net.set_manual_category(col='Category', preferred_cats=df['meta_cat'])
net.widget()
Technologies Used
Intel oneAPI
Intel oneAPI AI Analytics toolkit
Intel Optimized Python
Intel optimized Scikit Learn