Run INSPIRE on the human breast cancer Xenium sections
In this tutorial, we show INSPIRE’s application to analysis of human breast cancer sections profiled by Xenium, revealing microenvironment heterogeneity across tumor subtypes.
The human breast cancer Xenium sections are publicly available at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast.
Import packages
[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import anndata as ad
import umap
import os
import scipy.sparse
from matplotlib.cm import get_cmap
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.lines import Line2D
import INSPIRE
import warnings
warnings.filterwarnings("ignore")
Data preprocessing
Each Xenium section contains a large number of cells. For efficiency, we preprocessed each section respectively.
[2]:
# ## load data 1
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate1"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_1 = adata.copy()
# adata_1.obs.index = adata_1.obs.index + "-0"
# ## load data 2
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate2"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_2 = adata.copy()
# adata_2.obs.index = adata_2.obs.index + "-1"
# shared_genes = adata_1.var.index & adata_2.var.index
# adata_1 = adata_1[:, shared_genes].copy()
# adata_2 = adata_2[:, shared_genes].copy()
# del adata
# ## preprocess data 1
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_1,
# slice_name="BC_sample1_rep1_qc6_radcutoff20",
# preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
# min_genes_qc=6,
# min_cells_qc=6,
# rad_cutoff=20
# )
# ## preprocess data 2
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_2,
# slice_name="BC_sample1_rep2_qc6_radcutoff20",
# preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
# min_genes_qc=6,
# min_cells_qc=6,
# rad_cutoff=20
# )
The data after preprocessing are saved into preprocessed_data_path.
Load preprocessed data
[3]:
slice_name_list = ["BC_sample1_rep1_qc6_radcutoff20", "BC_sample1_rep2_qc6_radcutoff20"]
preprocessed_data_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC"
adata_st_list, adata_full = INSPIRE.utils.prepare_inputs_LGCN(slice_name_list=slice_name_list,
preprocessed_data_path=preprocessed_data_path,
num_hvgs=500,
spot_size=20.,
min_concat_dist=100)
Finding highly variable genes...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Find 313 shared highly variable genes among datasets.
Store counts and library sizes for Poisson modeling...
Normalize data...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Load and prepare node features for LGCN...
Load node features BC_sample1_rep1_qc6_radcutoff20
Node features for slice 0 : (164596, 626)
Load node features BC_sample1_rep2_qc6_radcutoff20
Node features for slice 1 : (118467, 626)
Prepare an adata containing full spot locations and slice labels for better visualization...
Run INSPIRE model
[4]:
model = INSPIRE.model.Model_LGCN(adata_st_list=adata_st_list,
n_spatial_factors=50,
n_training_steps=20000,
batch_size=2048
)
[5]:
model.train(adata_st_list)
0%| | 6/20000 [00:02<1:35:14, 3.50it/s]
Step: 0, d_loss: 1.4172, Loss: 1567.6923, recon_loss: 499.5919, fe_loss: 79.1669, geom_loss: 74.5035, beta_loss: 986.3552, gan_loss: 1.0883
3%|▎ | 507/20000 [00:12<06:32, 49.63it/s]
Step: 500, d_loss: 1.3132, Loss: 1309.3512, recon_loss: 339.4700, fe_loss: 45.5029, geom_loss: 88.7015, beta_loss: 921.8173, gan_loss: 0.7870
5%|▌ | 1007/20000 [00:22<06:37, 47.72it/s]
Step: 1000, d_loss: 1.2793, Loss: 1182.5730, recon_loss: 213.1364, fe_loss: 44.0688, geom_loss: 81.6868, beta_loss: 922.9014, gan_loss: 0.8327
8%|▊ | 1507/20000 [00:32<06:17, 48.95it/s]
Step: 1500, d_loss: 1.2698, Loss: 1102.9878, recon_loss: 134.4910, fe_loss: 43.4723, geom_loss: 76.3164, beta_loss: 922.5912, gan_loss: 0.9072
10%|█ | 2009/20000 [00:42<06:07, 48.96it/s]
Step: 2000, d_loss: 1.2445, Loss: 1035.4191, recon_loss: 67.2140, fe_loss: 43.1071, geom_loss: 68.3831, beta_loss: 922.8494, gan_loss: 0.8808
13%|█▎ | 2510/20000 [00:52<05:59, 48.63it/s]
Step: 2500, d_loss: 1.2286, Loss: 1002.5932, recon_loss: 34.3764, fe_loss: 42.6707, geom_loss: 61.9228, beta_loss: 923.3207, gan_loss: 0.9869
15%|█▌ | 3006/20000 [01:03<05:41, 49.76it/s]
Step: 3000, d_loss: 1.2076, Loss: 966.3399, recon_loss: -0.8271, fe_loss: 42.3578, geom_loss: 57.3993, beta_loss: 922.6725, gan_loss: 0.9887
18%|█▊ | 3507/20000 [01:13<05:32, 49.53it/s]
Step: 3500, d_loss: 1.1812, Loss: 959.6121, recon_loss: -7.1217, fe_loss: 42.3555, geom_loss: 53.9558, beta_loss: 922.3357, gan_loss: 0.9634
20%|██ | 4007/20000 [01:23<05:24, 49.33it/s]
Step: 4000, d_loss: 1.1691, Loss: 949.7222, recon_loss: -17.0247, fe_loss: 42.3874, geom_loss: 51.2098, beta_loss: 922.2996, gan_loss: 1.0357
23%|██▎ | 4509/20000 [01:33<05:22, 47.99it/s]
Step: 4500, d_loss: 1.1437, Loss: 942.9811, recon_loss: -23.8967, fe_loss: 42.2599, geom_loss: 48.3503, beta_loss: 922.5485, gan_loss: 1.1023
25%|██▌ | 5009/20000 [01:43<05:10, 48.24it/s]
Step: 5000, d_loss: 1.1096, Loss: 935.4135, recon_loss: -31.5005, fe_loss: 41.9880, geom_loss: 44.9566, beta_loss: 922.9172, gan_loss: 1.1095
28%|██▊ | 5509/20000 [01:54<04:56, 48.88it/s]
Step: 5500, d_loss: 1.0913, Loss: 929.1166, recon_loss: -37.2718, fe_loss: 42.0714, geom_loss: 43.7849, beta_loss: 922.2804, gan_loss: 1.1610
30%|███ | 6010/20000 [02:04<04:44, 49.17it/s]
Step: 6000, d_loss: 1.0835, Loss: 928.4330, recon_loss: -37.6152, fe_loss: 42.0797, geom_loss: 43.0180, beta_loss: 921.9680, gan_loss: 1.1402
33%|███▎ | 6510/20000 [02:14<04:34, 49.19it/s]
Step: 6500, d_loss: 1.0708, Loss: 932.1746, recon_loss: -34.6176, fe_loss: 41.9906, geom_loss: 41.3810, beta_loss: 922.8387, gan_loss: 1.1352
35%|███▌ | 7009/20000 [02:24<04:23, 49.21it/s]
Step: 7000, d_loss: 1.0598, Loss: 922.6492, recon_loss: -43.4826, fe_loss: 41.9655, geom_loss: 41.5733, beta_loss: 922.1033, gan_loss: 1.2315
38%|███▊ | 7509/20000 [02:34<04:18, 48.27it/s]
Step: 7500, d_loss: 1.0216, Loss: 917.3104, recon_loss: -49.1152, fe_loss: 41.8728, geom_loss: 40.6217, beta_loss: 922.3845, gan_loss: 1.3559
40%|████ | 8011/20000 [02:44<04:01, 49.68it/s]
Step: 8000, d_loss: 1.0168, Loss: 912.3078, recon_loss: -53.9664, fe_loss: 41.8964, geom_loss: 40.7532, beta_loss: 922.2964, gan_loss: 1.2664
43%|████▎ | 8507/20000 [02:55<03:53, 49.21it/s]
Step: 8500, d_loss: 1.0246, Loss: 907.8286, recon_loss: -58.2036, fe_loss: 41.8341, geom_loss: 40.7000, beta_loss: 922.0864, gan_loss: 1.2977
45%|████▌ | 9008/20000 [03:05<03:49, 47.81it/s]
Step: 9000, d_loss: 0.9640, Loss: 914.9151, recon_loss: -51.6274, fe_loss: 41.9215, geom_loss: 41.1876, beta_loss: 922.3317, gan_loss: 1.4655
48%|████▊ | 9508/20000 [03:15<03:34, 48.95it/s]
Step: 9500, d_loss: 0.9606, Loss: 903.0141, recon_loss: -63.1390, fe_loss: 41.6479, geom_loss: 40.1974, beta_loss: 922.2338, gan_loss: 1.4675
50%|█████ | 10008/20000 [03:25<03:23, 48.99it/s]
Step: 10000, d_loss: 0.9530, Loss: 911.0151, recon_loss: -55.3885, fe_loss: 41.8912, geom_loss: 40.8551, beta_loss: 922.2137, gan_loss: 1.4815
53%|█████▎ | 10508/20000 [03:35<03:13, 49.11it/s]
Step: 10500, d_loss: 0.9357, Loss: 908.7469, recon_loss: -57.5353, fe_loss: 41.9110, geom_loss: 41.9312, beta_loss: 922.0583, gan_loss: 1.4742
55%|█████▌ | 11006/20000 [03:45<03:02, 49.17it/s]
Step: 11000, d_loss: 0.9120, Loss: 905.8323, recon_loss: -60.0040, fe_loss: 41.8304, geom_loss: 41.6544, beta_loss: 921.7123, gan_loss: 1.4605
58%|█████▊ | 11506/20000 [03:56<02:54, 48.78it/s]
Step: 11500, d_loss: 0.8887, Loss: 894.6031, recon_loss: -72.1850, fe_loss: 41.6670, geom_loss: 40.7881, beta_loss: 922.7667, gan_loss: 1.5386
60%|██████ | 12006/20000 [04:06<02:42, 49.08it/s]
Step: 12000, d_loss: 0.9045, Loss: 910.2371, recon_loss: -55.5765, fe_loss: 42.0149, geom_loss: 42.2740, beta_loss: 921.6388, gan_loss: 1.3144
63%|██████▎ | 12506/20000 [04:16<02:32, 49.06it/s]
Step: 12500, d_loss: 0.8669, Loss: 908.5894, recon_loss: -57.8074, fe_loss: 41.8952, geom_loss: 42.7527, beta_loss: 922.0972, gan_loss: 1.5493
65%|██████▌ | 13006/20000 [04:26<02:22, 48.94it/s]
Step: 13000, d_loss: 0.8823, Loss: 895.9685, recon_loss: -70.3891, fe_loss: 41.6324, geom_loss: 42.0520, beta_loss: 922.2167, gan_loss: 1.6674
68%|██████▊ | 13506/20000 [04:36<02:13, 48.80it/s]
Step: 13500, d_loss: 0.8444, Loss: 906.3595, recon_loss: -60.0980, fe_loss: 41.8539, geom_loss: 42.8700, beta_loss: 922.0403, gan_loss: 1.7059
70%|███████ | 14006/20000 [04:47<02:03, 48.47it/s]
Step: 14000, d_loss: 0.8451, Loss: 904.1049, recon_loss: -62.0766, fe_loss: 41.7981, geom_loss: 43.1761, beta_loss: 921.9168, gan_loss: 1.6029
73%|███████▎ | 14506/20000 [04:57<01:52, 48.74it/s]
Step: 14500, d_loss: 0.8344, Loss: 906.0906, recon_loss: -60.7666, fe_loss: 41.6471, geom_loss: 44.2953, beta_loss: 922.5738, gan_loss: 1.7504
75%|███████▌ | 15008/20000 [05:07<01:42, 48.70it/s]
Step: 15000, d_loss: 0.8374, Loss: 906.1649, recon_loss: -60.0918, fe_loss: 41.7930, geom_loss: 44.6190, beta_loss: 921.9781, gan_loss: 1.5933
78%|███████▊ | 15508/20000 [05:17<01:31, 49.13it/s]
Step: 15500, d_loss: 0.8169, Loss: 892.4849, recon_loss: -73.7491, fe_loss: 41.6577, geom_loss: 44.4146, beta_loss: 921.9598, gan_loss: 1.7281
80%|████████ | 16008/20000 [05:28<01:21, 49.03it/s]
Step: 16000, d_loss: 0.8051, Loss: 899.8423, recon_loss: -66.5584, fe_loss: 41.7782, geom_loss: 46.0948, beta_loss: 921.8125, gan_loss: 1.8881
83%|████████▎ | 16508/20000 [05:38<01:11, 49.17it/s]
Step: 16500, d_loss: 0.8042, Loss: 904.9505, recon_loss: -61.3754, fe_loss: 41.7375, geom_loss: 46.3722, beta_loss: 921.7482, gan_loss: 1.9127
85%|████████▌ | 17009/20000 [05:48<01:00, 49.10it/s]
Step: 17000, d_loss: 0.7950, Loss: 908.1415, recon_loss: -58.2322, fe_loss: 41.8173, geom_loss: 47.9613, beta_loss: 921.9771, gan_loss: 1.6201
88%|████████▊ | 17510/20000 [05:58<00:50, 48.94it/s]
Step: 17500, d_loss: 0.7699, Loss: 886.5103, recon_loss: -79.6134, fe_loss: 41.7839, geom_loss: 46.7810, beta_loss: 921.5244, gan_loss: 1.8798
90%|█████████ | 18010/20000 [06:08<00:40, 48.87it/s]
Step: 18000, d_loss: 0.7473, Loss: 907.1054, recon_loss: -59.2270, fe_loss: 41.8680, geom_loss: 48.8641, beta_loss: 921.6464, gan_loss: 1.8407
93%|█████████▎| 18507/20000 [06:18<00:30, 49.00it/s]
Step: 18500, d_loss: 0.8016, Loss: 898.2635, recon_loss: -68.2579, fe_loss: 41.7153, geom_loss: 50.1626, beta_loss: 922.0256, gan_loss: 1.7772
95%|█████████▌| 19007/20000 [06:29<00:20, 48.60it/s]
Step: 19000, d_loss: 0.7876, Loss: 894.5960, recon_loss: -71.3222, fe_loss: 41.5344, geom_loss: 49.3036, beta_loss: 921.6761, gan_loss: 1.7215
98%|█████████▊| 19507/20000 [06:39<00:10, 48.87it/s]
Step: 19500, d_loss: 0.7333, Loss: 906.0635, recon_loss: -60.4748, fe_loss: 41.8529, geom_loss: 50.6278, beta_loss: 921.8535, gan_loss: 1.8193
100%|██████████| 20000/20000 [06:49<00:00, 48.85it/s]
Access cell representations, proportions of spatial factors in cells, and gene loading matrix
We evaluate the cells representations and proportions of spatial factors in cells with minibatches.
[6]:
adata_full, basis_df = model.eval_minibatch(adata_st_list,
adata_full,
batch_size=10000
)
Evaluate Z and beta using minibatch...
Evaluation for slice 0
Evaluation for slice 1
We calculate 2D UMAP coordinates of cells across sections based on INSPIRE’s inferred cell representations.
[7]:
reducer = umap.UMAP(n_neighbors=30,
n_components=2,
metric="correlation",
n_epochs=None,
learning_rate=1.0,
min_dist=0.3,
spread=1.0,
set_op_mix_ratio=1.0,
local_connectivity=1,
repulsion_strength=1,
negative_sample_rate=5,
a=None,
b=None,
random_state=1234,
metric_kwds=None,
angular_rp_forest=False,
verbose=True)
embedding = reducer.fit_transform(adata_full.obsm['latent'])
adata_full.obsm["X_umap"] = embedding
UMAP(angular_rp_forest=True, local_connectivity=1, metric='correlation', min_dist=0.3, n_neighbors=30, random_state=1234, repulsion_strength=1, verbose=True)
Thu May 29 10:20:57 2025 Construct fuzzy simplicial set
Thu May 29 10:20:57 2025 Finding Nearest Neighbors
Thu May 29 10:20:57 2025 Building RP forest with 32 trees
Thu May 29 10:21:01 2025 NN descent for 18 iterations
1 / 18
2 / 18
3 / 18
4 / 18
Stopping threshold met -- exiting after 4 iterations
Thu May 29 10:21:28 2025 Finished Nearest Neighbor Search
Thu May 29 10:21:32 2025 Construct embedding
completed 0 / 200 epochs
completed 20 / 200 epochs
completed 40 / 200 epochs
completed 60 / 200 epochs
completed 80 / 200 epochs
completed 100 / 200 epochs
completed 120 / 200 epochs
completed 140 / 200 epochs
completed 160 / 200 epochs
completed 180 / 200 epochs
Thu May 29 10:27:50 2025 Finished embedding
[8]:
adata_full.obs["slice_label"] = adata_full.obs["slice_label"].values.astype(str)
sc.pl.umap(adata_full, color=["slice_label"])
Save results
[9]:
res_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tutorials/new_examples/human_breast_cancer_xenium"
adata_full.write(res_path + "/adata_inspire.h5ad")
basis_df.to_csv(res_path + "/basis_df_inspire.csv")
[ ]: