Run INSPIRE on the human breast cancer Xenium sections

In this tutorial, we show INSPIRE’s application to analysis of human breast cancer sections profiled by Xenium, revealing microenvironment heterogeneity across tumor subtypes.

The human breast cancer Xenium sections are publicly available at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast.

Import packages

[1]:
import pandas as pd
import numpy as np
import scanpy as sc
import anndata as ad
import umap
import os
import scipy.sparse
from matplotlib.cm import get_cmap
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.lines import Line2D

import INSPIRE

import warnings
warnings.filterwarnings("ignore")

Data preprocessing

Each Xenium section contains a large number of cells. For efficiency, we preprocessed each section respectively.

[2]:
# ## load data 1
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate1"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_1 = adata.copy()
# adata_1.obs.index = adata_1.obs.index + "-0"


# ## load data 2
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate2"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_2 = adata.copy()
# adata_2.obs.index = adata_2.obs.index + "-1"

# shared_genes = adata_1.var.index & adata_2.var.index
# adata_1 = adata_1[:, shared_genes].copy()
# adata_2 = adata_2[:, shared_genes].copy()
# del adata

# ## preprocess data 1
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_1,
#                                            slice_name="BC_sample1_rep1_qc6_radcutoff20",
#                                            preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
#                                            min_genes_qc=6,
#                                            min_cells_qc=6,
#                                            rad_cutoff=20
#                                           )

# ## preprocess data 2
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_2,
#                                            slice_name="BC_sample1_rep2_qc6_radcutoff20",
#                                            preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
#                                            min_genes_qc=6,
#                                            min_cells_qc=6,
#                                            rad_cutoff=20
#                                           )

The data after preprocessing are saved into preprocessed_data_path.

Load preprocessed data

[3]:
slice_name_list = ["BC_sample1_rep1_qc6_radcutoff20", "BC_sample1_rep2_qc6_radcutoff20"]
preprocessed_data_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC"

adata_st_list, adata_full = INSPIRE.utils.prepare_inputs_LGCN(slice_name_list=slice_name_list,
                                                              preprocessed_data_path=preprocessed_data_path,
                                                              num_hvgs=500,
                                                              spot_size=20.,
                                                              min_concat_dist=100)
Finding highly variable genes...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Find 313 shared highly variable genes among datasets.
Store counts and library sizes for Poisson modeling...
Normalize data...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Load and prepare node features for LGCN...
Load node features BC_sample1_rep1_qc6_radcutoff20
Node features for slice 0 : (164596, 626)
Load node features BC_sample1_rep2_qc6_radcutoff20
Node features for slice 1 : (118467, 626)
Prepare an adata containing full spot locations and slice labels for better visualization...
../../_images/tutorials_human_breast_cancer_xenium_human_breast_cancer_example_10_1.png

Run INSPIRE model

[4]:
model = INSPIRE.model.Model_LGCN(adata_st_list=adata_st_list,
                                 n_spatial_factors=50,
                                 n_training_steps=20000,
                                 batch_size=2048
                                )
[5]:
model.train(adata_st_list)
  0%|          | 6/20000 [00:02<1:35:14,  3.50it/s]
Step: 0, d_loss: 1.4172, Loss: 1567.6923, recon_loss: 499.5919, fe_loss: 79.1669, geom_loss: 74.5035, beta_loss: 986.3552, gan_loss: 1.0883
  3%|▎         | 507/20000 [00:12<06:32, 49.63it/s]
Step: 500, d_loss: 1.3132, Loss: 1309.3512, recon_loss: 339.4700, fe_loss: 45.5029, geom_loss: 88.7015, beta_loss: 921.8173, gan_loss: 0.7870
  5%|▌         | 1007/20000 [00:22<06:37, 47.72it/s]
Step: 1000, d_loss: 1.2793, Loss: 1182.5730, recon_loss: 213.1364, fe_loss: 44.0688, geom_loss: 81.6868, beta_loss: 922.9014, gan_loss: 0.8327
  8%|▊         | 1507/20000 [00:32<06:17, 48.95it/s]
Step: 1500, d_loss: 1.2698, Loss: 1102.9878, recon_loss: 134.4910, fe_loss: 43.4723, geom_loss: 76.3164, beta_loss: 922.5912, gan_loss: 0.9072
 10%|█         | 2009/20000 [00:42<06:07, 48.96it/s]
Step: 2000, d_loss: 1.2445, Loss: 1035.4191, recon_loss: 67.2140, fe_loss: 43.1071, geom_loss: 68.3831, beta_loss: 922.8494, gan_loss: 0.8808
 13%|█▎        | 2510/20000 [00:52<05:59, 48.63it/s]
Step: 2500, d_loss: 1.2286, Loss: 1002.5932, recon_loss: 34.3764, fe_loss: 42.6707, geom_loss: 61.9228, beta_loss: 923.3207, gan_loss: 0.9869
 15%|█▌        | 3006/20000 [01:03<05:41, 49.76it/s]
Step: 3000, d_loss: 1.2076, Loss: 966.3399, recon_loss: -0.8271, fe_loss: 42.3578, geom_loss: 57.3993, beta_loss: 922.6725, gan_loss: 0.9887
 18%|█▊        | 3507/20000 [01:13<05:32, 49.53it/s]
Step: 3500, d_loss: 1.1812, Loss: 959.6121, recon_loss: -7.1217, fe_loss: 42.3555, geom_loss: 53.9558, beta_loss: 922.3357, gan_loss: 0.9634
 20%|██        | 4007/20000 [01:23<05:24, 49.33it/s]
Step: 4000, d_loss: 1.1691, Loss: 949.7222, recon_loss: -17.0247, fe_loss: 42.3874, geom_loss: 51.2098, beta_loss: 922.2996, gan_loss: 1.0357
 23%|██▎       | 4509/20000 [01:33<05:22, 47.99it/s]
Step: 4500, d_loss: 1.1437, Loss: 942.9811, recon_loss: -23.8967, fe_loss: 42.2599, geom_loss: 48.3503, beta_loss: 922.5485, gan_loss: 1.1023
 25%|██▌       | 5009/20000 [01:43<05:10, 48.24it/s]
Step: 5000, d_loss: 1.1096, Loss: 935.4135, recon_loss: -31.5005, fe_loss: 41.9880, geom_loss: 44.9566, beta_loss: 922.9172, gan_loss: 1.1095
 28%|██▊       | 5509/20000 [01:54<04:56, 48.88it/s]
Step: 5500, d_loss: 1.0913, Loss: 929.1166, recon_loss: -37.2718, fe_loss: 42.0714, geom_loss: 43.7849, beta_loss: 922.2804, gan_loss: 1.1610
 30%|███       | 6010/20000 [02:04<04:44, 49.17it/s]
Step: 6000, d_loss: 1.0835, Loss: 928.4330, recon_loss: -37.6152, fe_loss: 42.0797, geom_loss: 43.0180, beta_loss: 921.9680, gan_loss: 1.1402
 33%|███▎      | 6510/20000 [02:14<04:34, 49.19it/s]
Step: 6500, d_loss: 1.0708, Loss: 932.1746, recon_loss: -34.6176, fe_loss: 41.9906, geom_loss: 41.3810, beta_loss: 922.8387, gan_loss: 1.1352
 35%|███▌      | 7009/20000 [02:24<04:23, 49.21it/s]
Step: 7000, d_loss: 1.0598, Loss: 922.6492, recon_loss: -43.4826, fe_loss: 41.9655, geom_loss: 41.5733, beta_loss: 922.1033, gan_loss: 1.2315
 38%|███▊      | 7509/20000 [02:34<04:18, 48.27it/s]
Step: 7500, d_loss: 1.0216, Loss: 917.3104, recon_loss: -49.1152, fe_loss: 41.8728, geom_loss: 40.6217, beta_loss: 922.3845, gan_loss: 1.3559
 40%|████      | 8011/20000 [02:44<04:01, 49.68it/s]
Step: 8000, d_loss: 1.0168, Loss: 912.3078, recon_loss: -53.9664, fe_loss: 41.8964, geom_loss: 40.7532, beta_loss: 922.2964, gan_loss: 1.2664
 43%|████▎     | 8507/20000 [02:55<03:53, 49.21it/s]
Step: 8500, d_loss: 1.0246, Loss: 907.8286, recon_loss: -58.2036, fe_loss: 41.8341, geom_loss: 40.7000, beta_loss: 922.0864, gan_loss: 1.2977
 45%|████▌     | 9008/20000 [03:05<03:49, 47.81it/s]
Step: 9000, d_loss: 0.9640, Loss: 914.9151, recon_loss: -51.6274, fe_loss: 41.9215, geom_loss: 41.1876, beta_loss: 922.3317, gan_loss: 1.4655
 48%|████▊     | 9508/20000 [03:15<03:34, 48.95it/s]
Step: 9500, d_loss: 0.9606, Loss: 903.0141, recon_loss: -63.1390, fe_loss: 41.6479, geom_loss: 40.1974, beta_loss: 922.2338, gan_loss: 1.4675
 50%|█████     | 10008/20000 [03:25<03:23, 48.99it/s]
Step: 10000, d_loss: 0.9530, Loss: 911.0151, recon_loss: -55.3885, fe_loss: 41.8912, geom_loss: 40.8551, beta_loss: 922.2137, gan_loss: 1.4815
 53%|█████▎    | 10508/20000 [03:35<03:13, 49.11it/s]
Step: 10500, d_loss: 0.9357, Loss: 908.7469, recon_loss: -57.5353, fe_loss: 41.9110, geom_loss: 41.9312, beta_loss: 922.0583, gan_loss: 1.4742
 55%|█████▌    | 11006/20000 [03:45<03:02, 49.17it/s]
Step: 11000, d_loss: 0.9120, Loss: 905.8323, recon_loss: -60.0040, fe_loss: 41.8304, geom_loss: 41.6544, beta_loss: 921.7123, gan_loss: 1.4605
 58%|█████▊    | 11506/20000 [03:56<02:54, 48.78it/s]
Step: 11500, d_loss: 0.8887, Loss: 894.6031, recon_loss: -72.1850, fe_loss: 41.6670, geom_loss: 40.7881, beta_loss: 922.7667, gan_loss: 1.5386
 60%|██████    | 12006/20000 [04:06<02:42, 49.08it/s]
Step: 12000, d_loss: 0.9045, Loss: 910.2371, recon_loss: -55.5765, fe_loss: 42.0149, geom_loss: 42.2740, beta_loss: 921.6388, gan_loss: 1.3144
 63%|██████▎   | 12506/20000 [04:16<02:32, 49.06it/s]
Step: 12500, d_loss: 0.8669, Loss: 908.5894, recon_loss: -57.8074, fe_loss: 41.8952, geom_loss: 42.7527, beta_loss: 922.0972, gan_loss: 1.5493
 65%|██████▌   | 13006/20000 [04:26<02:22, 48.94it/s]
Step: 13000, d_loss: 0.8823, Loss: 895.9685, recon_loss: -70.3891, fe_loss: 41.6324, geom_loss: 42.0520, beta_loss: 922.2167, gan_loss: 1.6674
 68%|██████▊   | 13506/20000 [04:36<02:13, 48.80it/s]
Step: 13500, d_loss: 0.8444, Loss: 906.3595, recon_loss: -60.0980, fe_loss: 41.8539, geom_loss: 42.8700, beta_loss: 922.0403, gan_loss: 1.7059
 70%|███████   | 14006/20000 [04:47<02:03, 48.47it/s]
Step: 14000, d_loss: 0.8451, Loss: 904.1049, recon_loss: -62.0766, fe_loss: 41.7981, geom_loss: 43.1761, beta_loss: 921.9168, gan_loss: 1.6029
 73%|███████▎  | 14506/20000 [04:57<01:52, 48.74it/s]
Step: 14500, d_loss: 0.8344, Loss: 906.0906, recon_loss: -60.7666, fe_loss: 41.6471, geom_loss: 44.2953, beta_loss: 922.5738, gan_loss: 1.7504
 75%|███████▌  | 15008/20000 [05:07<01:42, 48.70it/s]
Step: 15000, d_loss: 0.8374, Loss: 906.1649, recon_loss: -60.0918, fe_loss: 41.7930, geom_loss: 44.6190, beta_loss: 921.9781, gan_loss: 1.5933
 78%|███████▊  | 15508/20000 [05:17<01:31, 49.13it/s]
Step: 15500, d_loss: 0.8169, Loss: 892.4849, recon_loss: -73.7491, fe_loss: 41.6577, geom_loss: 44.4146, beta_loss: 921.9598, gan_loss: 1.7281
 80%|████████  | 16008/20000 [05:28<01:21, 49.03it/s]
Step: 16000, d_loss: 0.8051, Loss: 899.8423, recon_loss: -66.5584, fe_loss: 41.7782, geom_loss: 46.0948, beta_loss: 921.8125, gan_loss: 1.8881
 83%|████████▎ | 16508/20000 [05:38<01:11, 49.17it/s]
Step: 16500, d_loss: 0.8042, Loss: 904.9505, recon_loss: -61.3754, fe_loss: 41.7375, geom_loss: 46.3722, beta_loss: 921.7482, gan_loss: 1.9127
 85%|████████▌ | 17009/20000 [05:48<01:00, 49.10it/s]
Step: 17000, d_loss: 0.7950, Loss: 908.1415, recon_loss: -58.2322, fe_loss: 41.8173, geom_loss: 47.9613, beta_loss: 921.9771, gan_loss: 1.6201
 88%|████████▊ | 17510/20000 [05:58<00:50, 48.94it/s]
Step: 17500, d_loss: 0.7699, Loss: 886.5103, recon_loss: -79.6134, fe_loss: 41.7839, geom_loss: 46.7810, beta_loss: 921.5244, gan_loss: 1.8798
 90%|█████████ | 18010/20000 [06:08<00:40, 48.87it/s]
Step: 18000, d_loss: 0.7473, Loss: 907.1054, recon_loss: -59.2270, fe_loss: 41.8680, geom_loss: 48.8641, beta_loss: 921.6464, gan_loss: 1.8407
 93%|█████████▎| 18507/20000 [06:18<00:30, 49.00it/s]
Step: 18500, d_loss: 0.8016, Loss: 898.2635, recon_loss: -68.2579, fe_loss: 41.7153, geom_loss: 50.1626, beta_loss: 922.0256, gan_loss: 1.7772
 95%|█████████▌| 19007/20000 [06:29<00:20, 48.60it/s]
Step: 19000, d_loss: 0.7876, Loss: 894.5960, recon_loss: -71.3222, fe_loss: 41.5344, geom_loss: 49.3036, beta_loss: 921.6761, gan_loss: 1.7215
 98%|█████████▊| 19507/20000 [06:39<00:10, 48.87it/s]
Step: 19500, d_loss: 0.7333, Loss: 906.0635, recon_loss: -60.4748, fe_loss: 41.8529, geom_loss: 50.6278, beta_loss: 921.8535, gan_loss: 1.8193
100%|██████████| 20000/20000 [06:49<00:00, 48.85it/s]

Access cell representations, proportions of spatial factors in cells, and gene loading matrix

We evaluate the cells representations and proportions of spatial factors in cells with minibatches.

[6]:
adata_full, basis_df = model.eval_minibatch(adata_st_list,
                                            adata_full,
                                            batch_size=10000
                                           )
Evaluate Z and beta using minibatch...
Evaluation for slice 0
Evaluation for slice 1

We calculate 2D UMAP coordinates of cells across sections based on INSPIRE’s inferred cell representations.

[7]:
reducer = umap.UMAP(n_neighbors=30,
                    n_components=2,
                    metric="correlation",
                    n_epochs=None,
                    learning_rate=1.0,
                    min_dist=0.3,
                    spread=1.0,
                    set_op_mix_ratio=1.0,
                    local_connectivity=1,
                    repulsion_strength=1,
                    negative_sample_rate=5,
                    a=None,
                    b=None,
                    random_state=1234,
                    metric_kwds=None,
                    angular_rp_forest=False,
                    verbose=True)
embedding = reducer.fit_transform(adata_full.obsm['latent'])
adata_full.obsm["X_umap"] = embedding
UMAP(angular_rp_forest=True, local_connectivity=1, metric='correlation', min_dist=0.3, n_neighbors=30, random_state=1234, repulsion_strength=1, verbose=True)
Thu May 29 10:20:57 2025 Construct fuzzy simplicial set
Thu May 29 10:20:57 2025 Finding Nearest Neighbors
Thu May 29 10:20:57 2025 Building RP forest with 32 trees
Thu May 29 10:21:01 2025 NN descent for 18 iterations
         1  /  18
         2  /  18
         3  /  18
         4  /  18
        Stopping threshold met -- exiting after 4 iterations
Thu May 29 10:21:28 2025 Finished Nearest Neighbor Search
Thu May 29 10:21:32 2025 Construct embedding
        completed  0  /  200 epochs
        completed  20  /  200 epochs
        completed  40  /  200 epochs
        completed  60  /  200 epochs
        completed  80  /  200 epochs
        completed  100  /  200 epochs
        completed  120  /  200 epochs
        completed  140  /  200 epochs
        completed  160  /  200 epochs
        completed  180  /  200 epochs
Thu May 29 10:27:50 2025 Finished embedding
[8]:
adata_full.obs["slice_label"] = adata_full.obs["slice_label"].values.astype(str)
sc.pl.umap(adata_full, color=["slice_label"])
../../_images/tutorials_human_breast_cancer_xenium_human_breast_cancer_example_19_0.png

Save results

[9]:
res_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tutorials/new_examples/human_breast_cancer_xenium"
adata_full.write(res_path + "/adata_inspire.h5ad")
basis_df.to_csv(res_path + "/basis_df_inspire.csv")
[ ]: