Run INSPIRE on the human breast cancer Xenium sections

In this tutorial, we show INSPIRE’s application to analysis of human breast cancer sections profiled by Xenium, revealing microenvironment heterogeneity across tumor subtypes.

The human breast cancer Xenium sections are publicly available at https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast.

Import packages

[1]:

import pandas as pd
import numpy as np
import scanpy as sc
import anndata as ad
import umap
import os
import scipy.sparse
from matplotlib.cm import get_cmap
import matplotlib.pyplot as plt
import matplotlib as mpl
from matplotlib.lines import Line2D

import INSPIRE

import warnings
warnings.filterwarnings("ignore")

Data preprocessing

Each Xenium section contains a large number of cells. For efficiency, we preprocessed each section respectively.

[2]:

# ## load data 1
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate1"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_1 = adata.copy()
# adata_1.obs.index = adata_1.obs.index + "-0"


# ## load data 2
# data_dir = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/data/Xenium_BC/Sample1_replicate2"
# adata = sc.read_10x_h5(data_dir + "/cell_feature_matrix.h5")
# coor_df = pd.read_csv(data_dir + "/cells.csv.gz")
# adata.obsm["spatial"] = coor_df[["x_centroid", "y_centroid"]].to_numpy()
# adata.obsm["spatial"][:,0] = - adata.obsm["spatial"][:,0]
# adata.obsm["spatial"][:,1] = - adata.obsm["spatial"][:,1]
# adata.obsm["spatial"] = adata.obsm["spatial"].astype(np.float32)
# adata.var_names_make_unique()
# adata.obs_names_make_unique()
# adata_2 = adata.copy()
# adata_2.obs.index = adata_2.obs.index + "-1"

# shared_genes = adata_1.var.index & adata_2.var.index
# adata_1 = adata_1[:, shared_genes].copy()
# adata_2 = adata_2[:, shared_genes].copy()
# del adata

# ## preprocess data 1
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_1,
#                                            slice_name="BC_sample1_rep1_qc6_radcutoff20",
#                                            preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
#                                            min_genes_qc=6,
#                                            min_cells_qc=6,
#                                            rad_cutoff=20
#                                           )

# ## preprocess data 2
# INSPIRE.utils.calculate_node_features_LGCN(adata=adata_2,
#                                            slice_name="BC_sample1_rep2_qc6_radcutoff20",
#                                            preprocessed_data_path="/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC",
#                                            min_genes_qc=6,
#                                            min_cells_qc=6,
#                                            rad_cutoff=20
#                                           )

The data after preprocessing are saved into preprocessed_data_path.

Load preprocessed data

[3]:

slice_name_list = ["BC_sample1_rep1_qc6_radcutoff20", "BC_sample1_rep2_qc6_radcutoff20"]
preprocessed_data_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tumor_microenvironment/calculate_node_features/Xenium_BC"

adata_st_list, adata_full = INSPIRE.utils.prepare_inputs_LGCN(slice_name_list=slice_name_list,
                                                              preprocessed_data_path=preprocessed_data_path,
                                                              num_hvgs=500,
                                                              spot_size=20.,
                                                              min_concat_dist=100)

Finding highly variable genes...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Find 313 shared highly variable genes among datasets.
Store counts and library sizes for Poisson modeling...
Normalize data...
Load data BC_sample1_rep1_qc6_radcutoff20
Load data BC_sample1_rep2_qc6_radcutoff20
Load and prepare node features for LGCN...
Load node features BC_sample1_rep1_qc6_radcutoff20
Node features for slice 0 : (164596, 626)
Load node features BC_sample1_rep2_qc6_radcutoff20
Node features for slice 1 : (118467, 626)
Prepare an adata containing full spot locations and slice labels for better visualization...

../../_images/tutorials_human_breast_cancer_xenium_human_breast_cancer_example_10_1.png

Run INSPIRE model

[4]:

model = INSPIRE.model.Model_LGCN(adata_st_list=adata_st_list,
                                 n_spatial_factors=50,
                                 n_training_steps=20000,
                                 batch_size=2048
                                )

[5]:

model.train(adata_st_list)

  0%|          | 6/20000 [00:02<1:35:14,  3.50it/s]

Step: 0, d_loss: 1.4172, Loss: 1567.6923, recon_loss: 499.5919, fe_loss: 79.1669, geom_loss: 74.5035, beta_loss: 986.3552, gan_loss: 1.0883

  3%|▎         | 507/20000 [00:12<06:32, 49.63it/s]

Step: 500, d_loss: 1.3132, Loss: 1309.3512, recon_loss: 339.4700, fe_loss: 45.5029, geom_loss: 88.7015, beta_loss: 921.8173, gan_loss: 0.7870

  5%|▌         | 1007/20000 [00:22<06:37, 47.72it/s]

Step: 1000, d_loss: 1.2793, Loss: 1182.5730, recon_loss: 213.1364, fe_loss: 44.0688, geom_loss: 81.6868, beta_loss: 922.9014, gan_loss: 0.8327

  8%|▊         | 1507/20000 [00:32<06:17, 48.95it/s]

Step: 1500, d_loss: 1.2698, Loss: 1102.9878, recon_loss: 134.4910, fe_loss: 43.4723, geom_loss: 76.3164, beta_loss: 922.5912, gan_loss: 0.9072

 10%|█         | 2009/20000 [00:42<06:07, 48.96it/s]

Step: 2000, d_loss: 1.2445, Loss: 1035.4191, recon_loss: 67.2140, fe_loss: 43.1071, geom_loss: 68.3831, beta_loss: 922.8494, gan_loss: 0.8808

 13%|█▎        | 2510/20000 [00:52<05:59, 48.63it/s]

Step: 2500, d_loss: 1.2286, Loss: 1002.5932, recon_loss: 34.3764, fe_loss: 42.6707, geom_loss: 61.9228, beta_loss: 923.3207, gan_loss: 0.9869

 15%|█▌        | 3006/20000 [01:03<05:41, 49.76it/s]

Step: 3000, d_loss: 1.2076, Loss: 966.3399, recon_loss: -0.8271, fe_loss: 42.3578, geom_loss: 57.3993, beta_loss: 922.6725, gan_loss: 0.9887

 18%|█▊        | 3507/20000 [01:13<05:32, 49.53it/s]

Step: 3500, d_loss: 1.1812, Loss: 959.6121, recon_loss: -7.1217, fe_loss: 42.3555, geom_loss: 53.9558, beta_loss: 922.3357, gan_loss: 0.9634

 20%|██        | 4007/20000 [01:23<05:24, 49.33it/s]

Step: 4000, d_loss: 1.1691, Loss: 949.7222, recon_loss: -17.0247, fe_loss: 42.3874, geom_loss: 51.2098, beta_loss: 922.2996, gan_loss: 1.0357

 23%|██▎       | 4509/20000 [01:33<05:22, 47.99it/s]

Step: 4500, d_loss: 1.1437, Loss: 942.9811, recon_loss: -23.8967, fe_loss: 42.2599, geom_loss: 48.3503, beta_loss: 922.5485, gan_loss: 1.1023

 25%|██▌       | 5009/20000 [01:43<05:10, 48.24it/s]

Step: 5000, d_loss: 1.1096, Loss: 935.4135, recon_loss: -31.5005, fe_loss: 41.9880, geom_loss: 44.9566, beta_loss: 922.9172, gan_loss: 1.1095

 28%|██▊       | 5509/20000 [01:54<04:56, 48.88it/s]

Step: 5500, d_loss: 1.0913, Loss: 929.1166, recon_loss: -37.2718, fe_loss: 42.0714, geom_loss: 43.7849, beta_loss: 922.2804, gan_loss: 1.1610

 30%|███       | 6010/20000 [02:04<04:44, 49.17it/s]

Step: 6000, d_loss: 1.0835, Loss: 928.4330, recon_loss: -37.6152, fe_loss: 42.0797, geom_loss: 43.0180, beta_loss: 921.9680, gan_loss: 1.1402

 33%|███▎      | 6510/20000 [02:14<04:34, 49.19it/s]

Step: 6500, d_loss: 1.0708, Loss: 932.1746, recon_loss: -34.6176, fe_loss: 41.9906, geom_loss: 41.3810, beta_loss: 922.8387, gan_loss: 1.1352

 35%|███▌      | 7009/20000 [02:24<04:23, 49.21it/s]

Step: 7000, d_loss: 1.0598, Loss: 922.6492, recon_loss: -43.4826, fe_loss: 41.9655, geom_loss: 41.5733, beta_loss: 922.1033, gan_loss: 1.2315

 38%|███▊      | 7509/20000 [02:34<04:18, 48.27it/s]

Step: 7500, d_loss: 1.0216, Loss: 917.3104, recon_loss: -49.1152, fe_loss: 41.8728, geom_loss: 40.6217, beta_loss: 922.3845, gan_loss: 1.3559

 40%|████      | 8011/20000 [02:44<04:01, 49.68it/s]

Step: 8000, d_loss: 1.0168, Loss: 912.3078, recon_loss: -53.9664, fe_loss: 41.8964, geom_loss: 40.7532, beta_loss: 922.2964, gan_loss: 1.2664

 43%|████▎     | 8507/20000 [02:55<03:53, 49.21it/s]

Step: 8500, d_loss: 1.0246, Loss: 907.8286, recon_loss: -58.2036, fe_loss: 41.8341, geom_loss: 40.7000, beta_loss: 922.0864, gan_loss: 1.2977

 45%|████▌     | 9008/20000 [03:05<03:49, 47.81it/s]

Step: 9000, d_loss: 0.9640, Loss: 914.9151, recon_loss: -51.6274, fe_loss: 41.9215, geom_loss: 41.1876, beta_loss: 922.3317, gan_loss: 1.4655

 48%|████▊     | 9508/20000 [03:15<03:34, 48.95it/s]

Step: 9500, d_loss: 0.9606, Loss: 903.0141, recon_loss: -63.1390, fe_loss: 41.6479, geom_loss: 40.1974, beta_loss: 922.2338, gan_loss: 1.4675

 50%|█████     | 10008/20000 [03:25<03:23, 48.99it/s]

Step: 10000, d_loss: 0.9530, Loss: 911.0151, recon_loss: -55.3885, fe_loss: 41.8912, geom_loss: 40.8551, beta_loss: 922.2137, gan_loss: 1.4815

 53%|█████▎    | 10508/20000 [03:35<03:13, 49.11it/s]

Step: 10500, d_loss: 0.9357, Loss: 908.7469, recon_loss: -57.5353, fe_loss: 41.9110, geom_loss: 41.9312, beta_loss: 922.0583, gan_loss: 1.4742

 55%|█████▌    | 11006/20000 [03:45<03:02, 49.17it/s]

Step: 11000, d_loss: 0.9120, Loss: 905.8323, recon_loss: -60.0040, fe_loss: 41.8304, geom_loss: 41.6544, beta_loss: 921.7123, gan_loss: 1.4605

 58%|█████▊    | 11506/20000 [03:56<02:54, 48.78it/s]

Step: 11500, d_loss: 0.8887, Loss: 894.6031, recon_loss: -72.1850, fe_loss: 41.6670, geom_loss: 40.7881, beta_loss: 922.7667, gan_loss: 1.5386

 60%|██████    | 12006/20000 [04:06<02:42, 49.08it/s]

Step: 12000, d_loss: 0.9045, Loss: 910.2371, recon_loss: -55.5765, fe_loss: 42.0149, geom_loss: 42.2740, beta_loss: 921.6388, gan_loss: 1.3144

 63%|██████▎   | 12506/20000 [04:16<02:32, 49.06it/s]

Step: 12500, d_loss: 0.8669, Loss: 908.5894, recon_loss: -57.8074, fe_loss: 41.8952, geom_loss: 42.7527, beta_loss: 922.0972, gan_loss: 1.5493

 65%|██████▌   | 13006/20000 [04:26<02:22, 48.94it/s]

Step: 13000, d_loss: 0.8823, Loss: 895.9685, recon_loss: -70.3891, fe_loss: 41.6324, geom_loss: 42.0520, beta_loss: 922.2167, gan_loss: 1.6674

 68%|██████▊   | 13506/20000 [04:36<02:13, 48.80it/s]

Step: 13500, d_loss: 0.8444, Loss: 906.3595, recon_loss: -60.0980, fe_loss: 41.8539, geom_loss: 42.8700, beta_loss: 922.0403, gan_loss: 1.7059

 70%|███████   | 14006/20000 [04:47<02:03, 48.47it/s]

Step: 14000, d_loss: 0.8451, Loss: 904.1049, recon_loss: -62.0766, fe_loss: 41.7981, geom_loss: 43.1761, beta_loss: 921.9168, gan_loss: 1.6029

 73%|███████▎  | 14506/20000 [04:57<01:52, 48.74it/s]

Step: 14500, d_loss: 0.8344, Loss: 906.0906, recon_loss: -60.7666, fe_loss: 41.6471, geom_loss: 44.2953, beta_loss: 922.5738, gan_loss: 1.7504

 75%|███████▌  | 15008/20000 [05:07<01:42, 48.70it/s]

Step: 15000, d_loss: 0.8374, Loss: 906.1649, recon_loss: -60.0918, fe_loss: 41.7930, geom_loss: 44.6190, beta_loss: 921.9781, gan_loss: 1.5933

 78%|███████▊  | 15508/20000 [05:17<01:31, 49.13it/s]

Step: 15500, d_loss: 0.8169, Loss: 892.4849, recon_loss: -73.7491, fe_loss: 41.6577, geom_loss: 44.4146, beta_loss: 921.9598, gan_loss: 1.7281

 80%|████████  | 16008/20000 [05:28<01:21, 49.03it/s]

Step: 16000, d_loss: 0.8051, Loss: 899.8423, recon_loss: -66.5584, fe_loss: 41.7782, geom_loss: 46.0948, beta_loss: 921.8125, gan_loss: 1.8881

 83%|████████▎ | 16508/20000 [05:38<01:11, 49.17it/s]

Step: 16500, d_loss: 0.8042, Loss: 904.9505, recon_loss: -61.3754, fe_loss: 41.7375, geom_loss: 46.3722, beta_loss: 921.7482, gan_loss: 1.9127

 85%|████████▌ | 17009/20000 [05:48<01:00, 49.10it/s]

Step: 17000, d_loss: 0.7950, Loss: 908.1415, recon_loss: -58.2322, fe_loss: 41.8173, geom_loss: 47.9613, beta_loss: 921.9771, gan_loss: 1.6201

 88%|████████▊ | 17510/20000 [05:58<00:50, 48.94it/s]

Step: 17500, d_loss: 0.7699, Loss: 886.5103, recon_loss: -79.6134, fe_loss: 41.7839, geom_loss: 46.7810, beta_loss: 921.5244, gan_loss: 1.8798

 90%|█████████ | 18010/20000 [06:08<00:40, 48.87it/s]

Step: 18000, d_loss: 0.7473, Loss: 907.1054, recon_loss: -59.2270, fe_loss: 41.8680, geom_loss: 48.8641, beta_loss: 921.6464, gan_loss: 1.8407

 93%|█████████▎| 18507/20000 [06:18<00:30, 49.00it/s]

Step: 18500, d_loss: 0.8016, Loss: 898.2635, recon_loss: -68.2579, fe_loss: 41.7153, geom_loss: 50.1626, beta_loss: 922.0256, gan_loss: 1.7772

 95%|█████████▌| 19007/20000 [06:29<00:20, 48.60it/s]

Step: 19000, d_loss: 0.7876, Loss: 894.5960, recon_loss: -71.3222, fe_loss: 41.5344, geom_loss: 49.3036, beta_loss: 921.6761, gan_loss: 1.7215

 98%|█████████▊| 19507/20000 [06:39<00:10, 48.87it/s]

Step: 19500, d_loss: 0.7333, Loss: 906.0635, recon_loss: -60.4748, fe_loss: 41.8529, geom_loss: 50.6278, beta_loss: 921.8535, gan_loss: 1.8193

100%|██████████| 20000/20000 [06:49<00:00, 48.85it/s]

Access cell representations, proportions of spatial factors in cells, and gene loading matrix

We evaluate the cells representations and proportions of spatial factors in cells with minibatches.

[6]:

adata_full, basis_df = model.eval_minibatch(adata_st_list,
                                            adata_full,
                                            batch_size=10000
                                           )

Evaluate Z and beta using minibatch...
Evaluation for slice 0
Evaluation for slice 1

We calculate 2D UMAP coordinates of cells across sections based on INSPIRE’s inferred cell representations.

[7]:

reducer = umap.UMAP(n_neighbors=30,
                    n_components=2,
                    metric="correlation",
                    n_epochs=None,
                    learning_rate=1.0,
                    min_dist=0.3,
                    spread=1.0,
                    set_op_mix_ratio=1.0,
                    local_connectivity=1,
                    repulsion_strength=1,
                    negative_sample_rate=5,
                    a=None,
                    b=None,
                    random_state=1234,
                    metric_kwds=None,
                    angular_rp_forest=False,
                    verbose=True)
embedding = reducer.fit_transform(adata_full.obsm['latent'])
adata_full.obsm["X_umap"] = embedding

UMAP(angular_rp_forest=True, local_connectivity=1, metric='correlation', min_dist=0.3, n_neighbors=30, random_state=1234, repulsion_strength=1, verbose=True)
Thu May 29 10:20:57 2025 Construct fuzzy simplicial set
Thu May 29 10:20:57 2025 Finding Nearest Neighbors
Thu May 29 10:20:57 2025 Building RP forest with 32 trees
Thu May 29 10:21:01 2025 NN descent for 18 iterations
         1  /  18
         2  /  18
         3  /  18
         4  /  18
        Stopping threshold met -- exiting after 4 iterations
Thu May 29 10:21:28 2025 Finished Nearest Neighbor Search
Thu May 29 10:21:32 2025 Construct embedding

        completed  0  /  200 epochs
        completed  20  /  200 epochs
        completed  40  /  200 epochs
        completed  60  /  200 epochs
        completed  80  /  200 epochs
        completed  100  /  200 epochs
        completed  120  /  200 epochs
        completed  140  /  200 epochs
        completed  160  /  200 epochs
        completed  180  /  200 epochs
Thu May 29 10:27:50 2025 Finished embedding

[8]:

adata_full.obs["slice_label"] = adata_full.obs["slice_label"].values.astype(str)
sc.pl.umap(adata_full, color=["slice_label"])

../../_images/tutorials_human_breast_cancer_xenium_human_breast_cancer_example_19_0.png

Save results

[9]:

res_path = "/gpfs/gibbs/pi/zhao/jz874/project/jiazhao/inspire_revision/tutorials/new_examples/human_breast_cancer_xenium"
adata_full.write(res_path + "/adata_inspire.h5ad")
basis_df.to_csv(res_path + "/basis_df_inspire.csv")

[ ]: