Defining the optimal setting for transcriptomic analyses on blood samples for response prediction in immunotherapy-treated NSCLC patients | Scientific Reports
HomeHome > Blog > Defining the optimal setting for transcriptomic analyses on blood samples for response prediction in immunotherapy-treated NSCLC patients | Scientific Reports

Defining the optimal setting for transcriptomic analyses on blood samples for response prediction in immunotherapy-treated NSCLC patients | Scientific Reports

Nov 01, 2024

Scientific Reports volume 14, Article number: 26026 (2024) Cite this article

151 Accesses

Metrics details

Transcriptomic profiling of blood immune cells offers a promising alternative to invasive, sampling bias-prone tissue-based biomarker assays for predicting immune checkpoint inhibitor (ICI) therapy response in non-small cell lung cancer (NSCLC) patients. However, the optimal analytical approach to identify systemic correlates of response still needs to be explored. We collected peripheral blood mononuclear cells and whole blood (WB) samples from 33 ICI-treated NSCLC patients before ICI treatment and at the first response evaluation. After bulk polyadenylated RNA-sequencing, we assessed differences in gene expression profiles between non-responders and responders using differential expression analysis, single sample gene set enrichment analysis (ssGSEA), and cell type deconvolution. We evaluated gene expression values, ssGSEA scores, and deconvolved cell type proportions to distinguish non-responders from responders via ROC curve (AUC) analysis, training a logistic regression classification model. Gene expression values and deconvolved proportions yielded the best results with WB samples after treatment (AUC = 0.87 and 0.85, respectively). Overall, ssGSEA scores showed superior classification performance across all sample types and timepoints (AUC > 0.7). In conclusion, transcriptomic analysis through ssGSEA demonstrated the best performance as a non-invasive biomarker for predicting clinical benefit in ICI-treated NSCLC patients, with gene expression and deconvolution on post-treatment WB samples also showing promising results.

Lung cancer is the deadliest cancer worldwide, accounting for 18% of all cancer-related deaths1. In recent years, immune checkpoint inhibitors (ICIs) have revolutionized the treatment of non-small cell lung cancer (NSCLC), expanding the therapeutic options and improving the overall clinical outcomes considerably. Unfortunately, the proportion of NSCLC patients who benefit from ICIs still remains below 30%2,3. Given the high cost of the therapy and the potential development of immune-related adverse effects, there is an urgent need for biomarkers that can predict the response to ICIs. Currently, PD-L1 expression in the tumor microenvironment (TME), the presence of microsatellite instability/defective mismatch repair and the tumor mutational burden (TMB) are the only Food and Drug Administration (FDA)-approved biomarkers for selecting patients likely to benefit from ICIs. Alternative biomarkers such as the presence, abundance and location of tumor infiltrating lymphocytes (TILs), e.g. CD8 + T cells4,5, or different gene expression signatures in the tumor6, have been explored but further research is needed to validate them. In addition, these are all tissue-based biomarkers and thus require invasive tumor tissue biopsies, which pose potential risks for the patients and are subject to sampling bias due to intratumoral heterogeneity.

Recent research has highlighted the crucial role that systemic immunity plays in the development and maintenance of successful responses to ICI therapy7, and several studies have found associations between the proportion of circulating immune cell types and the response to anti-PD1 or anti-PD-L1 therapy in NSCLC patients, as summarized in a review by our group8. Next to flow cytometry, single-cell RNA sequencing (scRNA-seq) of blood cells has also contributed to the identification and characterization of circulating immune cell subpopulations that are involved in and/or associated with the response to ICIs in several cancer types9,10. Additionally, bulk RNA-seq on whole blood samples has been employed to identify gene expression signatures associated with the response to ICIs11. However, bulk RNA-seq data lacks information about the cellular context, a limitation that can be addressed using gene set enrichment analysis (GSEA) and cell type deconvolution. The first method determines whether a set of genes shows significant differences between two biological states, whereas the latter estimates the cellular composition of heterogeneous mixtures such as blood12.

The main goal of this study was to identify the most informative experimental (analytical and computational) setup that interrogates the peripheral blood immune transcriptome landscape and this to discriminate NSCLC patients that do or do not respond to immunotherapy. To this end, we assessed three distinct factors, namely sample type, timepoint of blood draw and analysis strategy. First, we compared results generated from whole blood (WB) and peripheral blood mononuclear cell (PBMC) samples. WB samples are easier to handle compared to PBMCs, since they do not require any pre-processing protocol, which can take 2–3 h of hands-on time for PBMCs. Hence, detecting signal in WB would ease future collections and finally clinical implementation. The differences in cell type composition between both sample types might be beneficial as transcriptomic signals of e.g. neutrophils might be captured in WB, or a downside as T cell subtype proportions are lower. Differences in immune gene expression between PBMC and WB samples have been previously reported using probe-based gene expression profiling methods13,14, but these results have not been confirmed yet with RNA-seq data, nor studied in the context of immunotherapy response. Secondly, we investigated the timepoint of blood draw. Preferentially, we aim to develop a test that predicts upfront response, but if the predictive value of a blood draw early in therapy is much higher compared to before, this would also be of value for clinicians. Third, we assessed the analysis strategy. By combining genes in gene sets or through deconvolution, the power of the analysis might increase compared to measuring single genes. Since it has been demonstrated that the abundance of certain immune cell types are predictive for response, we decided to use immune cell type-specific gene sets for the GSEA analysis. A schematic representation of this workflow is depicted in Fig. 1.

Sample collection, data and result generation workflows performed in this study.

This study included 33 patients diagnosed with metastatic NSCLC and treated with ICIs only (nivolumab, durvalumab, pembrolizumab or nivolumab-ipilimumab) between September 2014 and December 2021 at Ghent University Hospital. Nivolumab was administered at 3 mg/kg every 2 weeks, durvalumab at 10 mg/kg every 2 weeks, pembrolizumab at 200 mg every 3 weeks and ipilimumab at 1 mg/kg every 6 weeks in combination with nivolumab. Response to ICI treatment was assessed by the Response Evaluation Criteria In Solid Tumors (RECIST) v.1.1., considering the best response achieved by each patient. When the best response was progressive disease or stable disease, patients were considered non-responders, whereas responders correspond to patients who achieved a partial or a complete response and did not develop disease progression within 6 months after starting the therapy. Patient samples were collected after signed informed consent and stored at the Thoracic Oncology Prospective and Integrative Biobank according to the Ethical Committee of Ghent University Hospital approval EC/2017/1207, following the ICH Good Clinical Practice rules and the Declaration of Helsinki.

Blood samples were collected at two timepoints: before the start of the treatment and at the first image-based response evaluation, which was performed after 3 ± 1 cycles of ICI therapy. Peripheral blood for PBMC isolation was collected in EDTA tubes (Becton Dickinson/BD Biosciences, San Jose, CA, USA) and WB was collected in PAXgene Blood RNA tubes (Becton Dickinson/BD Biosciences, San Jose, CA, USA). The PBMC isolation was performed as follows: blood specimens in EDTA tubes were transferred upon arrival to 50 ml Leucosep tubes (Greiner Bio-One) containing 15 ml of Ficoll-Paque Plus (Cytiva, Washington, D.C., USA). The blood was diluted in the same volume of 1x DPBS (ThermoFisher Scientific, Waltham, MA, USA), and the tubes were centrifuged at 800rcf for 18 min without using brake. The buffy coat containing the PBMCs was then pipetted to a new tube, and washed with 1x DPBS (ThermoFisher Scientific, Waltham, MA, USA) and centrifuged at 400rcf for 7 min twice. PBMCs were then counted using a hemocytometer and trypan blue to assess cell viability. At least two different squares with a difference lower than 10% in the cell count value were considered. PBMCs were resuspended in freezing mixture containing complete RPMI medium (RPMI + 1% pen/strep + 10% Fetal Bovine Serum) + 10% dimethyl sulfoxide, and distributed in cryovials, with 5–6 × 106 PBMCs per vial. Cryovials were placed in Nalgene Mr. Frosty containers (Thermo Fisher, Waltham, MA, USA) for 24 h at − 80 °C, and then transferred to a − 150 °C freezer until further processing. The PAXgene Blood RNA tubes were incubated at room temperature for 2 h upon arrival, stored at − 20 °C for 24 h and then transferred to − 80 °C until RNA extraction was performed.

RNA from PBMCs was extracted using the miRNeasy Mini kit (Qiagen, Hilden, Germany) following the manufacturers’ instructions, after being defrosted and washed with 1x DPBS. RNA from PAXgene tubes was extracted using the automated Maxwell RSC simplyRNA Blood kit (Promega, Madison, WI, USA) according to the manufacturers’ guidelines, after being defrosted for 2 h at room temperature. RNA concentration was measured with Nanodrop (ThermoFisher Scientific, Waltham, MA, USA) and RNA quality was assessed with a Fragment Analyzer System (Agilent Technologies, Santa Clara, CA, USA). Then, library preparation was performed using the Truseq Stranded mRNA library prep kit (Illumina, San Diego, CA, USA), with 400 ng of total RNA as input. On top, the FastSelect Globin Depletion kit (Qiagen, Hilden, Germany) was used, according to the manufacturers’ instructions, to deplete non-informative globin transcripts before sequencing. The quality of the final libraries was assessed with a Fragment Analyzer System (Agilent Technologies, Santa Clara, CA, USA). The libraries were then pooled equimolarly, and the concentration of the pools was measured using the KAPA Library Quantification Kit (Roche Diagnostics, Diegem, Belgium). The library pool was sequenced on a NovaSeq 6000 instrument in five batches using NovaSeq 6000 SP Reagent Kit v1.5 (100 cycles, paired-end, 2 × 50 cycles) (Illumina, San Diego, CA, USA), with a final loading concentration of 1725 nM and 1% PhiX. The median number of sequenced reads per sample was 29.5 M (9.9–103.4 M), and the mean %Q30 value was 92,96%.

The data were processed using Kallisto (v0.48.0)15 for pseudoalignment against the human genome (hg38) and gene count quantification. Then, R (v4.3.2)16 including tidyverse (v2.0.0)17 and biomaRt (v2.56.1)18,19 were used to analyze and visualize the data. On the gene counts, sequencing batch correction was performed using the sva package (v3.48.0) with the function ‘ComBat_seq’20. To perform DEA, we first filtered out all genes with less than 10 counts in at least half of the samples of any of the subgroups defined by response (non-responders, responders), timepoint (before treatment, at first response evaluation) and sample type (PBMC, WB). Subsequently, limma-voom (v3.56.2)21 was used for actual DEA in which gene counts were normalized using the TMM method. The package fgsea (v1.26.0)22 was used for GSEA based on the adjusted p-value ranked list, and the GSVA package (v1.50.0)23 was used for single-sample GSEA (ssGSEA). The Cell Type Signature (C8) pathways were retrieved from the Molecular Signatures Database (MSigDB), and the immune-related gene sets were selected for analysis.

Based on our previous study on benchmarking RNA-seq based deconvolution tools12, and on a smaller in-house benchmarking study using both simulated (pseudobulk) data and our own bulk PBMC and WB data (data not shown), the Dampened Weighted Least Squares (DWLS) (v0.1.0) method was selected as a tool to accurately infer the cell type compositions on our bulk poly-A + RNA-seq samples24. To provide an appropriate reference to our bulk data, we used a publicly available single cell (sc)RNA-seq reference dataset (EGAS00001004571) that contained both PBMC and WB samples (erythrocyte depleted)25. Only the data of healthy donors were included, and we divided the samples into two sets according to the sample type (PBMCs vs. WB). The scRNA-seq data for each sample type labelled with 9 different cell types (CD4+ T cells, CD8+ T cells, B cells, neutrophils, natural killler (NK) cells, monocytes, erythrocytes, eosinophils, and hematopoietic stem cells) was used as an input for constructing a signature matrix which is then used for deconvolution. In R (v4.3.2), a signature matrix for the different cell types for each sample type was built from the single cell data using the function ‘buildSignatureMatrixMAST’ with default parameters. Subsequently, common genes between bulk and signature data were filtered with the function ‘trimData’, after which deconvolution was performed using the function ‘solDWLS’.

All analyses and figures (except Fig. 1) were performed using the R statistical software package (v4.3.2). Principal component analyses were performed using the ‘prcomp’ function with the normalized gene expression values, gene set enrichment scores and deconvolved cell type proportions, after transforming the data into 0 to 1 values for each sample, to correct for sequencing depth differences. Two-sample Hotelling’s T2 test was used on the PCA scores to quantify and evaluate the clustering of samples belonging to a specific subgroup based on a specific variable. Two-tailed unpaired T-test and Mann–Whitney U test were used to compare variable means between groups, for parametric and non-parametric distributions, respectively. P-values were adjusted using the Benjamini–Hochberg false discovery rate correction when multiple tests were performed simultaneously. An adjusted p-value < 0.05 was used as threshold for statistical significance. To compare the ability of the different sample types (PBMC vs. WB), timepoints (before vs. after treatment) and methods (gene expression vs. ssGSEA vs. deconvolution) to classify the patient samples into the non-responder or responder groups, a logistic regression model was fitted to the data and trained and tested using a 5-fold cross validation method, with the ‘train’ function and the ‘glmnet’ method from the caret package (v.6.0-94). Given the lack of an independent dataset for validation, this strategy allowed us to split the dataset in 5 subsets, with one of the subsets being used as a validation dataset, while the rest of the subsets being used as a training dataset to fit the model. This was repeated with each of the subsets as validation datasets. The normalized gene expression values of the 5000 most variably expressed genes (calculated using the ‘rowVars’ function across all samples), ssGSEA normalized enrichment scores (NES) and deconvolved cell type proportions were used to train the classification models at the four different sample collection conditions. Each of the models was run 10 times and the mean area under the ROC curve (AUC) values for each of the 5 validation cycles of every model was calculated.

For this study, we included 33 NSCLC patients who underwent ICI treatment, of which 16 were classified as responders and 17 as non-responders (Table 1, Supplementary Fig. 2). The average follow-up time was 17.7 (1.9–80.9) months for non-responders and 58.6 (4.2–107.6) months for responders. We collected PBMC and WB samples from each patient before ICI administration (non-responders: PBMC = 17, WB = 16; responders: PBMC = 16, WB = 15) and at the first response evaluation for the majority of cases (non-responders: PBMC = 10, WB = 11; responders: PBMC = 15, WB = 14), and performed bulk poly-A + RNA-seq. The quality and sequencing depth of the PBMC and WB samples was similar. To prove that the transcriptome of PBMCs and WB samples are different, a principal component, differential expression and gene set enrichment analyses were performed (Supplementary Fig. 1), evidencing the known cell type abundance differences.

As we wanted to compare the potential of the PBMCs versus WB transcriptomes to identify biomarkers for predicting the response to ICIs, we first analyzed the gene expression profiles of non-responders and responders at the two different timepoints (before and after treatment) and for PBMCs and WB samples separately. First, we performed principal component analysis (PCA) using the normalized gene counts on the four different conditions. Non-responder and responder samples did not completely cluster separately. However, this separation was statistically significant in WB samples after treatment (p = 0.0041, Fig. 2A), as well as in PBMC samples after treatment (p = 0.0453, Supplementary Fig. 3B) and not significant in the baseline samples (Supplementary Fig. 3A–C). We also plotted the PCA results grouping the samples based on several patient clinical features (sex, age, ICI treatment and smoking history) to assess the association of these factors with the gene expression profiles. However, no clear groups were observed either (p > 0.05, Supplementary Fig. 4). Similarly, DEA revealed differentially expressed genes (DEGs) in WB samples after treatment (1640 overexpressed genes in non-responders and 1145 genes overexpressed in responders (Supplementary Table 1)) and PBMC samples after treatment (5 genes with higher expression in non-responders versus responders: STAB1, MARCO, ANPEP, TMEM150B and LIMS3). Among the DEGs in WB samples after treatment, 25 of the 1640 and 10 of the 1145 had an absolute logFC > 2 (Fig. 2B). Among the overexpressed genes in non-responders, there were erythrocyte (GYBP, CA1), platelet (GP1BB) and granulocyte (ARG1, CD177, ELANE) markers, as well as genes involved in innate immunity processes, such as the complement activation (C1QC). Within the overexpressed genes in responders, KLRC2 stood out as a key regulator of the effector functions of cytotoxic lymphocytes (Fig. 2C, Supplementary Fig. 5). Unsupervised clustering of the WB samples after treatment based on the expression of these 35 DEGs formed two distinct clusters, one responder-dominated cluster and a non-responder-dominated cluster. Interestingly, the non-responder cluster was enriched in never-smoker patients, consistent with the previously observed association between the habit of smoking and a better response to ICIs in NSCLC patients, and vice versa, probably due to a higher TMB in smokers26.

Differential expression analysis of non-responders (NR) vs. responders (R) in WB samples after treatment. (A) Unsupervised principal component analysis of normalized gene expression values. Two-sample Hotelling’s T2 test was used to assess statistical differences in sample clustering. (B) Volcano plot showing 28 overexpressed genes in NR and 11 overexpressed genes in R. Overexpressed genes were defined as having an adjusted p-value < 0.05 and an absolute log(fold change) > 2. (C) Heatmap of normalized gene expression values showing the 35 overexpressed genes in NR and R. Hierarchical clustering shows separation in a non-responder dominated cluster and a responder dominated cluster.

Apart from looking at individual genes, we also analyzed gene sets using ssGSEA and conventional GSEA, resulting in increased power. More specifically, we focused on immune cell-related gene sets (see methods section) that are differential in non-responders versus responders. Again, PCA using the ssGSEA NES did not show a complete separation of the samples based on ICI response (Supplementary Figs. 6A, 7A, 8A), although the responder and the non-responder WB samples collected after treatment formed two significantly different clusters (p = 0.0055, Fig. 3A). As opposed to gene expression, PCA plots of PMBC samples (before and after treatment) based on ssGSEA NES values showed significant group separations based on the patients’ sex (pbefore = 0.0002, pafter = 0.0044) and age (pbefore = 0.0004) (Supplementary Fig. 9). Hierarchical clustering of these samples based on the NES values resulted in 3 major clusters including a non-responder-dominated cluster, a responder-dominated cluster and a mixed cluster. The non-responder cluster was characterized by a high NES for myeloid cells, whereas lymphoid cells characterized the responder cluster (Fig. 3B). Interestingly, this trend was also observed in PBMC samples (Supplementary Figs. 6B, 7B), but not in WB samples collected before treatment (Supplementary Fig. 8B). In PBMC samples at baseline, the responder dominated cluster was also enriched in female and younger patients, suggesting that these factors might also influence the PBMC landscape. Additionally, conventional GSEA showed that, in all the conditions, neutrophil, immature neutrophil and eosinophil-specific gene sets were significantly enriched in non-responders versus responders (p < 0.05) (Fig. 3C, Supplementary Figs. 6C, 7C, 8C). Monocyte and platelet-specific gene sets were significantly enriched in non-responders in all situations, except for PBMCs before treatment. On the contrary, gene sets specific for several T cell and B cell subsets (naïve T cells, CD8+ T cells, pro-B cells and plasma cells) were enriched in responder patients in most of the conditions. Lastly, dendritic cell and NK cell-specific gene sets showed contradictory results depending on the sample type and timepoint.

Immune cell type gene set enrichment analysis of non-responders (NR) versus responders (R) in WB samples after treatment. (A) Unsupervised principal component analysis of single sample gene set enrichment analysis (ssGSEA) normalized enrichment score (NES) values. Two-sample Hotelling’s T2 test was used to assess statistical differences in sample clustering. (B) Heatmap of ssGSEA NES values. Hierarchical clustering shows separation in a non-responder dominated cluster, a responder dominated cluster and a mixed cluster. (C) Bar plot showing the significantly (adjusted p-value < 0.05) enriched immune cell type-specific gene sets in NR versus R.

As a third data analysis approach, we estimated the cell type proportions of the PBMC and WB samples using the DWLS deconvolution algorithm24. Significant differences in cell proportions between non-responders and responders were observed in WB samples after treatment, but not in the other conditions (Supplementary Figs. 10A, 11A, 12A). In particular, responders had higher proportions of CD4+ T (p = 0.035) and CD8+ T cells (p = 0.008) than non-responders. Neutrophils, however, were more abundant in non-responders (p = 0.008). In addition, B cells and NK cells seemed to be more frequent in responders, although the differences were not sufficient to reach significance (Fig. 4A). Similarly to ssGSEA, PCA on the deconvolved cell type proportions showed a significant separation of non-responder and responder WB samples after treatment (p = 0.0018, Fig. 4B), which was not the case for the other conditions (Supplementary Figs. 10B, 11B, 12B). Again, PBMC samples significantly clustered in distinct groups based on the patients’ sex (pbefore = 0.0089, pafter = 0.0232) and age (pbefore = 0.0168) using the deconvolution data (Supplementary Fig. 13). Moreover, hierarchical clustering of the WB samples after treatment based on the deconvolved proportions formed a non-responder-dominated cluster, characterized by a high percentage of neutrophils, and a responder-dominated cluster, characterized by a high proportion of lymphocytes and monocytes (Fig. 4C). However, this trend was not present in the other three conditions (Supplementary Figs. 10C, 11C, 12C). In PBMC samples, we could not observe a clear association between the clusters and the sex and age of the patients, as the PCA plots suggested.

Cell type deconvolution analysis of non-responders (NR) versus responders (R) using WB samples after treatment. (A) Boxplots showing the deconvolved proportions of 9 different blood immune cell types in NR versus R. Mann–Whitney U test was used to assess statistical differences, and FDR-adjusted p-values are shown. (B) Unsupervised principal component analysis of normalized deconvolved cell type proportions. Two-sample Hotelling’s T2 test was used to assess statistical differences in sample clustering. (C) Heatmap of normalized deconvolved proportions. Hierarchical clustering shows separation in a non-responder dominated cluster and a responder dominated cluster.

To compare the performance of the gene expression values versus ssGSEA NES versus deconvolved cell type proportions in classifying non-responder and responder ICI-treated NSCLC patients, we performed a 5-fold cross validation to train and test a logistic regression classification model with these feature sets. This approach might give an idea about the robustness of the prediction by looking at the spread of the obtained AUC values and, by comparing these values, the approaches can be evaluated. Since this is a relatively small dataset, independent validation is needed to ensure that the AUC values are not overfitted. We ran every model 10 times and computed and plotted the mean and standard deviation of the AUC values. Overall, the best performing data type were ssGSEA NES, as the mean AUC values of all the 4 sample collection conditions were above 0.7, and only WB samples before treatment being below 0.8. Interestingly, PBMCs had higher AUC values using ssGSEA NES (AUC > 0.8) than normalized gene expression values (AUC < 0.8) or deconvolved cell type proportions (AUC < 0.7). On the contrary, the highest mean AUC values using gene expression and cell type deconvolution data were achieved by the WB samples after treatment (AUC = 0.87 and 0.85, respectively), in line with the previous results (Fig. 5). To evaluate the impact of the patients’ clinical variables (sex and age) on the classification performance of the models, we also ran them including these features in the training data besides the gene expression, ssGSEA and deconvolution data. However, despite slightly decreasing the AUC values for ssGSEA and deconvolution, the classification performance of the models was similar with and without including the sex and age variables (Supplementary Fig. 14).

Five-fold cross validation fitting a logistic regression classification model. Normalized gene expression values of the 5000 most variably expressed genes, single sample gene set enrichment analysis (ssGSEA) normalized enrichment scores and deconvolved cell type proportions using the DWLS algorithm were used to train the classification models at four different sample collection conditions. Each model was run 10 times and the mean area under the ROC curve (AUC) values for each of the 5 folds of every model was calculated and plotted. The red dots represent the mean of the 5 mean ROC AUC values of every model. The error bars represent the standard deviation.

Transcriptomic analyses have previously been used to identify tissue-based biomarkers for immunotherapy response prediction in NSCLC6,27. Similar studies using peripheral blood are rare, and they have only been performed using (sc)RNA-seq on (subsets of) PBMCs10,28, but not from WB. PBMCs require more hands-on time prior to RNA extraction, and they lack the granulocyte compartment, issues that can be overcome using WB samples. To our knowledge, a direct comparison of RNA-sequencing profiles generated on these two sample types in the context of immunotherapy response is currently lacking. In this study, we present the first transcriptomic dataset comprising matched PBMC and WB samples from ICI-treated NSCLC patients, collected before treatment initiation and at the first response evaluation.

The DEA analysis revealed, in the WB samples collected at the first response evaluation, abundance differences in several genes previously identified as regulators of the anti-tumor immune response and the response to ICIs between responders and non-responders. Standing out among these genes are factors that are known to contribute to immune suppression in the TME, and whose expression was higher in non-responders than in responders. MAOA is expressed in tumor associated macrophages (TAMs) and contributes to their immune suppressive function, with its inhibition being explored as a cancer immunotherapeutic29,30. ARG-1 is a key driver of T-cell suppression likewise expressed by TAMs31, and targeting this protein in combination with ICIs leads to restored CD8+ T cell function in human resected pancreatic ductal adenocarcinomas32. CD177, expressed in neutrophils and a subset of tumor infiltrating T regulatory cells (Tregs), is associated with a poorer prognosis in several cancer types33. Specifically in NSCLC, it was included in a six-gene neutrophil-differentiation signature associated with immunotherapy response34. Lastly, inhibition of PCSK9 enhances the effect of anti-PD1 in various in vitro and in vivo cancer models by promoting the infiltration of T cells and exclusion of Tregs from the TME35. Also, a higher expression of PCSK9 in baseline tumor tissue from anti-PD-1-treated NSCLC patients was associated with a shorter PFS and a lower response rate36.

As a second strategy, we performed ssGSEA and conventional GSEA to identify enriched immune cell type-specific gene sets. This analysis showed similar enriched populations for all the conditions, although a better separation of non-responder and responder patients through hierarchical clustering was obtained on the after treatment samples. Neutrophils and immature neutrophil-specific gene sets were the most enriched in non-responders, which is in line with what has been previously reported in tumor tissue samples from NSCLC patients treated with ICIs34. Surprisingly, this result was observed in PBMCs as well as in WB. Although we do not expect to find neutrophils in PBMC samples since they are depleted during the PBMC isolation process, a small fraction of the low-density neutrophils (LDNs) is probably still preserved. Concordant with our results, circulating LDNs have been found in patients with resistance to first line ICIs in NSCLC37. Monocyte-specific gene sets were also enriched in non-responders in all cases except for PBMCs before treatment. The abundance of circulating monocytes has been associated with response to ICIs in melanoma38, but not in NSCLC. However, it is known that monocytic MDSCs (mMDSCs) are mainly composed by immature monocytes39, and a high frequency of these cells in the peripheral blood correlates with resistance to anti-PD-1 in NSCLC40. Conversely, responder patients showed a predominant enrichment in gene sets specific for naïve T cells, CD8+ T cells and several B cell stages, including pro B cells, follicular B cells and plasma cells. The presence of a peripheral T cell gene expression signature in the TME of anti-PD-1-treated NSCLC patients was associated with a longer PFS, and showed a higher AUC value than PD-L1 expression, TMB and TILs41. As for B cells, the abundance of B cell-associated tertiary lymphoid structures in the TME promotes immunotherapy response in several cancer types, including NSCLC42,43. Similarly, peripheral blood B cells have also been associated with a better response to ICIs in pan-cancer studies that included NSCLC44,45.

As a third data analysis approach, we performed cell type deconvolution analysis using the DWLS algorithm to further investigate the peripheral blood immune cell content of non-responders and responders in PBMCs and WB. Confirming our DEA results, we only obtained significant differences in the cell type proportions between non-responders and responders when using WB samples collected at the first response evaluation. In particular, the proportion of neutrophils was higher in non-responders than in responders, whereas CD8+ and CD4+ T cells were more abundant in responders. This confirms what has been previously described by multiple research groups, and supports the idea that these are genuinely circulating immune cell types predictors for response to ICI therapy in NSCLC8.

All three analyses agree with what has been described so far, but what is now the better approach? Therefore, we compared the classification performance of the sample types, timepoints and analysis strategies. Overall, the best performing data type were the ssGSEA NES values. Normalized gene expression values and deconvolved cell type proportions, however, showed generally lower mean AUC values than ssGSEA NES, except for WB samples after treatment. This is in line with studies describing changes in the proportion of certain systemic immune cell types after several weeks of ICI therapy in NSCLC patients46,47. Other investigations have found similar results before treatment8, but our deconvolution analysis was unable to replicate them. ssGSEA NES values represent the degree to which the genes in a certain gene set are coordinately (instead of individually) up- or down-regulated within a sample. Therefore, ssGSEA represents a better tool to biologically interpret transcriptomic data than individual gene expression. In addition, PBMC samples showed remarkably higher AUC values using ssGSEA NES than gene expression values and deconvolved proportions. This suggests that both sample types displayed the same underlying transcriptomic patterns at the gene set level, but that only the differences between non-responders and responders on the WB samples after treatment were large enough to emerge at the individual gene level and after cell type deconvolution.

It is known that sex and ageing affect the systemic immune landscape48. Therefore, our patient cohort was selected with a balanced proportion of responders and non-responders across the binary sex and age categories, minimizing the potential bias introduced by these variables. Additionally, we evaluated the effect of these two variables on our transcriptomic-derived data through PCA, hierarchical clustering and by performing the logistic regression classification models including the patients’ sex and age information. Overall, these factors did not affect the performance of the classification models, and did not influence the differences in response captured by WB samples, suggesting that they are not independent predictors of the response to ICIs in NSCLC patients.

Altogether, our results suggest that WB is more likely than PBMCs to capture differences in the peripheral blood immune cell transcriptomic landscape between non-responder and responder NSCLC patients undergoing ICI treatment, across different data analysis strategies. PBMCs need to be isolated from WB as soon as the sample collection is performed, which is not always possible due to logistic reasons. Furthermore, they must be cryopreserved for storage until thawing for RNA isolation, which affects the cell viability and function, and might induce changes in the gene expression profile and thus introduce bias. Conversely, WB can be collected in dedicated RNA-stabilizing collection tubes, preserving the integrity and expression profile of the RNA during long-term storage. In addition, the granulocyte fraction is lost (except for LDNs) in PBMCs, but it is present in WB. Since neutrophils represent one of the most important cellular regulators of the response to ICIs, depleting them may have a great impact on the results of studies seeking non-invasive biomarkers for immunotherapy response prediction. However, WB samples require the use of a red blood cell lysis buffer or a globin transcript depletion method prior to the RNA-seq process to ensure that the expression of less abundant but more informative transcripts is not concealed.

This study has several limitations. The differences we found in deconvolved proportions of CD8+ and CD4+ T cells and neutrophils between non-responders and responders using WB samples after treatment could not be supported by a significant result through Cox proportional hazards model and log-rank test survival analysis (data not shown), probably due to the small cohort size. For the same reason, we were not able to split our cohort into a training set and a testing set, and we performed a 5-fold cross validation within the full cohort, which provides limited robustness to the outcome. Therefore, our results need to be validated, ideally on a larger cohort of ICI-treated NSCLC patients. To our knowledge, such dataset similar to ours has not been published yet. A transcriptomic dataset of WB from renal cell carcinoma patients is available11, but using it for validation purposes in this study would presume that there are no tumor specific differences in gene expression. The recruitment of more patients might provide sufficient statistical power to obtain similar results before treatment initiation, which should be the aim of any predictive biomarker. In addition, the collection of another blood sample earlier on treatment, i.e. before the second cycle of ICI therapy, would shed light on the early immunotherapy-induced systemic response. Another important limitation is the lack of more fine-grained and curated gene sets and cell type deconvolution references that would allow to find more ICI related transcriptomic-based differences or yet unannotated immune cell subtypes between non-responders and responders. Fine-tuning this based on single cell data might increase the predictive value.

In conclusion, we present the first study comparing the protein coding transcriptome profile of PBMCs and WB samples from ICI-treated NSCLC patients through massively parallel poly-A+ RNA-seq. We report notable differences between both sample types via DEA, GSEA and cell type deconvolution. Additionally, we compared the gene expression patterns between non-responder and responder patients, to assess the suitability of PBMCs and WB before and after treatment to find non-invasive transcriptomic biomarkers for the prediction of ICI therapy response. Our results suggest that ssGSEA is the most accurate data analysis tool to identify such biomarkers. Moreover, the differential expression analysis and cell type deconvolution results indicate that the largest differences in the systemic immune landscape between non-responders and responders are captured using WB samples collected at first response evaluation, and that CD8+, CD4+ T cells and neutrophils are the cell types with the most predictive potential. WB samples collected in RNA-stabilizing tubes are easy to handle and ensure a reliable characterization of the peripheral blood immune cell RNA content. If validated and/or fine-tuned, these results may be useful for future research on non-invasive biomarkers for immunotherapy response prediction in NSCLC, provide more insights on the systemic regulators of the response to ICIs in NSCLC patients and be applicable to other cancer entities.

The dataset generated during the current study are available in the European Genome Archive (EGA) repository, under the accession number EGAD50000000389.

Immune checkpoint inhibitor

Non-small cell lung cancer

Tumor microenvironment

Tumor mutational burden

Tumor infiltrating lymphocyte

Single-cell RNA sequencing

Gene set enrichment analysis

Peripheral blood mononuclear cell

Whole blood

Polyadenylated

Differential expression analysis

Molecular signatures database

Single-sample gene set enrichment analysis

Deconvolution with Dampened Weighted Least Squares

Natural killer

Normalized enrichment score

Area under the ROC curve

Principal component analysis

Differentially expressed gene

Logarithm of the fold change

Tumor associated macrophage

Regulatory T cell

Low-density neutrophil

Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 71, 209–249 (2021).

Article PubMed Google Scholar

Borghaei, H. et al. Nivolumab versus docetaxel in advanced nonsquamous non–small-cell lung cancer. N. Engl. J. Med. 373, 1627–1639 (2015).

Article CAS PubMed PubMed Central Google Scholar

Rittmeyer, A. et al. Atezolizumab versus docetaxel in patients with previously treated non-small-cell lung cancer (OAK): A phase 3, open-label, multicentre randomised controlled trial. Lancet 389, 255–265 (2017).

Article PubMed Google Scholar

Yeong, J. et al. Intratumoral CD39+CD8+ T cells predict response to programmed cell death protein-1 or programmed death ligand-1 blockade in patients with NSCLC. J. Thorac. Oncol Off. Publ. Int. Assoc. Study Lung Cancer 16, 1349–1358 (2021).

CAS Google Scholar

Qin, A. et al. Cellular engagement and interaction in the tumor microenvironment predict non-response to PD-1/PD-L1 inhibitors in metastatic non-small cell lung cancer. Sci. Rep. 12, 9054 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar

Ravi, A. et al. Genomic and transcriptomic analysis of checkpoint blockade response in advanced non-small cell lung cancer. Nat. Genet. 55, 807–819 (2023).

Article CAS PubMed PubMed Central Google Scholar

Spitzer, M. H. et al. Systemic immunity is required for effective cancer immunotherapy. Cell 168, 487-502.e15 (2017).

Article CAS PubMed PubMed Central Google Scholar

Marcos Rubio, A., Everaert, C., Van Damme, E., De Preter, K. & Vermaelen, K. Circulating immune cell dynamics as outcome predictors for immunotherapy in non-small cell lung cancer. J. Immunother. Cancer 11, e007023 (2023).

Article PubMed PubMed Central Google Scholar

Zhang, Y. et al. Single-cell analyses reveal key immune cell subsets associated with response to PD-L1 blockade in triple-negative breast cancer. Cancer Cell 39, 1578-1593.e8 (2021).

Article CAS PubMed Google Scholar

Fehlings, M. et al. Late-differentiated effector neoantigen-specific CD8+ T cells are enriched in peripheral blood of non-small cell lung carcinoma patients responding to atezolizumab treatment. J. Immunother. Cancer 7, (2019).

Nagumo, Y. et al. Whole-blood gene expression profiles correlate with response to immune checkpoint inhibitors in patients with metastatic renal cell carcinoma. Cancers 14, 6207 (2022).

Article CAS PubMed PubMed Central Google Scholar

Avila Cobos, F., Alquicira-Hernandez, J., Powell, J. E., Mestdagh, P. & De Preter, K. Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat. Commun. 11, (2020).

He, D. et al. Whole blood vs PBMC: Compartmental differences in gene expression profiling exemplified in asthma. Allergy Asthma Clin. Immunol. 15, 67 (2019).

Article CAS PubMed PubMed Central Google Scholar

Van Der Sijde, F. et al. RNA from stabilized whole blood enables more comprehensive immune gene expression profiling compared to RNA from peripheral blood mononuclear cells. PLOS ONE 15, e0235413 (2020).

Article PubMed PubMed Central Google Scholar

Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

Article CAS PubMed Google Scholar

R: The R Project for Statistical Computing. https://www.r-project.org/.

Wickham, H. et al. Welcome to the tidyverse. J. Open Source Softw. 4, 1686 (2019).

Article ADS Google Scholar

Durinck, S. et al. BioMart and Bioconductor: A powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005).

Article CAS PubMed Google Scholar

Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).

Article CAS PubMed PubMed Central Google Scholar

Zhang, Y., Parmigiani, G. & Johnson, W. E. ComBat-seq: Batch effect adjustment for RNA-seq count data. NAR Genom. Bioinform. 2, Iqaa078 (2020).

Article Google Scholar

Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

Article PubMed PubMed Central Google Scholar

Korotkevich, G. et al. Fast gene set enrichment analysis. 060012. https://doi.org/10.1101/060012 (2021).

Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: Gene set variation analysis for microarray and RNA-Seq data. BMC Bioinform. 14, 7 (2013).

Article Google Scholar

Tsoucas, D. et al. Accurate estimation of cell-type composition from gene expression data. Nat. Commun. 10, (2019).

Schulte-Schrepping, J. et al. Severe COVID-19 is marked by a dysregulated myeloid cell compartment. Cell 182, 1419-1440.e23 (2020).

Article CAS PubMed PubMed Central Google Scholar

Lee, C. K. et al. Clinical and molecular characteristics associated with survival among patients treated with checkpoint inhibitors for advanced non-small cell lung carcinoma: A systematic review and meta-analysis. JAMA Oncol. 4, 210–216 (2018).

Article PubMed Google Scholar

Thommen, D. S. et al. A transcriptionally and functionally distinct pd-1+ cd8+ t cell pool with predictive potential in non-small-cell lung cancer treated with pd-1 blockade. Nat. Med. 24, 994–1004 (2018).

Article CAS PubMed PubMed Central Google Scholar

Zhang, F. et al. Dynamics of peripheral T cell clones during PD-1 blockade in non-small cell lung cancer. Cancer Immunol. Immunother. 69, 2599–2611 (2020).

Article CAS PubMed PubMed Central Google Scholar

Wang, Y.-C. et al. Targeting monoamine oxidase A-regulated tumor-associated macrophage polarization for cancer immunotherapy. Nat. Commun. 12, 3530 (2021).

Article ADS CAS PubMed PubMed Central Google Scholar

Wang, X. et al. Targeting monoamine oxidase A for T cell–based cancer immunotherapy. Sci. Immunol. 6, eabh2383 (2021).

Article ADS MathSciNet CAS PubMed Google Scholar

Menjivar, R. E. et al. Arginase 1 is a key driver of immune suppression in pancreatic cancer. eLife 12, e80721 (2023).

Article CAS PubMed PubMed Central Google Scholar

Canè, S. et al. Neutralization of NET-associated human ARG1 enhances cancer immunotherapy. Sci. Transl. Med. 15, eabq6221 (2023).

Article PubMed Google Scholar

Kim, M.-C. et al. CD177 modulates the function and homeostasis of tumor-infiltrating regulatory T cells. Nat. Commun. 12, 5764 (2021).

Article ADS CAS PubMed PubMed Central Google Scholar

Pang, J. et al. Integrating Single-cell RNA-seq to construct a Neutrophil prognostic model for predicting immune responses in non-small cell lung cancer. J. Transl. Med. 20, 531 (2022).

Article CAS PubMed PubMed Central Google Scholar

Liu, X. et al. Inhibition of PCSK9 potentiates immune checkpoint therapy for cancer. Nature 588, 693–698 (2020).

Article ADS CAS PubMed PubMed Central Google Scholar

Gao, X. et al. PCSK9 regulates the efficacy of immune checkpoint therapy in lung cancer. Front. Immunol. 14, (2023).

Arasanz, H. et al. Circulating low density neutrophils are associated with resistance to first line anti-PD1/PDL1 immunotherapy in non-small cell lung cancer. Cancers 14, 3846 (2022).

Article CAS PubMed PubMed Central Google Scholar

Krieg, C. et al. High-dimensional single-cell analysis predicts response to anti-PD-1 immunotherapy. Nat. Med. 24, 144–153 (2018).

Article CAS PubMed Google Scholar

Veglia, F., Perego, M. & Gabrilovich, D. Myeloid-derived suppressor cells coming of age. Nat. Immunol. 2018(19), 108–119 (2018).

Article Google Scholar

Limagne, E. et al. Tim-3/galectin-9 pathway and mMDSC control primary and secondary resistances to PD-1 blockade in lung cancer patients. OncoImmunology 8, (2019).

Hwang, S. et al. Immune gene signatures for predicting durable clinical benefit of anti-PD-1 immunotherapy in patients with non-small cell lung cancer. Sci. Rep. 2020(10), 1–10 (2020).

Google Scholar

Helmink, B. A. et al. B cells and tertiary lymphoid structures promote immunotherapy response. Nature 2020(577), 549–555 (2020).

Article ADS Google Scholar

Sun, X. et al. Maturation and abundance of tertiary lymphoid structures are associated with the efficacy of neoadjuvant chemoimmunotherapy in resectable non-small cell lung cancer. J. Immunother. Cancer 10, e005531 (2022).

Article PubMed PubMed Central Google Scholar

Yuan, S., Liu, Y., Till, B., Song, Y. & Wang, Z. Pretreatment peripheral B cells are associated with tumor response to anti-PD-1-based immunotherapy. Front. Immunol. 11, (2020).

Barth, D. A. et al. Patterns of peripheral blood B-cell subtypes are associated with treatment response in patients treated with immune checkpoint inhibitors: A prospective longitudinal pan-cancer study. Front. Immunol. 13, (2022).

Kamphorst, A. O. et al. Proliferation of PD-1+ CD8 T cells in peripheral blood after PD-1-targeted therapy in lung cancer patients. Proc. Natl. Acad. Sci. U. S. A. 114, 4993–4998 (2017).

Article ADS CAS PubMed PubMed Central Google Scholar

Alessi, J. V. et al. Low peripheral blood derived neutrophil-to-lymphocyte ratio (dNLR) is associated with increased tumor T-cell infiltration and favorable outcomes to first-line pembrolizumab in non-small cell lung cancer. J. Immunother. Cancer 9, (2021).

Huang, Z. et al. Effects of sex and aging on the immune cell landscape as assessed by single-cell transcriptomic analysis. Proc. Natl. Acad. Sci. U. S. A. 118, (2021).

Download references

We thank Leander Meuris and the HPC-UGent team for their support during statistical and computational data analysis. Figure 1 was created using Biorender.com.

AMR is supported by Kom op tegen Kanker (Stand up against Cancer), while CE and EVD are supported by Fund for Scientific Research Flanders (FWO) postdoctoral and doctoral grants (grant IDs 11L7122N and 1226821 N). This work was also supported by Kom op tegen Kanker (grant ID not available).

Katleen De Preter and Celine Everaert: Shared last authors.

Department of Biomolecular Medicine, VIB-UGent Center for Medical Biotechnology, Ghent University, Technologiepark-Zwijnaarde 75, 9052, Ghent, Belgium

Álvaro Marcos Rubio, Seoyeon Oh, Sofie Roelandt, Eufra Van Damme, Katleen De Preter & Celine Everaert

Cancer Research Institute Ghent (CRIG), Medical Research Building 2 (MRB2) – UZ Gent - Corneel Heymanslaan 10, 9000, Ghent, Belgium

Álvaro Marcos Rubio, Seoyeon Oh, Sofie Roelandt, Dieter Stevens, Eufra Van Damme, Karim Vermaelen, Katleen De Preter & Celine Everaert

Department of Pulmonary Medicine and Immuno-Oncology Network Ghent, Ghent University Hospital, Ghent, Belgium

Dieter Stevens & Karim Vermaelen

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

You can also search for this author in PubMed Google Scholar

AMR, CE, KDP and KV conceptualized and designed the study. AMR and SR performed the experiments. AMR and SO performed the data and statistical analysis. DS, EVD and KV provided the patient samples and clinical information. AMR and SO created the figures and wrote the first draft. EVD, CE, KDP and KV critically reviewed and commented on the manuscript. CE, KDP and KV supervised the work and obtained funding. All authors contributed to the article and approved the submitted version.

Correspondence to Celine Everaert.

The authors declare no competing interests.

The studies involving humans were approved by medical Ethical Committee of Ghent University Hospital, Ghent, Belgium (EC/2017/1207). The studies were conducted in accordance with the local legislation and institutional requirements. The participants provided their written informed consent to participate in this study.

Not applicable.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Below is the link to the electronic supplementary material.

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

Marcos Rubio, Á., Oh, S., Roelandt, S. et al. Defining the optimal setting for transcriptomic analyses on blood samples for response prediction in immunotherapy-treated NSCLC patients. Sci Rep 14, 26026 (2024). https://doi.org/10.1038/s41598-024-76982-x

Download citation

Received: 21 May 2024

Accepted: 18 October 2024

Published: 29 October 2024

DOI: https://doi.org/10.1038/s41598-024-76982-x

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative