European Defence Fund (EDF) Project 101103176 (Closed).

AI/ML

AI / ML and Data

IMS and GC-IMS generate high-dimensional spectral and chromatographic data that cannot be reliably interpreted using simple threshold logic.

Artificial intelligence constitutes a core enabling technology of TeChBioT, transforming raw spectral signals into robust, real-time classification outputs under laboratory and field conditions.

Core Analytical Challenges

The AI/ML framework addresses three fundamental challenges:

The high dimensionality of chromatographic and ion mobility data

Noise, baseline drift and environmental variability

Reliable discrimination of chemical and biological signatures in complex matrices.

AI/ML for Chemical Agent Detection

Chemical detection within TeChBioT focuses on volatile and semi-volatile chemical warfare agent (CWA) simulants and selected real CWAs measured under controlled laboratory conditions.

The detection platform operates using:

  • Standalone HT-IMS

  • Hyphenated HT-GC-IMS configurations

depending on the operational scenario.

These systems produce multidimensional datasets including:

  • Retention time

  • Drift time (or inverse reduced mobility K₀)

  • Intensity distributions

  • Dual-polarity ion information

In realistic environments such as gasoline vapour backgrounds or variable humidity conditions, spectral overlap and matrix effects significantly complicate interpretation.

AI-based pattern recognition was implemented to improve:

  • Sensitivity

  • Selectivity

  • Decision reliability

 


 

Preprocessing Pipeline

Before model training, chromatograms undergo systematic preprocessing.

  • Baseline drift and baseband noise are removed using wavelet transform techniques.

  • High-frequency noise is attenuated by wavelet shrinkage or Savitzky–Golay smoothing while preserving peak morphology.

  • Persistent homology methods are employed for robust peak detection, applied locally within automatically defined regions of interest to reduce computational load and avoid global threshold artefacts.

For GC-MS datasets:

  • Retention times are normalized using the Kovats retention index to mitigate variations arising from different column lengths and temperature programs.

  • Signals are scaled and standardized to ensure numerical stability and balanced feature weighting during training.

 


 

Supervised Learning Approaches

Multiple supervised learning algorithms were evaluated for chemical classification:

  • Support Vector Machines (SVM)

  • Logistic Regression

  • XGBoost

  • Multilayer Perceptrons (MLP)

Both modelling strategies were assessed:

Holistic strategy
Performs end-to-end classification directly from chromatographic input.

Modular strategy
Embeds ML into specific pipeline stages such as noise filtering or peak selection.

 


 

Laboratory and Field Performance

In laboratory experiments involving five CWA simulants:

  • Classification accuracy was consistently high across models.

  • SVM demonstrated particular robustness under class imbalance.

In outdoor validation campaigns involving the simulant DPM:

  • AI models produced rapid alarm decisions in real time.

  • No false positives or false negatives were observed.

The user interface displayed clear red (alarm) and green (safe) signals, demonstrating the feasibility of automated decision support in mobile deployment scenarios.

AI/ML for Biological Agent Detection

Biological detection presents fundamentally different analytical constraints.

Bacteria and viruses are non-volatile entities and require fragmentation prior to analysis.

Compared to chemical detection, biological classification must address:

  • Greater inter-class similarity

  • Lower signal-to-noise ratios

  • Higher variability across environmental matrices

Benchmarking was done using:

  • MALDI-TOF

  • Py-GC-MS

 


 

Evaluated ML and DL Models

A broad range of ML and DL models was evaluated to assess their ability to capture biologically meaningful patterns.

Classical ML approaches:

  • Random Forest

  • Support Vector Machines

  • Ridge Classifiers

  • k-Nearest Neighbors

  • XGBoost

  • Partial Least Squares Discriminant Analysis (PLS-DA)

Deep learning architectures:

  • One-dimensional and two-dimensional Convolutional Neural Networks (CNN1D and CNN2D)

  • Fully Connected Neural Networks (FCNN)

  • Denoising autoencoders

  • Established computer vision backbones such as ResNet and VGG

 


 

MALDI-TOF Preprocessing and Results

For MALDI-TOF spectra, preprocessing included:

  • Asymmetric least squares baseline subtraction

  • Savitzky–Golay smoothing

  • Normalization

  • Truncation to the 2,000–12,000 Da m/z range to remove matrix noise and low-informative regions

Internal datasets achieved perfect classification performance for:

  • Discrimination between bacteria and viruses

  • Gram-positive versus Gram-negative bacteria

  • A panel of seven bacterial and five viral species

External validation using an independent reference database from the Robert Koch Institute confirmed robust generalization.

The Extra Trees Classifier achieved:

  • 100% accuracy for Gram classification

  • Approximately 80% accuracy for multi-class species identification

demonstrating strong transferability beyond the training dataset.

 


 

Py-GC-MS Analysis

A comprehensive Py-GC-MS dataset comprising 22 bacterial and viral classes was analysed using multiple data representations, including:

  • Full 2D GC×MS chromatograms

  • Total ion count (TIC) profiles

  • FAME features

  • Principal component features

Deep learning applied to the 2D GC×MS representation achieved the highest classification performance, highlighting the value of preserving spatial structure in chromatographic–mass spectral data.

 


 

Py-GC-IMS Preprocessing Pipeline

For Py-GC-IMS datasets, a structured preprocessing pipeline standardizes the chromatograms prior to modelling.

Processing steps include:

  • Two-dimensional interpolation to ensure fixed spatial resolution

  • Region-of-interest restriction based on retention time and inverse reduced mobility ranges

  • Savitzky–Golay smoothing to attenuate high-frequency noise while preserving peak structure

  • Reactant Ion Peak (RIP) identification and removal to avoid dominance of non-analyte features

  • Baseline correction via white top-hat morphological filtering

  • Intensity thresholding

 


 

Py-GC-IMS Classification Performance

For laboratory-generated Py-GC-IMS datasets:

  • CNN-based deep learning models achieved superior performance compared to classical ML approaches

  • CNN1D reached perfect classification accuracy under positive-polarity measurements

  • Dual-polarity configurations were preferred for robustness across conditions

During outdoor validation campaigns:

  • FCNN, CNN2D and denoising autoencoder architectures achieved accuracies above 85% despite domain shift between laboratory and field environments

  • Only a single misclassification was observed in the evaluated dataset

Data Simulation and Future Scalability

Because access to hazardous compounds and large annotated datasets is inherently limited, TeChBioT developed a chromatogram data simulator capable of generating synthetic chromatograms with:

  • Variable peak shapes

  • Baseline drift

  • White noise

  • Column degradation artefacts

This simulator:

  • Enhances model robustness

  • Enables stress testing under diverse synthetic scenarios

  • Supports future transfer learning strategies

From Sensor to Intelligent Decision Support

AI integration transforms TeChBioT from a signal-producing analytical device into an intelligent decision-support platform.

By:

  • Reducing false alarms

  • Compensating for environmental variability

  • Enabling hierarchical biological classification

AI and deep learning significantly enhance operational reliability.

The combination of HT-GC-IMS technology and advanced AI provides a scalable foundation for future deployment in:

  • Mobile platforms

  • UAV integration

  • Networked CBRN monitoring systems

AI/ML is not an auxiliary component but a central innovation pillar of the TeChBioT architecture.