Whole-Slide Edge Tomograph System for High-Speed 3D Cytology Imaging
A novel whole-slide edge tomograph system has been developed, designed for high-speed 3D imaging and advanced edge-side data processing of cytology samples. This innovative system integrates sophisticated mechanical components with powerful computational units to deliver real-time volumetric imaging capabilities.
System Overview
The system features an illumination unit equipped with a high-power light-emitting diode and a motorized iris, which projects light through cytology samples. This light is then captured by a camera board housing a CMOS sensor. The camera is mounted on a Z-stage for precise axial scanning, while an XY stage is responsible for moving the slide to ensure complete coverage.
Mechanical components are seamlessly integrated with an edge computer, forming the core of the system's control and processing. This edge computer contains an image sensor FPGA, a real-time controller with an additional FPGA, and microcontrollers dedicated to XY stage and illumination control. A System-on-Module (SOM) unit orchestrates internal communications, boasting a multi-core Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a hardware encoder, and ample main memory for image buffering. Captured images are initially sent to the FPGA for signal conditioning and protocol conversion before further processing.
Data Acquisition and Processing
The system employs a dual four-lane MIPI interface, effectively doubling data throughput. This robust interface enables the continuous transmission of 4,480 × 4,504 resolution images at an impressive rate of up to 50 frames per second from the FPGA to the SOM, crucial for real-time handling of large volumetric datasets.
Upon receipt, image data undergo a critical three-step processing pipeline on the SOM:
- 3D image acquisition.
- 3D reconstruction through axial alignment, leveraging both the GPU and CPU.
- Real-time compression using the on-board hardware encoder (NVENC library for HEVC format).
Once processed, the compressed image data are securely stored locally on an integrated solid-state drive.
Back-End Server Operations
Following local storage, compressed image data are transmitted to a dedicated back-end server. This server is responsible for stitching the individual images into full-slide 3D volumes and storing them on a robust Network-Attached Storage (NAS) system. These meticulously reconstructed volumes are essential for interactive visualization and advanced AI-based computational analysis.
The back-end server hosts a DZI viewer, an integral tool that dynamically decompresses and transmits requested tile regions based on user inputs such as zooming, panning, and focus adjustments. These intensive operations are significantly accelerated by an NVIDIA RTX 4000 Ada GPU, which efficiently handles stitching, image rendering, and hardware-accelerated decoding.
Further enhancing the system's capabilities, an AI analysis server retrieves compressed data from the NAS. It then decodes this data with hardware acceleration and performs diagnostic or morphological inference using an NVIDIA RTX 6000 Ada GPU. The predictions and associated metadata are subsequently stored back on the NAS for review and future analysis.
Sectional 3D Image Construction and Compression Workflow
The imaging workflow necessitates real-time coordination across numerous software and hardware components. The real-time controller meticulously adjusts the Z stage to acquire sectional 2D images at precisely specified focal depths. Each captured image is transmitted to the FPGA's image signal processing unit and then promptly forwarded to the GPU buffer on the edge computer.
Once image acquisition for a specific region is complete, the XY stage swiftly moves to the next section, while image processing and compression for the previous region simultaneously begin. This pipelined execution strategy is key to maintaining high throughput.
A dedicated 3D image construction module works to enhance color uniformity, optimize dynamic range, and select the optimal focal planes. Concurrently, a 3D image compression module utilizes the SOM's hardware encoder to compress the processed image stack into a high-efficiency HEVC-format video file. Both these modules operate simultaneously to sustain high-throughput scanning operations.
System performance was rigorously evaluated across 10, 20, and 40 Z-layer configurations. It was observed that XY stage motion time remained constant, while the durations for image acquisition, construction, and compression increased linearly with the number of Z layers. Compression performance was assessed using the HEVC codec with high (40.36 Mbps), medium (24.21 Mbps), and low (8.07 Mbps) bit rate modes. Image quality, quantified by PSNR values, showed negligible degradation above 40 dB and was imperceptible above 42 dB. Both high and medium settings consistently yielded PSNR values highly suitable for cytological assessment.
Notably, compression quality settings had only a minor impact on image compression time and did not significantly affect overall system throughput. File size increased with Z layers and decreased with stronger compression, while imaging time scaled linearly with Z layers and was not substantially affected by compression settings.
Sectional 3D Image Decompression and Viewing
The DZI viewer system provides an intuitive platform for interactive web-based visualization of whole-slide 3D cytology images, comprising front-end, back-end, and data layers. After acquisition, sectional 3D images are transmitted to the back-end server. Here, the server performs stitching using precise positional metadata to reconstruct the full 3D whole-slide image.
The front end empowers users to browse slides, zoom, pan, rotate, and navigate through Z layers, supported by a preview image and comprehensive annotation tools. The back end efficiently responds to requests via a slide API for metadata and an image API for tile access. Compressed frames are retrieved and decompressed using NVDEC, an NVIDIA GPU hardware video decoder, with the Decord library enabling efficient hardware-accelerated HEVC frame decoding and random access. The system demonstrates impressive capabilities, handling over ten concurrent image tile requests while maintaining an average response time of under 100 ms per request.
AI Components for Cell Analysis
Detection of Cell Nuclei
Cell nuclei were accurately detected using a YOLOX object detection model. This model was trained on 348 cytology-specific images (278 for training, 70 for validation) derived directly from the tomograph, benefiting from 242,669 annotated nuclei. Training and inference were conducted on downsampled 1,024 × 1,024 pixel images. An intersection-over-union threshold of 0 and a detection probability cutoff of 0.005 were carefully chosen to maximize sensitivity. Automated nucleus counts showed strong agreement with manual counts (y = 1.0098x, R2 = 0.9487), with false positives near cell boundaries considered acceptable for subsequent downstream filtering.
Extraction of In-Focus Single-Cell Images
Morphologically informative single-cell images were extracted through a meticulous four-step pipeline:
- Nucleus detection: The YOLOX model was applied to downsampled 3D whole-slide images, subsampled at 3-μm intervals from the Z-stack.
- Z-layer grouping: An algorithm intelligently clustered spatially proximate and axially aligned detections across Z layers, identifying them as single nucleus instances.
- Focus evaluation: For each grouped nucleus, full-resolution image patches within the identified Z range were retrieved. A sophisticated focus metric then pinpointed the slice with the best optical focus.
- Image cropping: A 224 × 224 pixel patch, precisely centered on the nucleus, was cropped from the best-focus slice. This cropped image served as the optimal input for subsequent classification models.
Classification of Single-Cell Images
Single-cell images underwent classification using a MaxViT-base vision transformer model into ten distinct cytological categories: leukocytes, superficial/intermediate squamous cells, parabasal cells, squamous metaplasia cells, glandular cells, miscellaneous cell clusters, LSIL cells, HSIL cells, adenocarcinoma cells, and irrelevant objects. The model was rigorously trained on expert-annotated cell images from 354 donor-derived whole-slide images, employing robust data augmentation techniques.
A second round of training further refined the model, utilizing 14 additional whole-slide samples from center K. This expanded the taxonomy to 11 classes by introducing navicular cells, incorporating 11,210 annotated cells (scaled sevenfold to 78,470 training images) and addressing specific morphological observations.
CMD-Based Cell Population Analysis
The vision transformer model generates 10- or 11-dimensional class probability vectors, referred to as CMD (Class Membership Discriminant) values, for each classified cell image. These values quantitatively represent the confidence levels for each class assignment.
Whole-slide level analysis effectively utilized these CMD vectors for advanced visualization and gating. A 2D scatter plot of 'irrelevant objects' and 'leukocytes' CMD probabilities allowed for sophisticated negative gating, specifically to isolate epithelial-lineage cells. Histograms of LSIL, HSIL, and adenocarcinoma CMD values were then generated within this gated population to evaluate lesion-associated probability distributions, applying class-specific thresholds for abnormal populations.
Each cell received a hierarchical rule-based class label for UMAP visualization. Cells were labeled 'irrelevant' (if beyond threshold), then 'leukocytes' (if beyond threshold), followed by lesion categories (LSIL, HSIL, adenocarcinoma) if their respective thresholds were crossed. Remaining cells were assigned to the class with the highest CMD value among other categories (squamous, parabasal, glandular, miscellaneous). UMAP plots effectively used these assignments, with alpha transparency reflecting prediction confidence.
Clinical Study and Evaluation
Human Participants and Sample Preparation
Cervical cytology samples were collected from a total of 770 patients at Cancer Institute Hospital of JFCR (C) and an additional 222 (T), 384 (K), and 199 (J) samples from three other centers. Informed consent was meticulously obtained via an opt-out process. Samples were prepared using established ThinPrep or SurePath methods, stained with Papanicolaou, and expertly evaluated by cytotechnologists based on the Bethesda System. HPV testing was also comprehensively conducted. Both ThinPrep and SurePath methods yielded similarly favorable ROC curves and AUC values, demonstrating the model's robustness to variations in slide preparation.
Clinical-Grade Performance Evaluation
A comprehensive clinical-grade performance analysis was conducted using 318 cervical liquid-based cytology samples, each with expert diagnoses and HPV test results. Whole-slide image acquisition and CMD-based classification were meticulously performed, aggregating AI results on a per-slide basis. Total cell counts across ten classes and specific abnormal cell classes were clearly visualized, grouped by cytological diagnosis.
The ratio of superficial/intermediate squamous cells for NILM samples was plotted against donor age, alongside absolute counts of five normal epithelial components, effectively illustrating age-related variations. AI-detected LSIL and HSIL cell counts were visualized using violin plots, grouped by cytological diagnosis and HPV test results. Comparisons between HPV-negative and HPV-positive slides were performed using one-sided Mann–Whitney U-tests. ROC analysis for LSIL+ (LSIL, ASC-H, HSIL, SCC) and HSIL+ (HSIL, SCC) detection was conducted on all 318 samples, using total LSIL+HSIL counts for LSIL+ and HSIL counts for HSIL+.
Multicenter Evaluation
The clinical performance study was expanded to include a larger dataset of 1,124 whole-slide images from four distinct centers (C, T, K, J), carefully accounting for varying age distributions. All slides were consistently imaged with 40 Z layers at high image-quality settings and processed using the 11-class detector-classifier, which incorporated navicular cells. Per-slide cell burdens were summarized, with age-dependent trends clearly visualized, showing navicular cell counts peaking among donors in their early 20s.
Distributions of whole-slide cell counts were visualized to provide a comprehensive overview. AI-detected LSIL and HSIL counts were summarized by cytological diagnosis (NILM, ASC-US, LSIL, ASC-H, HSIL, SCC) within each center. For HPV-stratified analyses (n = 814 slides with HPV results), counts were rigorously compared between HPV− and HPV+ slides using one-sided Mann–Whitney U-tests.
AI performance was rigorously benchmarked against conventional cytology, using HPV positivity as a reference, to generate AI-based ROC curves. Human operating points were defined from routine cytology as LSIL+ and ASC-US+ (ASC-US, LSIL, ASC-H, HSIL, SCC). Pairwise comparisons between AI and human performance were conducted at matched specificity and matched sensitivity. Center-wise ROC curves were meticulously constructed for LSIL+ and HSIL+ endpoints. Threshold sensitivity was assessed by sweeping the per-cell probability threshold from 0.60 to 0.99. Spatial correspondence between AI detections and expert annotations was visually confirmed through overlay of AI-detected LSIL cells on whole-slide images.
Statistical Analysis and Software
Group comparisons were conducted using one-sided Mann–Whitney U-tests with Benjamini–Hochberg adjusted q values and Cliff’s δ effect sizes. No specific statistical methods were used to predetermine sample size, and no randomization was performed. Importantly, cytology assessment was blinded to HPV and AI results, and AI analyses were carried out without access to labels, ensuring impartiality.
Software implementation for 3D image construction and compression was developed in C++ and CUDA. Decompression, deep learning, and statistical analysis were implemented in Python, leveraging a wide array of open-source libraries including NumPy, pandas, matplotlib, seaborn, scikit-learn, statsmodels, PyTorch, torchvision, albumentations, OpenCV, timm, and ONNX Runtime. Deep learning models were initially developed in PyTorch/timm and then exported to ONNX for highly efficient GPU-accelerated inference. Image annotations, a crucial part of the process, were created using the Computer Vision Annotation Tool.