mednet.config.data.shenzhen.datamodule#

Shenzhen DataModule for computer-aided diagnosis.

Database reference: [MONTGOMERY-SHENZHEN-2014]

Module Attributes

CONFIGURATION_KEY_DATADIR

Key to search for in the configuration file for the root directory of this database.

Functions

make_split(basename)

Return a database split for the Shenzhen database.

Classes

DataModule(split_filename)

Shenzhen DataModule for computer-aided diagnosis.

RawDataLoader([config_variable])

A specialized raw-data-loader for the Shenzhen dataset.

mednet.config.data.shenzhen.datamodule.CONFIGURATION_KEY_DATADIR = 'datadir.shenzhen'#

Key to search for in the configuration file for the root directory of this database.

class mednet.config.data.shenzhen.datamodule.RawDataLoader(config_variable='datadir.shenzhen')[source]#

Bases: RawDataLoader

A specialized raw-data-loader for the Shenzhen dataset.

Parameters:

config_variable (str) – Key to search for in the configuration file for the root directory of this database.

datadir: Path#

This variable contains the base directory where the database raw data is stored.

sample(sample)[source]#

Load a single image sample from the disk.

Parameters:

sample (tuple[str, int]) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample label.

Return type:

tuple[Tensor, Mapping[str, Any]]

Returns:

The sample representation.

label(sample)[source]#

Load a single image sample label from the disk.

Parameters:

sample (tuple[str, int]) – A tuple containing the path suffix, within the dataset root folder, where to find the image to be loaded, and an integer, representing the sample label.

Returns:

The integer label associated with the sample.

Return type:

int

mednet.config.data.shenzhen.datamodule.make_split(basename)[source]#

Return a database split for the Shenzhen database.

Parameters:

basename (str) – Name of the .json file containing the split to load.

Return type:

Mapping[str, Sequence[Any]]

Returns:

An instance of DatabaseSplit.

class mednet.config.data.shenzhen.datamodule.DataModule(split_filename)[source]#

Bases: CachingDataModule

Shenzhen DataModule for computer-aided diagnosis.

The standard digital image database for Tuberculosis was created by the National Library of Medicine, Maryland, USA in collaboration with Shenzhen No.3 People’s Hospital, Guangdong Medical College, Shenzhen, China. The Chest X-rays are from out-patient clinics, and were captured as part of the daily routine using Philips DR Digital Diagnose systems.

Data specifications:

  • Raw data input (on disk):

    • PNG 8-bit RGB images (grayscale, but encoded as RGB images with “inverted” grayscale scale requiring special treatment).

    • Variable width and height of 3000 x 3000 pixels or less

  • Output image:

    • Transforms:

      • Load raw PNG with PIL

      • Remove black borders

      • Torch center cropping to get square image

    • Final specifications:

      • Grayscale, encoded as a single plane tensor, 32-bit floats, square with varying resolutions, depending on the input image

      • Labels: 0 (healthy), 1 (active tuberculosis)

Parameters:

split_filename (str) – Name of the .json file containing the split to load.