Tutorial 2: Working with phenopype (prototyping, low throughput, high throughput)

Analysis of scientific images can be an iterative process that may require frequent user input to preprocess images, adjust settings and evaluate the obtained results. In phenopype, users can start this process by identifying the appropriate functions and settings to analyse a series of images (i.e. which segmentation algorithms is to be used). For the actual analysis, users then can switch to a workflow that has higher throughput and is more reproducible. Phenopype offers workflows that are appropriate for all stages of the scientific process:


Workflow

Use case

Principle of operation

Explicitness

Reproducibility

Prototyping

analysis prototyping, self education and evaluation

images are loaded as arrays and functions are applied one by one

High

Low

Low throughput

single pictures and very small datasets

images are loaded into phenopype containers

Medium

low

High throughput

medium and large datasets - default analysis workflow

images are loaded from a phenopype project directory tree, and analyzed with the pype method

Low

High


For all three workflows, users assemble a stack of computer vision functions from phenopypes five core modules (preprocessing, segmentation, measurement, export, visualization - for an overview check the API reference). However, the degree of user interaction, visual feedback and the mode by which these functions are applied to images differ, as well as reproducibility.

In the prototyping and low throughput workflow, users write a phenopype function stack in directly in Python code. This is recommended for users who wish to familiarize themselves with the basic principles of computer vision and to explore the phenopype function library. Output from all intermediate steps is returned from the functions and can be evaluated, which makes these routines are also appropriate for prototyping and testing.

To process image datasets in high throughput and reproducibility, users should work from a phenopype directory structure in conjunction with the pype-method. To get started with Phenopype’s high througput workflow, see below, check Tutorial 3 and consult the pype section in the API reference.

Phenopype workflow example

Fig. 1: Workflow demonstration using a stained stickleback (Gasterosteus aculeatus) stained with alizarin red. Traits of interest are bone-plate area and shape, and, within the detected plates, pixel intensities that denote bone-density. The computer vision functions used to extract the trait of interest (bone-plate area, shape and pixel density) are the same in all workflows, but workflows differ in the amount of code necessary and in reproducibility.

Prototyping worflow

The low throughput workflow starts with the path to an image that is stored on the hard drive. load_image imports the file as a three-channel [1] numpy array (ndarray), together with image meta data (file name, exposure, dimensions, etc.) as a pandas DataFrame. The array gets passed on to the threshold function, which will return a binary array of the same dimensions. This array needs to be passed on to the find_contours function, which will return a dictionary with the detected contours. This dictionary, together with the original array, can then be passed to the colour_intensity function. This function will collect the average color value from within the perimeter coordinates for each contour and return a pandas dataframe containing those values. Finally, the dataframe can be exported as a csv file with save_colour. By passing on the initially created meta-data, the function will automatically expand the provided columns of meta-info into the exported csv.

[1] to learn more about the basics of Computer Vision check the resources section of the phenopype documentation.

Phenopype prototyping workflow Fig. 2: Schematic of Phenopype’s prototyping workflow

[10]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

## load image as array, supply image_data (DataFrame containing meta data)
image, image_data = pp.load_image(filepath, df = True, meta=True)
## draw mask
mask = pp.preprocessing.create_mask(image, tool="polygon")

## thresholding converts multichannel to binary image
image_bin = pp.segmentation.threshold(image, method="adaptive",
                                      channel="red", blocksize=199,
                                      constant=5, df_masks=mask)
## perform morphology operations on binarized image
image_morph = pp.segmentation.morphology(image_bin, operation="close",
                                         shape="ellipse", kernel_size=3,
                                         iterations=3)
## detect contours ony binary image
contours = pp.segmentation.find_contours(image_morph, df_image_data=image_data,
                                         retrieval="ext", min_area=150)
## draw detected contours onto canvas
image_drawn = pp.visualization.draw_contours(image,
                                             df_contours=contours)
## export contours to csv
pp.export.save_contours(contours, dirpath = r"../_temp/output")
## show convas
pp.show_image(image_drawn)
no meta-data found
- create mask
- include mask "mask1" pixels
- contours saved under ../_temp/output\contours.csv (overwritten).

While analyzing the image, you can explore output from the different steps to see what is going on. For example, the binary image resulting from the thresholding:

[11]:
pp.show_image(image_bin)

Low throughput worflow

The load_image function can also load an image into a phenopype container, which is a python class that incorporates loaded images, dataframes, detected contours, intermediate output, etc. so that they are available for inspection or storage at the end of the analysis. The advantage of using containers is that they don’t litter the global environment and namespace, while still containing all intermediate steps (e.g. binary masks or contour DataFrames). Containers can be used manually to analyze images, but typically they are used automatically within the pype-routine that is part of phenoype’s high throughput workflow (see below).

Phenopype low throughput workflow

[12]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

## load image as a phenopype container which will include all images, dataframes,
## detected contours and intermediate output
container = pp.load_image(filepath, cont=True, meta=True)

## afterwards, same as in the prototyping workflow, functions are applied
## directly to the container
pp.preprocessing.create_mask(container, tool="polygon")
pp.segmentation.threshold(container, method="adaptive", channel="red",
                          blocksize=199, constant=5) # 3/4
pp.segmentation.morphology(container, operation="close", shape="ellipse",
                           kernel_size=3, iterations=3) # 5
pp.segmentation.find_contours(container, retrieval="ext", min_area=150) # 6
pp.visualization.draw_contours(container) # 6
pp.export.save_contours(container, dirpath = r"../_temp/output")
pp.show_image(container.canvas)
no meta-data found
- create mask
- include mask "mask1" pixels
- contours saved under ../_temp/output\contours.csv (overwritten).

Although the intermediate steps from the functions are not present as objects in the namespace, you can access and evaluate it from the container. Again, we will look at the binary image:

[13]:
pp.show_image(container.image_bin)

Use dir to inspect all the components of the container:

[14]:
print(dir(container))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'canvas', 'df_contours', 'df_image_data', 'df_image_data_copy', 'df_masks', 'dirpath', 'image', 'image_bin', 'image_copy', 'image_data', 'image_gray', 'image_mod', 'load', 'reset', 'save', 'save_suffix']

High throughput worflow

The pype routine is phenopype’s standard method to analyse medium and large image datasets, where a function stack is constructed with the human readable yaml syntax (see below). Users can execute the pype method on a filepath, an array, or a phenopype directory, which always will trigger three actions:

  1. open the contained yaml configuration with the default OS text editor

  2. parse the contained functions and execute them in the sequence

  3. open a Python-window showing the processed image.

After one iteration of these steps, users can evaluate the results and decide to modify the opened configuration file (e.g. either change function parameters or add new functions), and run the pype again, or to terminate the pype and save all results. The processed image, any extracted phenotypic information, as well as the modified config-file is stored inside the image directory. Together with the raw images, which may be either stored separately or within the directory tree, users can thereby provide the full image analysis pipeline to anyone who wishes to reproduce the obtained results.

Further information

  • For more detailed information on the pype method, configuration files, and default behavior, see below, and the pype section of the API reference

  • For more information on how to use the pype in conjunction with phenopype projects please refer to the Tutorial 3.

  • Also, check the examples (e.g. Example 2), which include code for both low and high throughput.

Phenopype high throughput workflow

[15]:
import phenopype as pp

filepath = r"images/stickleback_side.jpg"

pp.pype(image=filepath, # input - can be also an array or a phenopype directory
        dirpath = r"../_temp/output", ## where output is stored - folder needs to exist
        name="demo", # name of the  pype routine, appended to all results-files
        config_preset="demo1" # template for the analysis - you can create your own!
        )
../_temp/output\pype_config_demo.yaml


------------+++ new pype iteration 2020:04:20 16:14:24 +++--------------


AUTOLOAD
- drawing loaded from attributes.yaml
- masks_demo.csv
PREPROCESSING
create_mask
- mask with label mask1 already created (overwrite=False)
SEGMENTATION
threshold
- include mask "mask1" pixels
morphology
find_contours
VISUALIZATION
select_canvas
- invalid selection - defaulting to raw image
draw_contours
draw_masks
 - show mask: mask1.
EXPORT
save_contours
- contours saved under ../_temp/output\contours_demo.csv (overwritten).
save_canvas
- canvas saved under ../_temp/output\canvas_demo.jpg (overwritten).
AUTOSAVE
save_masks
- masks not saved - file already exists (overwrite=False).
save_drawing
- drawing saved (overwriting)


TERMINATE
[15]:
<phenopype.main.pype at 0x2c52e39a8c8>

At the current stage of development, the pype method is prone to errors resulting from incorrect yaml syntax, e.g. missing spaces or wrong indentation. The pype will still try to run from bottom to top and pass exceptions, but may result in errors that cascade through the function stack. Consult the section below on how to modify yaml-syntax.

Moreover, the pype will trigger specific behavior of some functions to facilitate user experience when working with large data sets. Fore example, some functions get called automatically (e.g. from the visualization and export modules), but they don’t necessarily show default behavior as documented in the api (e.g. visualization.save_canvas will always have overwrite=True to save output canvas). Consult the section below to learn about pype-behavior.

yaml-syntax

The configuration files needed to run the pype are written in yaml (a recursive acronym for “YAML Ain’t Markup Language”). In principle, these are just text files that follow a specific syntax that follows rules for indentation and separation.

[16]:
import phenopype as pp
print(pp.presets.demo1)

preprocessing:
- create_mask:
    tool: polygon
segmentation:
- threshold:
    method: adaptive
    blocksize: 199
    constant: 5
    channel: red
- morphology:
    operation: close
    shape: ellipse
    kernel_size: 3
    iterations: 3
- find_contours:
    retrieval: ext
    min_diameter: 0
    min_area: 150
measurement:
visualization:
- select_canvas:
    canvas: image
- draw_contours:
    line_width: 2
    label_width: 1
    label_size: 1
    fill: 0.3
- draw_masks
export:
- save_contours:
    overwrite: true
- save_canvas:
    resize: 0.5
    overwrite: true

The text inside the yaml configuration files is parsed by Python from top to bottom, i.e. converted to modules and functions in Phenopype. The first level without any indentation, e.g. from the above example segmentation, visualization and export, denote from the core module that a function is part of. The second level, e.g. - threshold and - find_contours are functions inside the segmentation module. The third level, e.g. method: otsu and blocksize: 99, are arguments passed on to the function. Following this notation, the yaml parser in Python would interpret the first item in segmentation as follows:

pp.segmentation.threshold(image, method=“otsu”, blocksize=99, constant=1, value=127, channel=“gray”)

When running the pype routine, image is automatically loaded and passed to all functions. Apart from that you can add or remove functions as you like. Note the hyphen followed by a space in front (-) a function: this notation indicates that during parsing the items are interpreted as part of a list. This important, as it allows you to specify the same function as many times as you want.

When adding or modifying modules and functions, it is important to keep in mind that the function stack is executed sequentially. So, if you want to perform a morphology operation on a binary images, it should come after and not before the main segmentation function (in this case threshold). Also, the order for modules should always be:

preprocessing > segmentation > measurement > visualization > export

Here are the most important syntax rules summed up:

  • indentation rules: 0 spaces for modules, hyphen+space in front of functions, 4 spaces in front of arguments

  • separation rules: modules and functions with arguments are followed by a colon (:) and a new line; functions without specified arguments don’t need a colon; arguments are followed by a colon, a space and then the value

  • modules and functions can be emtpy (e.g. see measurement: and `- draw_masksabove), but function arguments *cannot* (e.g.overwrite:needs to betrueorfalse`)

  • as per Python syntax, optional function arguments can, but don’t have to be specified and the functions will just run on default values

  • if you need to add modules (not all presets contain all modules), stick to this order: preprocessing > segmentation > measurement > visualization > export

  • functions can be added multiple times, but sometimes their output may be overwtritten (e.g. - threshold``only works once,`- create_mask` multiple times [1])

[1] Note that one create_mask mask operation already can create multiple masks - see Tutorial 5.

pype behavior

The pype function has specific implicit [1] behavior that aims at supporting speed and robustness when working in “production” (i.e. when performing the actual analysis of large image datasets compared to prototyping and low throughput workflow). Here I list some important aspects of that behavior.

Window control

Handling the popped up image and text windows (if your text window didn’t pop up, check the installation instructions).

  1. Editing and saving the opened configuration file will close the image window, run the functions in the control file, and show the updated results.

  2. Closing the image window manually (with the X button in the upper right), also runs the functions in the control file, and show the updated results.

  3. Esc will close all windows and interrupt the pype routine (Esc triggers sys.exit(), which will also end a Python session if run from the command line) [2,3].

  4. Each step that requires user interaction (e.g. create_mask or landmarks) needs to be confirmed with Return until the next function in the sequence is executed [2,3].

  5. At the end of the analysis, when the final steps (visualization and export functions) have run, you can end the pype routine with another Return keystroke [2,3].

[1] At this point, it’s not configurable, but later versions will allow to customize the behavior.

[2] Sometimes keystrokes will not be recognized, so they need to be executed multiple times - see Tutorial 5.

[3] The image window needs to be highlighted to detect keystrokes, not the text editor or the console - see Tutorial 5.

Function execution

Handling the function stack specified in the config files

  1. The pype function will execute all functions in sequence and overwrite past iterations, except data from interactive user input (e.g. create_mask or landmarks - see point 2.).

  2. To overwrite interactive user input, set the argument overwrite: true at the specific function in the configuration file. Remember to remove it after the next run. [1]

  3. If a pype is initialized on a directory (either from a Phenopype project or a directory specified with the argument dirpath), it will attempt to load input data (e.g. masks) that contain the provided name argument [2].

  4. To edit the loaded input data, set the argument overwrite: true as explained in point 2.

[1] If you forget to remove an overwrite argument and are prompted to overwrite previous input, simply remove the overwrite: true argument, and save to run the pype again, it will fall back onto input from the last iteration.

[2] For example, pp.pype(image, name="run1", dirpath="path\to\directory) will attempt to load any saved files in directory that contains the suffix "run1" (e.g. "masks_run1.csv").

Visualizing the results

Handling aspects of visual feedback (can be completely suppressed by setting feedback=False in the pype arguments).

  1. Visual feedback is always generated automatically by an internal function that show results (i.e. output from landmarks, find_contours or create_mask) on top of a “canvas”.

  2. The canvas can be the image at any step of analytic process (i.e. raw image, binary image, or a colour channel [gray, red, green or blue]) and is selected with `- select_canvasas part of thevisualization` module.

  3. If `- select_canvas` is not specified explicitly, it is called automatically and defaults to the raw image as canvas.

  4. Output from all functions, needs to be specified manually. So, after using `- landmarks,- draw_landmarksshould be called in thevisualization` module. [1]

  5. Visual parameters of interactive tools (e.g. point_size or line_thickness) is specified separately in the respective function, and in the visualization module.

[1] Experimental: use the flag autoshow=True in the pype arguments to automatically show results.

Exporting the results

Saving results and canvas for quality control.

  1. All results are saved automatically, even if the respective functions in export are not specified, with the name argument in pype as suffix [1,2].

  2. If a file already exist in the directory, and the respective function is not listed under export:, then it will not be overwritten. If an export function is specified under export:, it will overwrite any existing file [3]

  3. The canvas is an exception: it will always be saved and always be overwritten to show the output from the last iteration. However, users can modify the canvas name with name in the arguments to save different output side by side [3].

[1] Experimental: use the flag autosave=False in the pype arguments to deactivate this behavior.

[2] For example, pp.pype(image, name="run1") will save "masks_run1.csv" or "contours_run1.csv".

[3] For example, listing - save_landmarks under export: will overwrite landmarks_run1.csv

[4] For example, name: binary under - save_canvas: save the canvas as canvas_binary.jpg

[ ]: