Tutorial 2: Working with phenopype (prototyping, low throughput, high throughput)¶
Analysis of scientific images can be an iterative process that may require frequent user input to preprocess images, adjust settings and evaluate the obtained results. In phenopype, users can start this process by identifying the appropriate functions and settings to analyse a series of images (i.e. which segmentation algorithms is to be used). For the actual analysis, users then can switch to a workflow that has higher throughput and is more reproducible. Phenopype offers workflows that are appropriate for all stages of the scientific process:
Principle of operation
analysis prototyping, self education and evaluation
images are loaded as arrays and functions are applied one by one
single pictures and very small datasets
images are loaded into phenopype containers
medium and large datasets - default analysis workflow
images are loaded from a phenopype project directory tree, and analyzed with the pype method
For all three workflows, users assemble a stack of computer vision functions from phenopypes five core modules (preprocessing, segmentation, measurement, export, visualization - for an overview check the API reference). However, the degree of user interaction, visual feedback and the mode by which these functions are applied to images differ, as well as reproducibility.
In the prototyping and low throughput workflow, users write a phenopype function stack in directly in Python code. This is recommended for users who wish to familiarize themselves with the basic principles of computer vision and to explore the phenopype function library. Output from all intermediate steps is returned from the functions and can be evaluated, which makes these routines are also appropriate for prototyping and testing.
To process image datasets in high throughput and reproducibility, users should work from a phenopype directory structure in conjunction with the pype-method. To get started with Phenopype’s high througput workflow, see below, check Tutorial 3 and consult the pype section in the API reference.
Fig. 1: Workflow demonstration using a stained stickleback (Gasterosteus aculeatus) stained with alizarin red. Traits of interest are bone-plate area and shape, and, within the detected plates, pixel intensities that denote bone-density. The computer vision functions used to extract the trait of interest (bone-plate area, shape and pixel density) are the same in all workflows, but workflows differ in the amount of code necessary and in reproducibility.
The low throughput workflow starts with the path to an image that is stored on the hard drive.
load_image imports the file as a three-channel  numpy array (ndarray), together with image meta data (file name, exposure, dimensions, etc.) as a pandas DataFrame. The array gets passed on to the
threshold function, which will return a binary array of the same dimensions. This array needs to be passed on to the
find_contours function, which will return a dictionary with the detected
contours. This dictionary, together with the original array, can then be passed to the
colour_intensity function. This function will collect the average color value from within the perimeter coordinates for each contour and return a pandas dataframe containing those values. Finally, the dataframe can be exported as a csv file with
save_colour. By passing on the initially created meta-data, the function will automatically expand the provided columns of meta-info into the exported csv.
 to learn more about the basics of Computer Vision check the resources section of the phenopype documentation.
Fig. 2: Schematic of Phenopype’s prototyping workflow
import phenopype as pp filepath = r"images/stickleback_side.jpg" ## load image as array, supply image_data (DataFrame containing meta data) image, image_data = pp.load_image(filepath, df = True, meta=True) ## draw mask mask = pp.preprocessing.create_mask(image, tool="polygon") ## thresholding converts multichannel to binary image image_bin = pp.segmentation.threshold(image, method="adaptive", channel="red", blocksize=199, constant=5, df_masks=mask) ## perform morphology operations on binarized image image_morph = pp.segmentation.morphology(image_bin, operation="close", shape="ellipse", kernel_size=3, iterations=3) ## detect contours ony binary image contours = pp.segmentation.find_contours(image_morph, df_image_data=image_data, retrieval="ext", min_area=150) ## draw detected contours onto canvas image_drawn = pp.visualization.draw_contours(image, df_contours=contours) ## export contours to csv pp.export.save_contours(contours, dirpath = r"../_temp/output") ## show convas pp.show_image(image_drawn)
no meta-data found - create mask - include mask "mask1" pixels - contours saved under ../_temp/output\contours.csv (overwritten).
While analyzing the image, you can explore output from the different steps to see what is going on. For example, the binary image resulting from the thresholding:
Low throughput worflow¶
load_image function can also load an image into a phenopype container, which is a python class that incorporates loaded images, dataframes, detected contours, intermediate output, etc. so that they are available for inspection or storage at the end of the analysis. The advantage of using containers is that they don’t litter the global environment and namespace, while still containing all intermediate steps (e.g. binary masks or contour DataFrames). Containers can be used manually to
analyze images, but typically they are used automatically within the pype-routine that is part of phenoype’s high throughput workflow (see below).
import phenopype as pp filepath = r"images/stickleback_side.jpg" ## load image as a phenopype container which will include all images, dataframes, ## detected contours and intermediate output container = pp.load_image(filepath, cont=True, meta=True) ## afterwards, same as in the prototyping workflow, functions are applied ## directly to the container pp.preprocessing.create_mask(container, tool="polygon") pp.segmentation.threshold(container, method="adaptive", channel="red", blocksize=199, constant=5) # 3/4 pp.segmentation.morphology(container, operation="close", shape="ellipse", kernel_size=3, iterations=3) # 5 pp.segmentation.find_contours(container, retrieval="ext", min_area=150) # 6 pp.visualization.select_canvas(container, canvas="raw") pp.visualization.draw_contours(container) # 6 pp.export.save_contours(container, dirpath = r"../_temp/output") pp.show_image(container.canvas)
dirpath defaulted to file directory - E:\git_repos\phenopype\tutorials\images no meta-data found Directory to save files set at - E:\git_repos\phenopype\tutorials\images - create mask - include mask "mask1" pixels Found 6 contours that match criteria. - raw image - contours saved under ../_temp/output\contours.csv.
Although the intermediate steps from the functions are not present as objects in the namespace, you can access and evaluate it from the container. Again, we will look at the binary image:
dir to inspect all the components of the container:
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'canvas', 'df_contours', 'df_image_data', 'df_image_data_copy', 'df_masks', 'dirpath', 'image', 'image_bin', 'image_copy', 'image_data', 'image_gray', 'image_mod', 'load', 'reset', 'save', 'save_suffix']
High throughput worflow¶
The pype routine is phenopype’s standard method to analyse medium and large image datasets, where a function stack is constructed with the human readable
yaml syntax (see below). Users can execute the pype method on a filepath, an array, or a phenopype directory, which always will trigger three actions:
open the contained yaml configuration with the default OS text editor
parse the contained functions and execute them in the sequence
open a Python-window showing the processed image.
After one iteration of these steps, users can evaluate the results and decide to modify the opened configuration file (e.g. either change function parameters or add new functions), and run the pype again, or to terminate the pype and save all results. The processed image, any extracted phenotypic information, as well as the modified config-file is stored inside the image directory. Together with the raw images, which may be either stored separately or within the directory tree, users can thereby provide the full image analysis pipeline to anyone who wishes to reproduce the obtained results.
For more information on how to use the
pypein conjunction with phenopype projects please refer to the Tutorial 3.
Also, check the examples (e.g. Example 2), which include code for both low and high throughput.
import phenopype as pp filepath = r"images/stickleback_side.jpg" pp.pype(image=filepath, # input - can be also an array or a phenopype directory dirpath = r"../_temp/output", ## where output is stored - folder needs to exist name="demo", # name of the pype routine, appended to all results-files config_preset="demo1" # template for the analysis - you can create your own! )
dirpath defaulted to file directory - E:\git_repos\phenopype\tutorials\images Directory to save files set at - E:\git_repos\phenopype\tutorials\images Did not find "pype_config_demo.yaml" - create at following location? (y/n): ../_temp/output\pype_config_demo.yaml y pype config generated from demo1. Created and saved new pype config "pype_config_demo.yaml" in folder ../_temp/output ../_temp/output\pype_config_demo.yaml ------------+++ new pype iteration 2020:08:27 17:09:47 +++-------------- Nothing loaded. PREPROCESSING create_mask - create mask SEGMENTATION threshold - include mask "mask1" pixels morphology find_contours Found 6 contours that match criteria. VISUALIZATION select_canvas - invalid selection - defaulting to raw image draw_contours draw_masks - show mask: mask1. EXPORT save_contours - contours saved under ../_temp/output\contours_demo.csv. save_canvas - canvas saved under ../_temp/output\canvas_demo.jpg. AUTOSAVE save_masks - masks saved under ../_temp/output\masks_demo.csv. TERMINATE
<phenopype.main.pype at 0x247a2b31188>
At the current stage of development, the pype method is prone to errors resulting from incorrect yaml syntax, e.g. missing spaces or wrong indentation. The pype will still try to run from bottom to top and pass exceptions, but may result in errors that cascade through the function stack. Consult the section below on how to modify yaml-syntax.
Moreover, the pype will trigger specific behavior of some functions to facilitate user experience when working with large data sets. Fore example, some functions get called automatically (e.g. from the
export modules), but they don’t necessarily show default behavior as documented in the api (e.g.
visualization.save_canvas will always have
overwrite=True to save output canvas). Consult the section below to learn about pype-behavior.
The configuration files needed to run the pype are written in yaml (a recursive acronym for “YAML Ain’t Markup Language”). In principle, these are just text files that follow a specific syntax that follows rules for indentation and separation.
import phenopype as pp print(pp.presets.demo1)
preprocessing: - create_mask: tool: polygon segmentation: - threshold: method: adaptive blocksize: 199 constant: 5 channel: red - morphology: operation: close shape: ellipse kernel_size: 3 iterations: 3 - find_contours: retrieval: ext min_diameter: 0 min_area: 150 measurement: visualization: - select_canvas: canvas: image - draw_contours: line_width: 2 label_width: 1 label_size: 1 fill: 0.3 - draw_masks export: - save_contours: overwrite: true - save_canvas: resize: 0.5 overwrite: true
The text inside the yaml configuration files is parsed by Python from top to bottom, i.e. converted to modules and functions in Phenopype. The first level without any indentation, e.g. from the above example
export, denote from the core module that a function is part of. The second level, e.g.
- threshold and
- find_contours are functions inside the
segmentation module. The third level, e.g.
method: otsu and
blocksize: 99, are
arguments passed on to the function. Following this notation, the yaml parser in Python would interpret the first item in
segmentation as follows:
pp.segmentation.threshold(image, method=“otsu”, blocksize=99, constant=1, value=127, channel=“gray”)
When running the pype routine,
image is automatically loaded and passed to all functions. Apart from that you can add or remove functions as you like. Note the hyphen followed by a space in front (
-) a function: this notation indicates that during parsing the items are interpreted as part of a list. This important, as it allows you to specify the same function as many times as you want.
When adding or modifying modules and functions, it is important to keep in mind that the function stack is executed sequentially. So, if you want to perform a
morphology operation on a binary images, it should come after and not before the main segmentation function (in this case
threshold). Also, the order for modules should always be:
preprocessing > segmentation > measurement > visualization > export
Here are the most important syntax rules summed up:
indentation rules: 0 spaces for modules, hyphen+space in front of functions, 4 spaces in front of arguments
separation rules: modules and functions with arguments are followed by a colon (
:) and a new line; functions without specified arguments don’t need a colon; arguments are followed by a colon, a space and then the value
modules and functions can be emtpy (e.g. see
- draw_masksabove), but function arguments cannot (e.g.
overwrite:needs to be
as per Python syntax, optional function arguments can, but don’t have to be specified and the functions will just run on default values
if you need to add modules (not all presets contain all modules), stick to this order: preprocessing > segmentation > measurement > visualization > export
functions can be added multiple times, but sometimes their output may be overwtritten (e.g.
- thresholdonly works once,
- create_maskmultiple times )
 Note that one
create_mask mask operation already can create multiple masks - see Tutorial 5.
pype function has specific implicit  behavior that aims at supporting speed and robustness when working in “production” (i.e. when performing the actual analysis of large image datasets compared to prototyping and low throughput workflow). Here I list some important aspects of that behavior.
Handling the popped up image and text windows (if your text window didn’t pop up, check the installation instructions).
Editing and saving the opened configuration file will close the image window, run the functions in the control file, and show the updated results.
Closing the image window manually (with the X button in the upper right), also runs the functions in the control file, and show the updated results.
Escwill close all windows and interrupt the pype routine (
sys.exit(), which will also end a Python session if run from the command line) [2,3].
Each step that requires user interaction (e.g.
landmarks) needs to be confirmed with
Returnuntil the next function in the sequence is executed [2,3].
At the end of the analysis, when the final steps (visualization and export functions) have run, you can end the pype routine with another
 At this point, it’s not configurable, but later versions will allow to customize the behavior.
 Sometimes keystrokes will not be recognized, so they need to be executed multiple times - see Tutorial 5.
 The image window needs to be highlighted to detect keystrokes, not the text editor or the console - see Tutorial 5.
Handling the function stack specified in the config files
pypefunction will execute all functions in sequence and overwrite past iterations, except data from interactive user input (e.g.
landmarks- see point 2.).
To overwrite interactive user input, set the argument
overwrite: trueat the specific function in the configuration file. Remember to remove it after the next run. 
pypeis initialized on a directory (either from a Phenopype project or a directory specified with the argument
dirpath), it will attempt to load input data (e.g. masks) that contain the provided
To edit the loaded input data, set the argument
overwrite: trueas explained in point 2.
 If you forget to remove an overwrite argument and are prompted to overwrite previous input, simply remove the
overwrite: true argument, and save to run the
pype again, it will fall back onto input from the last iteration.
 For example,
pp.pype(image, name="run1", dirpath="path\to\directory) will attempt to load any saved files in
directory that contains the suffix
Visualizing the results¶
Handling aspects of visual feedback (can be completely suppressed by setting
feedback=False in the
Visual feedback is always generated automatically by an internal function that show results (i.e. output from
create_mask) on top of a “canvas”.
The canvas can be the image at any step of analytic process (i.e. raw image, binary image, or a colour channel [gray, red, green or blue]) and is selected with
- select_canvasas part of the
- select_canvasis not specified explicitly, it is called automatically and defaults to the raw image as canvas.
Output from all functions, needs to be specified manually. So, after using
- draw_landmarksshould be called in the
Visual parameters of interactive tools (e.g.
line_thickness) is specified separately in the respective function, and in the
 Experimental: use the flag
autoshow=True in the
pype arguments to automatically show results.
Exporting the results¶
Saving results and canvas for quality control.
All results are saved automatically, even if the respective functions in
exportare not specified, with the
pypeas suffix [1,2].
If a file already exist in the directory, and the respective function is not listed under
export:, then it will not be overwritten. If an export function is specified under
export:, it will overwrite any existing file 
The canvas is an exception: it will always be saved and always be overwritten to show the output from the last iteration. However, users can modify the canvas name with
namein the arguments to save different output side by side .
 Experimental: use the flag
autosave=False in the
pype arguments to deactivate this behavior.
 For example,
pp.pype(image, name="run1") will save
 For example, listing
- save_landmarks under
export: will overwrite
 For example,
name: binary under
- save_canvas: save the canvas as