Tutorial 4: Project wide actions (e.g. size and colour reference)

Unless images are taken in a highly standardized environment, e.g. via a scanner or a microscope, variation will be introduced in terms of exposure or distance between camera and photographed object, zooming, etc. To compensate this variation among images within and across datasets, Phenopype contains some preprocessing tools that can automatically correct images.

Load project

First we load the project we created in the last tutorial. If we haven’t done so, we should create a project:

[1]:
import phenopype as pp
import os

# os.chdir(r"/phenopype-master/tutorials")

proj_dir = r"../_temp/my_project/phenopype"


[2]:
if os.path.isdir(proj_dir):
    myproj = pp.project.load(proj_dir)     ## from /phenopype-master/tutorials
else:
    images = "../../../tutorials/images"
    myproj = pp.project(root_dir=proj_dir)
    myproj.add_files(image_dir=images,
                 include="stickle",
                 exclude=["side","top"])
    myproj.add_config(name = "lm",
                      config_preset="landmarks_plain")
    pp.project.save(myproj, overwrite=True)
--------------------------------------------
Project loaded and current working directory changed to

E:\git_repos\phenopype\_temp\my_project\phenopype
--------------------------------------------

Create reference template

With this method you can set a project specific scale by measuring the pixel-to-mm ratio in a reference image. The steps for this are:

  1. Click on two points with a known distance (in mm) in between, e.g. on a piece of mm-paper that you put in the image, and hit Enter.

  2. Enter the length in mm. This either returns the pixel-to-mm ratio as a float (under prototyping- low throughput mode), or adds it to all attribute files within a project (under high throughput mode).

  3. Optional: You can mask the reference card with the option mask=True, to exclude it from all thresholding algorithms, or use template=True to create a template for automatic scale detection that is saved on the project’s root directory.

If you used automatic scale detection, the corresponding column in all resulting data frames is “template_px_mm_ratio” for the distance measured in the template, and “current_px_mm_ratio” for the ratio in the current picture detected by “detect_scale”.

Adding a scale

The reference image can be any image, but choose it carefully: if you plan on doing brightness and colour corrections, it should be in the middle of the distribution of all exposures and colours so corrections will not over-expose or over-saturate the images.

We will use the image stickleback_side.jpg from the image folder in tutorials:

[3]:
images = "../../../tutorials/images"
os.listdir(images)
[3]:
['isopods.jpg',
 'isopods_fish.mp4',
 'phytoplankton.jpg',
 'snail.jpg',
 'stickle1.JPG',
 'stickle2.JPG',
 'stickle3.JPG',
 'stickleback_side.jpg',
 'stickleback_top.jpg',
 'worms.jpg']

Within a project, the reference image is set with the add_scale method of the project object: 1. click on two points inside the provided image 2. enter the distance (returns the pixel-to-mm-ratio) 3. drag a rectangle mask over the reference card

The pixel-to-mm-ratio for reference image gets saved to every image included in the project, the mask is stored as a template for automatic scale detection with the find_scale function.

[5]:
myproj.add_scale(reference_image="../../../tutorials/images/stickleback_side.jpg", overwrite=True)
- scale template saved under scale_template.jpg (overwritten).
- measure pixel-to-mm-ratio
Scale set
- add column length
Template selected
- scale pixel-to-mm-ratio already measured (overwrite=False)
added scale information to 0__stickle1 (overwritten)
added scale information to 0__stickle2 (overwritten)
added scale information to 0__stickle3 (overwritten)

If we now load the first image in the phenopype directory folders ((myproj.dirpaths[0]), we retrieve the scale information we just collected:

[6]:
image, df_img_data = pp.load_directory(myproj.dirpaths[0],  # first directory in project folder
                                       df=True,             # return DataFrame
                                       cont=False)          # return as image, not as container (default)
df_img_data

[6]:
filename width height size_ratio_original template_px_mm_ratio
0 stickle1.JPG 2400 1600 1 35

We then supply the image, the DataFrame containing template_px_mm_ratio, and the scale template to the find_scale function. The template mask is always stored in the root directory, which made the current Python working directory when a project is created or loaded. The function will return the updated image meta DataFrame, a mask dataframe containing the coordinates of the detected reference card, and the original image.

[7]:
templ = pp.load_image("scale_template.jpg")
df_image_data, masks, image = pp.preprocessing.find_scale(image,
                                                          template=templ,
                                                          df_image_data = df_img_data)
df_image_data
---------------------------------------------------
Reference card found with 243 keypoint matches:
template image has 0    35
Name: template_px_mm_ratio, dtype: int64 pixel per mm.
current image has 0    33.8
Name: template_px_mm_ratio, dtype: float64 pixel per mm.
= 96.71 % of template image.
---------------------------------------------------
[7]:
filename width height size_ratio_original template_px_mm_ratio current_px_mm_ratio
0 stickle1.JPG 2400 1600 1 35 33.8
[8]:
image = pp.visualization.draw_masks(image, df_masks=masks)
pp.show_image(image)
 - show mask: scale.

If the scale contains colour information, you can also set the equalize=True flag. This will adjust the target image’s pixel-value-histogram (https://en.wikipedia.org/wiki/Image_histogram) to the histogram of the scale in the reference image, based on the values only inside the detected scale.

[9]:
df_image_data, masks, image = pp.preprocessing.find_scale(image,
                                                          template=templ,
                                                          df_image_data = df_img_data,
                                                          equalize=True)
pp.show_image(image)
---------------------------------------------------
Reference card found with 239 keypoint matches:
template image has 0    35
Name: template_px_mm_ratio, dtype: int64 pixel per mm.
current image has 0    33.9
Name: template_px_mm_ratio, dtype: float64 pixel per mm.
= 96.764 % of template image.
---------------------------------------------------
histograms equalized

Find scale in high throughput workflow

These operations are of course also possible in the fly as part of the high throughput workflow. For this we just add a new configuration file that contains the find_scale instruction. However, we need to switch to interactive configuration mode, and add the equalize: true string:

[1]:
myproj.add_config(name = "lm2",
                  config_preset="landmarks_scale",
                  overwrite=True,
                  interactive=True)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-fd67495ea407> in <module>
----> 1 myproj.add_config(name = "lm2",
      2                   config_preset="landmarks_scale",
      3                   overwrite=True,
      4                   interactive=True)

NameError: name 'myproj' is not defined
[2]:
for directory in myproj.dirpaths:
    p1 = pp.pype(directory,
           name="lm2")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-21834d2215b4> in <module>
----> 1 for directory in myproj.dirpaths:
      2     p1 = pp.pype(directory,
      3            name="lm2")

NameError: name 'myproj' is not defined

Collect results

Using collect_results one can search the project folder for results, and copy them to a folder in the root directory (“results” is the default, but can be changed).

[ ]:
myproj.collect_results(name="lm2",          # these two arguments create the search string for "landmarks_lm2.csv"
                       files=["landmarks"], #
                       overwrite=True)