Tutorial 3: Setting up and managing phenopype projects

A central aspect of efficiently using Phenopype is to work from within a project. Phenopype projects are composed of a directory tree in which each folder contains the copy or a link to a single raw image file (1). Alongside the images to be processed, users can store configuration file for the pype routine that we covered in the previous tutorial: configurations files can be created using preconfigured templates, which can easily be modifed (2). Once raw images have been added and configuration files are in place, the pype routine is used until results are satisfactory (3). Together, the raw images, the pype-configuration files and the saved results files are all that is needed to completely reproduce any phenotypic data that was collected in the process (4).

Create project and add images

Step 1

Step. 1: Create a phenopype project and organize raw images into separate folders where all relevant data, attributes and results are stored.

A Phenopype project directory can be initiatlized with the project function. The phenoype project root folder should be separate from the raw data, e.g. as a folder inside of your main project folder:

[3]:
import phenopype as pp
import os
os.getcwd()
[6]:
myproj = pp.project(root_dir=r"../_temp/my_project/phenopype") ## doesn't have to be "myproj", can be named anything
--------------------------------------------
Phenopype will create a new project at

E:\git_repos\phenopype\_temp\my_project\phenopype

and change the current working directory to this location.
Proceed? (y/n)
y

project attributes written to E:\git_repos\phenopype\_temp\my_project\phenopype\attributes.yaml
--------------------------------------------
[9]:
os.getcwd()
[9]:
'E:\\git_repos\\phenopype\\_temp\\my_project\\phenopype'

Next step is to add images to the project. You can do so with the add_files method of the created project (a method is an executable function that belongs to an existing object, in this case “myproj. see this SO question). The function offers some flexibility in terms of which files to import. Most important arguments here are include, exclude and filetypes. For example, given the following list of images:

[16]:
images = "../../../tutorials/images"
os.listdir(images) ##
[16]:
['isopods.jpg',
 'isopods_fish.mp4',
 'phytoplankton.jpg',
 'snail.jpg',
 'stickle1.JPG',
 'stickle2.JPG',
 'stickle3.JPG',
 'stickleback_side.jpg',
 'stickleback_top.jpg',
 'worms.jpg']

If we want to import “stickle1”, “stickle2”, and “stickle3”, we can do a combination include and exclude (also prints all other default settings):

[17]:
myproj.add_files(image_dir=images,
                 include="stickle",       ## can be type "str" or type "list"
                 exclude=["side","top"]   ## can be type "str" or type "list"
                )
--------------------------------------------
phenopype will search for files at

E:\git_repos\phenopype\tutorials\images

using the following settings:

filetypes: ['jpg', 'JPG', 'jpeg', 'JPEG', 'tif', 'png'], include: stickle, exclude: ['side', 'top'], raw_mode: copy, search_mode: dir, unique_mode: path

Found image stickle1.JPG - phenopype-project folder 0__stickle1 created
no meta-data found
Found image stickle2.JPG - phenopype-project folder 0__stickle2 created
no meta-data found
Found image stickle3.JPG - phenopype-project folder 0__stickle3 created
no meta-data found

Found 3 files
--------------------------------------------

The three images have the same (nonstandard) file ending, so we can also use the filetype argument (and the overwrite argument, because have already added them above):

[18]:
 myproj.add_files(image_dir=images,
                 filetypes="JPG" ,       ## can be type "str" or type "list"
                 exclude=["side","top"],      ## can be type "str" or type "list"
                 overwrite=True
                )
--------------------------------------------
phenopype will search for files at

E:\git_repos\phenopype\tutorials\images

using the following settings:

filetypes: JPG, include: [], exclude: ['side', 'top'], raw_mode: copy, search_mode: dir, unique_mode: path

Found image stickle1.JPG - phenopype-project folder 0__stickle1 created (overwritten)
no meta-data found
Found image stickle2.JPG - phenopype-project folder 0__stickle2 created (overwritten)
no meta-data found
Found image stickle3.JPG - phenopype-project folder 0__stickle3 created (overwritten)
no meta-data found

Found 3 files
--------------------------------------------

The remaining settings are raw_mode, search_mode, unique_mode. raw_mode determines whether raw files should be copied to each folder in the Phenopype directory tree (using copy [default]), or just their filepath (using link), which can be useful if data sets contain many or very large images. search_mode indicates whether only the top directory (dir [default]), or also all subdirectories (recursive) should be included in the search. unique_mode indicates whether files should be unique by their path (filepath [default]) or only by their name (filename) - duplicate files will be skipped.

[19]:
help(pp.project.add_files)
Help on function add_files in module phenopype.main:

add_files(self, image_dir, filetypes=['jpg', 'JPG', 'jpeg', 'JPEG', 'tif', 'png'], include=[], exclude=[], raw_mode='copy', search_mode='dir', unique_mode='path', overwrite=False, resize=1, **kwargs)
    Add files to your project from a directory, can look recursively.
    Specify in- or exclude arguments, filetypes, duplicate-action and copy
    or link raw files to save memory on the harddrive. For each found image,
    a folder will be created in the "data" folder within the projects root
    directory. If found images are in subfolders and search_mode is
    recursive, the respective phenopype directories will be created with
    flattened path as prefix.

    E.g., with "raw_files" as folder with the original image files
    and "phenopype_proj" as rootfolder:

    - raw_files/file.jpg ==> phenopype_proj/data/file.jpg
    - raw_files/subdir1/file.jpg ==> phenopype_proj/data/1__subdir1__file.jpg
    - raw_files/subdir1/subdir2/file.jpg ==> phenopype_proj/data/2__subdir1__subdir2__file.jpg

    Parameters
    ----------
    image_dir: str
        path to directory with images
    filetypes: list or str, optional
        single or multiple string patterns to target files with certain endings.
        "default_filetypes" are configured in settings.py
    include: list or str, optional
        single or multiple string patterns to target certain files to include
    exclude: list or str, optional
        single or multiple string patterns to target certain files to exclude -
        can overrule "include"
    raw_mode: {"copy", "link"} str, optional
        how should the raw files be passed on to the phenopype directory tree:
        "copy" will make a copy of the original file, "link" will only send the
        link to the original raw file to attributes, but not copy the actual
        file (useful for big files)
    search_mode: {"dir", "recursive"}, str, optional
        "dir" searches current directory for valid files; "recursive" walks
        through all subdirectories
    unique_mode: {"filepath", "filename"}, str, optional:
        how to deal with image duplicates - "filepath" is useful if identically
        named files exist in different subfolders (folder structure will be
        collapsed and goes into the filename), whereas filename will ignore
        all similar named files after their first occurrence.
    kwargs:
        developer options

Add pype-configuration files

Step 2

Step. 2: Create configuration files and store them alongside the raw images.

In the next step we prepare the files we added for use with the pype routine by addding a configuration file with the add_config method. Instead of adding the functions one by one we can load presets that are appropriate for the given computer vision analysis.

Currently, the different templates are stored inside a Python file, and can be inspected using dir(pp.presets) to show all existing presets, and print(pp.presets.landmarks_plain) to show the contents.

[20]:
dir(pp.presets)
[20]:
['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'demo1',
 'demo2',
 'ex7',
 'inverted1',
 'landmarks_2',
 'landmarks_plain',
 'object_detection_morph',
 'object_detection_plain',
 'preset1',
 'preset3',
 'preset4',
 'watershed']
[21]:
print(pp.presets.landmarks_plain)

measurement:
- landmarks:
    point_size: 25
    point_colour: green
    label_size: 3
    label_width: 5
visualization:
- draw_landmarks:
    point_size: 25
    point_colour: green
    label_size: 3
    label_width: 5
export:
- save_landmarks

For example, if we want to place landmarks, we can use one of the corresponding presets.

[22]:
myproj.add_config(name = "lm", config_preset="landmarks_plain")
pype config generated from landmarks_plain.
pype_lm.yaml created for 0__stickle1
pype_lm.yaml created for 0__stickle2
pype_lm.yaml created for 0__stickle3

Now all images folders contain a configuration file in yaml format (see Tutorial 2 and the resources section of the Documentation for details).

An imporant feature of add_config is the opportunity to evulate and edit the template before it gets saved in the folders. This is done by setting the flag interactive=True in the arguments. For example, if we globally want to change point and label size of the landmark preset, we can do:

Edit template

Edit the templates before saving them to the image folders.

NOTE 1: The pype function opens a text editor and a python window. To modify the pype configuration templates, by default, the first image in your project directory tree will copied over to the phenopype project root directory. After the windows have opened they can be controlled as described as in Tutorial 2.

NOTE 2: If you have issues with this step, e.g. no text editor window is popping up, make sure you have set the default app for opening yaml files. Furthermore, consult the installation instructions and check if your text editor is configured correctly.

[23]:
myproj.add_config(name = "lm",
                  config_preset="landmarks_plain",
                  interactive=True,
                  overwrite=True                 ## needed because config with the name "lm" already exists in the folders
                 )
pype config generated from landmarks_plain.
E:\git_repos\phenopype\_temp\my_project\phenopype\pype_config_template-lm.yaml


------------+++ new pype iteration 2020:04:20 16:10:59 +++--------------


MEASUREMENT
landmarks
- setting landmarks
VISUALIZATION
draw_landmarks


TERMINATE
pype_lm.yaml created for 0__stickle1 (overwritten)
pype_lm.yaml created for 0__stickle2 (overwritten)
pype_lm.yaml created for 0__stickle3 (overwritten)
[24]:
help(pp.project.add_config)
Help on function add_config in module phenopype.main:

add_config(self, name, config_preset='preset1', interactive=False, overwrite=False, idx=0, **kwargs)
    Add pype configuration presets to all image folders in the project, either by using
    the templates included in the presets folder, or by adding your own templates
    by providing a path to a yaml file. Can be tested and modified using the
    interactive flag before distributing the config files.

    Parameters
    ----------

    name: str
        name of config-file. this gets appended to all files and serves as and
        identifier of a specific analysis pipeline
    preset: str, optional
        can be either a string denoting a template name (e.g. preset1, preset2,
        landamarking1, ... - in "phenopype/settings/presets.py") or a path to a
        compatible yaml file
    interactive: bool, optional
        start a pype and modify preset before saving it to phenopype directories
    overwrite: bool, optional
        overwrite option, if a given pype config-file already exist
    kwargs:
        developer options

Saving and loading a project

Project objects can be saved using a the static method save (static = unbound to any object). This will save the project data to the project’s root directory. Currently, the only useful information stored in the project object is the list of all contained directories. Future releases will make more use of the project object.

NOTE: pp.project.save saves ONLY the project data, all data collected with the pype method or any of the other workflows need to be saved inside the folders using the appropriate export functions.

[40]:
pp.project.save(myproj, overwrite=True)
Project data saved under E:\git_repos\phenopype\_temp\my_project\phenopype\project.data.

To load the project again, add provide the path of the project.data file in the root folder to the load method:

[30]:
import phenopype as pp

myproj = pp.project.load(".") ## "." because we are still in the same working directory
myproj.dirpaths
--------------------------------------------
Project loaded and current working directory changed to

E:\git_repos\phenopype\_temp\my_project\phenopype
--------------------------------------------
[30]:
['data/0__stickle1', 'data/0__stickle2', 'data/0__stickle3']

Applying the pype to project folders

Step 3

Step. 3: Apply pype function image by image.

After adding images and configuration, all is set to process your dataset with high throughput. Using a simple for loop, we go through all directories one by one. You can modify the configuration file and controll the window as described as in Tutorial 2. The skip argument will allow to skip files with a given config name you have already analyzed. This allows you to return to the point where you left off.

NOTE 1: Make sure to specifiy the name of the config file you added before, in this case, “lm”. The config file name serves multiple purposes: on the one hand it tells the pype function which configuration to load, if you have multiple in one directory. On the other hand, the name gets appended to all results files that are produced by this constellation.

NOTE 2: Consult Tutorial 2 to understand pype behavior. For example, the pype will automatically save all collected data, and by default overwrite any existing results files, but the latter only if indicated in the config file.

[31]:
for folder in myproj.dirpaths:
    directory = os.path.join(myproj.root_dir, folder)
    print(directory)
E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle1
E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle2
E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle3
[32]:
os.path.isdir(directory)
[32]:
True
[33]:
for folder in myproj.dirpaths:
    directory = os.path.join(myproj.root_dir, folder)
    pp.pype(directory,
            name="lm",         ## loads the config file "pype_config_lm.yaml". "lm" gets appended to all results files
            skip=True          ## skip=True will skip over any directories that already contain results files with "lm"
           )
E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle1\pype_config_lm.yaml


------------+++ new pype iteration 2020:04:20 16:12:12 +++--------------


MEASUREMENT
landmarks
- setting landmarks
VISUALIZATION
draw_landmarks
EXPORT
save_landmarks
- landmarks saved under E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle1\landmarks_lm.csv.
AUTOSAVE
save_canvas
- canvas saved under E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle1\canvas_lm.jpg.


TERMINATE
E:\git_repos\phenopype\_temp\my_project\phenopype\data/0__stickle2\pype_config_lm.yaml


------------+++ new pype iteration 2020:04:20 16:12:16 +++--------------


MEASUREMENT
landmarks
- setting landmarks
An exception has occurred, use %tb to see the full traceback.

SystemExit:

TERMINATE (by user)

WARNING: To exit: use 'exit', 'quit', or Ctrl-D.

Step 4

Step. 4: Each folder contains all information necessary to reproduce the collected phenopytic data. Ouput from different pype runs can be stored side by side in the same folders.

As mentioned above, it’s possible to have multiple configuration files side by side in phenopype folders. For example, if we want to implement an alternative set of landmarks, we can simply do:

[34]:
myproj.add_config(name = "lm2",                  ## add different name (my not contain underscores or other special characters)
                  config_preset="landmarks_plain"    ## same preset
                 )
pype config generated from landmarks_plain.
pype_lm2.yaml created for 0__stickle1
pype_lm2.yaml created for 0__stickle2
pype_lm2.yaml created for 0__stickle3
[35]:
for img in myproj.dirpaths:
    pp.pype(img,
            name="lm2",         ## loads the config file "pype_config_lm2.yaml". "lm2" gets appended to all results files
            skip=True          ## skip=True will skip over any directories that already contain results files with "lm2"
           )
data/0__stickle1\pype_config_lm2.yaml


------------+++ new pype iteration 2020:04:20 16:13:01 +++--------------


MEASUREMENT
landmarks
- setting landmarks
An exception has occurred, use %tb to see the full traceback.

SystemExit:

TERMINATE (by user)

WARNING: To exit: use 'exit', 'quit', or Ctrl-D.
[39]:
os.listdir(r"data/0__stickle1")
[39]:
['attributes.yaml',
 'canvas_lm.jpg',
 'landmarks_lm.csv',
 'pype_config_lm.yaml',
 'pype_config_lm2.yaml',
 'raw.JPG']
[ ]: