Converting Files To EMDB-SFF

Introduction

Converting files to use the EMDB-SFF data model is one of the core functions of sfftk. This guide describes in detail how to accomplish conversions.

Synopsis

Running

sff convert
sff convert -h
sff convert --help

displays all conversion options.

sff convert
usage: sff convert [-h] [-D DETAILS] [-R PRIMARY_DESCRIPTOR] [-v] [-x]
                                   [--json-indent JSON_INDENT] [--json-sort]
                                   [-o OUTPUT | -f FORMAT] [-p CONFIG_PATH] [-b] [-a] [-m]
                                   [--subtype-index SUBTYPE_INDEX] [--image IMAGE]
                                   [--label-tree LABEL_TREE]
                                   [--subtomogram-average SUBTOMOGRAM_AVERAGE]
                                   [--image-name-field IMAGE_NAME_FIELD]
                                   [--euler-angle-convention {zyz,zxz,xyx,xzx,yxy,yzy}]
                                   [--radians]
                                   [from_file [from_file ...]]

Perform conversions to EMDB-SFF

positional arguments:
  from_file             file to convert from

optional arguments:
  -h, --help            show this help message and exit
  -D DETAILS, --details DETAILS
                                                populates <details>...</details> in the XML file
  -R PRIMARY_DESCRIPTOR, --primary-descriptor PRIMARY_DESCRIPTOR
                                                populates the
                                                <primary_descriptor>...</primary_descriptor> to this
                                                value [valid values: three_d_volume, mesh_list,
                                                shape_primitive_list]
  -v, --verbose         verbose output
  -x, --exclude-geometry
                                                do not include the geometry in the conversion;
                                                geometry is included by default [default: False]
  --json-indent JSON_INDENT
                                                size in spaces of the JSON indent [default: 2]
  --json-sort           output JSON sorted lexicographically [default: False]
  -o OUTPUT, --output OUTPUT
                                                file to convert to; the extension (.sff, .hff, .json)
                                                determines the output format [default: None]
  -f FORMAT, --format FORMAT
                                                output file format; valid options are: sff (XML), hff
                                                (HDF5), json (JSON) [default: sff]
  -p CONFIG_PATH, --config-path CONFIG_PATH
                                                path to configs file
  -b, --shipped-configs
                                                use shipped configs only if config path and user
                                                configs fail [default: False]
  -a, --all-levels      for segments structured hierarchically (e.g. Segger
                                                from UCSF Chimera and Chimera X) convert all segment
                                                leves in the hierarchy [default: False]
  -m, --multi-file      enables convert to treat multiple files as individual
                                                segments of a single segmentation; only works for the
                                                following filetypes: stl, map, mrc, rec, star
                                                [default: False]
  --subtype-index SUBTYPE_INDEX
                                                some file extensions are used by multiple file types
  --image IMAGE         specify the segmented EMDB MAP/MRC file from which to
                                                determine the correct image-to-physical transform
  --label-tree LABEL_TREE
                                                a JSON file produced by running 'sff prep mergemask'
                                                which captures: 1) the mask labels (key:
                                                'mask_to_label') and 2) the hierarchical relationship
                                                between labels (key: 'label_tree')
  --subtomogram-average SUBTOMOGRAM_AVERAGE
                                                the result of subtomogram averaging or a particle mask
                                                for visualisation in CCP4 format (.mrc, .map, .rec)
  --image-name-field IMAGE_NAME_FIELD
                                                the field in the star file that contains the image
                                                name [default: '_rlnImageName']
  --euler-angle-convention {zyz,zxz,xyx,xzx,yxy,yzy}
                                                the Euler angle convention used in the subtomogram
                                                averaging [default: 'zyz' - case insensitive]
  --radians             use radians instead of degrees for Euler angles
                                                [default: False i.e. use degrees]

Quick Start

Output to XML (Default)

sff convert file.seg

Note

New in version 0.7.0.

Without any other arguments, the above command will print out the following warning:

Warning: missing --image <file.map> option to accurately determine image-to-physical transform

sfftk assumes that the original segmented image is specified as an MRC-like file e.g. *.map (images deposited into EMDB), *.mrc or compatible formats such as *.rec used by IMOD, which have metadata to compute this transform.

Specifying the Segmented Image File

First, ensure that the segmented image is specified in an MRC-like format. For images deposited in the EMDB use the deposited image.

Use the --image argument to specify the segmented image.

sff convert --image file.map file.seg

Some segmentation file formats specify the transform and sfftk will fall back on this if the image is not specified.

Note

Viewing transforms

To view the transform inferred from the image file use sff view --transform file.map. Visit the view documentation pages Show Image-to-Physical Transform for more information.

Specify Output File

sff convert file.seg -o file.sff
sff convert file.seg --output /path/to/output/file.sff
sff convert file.seg -o file.hff
sff convert file.seg -o --exclude-geometry file.json # only metadata; no geometrical data

Specify Output Format

sff convert file.seg -f hff
sff convert file.seg --format hff

EMDB-SFF Format Interconversion

sff convert file.sff --output /path/to/output/file.hff
sff convert file.hff --format json
sff convert file.sff --format sff # reduntant but should work

Verbose Operation

sff convert -v file.hff
sff convert --verbose file.hff

Include All Segments

When a segmentation is defined hierarchically only the top level of segments (i.e. those just under the root) will be included by default. Use -a/--all-levels argument to include all segments. This can lead to very large files. Segger (.seg) segmentations are one such example.

sff convert -a file.seg
sff convert --all-levels file.seg

Set Details

sff convert -D "Lorem ipsum dolor..." file.seg # strings must be quoted (single/double)
sff convert --details "Lorem ipsum dolor..." file.seg

Change Primary Descriptor

sff convert -R shape_primitive_list file.surf # IMOD file
sff convert --primary-descriptor shape_primitive_list file.surf # IMOD file

Input Formats

sfftk can convert several segmentation file formats (see Supported Formats) into EMDB-SFF files.

Output Formats

EMDB-SFF files can be output as XML (.sff, .xml), HDF5 (.hff, .h5 or .hdf5) or JSON (.json).

  • Both XML and HDF5 are quite compact and in many cases would be smaller than the original segmentation file.

  • JSON EMDB-SFF files may exclude geometry if created with -x/--exclude-geometry flag; they are primarily used as temporary files during annotation for speed.

  • Interconversion of the three formats is lossless (with the exception of geometrical data when converting to JSON - if geometrical data is excluded).

There are two ways to perform conversion:

  • Specifying the output path with -o/--output flag

  • Specifying the output format with -f/--format flag

Specifying the output path with -o/--output flag

Conversion is performed as follows:

sff convert file.seg -o file.sff
sff convert file.seg --output /path/to/output/file.sff

The output file extension determines the output format i.e.

sff convert file.seg -o file.hff

will result in an HDF5 file while

sff convert file.seg --output file.json

will be a JSON file.

Specifying the output format with -f/--format flag

The -f/–format options ensures that the output file will be in the same directory as the original segmentation file. The -f flag takes one of three values:

  • sff for XML files

  • hff for HDF5 files

  • json for JSON files.

Any other value raises an error.

sff convert file.seg -f hff
sff convert file.seg --format hff

The default format (if none is specified) is sff (XML).

sff convert file.seg

results in file.sff as output.

EMDB-SFF Format Interconversion

It is also possible to perform interconversions between XML, HDF5 and JSON EMDB-SFF files.

sff convert file.sff --output /path/to/output/file.hff

or using --format

sff convert file.hff --format json

Even null conversions are possible:

sff convert file.sff --format sff

Conversions from JSON to XML/HDF5 where the latter excluded the geometry will not reinstate the geometrical description information.

Verbose Operation

As with many Linux shell programs the -v/--verbose option prints status information on the terminal.

sff convert --verbose file.hff
Tue Sep 12 15:29:18 2017 Seting output file to file.sff
Tue Sep 12 15:29:18 2017 Converting from EMDB-SFF (HDF5) file file.hff
Tue Sep 12 15:30:03 2017 Created SFFSegmentation object
Tue Sep 12 15:30:03 2017 Exporting to file.sff
Tue Sep 12 15:30:07 2017 Done

Including All Segments

Segger segmentations include hundreds to thousands of sub-segmentations due to how the algorithm it uses (watershed algorithm) to segment the volume. The segmentations thus form a tree with the root having an ID of zero. Mostly, we are only interested in the children of the root which are in themselves roots of another tree. By default we only transfer the children of the global root into the EMDB-SFF file.

Consider the following tree of segments:

_images/converting-01.png

The segmentation contains different levels commencing from the root down, with children segments contained within parent segments. Specifying -a/--all-levels treats all children of the root as segments and includes all segments. Therefore, running

sff convert --all-levels file.seg

on the above will produce an EMDB-SFF file with hundreds of segments. The default operation results in a more compact file.

Specify Details

The EMDB-SFF data model provides for an optional <details/> tag for auxilliary information. The contents of this option will be put into <details/>.

sff convert --details "Lorem ipsum dolor..." file.seg

Todo

Allow a user to pass a file whose contents will be inserted into <details/>.

Changing The Primary Descriptor

The EMDB-SFF data model provides for three possible geometrical descriptors: meshes (mesh_list), shape primitives (shape_primitive_list) and 3D volumes (three_d_volume).

In some cases, such as with IMOD segmentations, more than one geometrical descriptor may have been specified for the same segmentations.

The mandatory <primaryDescriptor/> field specifies the main geometrical descriptor to be used when performing conversions and other processing tasks. Only valid values are allowed; otherwise a ValueError is raised.

The table below shows valid primary descriptors by file type.

File format

Valid primary descriptors

AmiraMesh

three_d_volume

AmiraHxSurface

mesh_list

SuRVoS

three_d_volume

CCP4 masks

three_d_volume

IMOD

mesh_list (default), shape_primitive_list

Segger

three_d_volume

STL

mesh_list

Note

IMOD files must have a mesh generated using imodmesh command. Open contours will need to be converted to tubes using the -t <obj_list> option. For example, for an IMOD file file.mod with three objects all of which are open contours we can run:

~$ imodmesh -t 1,2,3 -d 10 -E -Z 1.0 file.mod

which convert to tubes objects 1, 2 and 3 (-t 1,2,3), cap ends (-E) with domes at a scale of 1.0 (-Z 1.0) and a diameter of 10 pixels (-d 10).

You can find out much more about using imodmesh at its documentation page.

Note that the primary descriptor should only be changed to a value of a geometrical descriptor that is actually present in the EMDB-SFF file.

Working with Multifile Segmentations

Some of the segmentation file formats supported are designed to hold one segment per file. Therefore, representing a complete segmentation will require multiple files.

Currently the following file formats are multifile by design:

  • CCP4 and related files - these files store segments as a 3D volume with segment region marked by specific voxel values (e.g. 1 for in segment voxels and 0 for the background. Specific file formats have .mrc, .map and .rec.

  • Stereolithography files - while it is possible to concatenate several STL files into one, STL files do not contain metadata such as segment colour. Therefore, it is best to handle them as multifiles. STL files have a .stl extension.

Multifiles utilise the -m/--multi-file argument followed by all the files each of which should specify a single segment.

sff convert -m file1.map file2.map file3.map

The above command will use default options and write an EMDB-SFF file to file1.sff. Alternatively, the user should specify the output file

sff convert -m file1.map file2.map file3.map --output file.sff

Converting Subtomogram Averages

Subtomogram averages are the result of subtomogram averaging and are typically stored in CCP4 format (.mrc, .map, .rec). sfftk currently only works with RELION subtomogram averages in .star format. Conversion of subtomogram averages is similar to that of other file formats in that the user specifies the file to convert and the output file. Optionally, the user may provide the image file from which the subtomogram average was derived using the --image option to accurately determine the image-to-physical transform.

sff convert --image file.map file.star

Specifying the Particle File

It is also possible to specify the particle file from which the subtomogram average was derived using the --subtomogram-average option. This is useful when the subtomogram average is a particle mask for visualisation. However, given that the particle will be of higher resolution that the original tonmogram, the user may need to downsample the particle for visualisation purposes.

sff convert --subtomogram-average particle.mrc --image image.mrc file.star

A tool that can be used to prepare artificial particles is available from https://github.com/emdb-empiar/masks.git.

Multiple Subtomogram Averages

As with multifile segmentations, it is possible to convert multiple subtomogram averages into a single EMDB-SFF file. The user should use the -m/--multi-file option to specify the subtomogram averages. However, only one --subtomogram-average option should be used to specify the particle file.

sff convert -m file1.star file2.star file3.star --output file.sff

Specifying Euler Angle Convention

The Euler angle convention used in the subtomogram averaging can be specified using the --euler-angle-convention option. The default is zyz. The case is not important (zyz``=``ZYZ).

Specifying Angle Type

The user can specify whether the Euler angles are in degrees or radians using the --radians option. The default is degrees.

Specifying Configurations To Use

sfftk makes use of persistent configurations which affect how certain operations are performed. There are three types of configurations detailed in the dedicated documentation on configs (see Where Configurations Are Stored) in decreasing order of priority:

  • custom configs defined in a path/to/sff.conf file;

  • user configs stored in ~/.sfftk/sff.conf;

  • shipped configs which will sit with the installed sfftk package.

Custom configs are invoked using the -p/--config-path option:

sff convert -p path/to/configs file.seg
sff convert --config-path path/to/configs file.seg

User configs are default and require no special flags.

Shipped configs use the -b/--shipped-configs flag with no arguments:

sff convert -b file.am
sff convert --shipped-configs file.am