sfftk.readers
package
Note
Each module in this package implements an ad hoc reader for a particular file type. The naming convention is
<ext>reader
, where<ext>
is a short and unique description for that file format, typically (but not exclusively) the file extension for the file format e.g.am
for AmiraMesh henceamreader
module. By ad hoc we mean that the module is designed to conform to the data model of the file format. Theformats
package adapts the ad hocness to that of EMDB-SFF data model.Each module should implement at top-level function get_data that takes a string filename and
*args
,**kwargs
and returns an object representing a segmentation from the relevant file format.
AmiraMesh reader
sfftk.readers.amreader
Ad hoc reader for AmiraMesh files
SuRVoS reader
sfftk.readers.survosreader
Ad hoc reader for SuRVoS segmentation files
- class sfftk.readers.survosreader.SuRVoSSegmentation(fn, dataset='/data', mask_value=1)[source]
Bases:
object
A SuRVoS segmentation
SuRVoS segmentations are based on integer annotations. To the best of my understanding no textual information is saved in segmentation.
- property colours
A list of ordered colours
- property data
The underlying segmentation data
- property labels
A list of labels used
- property names
A list of ordered names
- property shape
The shape of the segmentation volume
- sfftk.readers.survosreader.get_data(fn, *args, **kwargs)[source]
Main entry point for reader
We need to return an object with a handle on the segments:
each segment is a 3D volume with only that segments voxel values retained
we reference each segment on the segmentation through an index-like interface e.g. s1 = Segmentation[1] returns the segmentation with annotation value of ‘1’
s = SuRVoSSegmentation(fn) s.segment_ids() # returns a list of segment IDs s[s.segment_ids()[0] # gets the first segment
CCP4 mask reader
sfftk.readers.mapreader
Ad hoc reader for CCP4 masks
References
The following article is useful as it exposes many internals of map files:
- class sfftk.readers.mapreader.Map(fn, header_only=False, *args, **kwargs)[source]
Bases:
object
Class to encapsulate a CCP4 mask
- fix_mask(mask_value=1.0, voxel_values_threshold=3)[source]
Try to fix this mask
A mask should have only two voxel values: some non-zero value (usually 1) and zero (0) for masked-out regions. Sometimes the process of manipulating the mask (e.g. volume rotation) relies on interpolation, which converts a mask to have more than two voxel values. This function attempts to fix that provided that the number of voxel values is not greater than voxel_value_threshold.
- property is_mask
Determine if this is a mask or not
- Return bool status:
mask or not
- property labels
A string of labels found in the CCP4 mask file
- read(f, header_only=False)[source]
Read data from an EMDB Map mask
- Parameters:
f (file) – file object
header_only (bool) – only read the header [default: False]
- Return int status:
0 on success; fail otherwise
- property skew_matrix
Skew matrix as a numpy array
- property skew_matrix_data
Skew matrix data as a space-separated string
- property skew_translation
Skew translation as a numpy array
- property skew_translation_data
Skew translation as a space-separated string
- property voxels
The voxel mask
- sfftk.readers.mapreader.compute_transform(fn, header_only=True)[source]
Compute the transform that connects the image to physical space
- Parameters:
- Returns:
a 3x4 transformation matrix
- Return type:
IMOD reader
sfftk.readers.modreader
Ad hoc reader for IMOD (.mod) files.
.mod files are chunk files and loosely follow the Interchange File Format (IFF). In summary, IFF files consist of a four-byte header (all caps chunk name e.g. ‘IMOD’) followed by an integer of the number of bytes in the chunk. The chunk is then structured according to the author’s design requirements. Not all .mod chunks follow this convention (e.g. ‘OBJT’ chunks do not include the size of the chunk immediately after the chunk ID.
A description of the structure of .mod files can be found at the following URL: https://bio3d.colorado.edu/imod/betaDoc/binspec.html. This module consists of a set of classes each identified by the respective chunk names. The following patterns are observed in the design of these classes:
The name of the class is the name of the chunk e.g. OBJT class refers to OBJT chunks.
All classes have one public method: read(f), which takes a file handle and returns a file handle at the current unread position.
Some chunks are nested (despite the serial nature of IFF files). Contained chunks are read with public methods defined as
add_<chunk>
e.g. OBJT objects are containers of CONT objects and therefore have aadd_cont()
method which takes a CONT object as argument. Internally, container objects use (ordered) dictionaries to store contained objects.All chunk classes inherit from
object
class and have theobject.__repr__()
method implemented to print objects of that class.
In addition, there are several useful dictionary constants and functions and classes (flags) that interpret several fields within chunks.
Note
The order of classes is based on their position in the module. This can be changed if needed.
The most important classes are modreader.IMOD
, modreader.OBJT
,
modreader.CONT
and modreader.MESH
- class sfftk.readers.modreader.CLIP(f)[source]
Bases:
object
CLIP chunk class
- Parameters:
f (file) – file handler for the IMOD segmentation file
- class sfftk.readers.modreader.CLIP_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the CLIP chunk
- class sfftk.readers.modreader.CONT(f)[source]
Bases:
object
CONT chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.CONTOUR_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the CONT chunk
- class sfftk.readers.modreader.COST(f)[source]
Bases:
object
COST chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.COST_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the COST chunk
- class sfftk.readers.modreader.FLAGS(int_value, num_bytes, endian='little')[source]
Bases:
object
Base class of bit flags
- class sfftk.readers.modreader.IMAT_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the IMAT chunk.
- class sfftk.readers.modreader.IMOD(f)[source]
Bases:
object
Class encapsulating the data in an IMOD file
The top-level of an IMOD file is an IMOD chunk specifying various data members.
- property x_length
The length of X side of the image in angstrom
- property y_length
The length of Y side of the image in angstrom
- property z_length
The length of Z side of the image in angstrom
- class sfftk.readers.modreader.MCLP(f)[source]
Bases:
object
MCLP chunk class
Model clipping plane parameters
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.MCLP_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the MCLP chunk
- class sfftk.readers.modreader.MEPA(f)[source]
Bases:
object
MEPA chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.MEPA_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the MEPA chunk
- class sfftk.readers.modreader.MESH(f)[source]
Bases:
object
MESH chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.MESH_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the MESH chunk
- class sfftk.readers.modreader.MEST(f)[source]
Bases:
object
MEST chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.MINX(f)[source]
Bases:
object
MINX chunk class
Model to image transformation Documented as 72 bytes but works with 76 bytes
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.MODEL_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the MODEL chunk
- class sfftk.readers.modreader.MOST(f)[source]
Bases:
object
MOST chunk class
Class encapsulating storage parameters for the top-level
sfftk.readers.modreader.IMOD
chunk.- Parameters:
f (file) – file handle of the IMOD segmentation
- class sfftk.readers.modreader.OBJECT_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the OBJT chunk
- class sfftk.readers.modreader.OBJECT_SYM_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Additional flags in the OBJT chunk
- class sfftk.readers.modreader.OBJT(f)[source]
Bases:
object
OBJT chunk class
An IMOD file has several
sfftk.readers.modreader.OBJT
chunks, each of which contain the data either as contours (sfftk.readers.modreader.CONT
) or meshes (sfftk.readers.modreader.MESH
). OBJT chunks also containsfftk.readers.modreader.CLIP
,sfftk.readers.modreader.IMAT
,sfftk.readers.modreader.MEPA
and asfftk.readers.modreader.OBST
storage chunk.
- class sfftk.readers.modreader.SLAN(f)[source]
Bases:
object
SLAN chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.STORE(f)[source]
Bases:
object
Generic storage class for models (MOST), objects (OBST), contours (COST), and meshes (MEST)
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.VIEW(f, first_view=False)[source]
Bases:
object
VIEW chunk class
- Parameters:
f (file) – file handle of the IMOD segmentation file
- class sfftk.readers.modreader.VIEW_FLAGS(*args, **kwargs)[source]
Bases:
FLAGS
Flags in the VIEW chunk
- sfftk.readers.modreader.angstrom_multiplier(units)[source]
Determine a multiplier to convert units to angstrom
\[1\textrm{Å} = 10^{-10}\textrm{m} \Rightarrow 1\textrm{m} = 10^{10}\textrm{Å}\]Consider some generic unit U with a power of 10
\[1\textrm{U} = 10^x \textrm{m} \Rightarrow 1\textrm{m} = 10^{-x}\textrm{U}\]We need a unit factor that relates Å to U. Dividing both expressions for \(1\textrm{m}\)
\[\begin{aligned} 1 = \frac{10^{10}}{10^{-x}} \textrm{Å per U (Å/U)} = 10^{10 + x} \textrm{Å/U} \end{aligned}\]To convert U to Å we multiply by \(10^{10 + x}\) Å/U
Example: To convert 3 nm to Å we consider that \(x = -9\) for nm. So:
\[\begin{split}\begin{align} 3\textrm{nm} & = 3\textrm{nm} \times 10^{10 + (-9)}\textrm{Å/nm} \\ & = 3\textrm{nm} \times 10^{10 - 9}\textrm{Å/nm} \\ & = 3\textrm{nm} \times 10\textrm{Å/nm} \\ & = 30\textrm{Å} \end{align}\end{split}\]
- sfftk.readers.modreader.find_chunk_length(f)[source]
Determine the size (in bytes) of the current chunk. Also, return the name of the next chunk.
Assumes that current position in the file is immediately after the chunk header.
- sfftk.readers.modreader.get_data(fn)[source]
Extract chunks from IMOD model file pointed to by the handle f
- Parameters:
fn (str) – name of IMOD file
- Raises:
ValueError – if it doesn’t start with an IMOD chunk
ValueError – if the file lacks an IEOF chunk
- sfftk.readers.modreader.print_model(fn)[source]
Pretty print the IMOD model
Arguments: :param str fn: name of IMOD file # :param mod: an object of class IMOD containing all data # :type mod:
sfftk.readers.modreader.IMOD
# :param file output: the name of the output to which data should be sent
Segger reader
sfftk.readers.segreader
Ad hoc reader for Segger files
- class sfftk.readers.segreader.SeggerSegmentation(fn, *args, **kwargs)[source]
Bases:
object
Encapsulation of a Segger segmentation
- property descriptions
Returns a dictionary of descriptions for each region
- property file_name
File name
- property file_path
File path
- property format
Format of the segmentation
- property format_version
Format version
- property header
Collate group-level attributes in a dictionary
- property ijk_to_xyz_transform
Image-to-physical space transform
- property map_level
Map level (contour level)
- property map_path
Path to map file
- property map_size
Map dimensions (I, J, K)
- property mask
The mask (TM)
- property name
Name of the segmentation
- property parent_ids
A dictionary of region_ids to parent_ids
- property ref_points
A dictionary of region_ids to ref_points
- property region_colours
A dictionary of region_ids to region_colors
- property region_ids
An iterable of region_ids
- property root_parent_ids
The
- simplify_mask(mask, replace=True)[source]
Simplify the mask by replacing all region_ids with their root_parent_id
The region_ids and parent_ids are paired from which a tree is inferred. The root of this tree is value 0. region_ids that have a corresponding parent_id of 0 are penultimate roots. This method replaces each region_id with its penultimate parent_id. It simplifies the volume.
- property smoothing_levels
A dictionary of region_ids to smoothing_levels
- sfftk.readers.segreader.get_data(fn, *args, **kwargs)[source]
Gets segmentation data from a Segger file
- sfftk.readers.segreader.get_root(region_parent_zip, region_id)[source]
Return the penultimate parent_id for any region_id.
The penultimate parent is one layer below the root (0). The set of penultimate parents are the distinct regions contained in the segmentation. They correspond to putative functional regions.
STAR reader
sfftk.readers.starreader
STAR files are generic data modelling files much in the same way as XML files. RELION uses a particular format of STAR file to store particle data. This module provides several classes to read STAR files: a generic reader and two RELION-specific ones.
In practice, the whole STAR file is loaded into memory during the parsing process. The API we provide enables the user to access the main ways the data is stored in the STAR file: key-value pairs and tables. This reader is designed only to extract the data from the STAR file and does not attempt to understand STAR file conventions.
Generic STAR files can have any number of key-value pairs and tables. For our use case, we are interested in capturing the relationship between a refined particle (subtomogram average) and a source tomogram. Since each such particle is expressed in terms of its orientation within the tomogram, we need to capture the affine transform that maps the particle to the tomogram.
Therefore, this imposes some constraints on the STAR file:
The STAR file must have a table with the following columns:
_rlnCoordinateX
,_rlnCoordinateY
,_rlnCoordinateZ
,_rlnAngleRot
,_rlnAngleTilt
,_rlnAnglePsi
. These columns represent the position and orientation of the particle in the tomogram.The STAR file must reference only one tomogram in the
_rlnImageName
column. This is because we are only interested in the relationship between a single particle and a single tomogram. If the STAR file references multiple tomograms, then a prior preparation step will need to be performed to partition the STAR file into multiple files, each referencing a single tomogram. (more on that to come)
For this reason, we distinguish between ‘composite’ RELION STAR files and ‘simple’ RELION STAR files. Composite RELION STAR files must be partitioned into simple RELION STAR files before they can be converted into EMDB-SFF files.
Anatomy of a STAR file
A STAR file is made up of one or more data blocks.
data_block_1
In the example above, the name of the data block is block_1
. The name is optional.
Data is stored in the form of key-value pairs and tables. Key-value pairs are simple and are stored in a dictionary.
_key value
Tables are designed by the loop_
keyword followed by a sequence of tags/labels each of which is prefixed by an
underscore. Each row after the tags/labels is then a row with values for each tag/label.
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 1 N N . LYS A 1 7 ? 12.364 -13.639 8.445 1.00 54.67 ? 527 LYS A N 1
ATOM 2 C CA . LYS A 1 7 ? 11.119 -12.888 8.550 1.00 49.59 ? 527 LYS A CA 1
ATOM 3 C C . LYS A 1 7 ? 9.961 -13.651 7.926 1.00 44.77 ? 527 LYS A C 1
ATOM 4 O O . LYS A 1 7 ? 9.055 -14.126 8.617 1.00 49.39 ? 527 LYS A O 1
ATOM 5 C CB . LYS A 1 7 ? 11.255 -11.538 7.841 1.00 49.41 ? 527 LYS A CB 1
ATOM 6 C CG . LYS A 1 7 ? 10.169 -10.531 8.174 1.00 53.16 ? 527 LYS A CG 1
ATOM 7 C CD . LYS A 1 7 ? 10.523 -9.771 9.432 1.00 59.71 ? 527 LYS A CD 1
ATOM 8 C CE . LYS A 1 7 ? 11.779 -8.947 9.195 1.00 63.60 ? 527 LYS A CE 1
ATOM 9 N NZ . LYS A 1 7 ? 12.353 -8.381 10.443 1.00 64.85 ? 527 LYS A NZ 1
ATOM 10 N N . ARG A 1 8 ? 10.011 -13.762 6.603 1.00 40.03 ? 528 ARG A N 1
All values are treated as strings.
RELION STAR files
These adhere to the following conventions (https://relion.readthedocs.io/en/latest/Reference/Conventions.html#star-format):
Euler angle convention used is ZYZ
The coordinate system is right-handed with positive rotations being anticlockwise/counterclockwise
_rlnCoordinateX
,_rlnCoordinateY
,_rlnCoordinateZ
are tomogram PIXEL positions_rlnAngleRot
,_rlnAngleTilt
,_rlnAnglePsi
are angles in DEGREESThe first rotation is called
rlnAngleRot
and is around the Z-axis;The second rotation is called
rlnAngleTilt
and is around the new Y-axis;The third rotation is called
rlnAnglePsi
and is around the new Z axis;
If present
_rlnOriginXAngstrom
and_rlnOriginYAngstrom
are in ANGSTROMS
API
There are two main classes to read STAR files.
sfftk.readers.starreader.StarReader
: a generic reader that can parse any STAR file;sfftk.readers.starreader.RelionStarReader
: a reader that can parse RELION STAR files, which validates the constraints described above.
Both readers provide the same API. The examples below use the generic reader but the same applies to the RELION reader.
First, users must instantiate the reader:
star_reader = StarReader()
Then, users must parse the file:
star_reader.parse('file.star')
The reader will then parse the file and store the data in memory. The user can then access the data in the following ways:
star_reader.keys()
: returns a list of key-value pairs;
print(star_reader.keys) # show key-value pairs
print(star_reader.keys['key']) # get the value for the given key
star_reader.tables
: returns a dictionary of tables where the key is the name of the table and the value is asfftk.readers.starreader.StarTable
object and each row in the table is asfftk.readers.starreader.StarTableRow
object. By default, we automatically infer the type of the values in the table. If the user wishes to disable this behaviour, they can passinfer_types=False
to theparse
method.
star_reader.parse('file.star', infer_types=False) # disable type inference
We can now access each table by name:
print(star_reader.tables) # print the list of tables
print(star_reader.tables['_atom_site'] # the name of a table is name of the label prefix (separated by a period)
print(star_reader.tables['_atom_site'].columns) # print the columns in the table
print(star_reader.tables['_atom_site'][0]) # print the first row in the table
print(star_reader.tables['_atom_site'][0][4]) # print the fifth column in the first row
Note
For RELION STAR files, the name of the table is _rln
.
print(star_reader.tables['_rln'])
Additionally, each row can be converted into an affine transform matrix using the to_affine_transform
method:
print(star_reader.tables['_rln'][0].to_affine_transform()) # print the affine transform matrix for the first row
print(star_reader.tables['_rln'][0].to_affine_transform(axes="ZXZ")) # change the orientation convention
print(star_reader.tables['_rln'][0].to_affine_transform(degrees=False)) # use radians instead of degrees
- class sfftk.readers.starreader.RelionStarReader(image_name_field='_rlnImageName', *args, **kwargs)[source]
Bases:
StarReader
StarReader
subclass which applies some constraints to the STAR file. These constraints are:The STAR file must have one and only one table
The table must have the following columns:
_rlnCoordinateX
,_rlnCoordinateY
,_rlnCoordinateZ
,_rlnAngleRot
,_rlnAngleTilt
,_rlnAnglePsi
. These columns represent the position and orientation of the particle in the tomogram.The STAR file must reference only one tomogram in the
_rlnImageName
column. This is because we are only interested in the relationship between a single particle and a single tomogram. If the STAR file references multiple tomograms, then a prior preparation step will need to be performed to partition the STAR file into multiple files, each referencing a single tomogram. (more on that to come)
reader = RelionStarReader() reader.parse('my_star_file.star') print(reader) # output some information e.g. number of rows, fields, etc. # if no warnings are raised then the file was successfully parsed for row in reader: # we can print the affine transform matrix for each row print(row.to_affine_transform())
- class sfftk.readers.starreader.StarReader[source]
Bases:
object
A generic star file reader. The user must specify which fields are required/optional and the reader will then assess whether a provided file has the specified field.
Once the file is parsed, the user can then iterate over the object to get the required data.
reader = StarReader() reader.parse('my_star_file.star') print(reader) # output some information e.g. number of rows, fields, etc. print(reader.keys()) print(reader.tables) print(reader.tables['default']) print(reader.tables['default'].columns) print(reader.tables['name']) # if 'name' exists # if no warnings are raised then the file was successfully parsed for row in reader.tables: # read from the 'default' table or the only table present # do something with the row # we drop the leading underscore so as not to imply private variables print(row.col1, row.col2, row.col3, row.col4, row.col5, row.col6)
- property tables
Return all the tables in the STAR file
- class sfftk.readers.starreader.StarTable(loop, name, infer_types=True, *args, **kwargs)[source]
Bases:
object
A section of tabular data in a star file
- property columns
Return the columns in this block
- property header
Return the header of the table
- class sfftk.readers.starreader.StarTableRow(name, loop, values, axes='ZYZ', degrees=True)[source]
Bases:
object
Each row in a star table
Stereolithography reader
sfftk.readers.stlreader
Ad hoc reader for Stereolithography (STL) files
Depends on the numpy-stl package
Reads both ASCII and binary files
Amira HyperSurface reader
sfftk.readers.surfreader
Ad hoc reader for Amira HyperSurface files
- class sfftk.readers.surfreader.HxSurfSegment(material, vertices, triangles, prune=True)[source]
Bases:
object
Generic HxSurface segment class
The
ahds
package provides a better abstraction of this filetype- property colour
The colour of the segment
- property id
The segment ID
- property name
The name of the segment
- property triangles
A list of triangles (lists with 3 vertex IDs) in this segment
- property vertices
A dictionary of vertices in this segment indexed by vertex ID
- sfftk.readers.surfreader.extract_segments(af, *args, **kwargs)[source]
Extract patches as segments
- Parameters:
af (
ahds.AmiraFile
) – an AmiraFile object- Return dict segments:
a dictionary of segments with keys set to Material Ids (voxel values)
- sfftk.readers.surfreader.get_data(fn, *args, **kwargs)[source]
Get segmentation data from the Amira HxSurface file
- Parameters:
fn (str) – file name
- Return header:
AmiraHxSurface header
- Rtype header:
- Return dict segments:
segments each of class
sfftk.readers.surfreader.HxSurfSegment