NIFTI plain and simple

I work with medical images, and that means a lot of work with medical image file formats. DICOM is the big one, since it is the industry standard, but I've also spent a lot of time with MINC and NIFTI, which are central to research within the clinical neurosciences. This post regards NIFTI, which is both the simplest format of the three, and at times the most vexing. NIFTI is a fixed header format, so each piece of information resides at a specific byte offset within the file. Writing code to read or write such a file should be easy, in fact that is exactly why it uses a fixed header. However, the use of a fixed header causes problems with the evolution of the format.

When it came time to make a "NIFTI-2" specification, it was necessary to create a completely new header structure that was incompatible with the old one. This means that old NIFTI-1 software cannot read the new files, and new NIFTI software has to essentially contain two decoders: one for NIFTI-1, and one for NIFTI-2. Also, the people who wrote the NIFTI-2 header goofed: the header contains elements that must be 8-byte aligned (i.e. 64-bit aligned), but the header itself is 540 bytes long, which is not divisible by 8. This means that, whenever a 64-bit compiler creates a nifti_2_header struct in memory, it allocates an extra 4 bytes of padding. If one calls sizeof(nifti_2_header) on a 64-bit machine, it returns 544 instead of 540. On my first attempt to read a NIFTI-2 file, I used read(sizeof(nifti_2_header)) and ended up reading 4 bytes too many! The silliest part of this is that the last element of nifti_2_header is just a 15-byte space reserved for future expansion, so they could have reserved 19 bytes instead and avoided this issue altogether! As a "solution", the official NIFTI-2 header uses the dangerous "#pragma pack(1)" setting to force the compiler to forgo alignment in favor of making sizeof() return the desired value.

Even before NIFTI-2 arrived, there were problems because the NIFTI-1 header was based on the old Analyze-7.5 header, but it had to "repurpose" several header fields in incompatible ways. This meant that, by avoiding these fields, people could write NIFTI files that could be read by software designed for Analyze-7.5 files. Conversely, it was altogether possible to write NIFTI files that were, essentially, corrupted Analyze-7.5 files. That is why fixed-header file formats are bad. Instead of simply obsoleting old header fields and adding new ones, any significant evolution of the format requires changing the header in incompatible ways.



DICOM images have been a big part of my job for over a decade, but until recently I swore that I would never write my own programming library for reading DICOM files. Not when there were mature, free DICOM libraries that I could use instead. But I needed something that would integrate seamlessly with the image analysis software that we are writing, which is all based on VTK. Hence, the genesis of vtk-dicom, a DICOM library specifically for VTK.

There are a few features that I am especially proud of:

  • It can handle files with over 4 dimensions (both read and write).
  • It works with the new "enhanced" multi-frame DICOM files.
  • DICOM scans are managed as full series, rather than one image at a time.
  • The meta data is stored in hash table with a very compact interface.
  • The dictionary is built as a static hash table right in the code.
  • Both the reader and write use an 8k buffer for efficient IO operations.

Getting source and build version with CMake.

These are some useful snippets of CMake code that I wrote today, so I thought that I's share them. They aren't CMake macros or functions or anything tidy like that, in fact I have them in the main CMakeLists.txt file of a project that I'm currently working on. The purpose of this code is to make it possible for my programs to print out full version information, so that even if I am working on some strange branch and haven't tagged a release, I'll always be able to check to see what source I used to produce that program:

myprogram --version
myprogram version 1.0.0 (master 1a8e6107, 16 Jun 2013, 17:50:26)

The first thing is to grab the git ref for the current head. I read ".git/HEAD" directly, rather than calling upon git itself (the "git describe" command is the more official way of getting such info, but it requires that at least one tag exists in the repository).

# Store the git hash of the current head
    string(REGEX REPLACE "^ref: *([^ \n\r]*).*" "\\1"

Now that we have a PROJECT_SOURCE_VERSION variable set in CMake, the next thing is to grab the date and time and put them into PROJECT_BUILD_DATE and PROJECT_BUILD_TIME. This can be done easily on Linux and OS X by calling the "date" command, but Windows does not provide any way of formatting the output of it's date command so some string manipulation is required:

# Store the build date
  execute_process(COMMAND "cmd" " /c date /t"
  string(REGEX REPLACE "[^0-9]*(..).*" "\\1" MONTH "${DATE}")
  set(MONTHS ""
    "Jan" "Feb" "Mar" "Apr" "May" "Jun"
    "Jul" "Aug" "Sep" "Oct" "Nov" "Dec")
  string(REGEX REPLACE "[^/]*/(..)/(....).*" "\\1 ${MONTH} \\2"
  execute_process(COMMAND "cmd" " /c echo %TIME%"
  string(REGEX REPLACE "[^0-9]*(..:..:..).*" "\\1"
  execute_process(COMMAND "date" "+%d %b %Y/%H:%M:%S"
  string(REGEX REPLACE "([^/]*)/.*" "\\1"
  string(REGEX REPLACE "[^/]*/([0-9:]*).*" "\\1"

This isn't quite everything: this only puts the needed information into CMake variables. It still necessary to add a file that can be configured by CMake to make these values available to your C++ code:

/* Source and Build version info. */

Now your program has a set of string macros that it can print when the user asks what version of the source was used to build it.

Nearest power of two

I was inspired by some code that I saw on StackOverflow that computed base-two logarithms in O(log2(N)) operations, where N is the number of bits in the integer. The following function computes the nearest power of two that is not less than a given value. I've compiled it with gcc and clang, and it optimizes into branch-free code.

 int nearest_power_of_two(int x)
   unsigned int z = ((x > 0) ? x - 1 : 0);
   z |= z >> 1;
   z |= z >> 2;
   z |= z >> 4;
   z |= z >> 8;
   z |= z >> 16;
  return (int)(z + 1);

Here is a related function, a simple log2() calculator for 64-bit values. Unlike the function above, it rounds down instead of rounding up. Note that the compiler should automatically unroll the loop:

 int log2_int(unsigned long long x)
   static const unsigned long long t[6] = {
   int y = 0;
   int j = 32;
   int i;
   for (i = 0; i < 6; i++) {
     int k = (((x & t[i]) == 0) ? 0 : j);
     y += k;
     x >>= k;
     j >>= 1;
   return y;

More efficient implementations exist that use CPU intrinsics, but I love the simplicity of these algorithms. I can't claim credit for either of these, since they are just slight modifications of existing code.

Fast power-of-two check

power of two

Is N a power of two?

This fits into the "bit trick" category. How can you check if a number is a power of two, without writing a loop? Here is a method I found on a couple websites a few years back (can't remember which ones):

  if (n != 0 && (n & (n-1)) == 0) {
      /* n is power of two */

The reason that "n & (n-1)" is zero for a power-of-two is shown in the picture. Subtracting one will only clear the highest bit if it is the only bit that is set (and powers-of-two are the only numbers that have only one bit turned on). However, you also have to check if n is zero, because anything & zero is zero.

A very compact list type

I still do some programming in C, and like to indulge myself by creating my own container types instead of using the standard template library. One of my favorite custom containers is a null-terminated list that has one-third the memory overhead of std::vector and std::list but is still efficient and growable.

Let’s say that you need to store a list of pointers or IDs, and you need to efficiently add to, remove from, or iterate through the list. If the item size is only 8 bytes (e.g. for a 64-bit pointer), then the std::list overhead of 16 bytes per node (plus the malloc chunk overhead) means that at most one third of the allocated memory will be used to store your items. The std::vector doesn’t have any per-item overhead, but it does have a fixed overhead of 24 bytes.

A null-terminated list (or vector, if you prefer) has a fixed overhead of just 8 bytes when empty, i.e. the size of a pointer. Because it is null-terminated, it doesn’t need a “size” member like std::vector, and a simple trick allows it to be growable without storing its size. In terms of functionality, it is essentially a very compact std::multiset with O(N) operations, i.e. it is at its best when used to store a small number of items. For a large number of items, the small overhead provides little benefit.

Here’s an example of a list of pointers, with append() and remove() methods:

  typedef struct _list_type {
      void **entries; /* the list is just a “void**” */
  } list_type;

  void list_append(list_type *l, void *p)
      size_t i = 0;
      size_t n = 0;

      if (l->entries) {
          /* create a new list if none exists */
          l->entries = (void **)malloc(2*sizeof(void *));
      else {
          while (l->entries[n] != 0) { n++; }
          /* if n+1 is a power of two, double the list size */
          if (n+1 >= 2 && (n & (n+1)) == 0) {
              void **tmp = l->entries;
              l->entries = (void **)malloc((n+1)*2*sizeof(void *));
              for (i = 0; i < n; i++) { l->entries[i] = tmp[i]; }
      l->entries[n++] = p;
      l->entries[n] = 0; /* null-terminate the list */

  void list_remove(list_type *l, void *p)
      size_t i = 0;

      if (l->entries) {
          /* find the item in the list */
          while (l->entries[i] != 0 && l->entries[i] != p) {
          /* remove the item from the list */
          while (l->entries[i] != 0) {
              l->entries[i] = l->entries[i+1];

The trick here is the check for when the list length reaches a power of two, which is a hint to double the allocated space. There isn’t any code to reduce the amount of allocated memory, it is assumed that after the initial growth of the list, the number of items will remain relatively constant.

I’d like to talk about how I used this to add weak pointers to vtkObjectBase, and about how the “(n-1) & n” power-of-two check works, and how the above code can be modified to work more efficiently with the way malloc() and new [ ] handle memory chunks, but this post has probably raised enough eyebrows already. What can I say, I still love C programming, regardless of all the fancy stuff that C++ provides.

Abstract interpolator interface

Ten years ago I wrote a filter for VTK called vtkImageReslice. Its purpose was simple: to resample an image volume through a 4×4 matrix transformation using interpolation. For my PhD work I also required deformable transformations, so I coded some. In order to connect these to my vtkImageReslice class, I included a vtkAbstractTransform class that described an abstract interface for a coordinate transformation operations in VTK. This work was sufficiently well-developed that I managed to publish a paper on it.

The interpolation code was not as well-developed, it was tightly integrated into vtkImageReslice and not modular at all. It did its job, certainly, and very efficiently, but as I added more interpolators the size of vtkImageReslice gradually expanded to over 5000 lines of source code, code that was becoming increasingly difficult to manage. I have finally completed a long overdue project that saw the creation of a vtkAbstractImageInterpolator class, providing an abstract interpolation interface and allowing me to move all the interpolation code from vtkImageReslice into several tidy little classes.

So far I provide two concrete interpolators. The first is a basic, default interpolator that provides nearest-neighbor, linear and cubic interpolation. The second is a sinc interpolator that provides high-quality interpolation and optionally performs antialiasing. Soon, I hope to also add some b-spline interpolation code that I have lying around. More information can be found at the VTK wiki page that I wrote for these classes:

New image display classes for VTK

Finally, I've finished a big and rather important project that I started in December (though I was interrupted a few times): a complete re-write of the image display classes for VTK. In brief, these classes replace vtkImageActor and provide a much higher level of functionality. Whereas image display used to be a clunky add-on feature in VTK, it is now a central feature just like geometry and volume display.

The main additions are three new classes:

  • vtkImageSlice – the new actor class for images
  • vtkImageResliceMapper – mapper for drawing images for MPR viewers
  • vtkImageProperty – for controlling lookup tables, window/level, opacity, etc

These are meant to mirror the vtkActor, vtkMapper, and vtkProperty classes used for geometry. The vtkImageSlice and vtkImageProperty classes are very simple, they are just information containers: the vtkImageSlice controls the position and orientation of the data with respect to the world coordinate system, and vtkImageProperty controls the color table, interpolation, and other display-related properties. The vtkImageResliceMapper (and its associated classes vtkImageMapper3D, vtkImageSliceMapper, and vtkOpenGLImageSliceMapper) are where all the work gets done.

My primary goal was to simplify user interaction with the image display, and I decided that the easiest way to do this, since most VTK interaction occurs via the scene "camera", was to make the camera control both the position and orientation of the slice that is displayed, via the camera focal point and view point. This turned out to be a very intuitive way of doing things, and also allowed the VTK interactor classes to be used for image viewing with very little modification.

A brief run-down of the features:

  1. oblique views (obviously!)
  2. nearest-neighbor, linear, and cubic interpolation
  3. easy modification of image transformation via prop's UserTransform
  4. fully multi-threaded reslicing and color mapping operations
  5. streaming, i.e. only pulls the data needed for rendering
  6. works well with very large images (as long as they fit in memory)
  7. LOD-style interaction (switches to faster rendering for interaction)
  8. blend images by adding multiple vtkImageSlice objects to a vtkImageStack
  9. checkerboard images for easy comparison after registration
  10. thick-slab views (see vtkImageResliceMapper for details)
  11. works with vtkCellPicker for picking image voxels as points or cells

For large image viewing, on a 4GB computer (64-bit OS X) I’ve displayed images up to 3GB in size with full interactivity, but YMMV. As long as the mapper’s ResampleToScreenPixels option is On, the mapper will never try to load the full image onto the GPU (which is what causes the vtkImageActor to be slow for large images). The size of the images that you can display is limited only by main memory, I have displayed stacks of RGB images that are 10000×10000 pixels in size.

Template wrapping

Finally, I've achieved one of my long-term goals for the VTK python wrappers: automatic wrapping of templated types! And if I had know how much work it would entail, I probably never would have started, but still, I'm glad that I did it.

The first part, parsing the templates from the C++ header files, is something that I did about a year ago. Templates are one thing that my own parser handles very differently from gccxml. Whereas gccxml spits out information only for the templates that are instantiated in the code, my parser spits out descriptions of the generic templates prior to instantiation. So the biggest part of my current project was creating subroutines that could take the info about generic templates, and generate info for instantiated templates that could then be wrapped. A big advantage of doing things this way is that the wrappers can produce any instantiations that are desired, whether or not they appeared in the original header files. Another advantage is that my wrappers can provide documentation for both the generic and the instantiated class templates.

In python, the templates are a class generator of sorts. They look kind of like a class, and kind of like a dict. They are used as follows:

v = vtkVector[float,3]([1.0, 2.5, 10.2])

The template args are provided in square brackets. Given the template args, the template "vtkVector" instantiates the template to produce a class. The class is then given arguments in parentheses, which it uses to construct an object. Other than the use of square brackets instead of angle brackets, the main difference from C++ template instantiation is that only a limited selection of template args are supported. Calling help( ) on the template will list all the available args.

Wrapping C++ operators in Python

Ah, the VTK Python wrappers… my favorite little programming project when there are other things that I should be doing instead. My goal for the Python wrappers is to make it possible to do anything with VTK that can be done in C++, also doable in Python. And bit-by-bit, I think that I'm getting there.

My most recent mini-project was to wrap the C++ "[ ]" operator in python. Not too difficult, because last year I had already enhanced vtkParse so that it pulls all operator methods from the C++ header files. One thing that I don't like about how I did it, is that the range checking (i.e. to make sure that a bad index does not crash the program) uses a hard-coded range. There is no mechanism in VTK to add a hint to the header files so that the wrappers know what the valid index range for the "[ ]" operator is. Eventually, there will be… but that is a battle for another day.

There is another wrapping project that I've been working on, one that is slightly more ambitions: wrapping templated VTK classes. Templates are also something that I added to vtkParse last year, but hadn't added to the wrappers. Without going into the details of the implementation, the idea is that in Python, a template class like vtkVector<float,3> will be specialized and instantiated like so:

v = vtkVector['f',3]()

where 'f' is the Python type character for 'float'. So more or less, the syntax is very similar to C++ except that square brackets are used instead of angle brackets. Of course the trick is that, unlike C++, new template specializations cannot be created on-the-fly… only the template specializations that are wrapped are available. So what I'm working on is a method to figure out what template specializations will be needed so that they can be wrapped ahead of time. This can be done by looking at how templated types are used within VTK, i.e. looking at where the templated types appear either as method arguments or as superclass types of other classes.