Hamming distance project

Initial problem

  • Bottleneck in computation: parallelized python code to calculate hamming distances from large set of gene expressions
  • Pre-processing step that takes many hours to complete even using many cores

What we did

  • Replace with a highly optimized c++ version
  • Packaged as a python library to fit into the existing user workflow
  • Set up Continuous Integration to provide automated testing and deployment of the code


  • Code now runs in minutes on a single core