Hamming distance project

Initial problem

  • Bottleneck in computation: parallelized python code to calculate hamming distances from large set of gene expressions
  • Pre-processing step that takes many hours to complete even using many cores

What we did

  • Replace with a highly optimized, vectorized and parallelized c++ version
  • Packaged as a python library hammingdist, to fit into the existing user workflow
  • Set up Continuous Integration and Continuous Delivery to provide automated testing and deployment of the code

Result