Hamming distance project
Initial problem
- Bottleneck in computation: parallelized python code to calculate hamming distances from large set of gene expressions
- Pre-processing step that takes many hours to complete even using many cores
What we did
- Replace with a highly optimized c++ version
- Packaged as a python library to fit into the existing user workflow
- Set up Continuous Integration to provide automated testing and deployment of the code
Result
- Code now runs in minutes on a single core