XBitPar - Bit-parallel String Comparison Tools on Accelerators


Project links



Bit-parallelism encodes calculated values into bit arrays. Thus, multiple values can be updated simultaneously by a single operation. This technique can be used to develop efficient algorithms for string comparison problems (e.g. the approximate pattern matching problem, the longest common substring problem).

An important parameter which has direct impact on the efficiency of bit-parallel algorithms is the machine word size w (e.g. w = 32 or 64 bits for conventional CPUs). On modern high performance computing architectures, such as Intel Xeon Phi coprocessors or CUDA-enabled GPUs, it is possible to efficiently simulate 512-bit or 1024-bit words which could contribute to the implementations of bit-parallel algorithms. In addition, the massive computation resource of these types of accelerator allows the development of powerful tools to process large text.

This project aims to provide bit-parallel string comparison tools on modern accelerators, as well as on conventional multi-core CPUs. Fundamentally, these tools are specified for biological data (i.e. DNA, RNA sequences), however, they could also be extended to process other types of text data.


Our CUDA-based implementation on a GeForce Titan graphics card runs up to 2.9 times faster than our implementation on a Xeon Phi 5110P. The results are presented in a journal paper which has been submitted to Journal of Parallel Computing.


XBitPar version 2.0 consists of 5 implementations of the Wu-Manber algorithm, which allow performing approximate pattern matching with the Levenshtein distance.

GPU implementations

  • wmCudaTile: CUDA-based version with the tiled parallelization approach.
  • wmCudaNaive: CUDA-based version which uses the global memory of the GPU as the main working memory.
Both GPU implementations simulate 1024-bit words by CUDA warps.

Xeon Phi implementations

  • wmIntr: Xeon Phi multi-threaded version, which simulates 512-bit words with Xeon Phi intrinsic function.
  • wmAutoVec: Xeon Phi multi-threaded version, which simulates 512-bit words with arrays of 16 unsigned integers.
Multicore CPU implementations
  • wmHost: CPU multi-threaded version, which simulates 512-bit words with arrays of 16 unsigned integers.
These implementations are specified for DNA sequences (i.e : the alphabet consists of 4 characters A, C, G, T).


Latest source code (v2.0)


  • Tran, T, Schindel, S, Liu, Y, and Schmidt, B: "Bit-Parallel approximate pattern matching on the Xeon Phi coprocessor". 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2014). pp. 81-88.
  • Tran, T, Liu, Y, and Schmidt, B: "Bit-Parallel approximate pattern matching: Kepler GPU versus Xeon Phi". Parallel Computing. 2015, under review.

Change Log

  • 15 August 2015: Second version of XBitPar (2.0) and Information Updated
  • 16 January 2015: Information Updated
  • 05 August 2014: First version of XBitPar (1.0)

Last update: 08/2015
Powered by sourceforge.net