tidytcells: Standardise TR/MH data

tidytcells: Standardise TR/MH data#

Cartoon of a T cell receptor binding to a peptide MH complex (left), and a cartoon demonstrating how messy data can be cleaned using tidytcells (right).

a) A diagram of a T cell receptor (TR) interacting with a peptide-Major Histocompatibility (MH) complex. The V, D and J minigenes comprising each TR chain are shown by color. The red dotted lines point out the junction sequences of both TR chains. b) An illustration of how tidytcells can help clean TR data. By using tidytcells, non-standard nomenclature in the “messy data” is corrected, and any invalid values are filtered out.#

tidytcells is a lightweight python package that cleans and standardizes T cell receptor (TR) and Major Histocompatibility Complex (MH) data to be IMGT-compliant (IMGT/GENE-DB, IMGT Repertoire). The main purpose of the package is to solve the problem of parsing and collating together non-standardized TR datasets. It is often difficult to compile TR data from multiple sources because the formats/nomenclature of how each dataset encodes TR and MH gene names are slightly different, or even inconsistent within themselves. tidytcells can ameliorate this issue by auto-correcting and auto-standardizing your data!

Contents#

Indices and tables#