tidytcells: Standardize TR/MH/IG data#
a) A diagram of a T cell receptor (TR) interacting with a peptide-Major Histocompatibility (MH) complex.
The V, D and J minigenes comprising each TR chain are shown by color.
The red dotted lines point out the junction sequences of both TR chains.
b) An illustration of how tidytcells can help clean TR data.
By using tidytcells, non-standard nomenclature in the “messy data” is corrected, and any invalid values are filtered out.#
tidytcells is a lightweight python package that cleans and standardizes T cell receptor (TR), Major Histocompatibility Complex (MH), and Immunoglobulin (IG) data to be IMGT-compliant (IMGT/GENE-DB, IMGT Repertoire).
The main purpose of the package is to solve the problem of parsing and collating together non-standardized TR datasets.
It is often difficult to compile TR data from multiple sources because the formats/nomenclature of how each dataset encodes TR and MH gene names are slightly different, or even inconsistent within themselves.
tidytcells can ameliorate this issue by auto-correcting and auto-standardizing your data!
Tip
The tidytcells.ig submodule is newly added! It provides functionality for standardizing, querying, and retrieving amino acid sequences for immunoglobulin genes/alleles, similar to the existing TR and MH modules. Thanks to Lonneke for implementing this module!