tidytcells.junction

tidytcells.junction#

Functions to manage junction (CDR3) data.

Functions

tidytcells.junction.standardise(*args, **kwargs)[source]#

Alias for tidytcells.junction.standardize().

Return type:

Junction

tidytcells.junction.standardize(seq, locus, j_symbol=None, v_symbol=None, species=None, allow_c_correction=None, allow_fw_correction=None, enforce_functional_v=None, enforce_functional_j=None, max_v_reconstruction=None, max_j_reconstruction=None, log_failures=None, suppress_warnings=None)[source]#

Corrects a given CDR3/Junction sequence into a valid and complete Junction sequence, based on alignment to V and J genes. This may include recovery of incorrectly trimmed amino acids, correction of sequencing errors in the conserved start and end positions, or trimming of unnecessary amino acids at the beginning and end of the sequence.

Parameters:
  • seq (str) – The junction sequence.

  • locus (str) – String value representing the locus (TRA, TRB, IGH, IGL, etc; TR or IG may be used if a more precise locus is unknown). This is used to select an applicable subset of V and J genes for junction correction.

  • j_symbol (str) – The TR/IG J symbol used to correct the end of the junction sequence. If a specific J allele is supplied (e.g., human TRAJ1*01), the junction is corrected according to the known sequence for this J allele. If a less precise gene or subgroup is given (e.g., human TRAJ23 which has multiple alleles), all associated allele sequences will be tested for the best alignment. If no J symbol is given, all J genes for the given species + locus will be tested.

  • v_symbol (str) – The TR/IG V symbol used to correct the start of the junction sequence. If a specific V allele is supplied (e.g., human TRAV1-1*01), the junction is corrected according to the known sequence for this V allele. If a less precise gene or subgroup is given (e.g., human TRAV1-1 which has multiple alleles, or TRAV1 which has multiple genes), all associated allele sequences will be tested for the best alignment. If no V symbol is given, all V genes for the given species + locus will be tested.

  • species (str) – The species that produced the underlying receptor. Defaults to homosapiens.

  • allow_c_correction (bool) – Whether to allow the first amino acid in the input sequence to be corrected to ‘C’ if it is a potential sequencing error (only “W”, “S”, “R”, “G”, “Y”, “F”) and correction improves the V gene alignment. Defaults to False.

  • allow_fw_correction (bool) – Whether to allow the last amino acid in the input sequence to be corrected to ‘F’ or ‘W’ if it is a potential sequencing error (only “I”, “L”, “V”, “Y”, “S”, “C”, “G”, “R”) and correction improves the J gene alignment. Defaults to False.

  • enforce_functional_v (bool) – Only consider V genes which are annotated to be ‘functional’ (excluding ORFs and pseudogenes). Defaults to True.

  • enforce_functional_j (bool) – Only consider V genes which are annotated to be ‘functional’ (excluding ORFs and pseudogenes). Due to its shorter size it can be difficult to determine whether a J gene is truly functional, and including ORFs/pseudogenes is therefore recommended for J genes but not for V genes. Defaults to False.

  • max_v_reconstruction (bool) – The maximum number of amino acids to reconstruct based on V gene information. Defaults to 1 (only construct the conserved C). It is recommended to only set this number to a value greater than 1 if V symbol information is supplied, and generally not recommended to set the value larger than 3.

  • max_j_reconstruction (bool) – The maximum number of amino acids to reconstruct based on J gene information. Defaults to 1 (only construct the conserved F / W / C). It is recommended to only set this number to a value greater than 1 if J symbol information is supplied, and generally not recommended to set the value larger than 3.

  • log_failures (bool) – Report standardization failures through logging (at level WARNING). Defaults to True.

  • suppress_warnings (bool) – Disable warnings that are usually logged when standardization fails. Deprecated in favour of log_failures.

Returns:

A standardized CDR3 junction wrapped in a Junction object. For details on how to use this output, please refer to the class documentation.

Return type:

Junction