tidytcells.junction

tidytcells.junction#

Functions to manage junction (CDR3) data.

Functions

tidytcells.junction.standardise(*args, **kwargs)[source]#

Alias for tidytcells.junction.standardize().

Return type:

str | None

tidytcells.junction.standardize(seq, j_symbol=None, species=None, allow_uncertain_118=None, fix_missing_conserved=None, on_fail=None, log_failures=None, j_strict=None, strict=None, suppress_warnings=None)[source]#

Ensures that a string value looks like a valid junction (CDR3) amino acid sequence.

A valid junction sequence must:

  1. Be a valid amino acid sequence

  2. Begin with a cysteine (C)

  3. End with a phenylalanine (F), tryptophan (W) or cysteine (C) in a way consistent with j_symbol if supplied

Parameters:
  • seq (str) – The junction sequence.

  • j_symbol (str) – The TR/IG J symbol used to determine the correct conserved trailing amino acid at position 118 (F / W / C). If the symbol does not resolve to a single allele but all productive alleles consistent with the symbol have the same conserved residue, this will be set as the expected ending residue. If the supplied symbol does not map to any (group of) known J alleles, the function will raise a ValueError.

  • species (str) – The species that produced the underlying receptor. Defaults to homosapiens.

  • allow_uncertain_118 (bool) – If False, standardization immediately fails if the expected conserved trailing amino acid at position 118 cannot be determined with certainty using j_symbol, or if j_symbol is not supplied. If True, in the event of an uncertain residue at position 118, either F or W is accepted, and if a trailing residue must be appended (see parameter fix_missing_conserved), an F will be added. Defaults to True.

  • fix_missing_conserved (bool) – If False, standardization immediately fails for any input sequence that does not start and end with the expected conserved residues. If True, any inputs that are valid amino acid sequences but do not start and end as expected are corrected by adding a C at the beginning and the expected trailing residue (see allow_uncertain_118) at the end. Defaults to True.

  • on_fail (str) – Behaviour when standardization fails. If set to "reject", returns None on failure. If set to "keep", returns the original input. Defaults to "reject".

  • log_failures (bool) – Report standardisation failures through logging (at level WARNING). Defaults to True.

  • j_strict (bool) – Inverse setting to allow_uncertain_118. Deprecated in favor of allow_uncertain_118.

  • strict (bool) – Inverse setting to fix_missing_conserved. Deprecated in favor of fix_missing_conserved.

  • suppress_warnings (bool) – Disable warnings that are usually logged when standardisation fails. Deprecated in favour of log_failures.

Returns:

If possible, a standardized version of the input string is returned. If the input string cannot be standardized, the function follows the behaviour as set by on_fail.

Return type:

Optional[str]