tidytcells.ig#

Functions to manage IG gene data.

Functions

tidytcells.ig.get_aa_sequence(symbol=None, species=None, gene=None)[source]#

Look up the amino acid sequence of a given IG allele.

Parameters:
  • symbol (str) – Standardized allele symbol. Note that the symbol must be specified to the level of the allele. Note that some alleles, notably those of non-functional genes, will not have resolvable amino acid sequences.

  • species (str) – Species to which the IG gene in question belongs (see above for supported species). Defaults to "homosapiens".

  • gene (str) – Alias for symbol.

Returns:

A dictionary with keys corresponding to names of different sequence regions within the allele, and values corresponding to their amino acid sequences.

Return type:

Dict[str, str]

tidytcells.ig.query(species=None, precision=None, functionality=None, contains_pattern=None)[source]#
Query the list of all known IG

genes / alleles.

Parameters:
  • species (str) – Species to query (see above for supported species). Defaults to "homosapiens".

  • precision (str) – The level of precision to query. allele will query from the set of all possible alleles. gene will query from the set of all possible genes. Defaults to allele.

  • functionality (str) – Gene/allele functionality to subset by. "any" queries from all possible genes/alleles. "F" queries from functional genes/alleles. "NF" queries from psuedogenes and ORFs. "P" queries from pseudogenes. "ORF" queries from ORFs. An allele is considered queriable if its functionality label matches the description. A gene is considered queriable if at least one of its alleles’ functionality label matches the description. Defaults to "any".

  • contains_pattern (str) – An optional regular expression string which will be used to filter the query result. If supplied, only genes/alleles which contain the regular expression will be returned. Defaults to None.

Returns:

The set of all genes / alleles that satisfy the given constraints.

Return type:

FrozenSet[str]

tidytcells.ig.standardise(*args, **kwargs)[source]#

Alias for tidytcells.ig.standardize().

Return type:

str | None

tidytcells.ig.standardize(symbol=None, species=None, enforce_functional=None, precision=None, on_fail=None, log_failures=None, gene=None, suppress_warnings=None)[source]#

Attempt to standardize a IG gene / allele symbol to be IMGT-compliant.

Parameters:
  • symbol (str) – Potentially non-standardized IG gene / allele symbol.

  • species (str) – Can be specified to standardise to a IG symbol that is known to be valid for that species (see above for supported species). Currently, only Homo sapiens is supported, but this parameter has been kept to keep the interface compatible with that of its sister function in tidytcells.tr. Defaults to "homosapiens".

  • enforce_functional (bool) – If True, disallows IG genes / alleles that are recognised by IMGT but are marked as non-functional (ORF or pseudogene). Defaults to False.

  • precision (str) – The maximum level of precision to standardize to. "allele" standardizes to the maximum precision possible. "gene" standardizes only to the level of the gene. Defaults to "allele".

  • on_fail (str) – Behaviour when standardization fails. If set to "reject", returns None on failure. If set to "keep", returns the original input. Defaults to "reject".

  • log_failures (bool) – Report standardisation failures through logging (at level WARNING). Defaults to True.

  • gene (str) – Alias for the parameter symbol.

  • suppress_warnings (bool) – Disable warnings that are usually logged when standardisation fails. Deprecated in favour of log_failures.

Returns:

If the specified species is supported, and symbol could be standardized, then return the standardized symbol name. If species is unsupported, then the function does not attempt to standardize , and returns the unaltered symbol string. Else follows the behaviour as set by on_fail.

Return type:

Optional[str]