tidytcells.mhc

Functions to clean and standardise MHC gene data.

Functions

tidytcells.mhc.classify(*args, **kwargs)[source]

Alias for tidytcells.mhc.get_class().

tidytcells.mhc.get_chain(gene: str | None = None, suppress_warnings: bool = False, gene_name: str | None = None) str[source]

Given a standardised MHC gene name, detect whether it codes for an alpha or a beta chain molecule.

Note

This function currently only recognises HLAs, and not MHCs from other species.

Parameters:
  • gene (str) – Standardised MHC gene name

  • suppress_warnings (bool) – Disable warnings that are usually emitted when chain classification fails. Defaults to False.

  • gene_name (str) – Alias for the parameter gene.

Returns:

'alpha' or 'beta' if gene is recognised and its chain is known, else None.

Return type:

str or None

tidytcells.mhc.get_class(gene: str | None = None, suppress_warnings: bool = False, gene_name: str | None = None) int[source]

Given a standardised MHC gene name, detect whether it comprises a class I or II MHC receptor complex.

Note

This function currently only recognises HLAs, and not MHCs from other species.

Parameters:
  • gene (str) – Standardised MHC gene name

  • suppress_warnings (bool) – Disable warnings that are usually emitted when classification fails. Defaults to False.

  • gene_name (str) – Alias for the parameter gene.

Returns:

1 or 2 if gene is recognised and its class is known, else None.

Return type:

int or None

tidytcells.mhc.query(species: str = 'homosapiens', precision: str = 'allele', contains: str | None = None) FrozenSet[str][source]

Query the list of all known MHC genes/alleles.

Note

tidytcells’ knowledge of MHC alleles is limited, especially outside of humans. tidytcells will allow you to query HLA alleles up to the level of the protein (first two allele designators), but that is the highest resolution available. For Mus musculus, there is currently only support for gene-level querying.

Parameters:
  • species (str) – Species to query (see above for supported species). Defaults to 'homosapiens'.

  • precision (str) – The level of precision to query. allele will query from the set of all possible alleles. gene will query from the set of all possible genes. Defaults to allele.

  • contains (str) – An optional regular expression string which will be used to filter the query result. If supplied, only genes/alleles which contain the regular expression will be returned. Defaults to None.

Returns:

The set of all genes/alleles that satisfy the given constraints.

Return type:

FrozenSet[str]

tidytcells.mhc.standardise(gene: str | None = None, species: str = 'homosapiens', precision: str = 'allele', suppress_warnings: bool = False, gene_name: str | None = None) tuple[source]

Attempt to standardise an MHC gene name to be IMGT-compliant.

Note

This function will only verify the validity of an MHC gene/allele up to the level of the protein. Any further precise allele designations will not be verified, apart from the requirement that the format (colon-separated numbers) look valid. The reasons for this is firstly because new alleles at that level are added to the IMGT list quite often and so accurate verification is difficult, secondly because people rarely need verification to such a precise level, and finally because such verification costs more computational effort with diminishing returns.

Parameters:
  • gene (str) – Potentially non-standardised MHC gene name.

  • species (str) – Species to which the MHC gene belongs (see above for supported species). Defaults to 'homosapiens'.

  • precision (str) – The maximum level of precision to standardise to. 'allele' standardises to the maximum precision possible. 'protein' keeps allele designators up to the level of the protein. 'gene' standardises only to the level of the gene. Defaults to 'allele'.

  • suppress_warnings (bool) – Disable warnings that are usually emitted when standardisation fails. Defaults to False.

  • gene_name (str) – Alias for the parameter gene.

Returns:

If the specified species is supported, and gene could be standardised, then return the standardised gene name. If species is unsupported, then the function does not attempt to standardise, and returns the unaltered gene string. Else returns None.

Return type:

str or None

tidytcells.mhc.standardize(*args, **kwargs)[source]

Alias for tidytcells.mhc.standardise().