`gpmap.core`¶

The core container module exposes GenotypePhenotypeMap and the schema validator validate_encoding_table.

`GenotypePhenotypeMap`¶

class GenotypePhenotypeMap:
    def __init__(
        self,
        wildtype: str,
        genotypes: list[str] | np.ndarray,
        phenotypes: list[float] | np.ndarray,
        *,
        stdeviations: list[float] | np.ndarray | None = None,
        n_replicates: list[int] | np.ndarray | None = None,
        mutations: Mapping[int, list[str] | None] | None = None,
        site_labels: list[str] | None = None,
        include_binary: bool = True,
    ) -> None: ...

Parameters¶

wildtype (str, required): Reference sequence. Must be a non-empty string. All genotypes must have this length.
genotypes (list[str] | np.ndarray, required): Observed sequences, each of length len(wildtype). Will be coerced to a numpy object array of strings.
phenotypes (list[float] | np.ndarray, required): Measured phenotypes, length matching genotypes. Coerced to float64. If the coercion changes any values (e.g., a NumPy Int64 overflow), a UserWarning is emitted once.
stdeviations (list[float] | np.ndarray | None, default None): Per-genotype standard deviations. Defaults to np.nan for every genotype when omitted.
n_replicates (list[int] | np.ndarray | None, default None): Per-genotype replicate counts. Defaults to 1 for every genotype.
mutations (dict[int, list[str] | None] | None, default None): Per-site alphabets. Keys must be 0..L-1. A None value marks a frozen site (only the wildtype letter is allowed). If omitted, the alphabet is inferred from observed letters.
site_labels (list[str] | None, default None): Optional per-site labels. Defaults to ["0", "1", ..., "L-1"].
include_binary (bool, default True): If True, eagerly build binary_packed at construction. Set to False to defer the cost until the first access.

Attributes¶

Attribute	Type	Notes
`wildtype`	`str`	Reference sequence
`genotypes`	`np.ndarray[object]`	Length n, dtype object of `str`
`phenotypes`	`np.ndarray[float64]`	Length n
`stdeviations`	`np.ndarray[float64]`	Length n, NaN by default
`n_replicates`	`np.ndarray[int64]`	Length n, 1 by default
`mutations`	`dict[int, list[str] \\| None]`	Per-site alphabets
`site_labels`	`list[str]`	Length L
`length`	`int`	L, the wildtype length
`n`	`int`	n, the number of genotypes
`encoding_table`	`pd.DataFrame`	Lazy, schema-locked, cached
`binary_packed`	`np.ndarray[uint8]`	Shape (n, n_bits), lazy, cached
`binary`	`np.ndarray[object]`	Length n, dtype object of '0'/'1' strings, lazy, cached
`n_mutations`	`np.ndarray[int64]`	Per-genotype Hamming weight
`data`	`pd.DataFrame`	Lazy, cached, see the data column
`stdeviation_map`	`StandardDeviationMap`	View returning raw stdeviations
`standard_error_map`	`StandardErrorMap`	View returning std / sqrt(n_replicates)

`data` columns¶

The data property returns a DataFrame with these columns, in order:

Column	dtype
`genotypes`	`string`
`phenotypes`	`float64`
`stdeviations`	`float64`
`n_replicates`	`Int64`
`binary`	`string`
`n_mutations`	`Int64`

`from_dataframe`¶

@classmethod
def from_dataframe(
    cls,
    df: pd.DataFrame,
    *,
    wildtype: str | None = None,
    mutations: Mapping[int, list[str] | None] | None = None,
    site_labels: list[str] | None = None,
    include_binary: bool = True,
) -> GenotypePhenotypeMap

Hydrate a GenotypePhenotypeMap from a pandas DataFrame.

Required columns: genotypes, phenotypes. Optional: stdeviations, n_replicates. Other columns are ignored.

wildtype may be omitted if the DataFrame attrs carry a "wildtype" key. Otherwise it must be provided.

`get_missing_genotypes`¶

def get_missing_genotypes(self) -> np.ndarray

Returns a numpy object array of genotype strings that are in the full Cartesian product of mutations but not in self.genotypes. Respects the SpaceTooLargeError guard; for huge spaces, use enumerate_genotypes_str(..., allow_huge=True) directly.

`validate_encoding_table`¶

def validate_encoding_table(table: pd.DataFrame) -> None

Raises SchemaError if the table is missing required columns, has NaN in site_index or binary_index_stop, or if binary_index_stop disagrees with the implied total bit count. Returns None on success.

Use this when you build an encoding table externally (or read one from a third-party source) and want to confirm it matches the schema before feeding it into genotypes_to_binary_packed.

gpmap.core¶

GenotypePhenotypeMap¶

Parameters¶

Attributes¶

data columns¶

from_dataframe¶

get_missing_genotypes¶

validate_encoding_table¶