Quickstart¶
Build a GenotypePhenotypeMap from a wildtype string, a list of genotypes, and their measured phenotypes. Everything else (binary encoding, encoding table, pandas view) is derived on demand.
Construct a map¶
from gpmap import GenotypePhenotypeMap
gpm = GenotypePhenotypeMap(
wildtype="AAA",
genotypes=["AAA", "AAT", "ATA", "TAA", "ATT", "TAT", "TTA", "TTT"],
phenotypes=[0.1, 0.2, 0.2, 0.6, 0.4, 0.6, 1.0, 1.1],
stdeviations=[0.05] * 8,
)
The constructor figures the per-site alphabet out from observed letters. Pass mutations={i: [...]} explicitly when you want to lock in an alphabet that does not appear in every column of your measured genotypes.
Inspect what you got back¶
gpm.genotypes # np.ndarray of strings, shape (8,)
gpm.phenotypes # np.ndarray[float64], shape (8,)
gpm.binary_packed # np.ndarray[uint8], shape (8, 3), the fast path
gpm.binary # np.ndarray of '0'/'1' strings, back-compat accessor
gpm.n_mutations # per-genotype Hamming weight
gpm.encoding_table # pandas DataFrame per SCHEMA.md
gpm.data # pandas DataFrame view for Jupyter
binary_packed is the recommended input for downstream consumers that need the encoded representation. It is computed once at construction (when include_binary=True, the default), then cached.
Inspecting in a notebook
gpm.data is the right thing to print in a notebook. It returns a pd.DataFrame view with genotypes, phenotypes, stdeviations, n_replicates, binary, and n_mutations columns, suitable for displaying inline.
Round-trip through JSON¶
from gpmap import to_json, read_json
to_json(gpm, "map.json")
gpm2 = read_json("map.json")
assert (gpm.genotypes == gpm2.genotypes).all()
assert (gpm.phenotypes == gpm2.phenotypes).all()
The JSON file carries schema_version so legacy v1 files still load with a UserWarning.
Round-trip through CSV¶
from gpmap import to_csv, read_csv
to_csv(gpm, "map.csv") # writes map.csv + map.csv.meta.json
gpm3 = read_csv("map.csv")
CSV stores the data columns; the sidecar map.csv.meta.json carries the wildtype, per-site alphabets, and site labels. Keep both files together.
Hydrate from a DataFrame¶
import pandas as pd
from gpmap import GenotypePhenotypeMap
df = pd.DataFrame({
"genotypes": ["AAA", "AAT", "ATA", "TAA"],
"phenotypes": [0.1, 0.2, 0.2, 0.6],
})
gpm = GenotypePhenotypeMap.from_dataframe(df, wildtype="AAA")
from_dataframe is also how you load the output of gpm.data back into a fresh map after manipulating it externally.
Simulate a landscape¶
For toy data, the gpmap.simulate subpackage provides a small zoo of generators:
import numpy as np
from gpmap.simulate import NKSimulation
sim = NKSimulation(
wildtype="AAAA",
mutations={i: ["A", "T"] for i in range(4)},
K=2,
rng=np.random.default_rng(0),
)
sim.phenotypes.shape # (16,)
Each simulator is a GenotypePhenotypeMap subclass, so anything you can do with gpm you can do with sim. See the simulators guide for the full menagerie (Mount Fuji, multi-peak Fuji, House of Cards, random, mask).
Where to next¶
- The container surface and invariants: Genotype-phenotype maps.
- The encoding-table contract for downstream consumers: Encoding table.
- File formats and round-tripping: Loading and saving.
- Toy landscapes: Simulators.