Skip to content

Quickstart

Read a structure once, then choose one of the three main MolScope paths: descriptors, graph ML, or coarse-grained beads.

import molscope as ms

mol = ms.read("examples/data/1fqy.pdb")
print(mol.summary())
mol.plot()

PDB to descriptors

features = mol.descriptors()
X, names = ms.featurize_many(["a.pdb", "b.pdb", "c.xyz"], return_names=True)

Use this path for quick structure summaries, batch QC, and classical ML tables.

PDB to graph/GNN

g = mol.to_graph()
G = mol.to_networkx()

Use this path for atom/bond message passing, residue-contact graphs, or framework exports such as PyTorch Geometric and DGL.

For a whole dataset, build_dataset reads, featurises, label-joins, and splits a folder in one call, and GraphDataset carries it to a training loop:

ds = ms.build_dataset("data/*.pdb", fmt="pyg", labels="labels.csv",
                      split=(0.8, 0.1, 0.1), cache_dir=".graph_cache")
scaler = ds.standardize_targets()                 # fit on train only
for batch in ds.loader("train", batch_size=32):   # batching PyG/DGL DataLoader
    ...

Start from RCSB accessions instead of files with ms.fetch_dataset(ids, labels=...).

PDB to coarse-grained beads

cg = mol.coarse_grain("residue_com")
print(cg.mapping_report())

Use this path for reduced representations, mapping inspection, and bead-level graph prototypes. MolScope does not generate production simulation topologies.

Supporting moves

Transformations return new molecules:

moved = mol.centered().rotate("z", 90).translate((1, 2, -1))

Read all models from an NMR PDB file:

models = ms.read_pdb_models("examples/data/1aml.pdb")
matrix = ms.ensemble.rmsd_matrix(models[:5])