Ensemble Analysis¶
Read all models from an NMR PDB file:
models = ms.read_pdb_models("examples/data/1aml.pdb")
Compute ensemble descriptors:
from molscope import ensemble
aligned = ensemble.align_all(models)
avg = ensemble.average(models)
rmsf = ensemble.rmsf(models)
matrix = ensemble.rmsd_matrix(models)
Cluster structures by RMSD:
result = ensemble.cluster(models, n_clusters=3)
result.labels
result.representatives()
Contact frequency across models:
freq = ms.ensemble_contact_frequency(models, cutoff=8.0)
freq.plot()
Concerted motions (dynamical cross-correlation)¶
contact_frequency tells you which contacts form, but not whether parts of
the structure move in a coordinated way. The dynamical cross-correlation matrix
(DCCM) answers that: each entry is the correlation of two atoms' displacements
about their mean positions, from +1 (moving together in lockstep) through 0
(uncorrelated) to -1 (moving in opposite directions). Coupled off-diagonal
blocks are the classic fingerprint of allosteric communication.
import molscope as ms
models = ms.read_pdb_models("examples/data/1aml.pdb")
ca = [m.alpha_carbons() for m in models] # residue-level DCCM
corr = ms.cross_correlation(ca) # (n_residues, n_residues), in [-1, 1]
ms.plot_cross_correlation(corr)
Structures are Kabsch-superposed onto the first model first (align=True) so
that rigid-body tumbling does not swamp the internal motion, exactly as rmsf
does. Omit the alpha-carbon selection to get an all-atom map. It is a few NumPy
operations over the coordinate stack: lightweight, but O(N²) in memory, so
prefer the residue-level (alpha-carbon) map for large systems.
Streaming trajectory-lite analysis¶
The functions above take a list of models held in memory. For a long trajectory,
analyze_stream walks the frames in a single pass and keeps only the
reference frame in memory, tracking a few scalars per frame:
analysis = ms.analyze_stream("trajectory.pdb", secondary_structure=True)
analysis.radius_of_gyration # (n_frames,) Rg per frame
analysis.rmsd # (n_frames,) RMSD to the first frame (rmsd[0] == 0)
analysis.helix_fraction # helix/strand/coil fractions (proteins; else None)
analysis.summary() # means, spread, and drift
analysis.plot() # Rg / RMSD / SS timeline panels
source is a path to a multi-frame file (multi-model PDB, multi-frame XYZ, or
multi-record SDF, streamed via ms.stream) or any iterable of Molecule
frames. RMSD is Kabsch-superposed over selection — "auto" (C-alphas when
present, else all atoms), "ca", or "all". Frames must share the first frame's
atom count.
Lite timeline, not a trajectory engine
analyze_stream reads the multi-frame formats MolScope already reads and
computes a handful of scalars. It does not read binary MD trajectories
(DCD/XTC/TRR), unwrap periodic boundaries, or track time/topology across
frames. For those, use a dedicated trajectory library such as MDAnalysis or
MDTraj.