Skip to content

GraphIDGenerator

Core Python implementation with full parameter control.

API Reference

graph_id.core.graph_id.GraphIDGenerator

Core Python implementation of Graph ID generation.

GraphIDGenerator converts atomic structures into unique, deterministic identifiers by analyzing the topological and compositional properties of the structure graph.

The algorithm:

  1. Constructs a graph where atoms are nodes and bonds are edges
  2. Computes compositional sequences for each atom (local environment fingerprints)
  3. Iteratively refines sequences until convergence
  4. Hashes the sequences to produce the final ID

Examples:

>>> from pymatgen.core import Structure
>>> from graph_id.core.graph_id import GraphIDGenerator
>>> structure = Structure.from_file("NaCl.cif")
>>> gen = GraphIDGenerator()
>>> gen.get_id(structure)
'NaCl-3D-88c8e156db1b0fd9'
See Also

GraphIDMaker : High-level interface with simpler API DistanceClusteringGraphID : Variant using distance clustering

Source code in graph_id/core/graph_id.py
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
class GraphIDGenerator:

    """Core Python implementation of Graph ID generation.

    GraphIDGenerator converts atomic structures into unique, deterministic identifiers
    by analyzing the topological and compositional properties of the structure graph.

    The algorithm:

    1. Constructs a graph where atoms are nodes and bonds are edges
    2. Computes compositional sequences for each atom (local environment fingerprints)
    3. Iteratively refines sequences until convergence
    4. Hashes the sequences to produce the final ID

    Examples
    --------
    >>> from pymatgen.core import Structure
    >>> from graph_id.core.graph_id import GraphIDGenerator
    >>> structure = Structure.from_file("NaCl.cif")
    >>> gen = GraphIDGenerator()
    >>> gen.get_id(structure)
    'NaCl-3D-88c8e156db1b0fd9'

    See Also
    --------
    GraphIDMaker : High-level interface with simpler API
    DistanceClusteringGraphID : Variant using distance clustering

    """

    def __init__(  # noqa: PLR0913
        self,
        nn=None,
        wyckoff=False,
        diameter_factor=2,
        additional_depth=1,
        symmetry_tol=0.1,
        topology_only=False,
        loop=False,
        digest_size=8,
        prepend_composition=True,
        prepend_dimensionality=True,
    ):
        """Initialize the GraphIDGenerator.

        Parameters
        ----------
        nn : NearNeighbors, optional
            A neighbor-finding strategy from pymatgen.analysis.local_env.
            If None, defaults to MinimumDistanceNN().
        wyckoff : bool, default False
            If True, include Wyckoff position information in the ID.
            Cannot be used together with ``loop=True``.
        diameter_factor : int, default 2
            Multiplier for the graph diameter to determine traversal depth.
            The total depth is ``diameter_factor * diameter + additional_depth``.
        additional_depth : int, default 1
            Extra depth added to the calculated traversal depth.
        symmetry_tol : float, default 0.1
            Tolerance for symmetry operations when detecting Wyckoff positions.
            Only used when ``wyckoff=True``.
        topology_only : bool, default False
            If True, generate topology-only IDs that ignore element types.
            Useful for finding isostructural materials.
            Cannot be used together with ``loop=True``.
        loop : bool, default False
            If True, use loop/ring-based identification algorithm.
            Cannot be used together with ``wyckoff=True`` or ``topology_only=True``.
        digest_size : int, default 8
            Size of the BLAKE2b hash digest in bytes.
            The output will be ``2 * digest_size`` hexadecimal characters.
        prepend_composition : bool, default True
            If True, prepend the reduced chemical formula to the ID.
        prepend_dimensionality : bool, default True
            If True, prepend the dimensionality (0D, 1D, 2D, 3D) to the ID.

        Raises
        ------
        ValueError
            If incompatible options are specified:

            - ``wyckoff=True`` and ``loop=True``
            - ``loop=True`` and ``topology_only=True``

        Examples
        --------
        >>> gen = GraphIDGenerator()  # Default settings
        >>> gen = GraphIDGenerator(topology_only=True)  # Topology-only
        >>> gen = GraphIDGenerator(wyckoff=True, symmetry_tol=0.01)  # With Wyckoff
        >>> gen = GraphIDGenerator(diameter_factor=3, additional_depth=2)  # Deeper traversal

        """
        if wyckoff and loop:
            msg = "wyckoff and loop cannot be True at the same time"
            raise ValueError(msg)

        if loop and topology_only:
            msg = "loop and topology_only cannot be True at the same time"
            raise ValueError(msg)

        if nn is None:
            self.nn = MinimumDistanceNN()
        else:
            self.nn = nn

        self.wyckoff = wyckoff
        self.additional_depth = additional_depth
        self.diameter_factor = diameter_factor
        self.symmetry_tol = symmetry_tol
        self.topology_only = topology_only
        self.loop = loop
        self.digest_size = digest_size
        self.prepend_composition = prepend_composition
        self.prepend_dimensionality = prepend_dimensionality

    def _join_cs_list(self, cs_list):
        """Join and hash a list of compositional sequences."""
        return blake("-".join(sorted(cs_list)))

    def _component_strings_to_whole_id(self, component_strings):
        """Combine component hashes into a single ID."""
        long_str = ":".join(np.sort(component_strings))
        return blake2b(long_str.encode("ascii"), digest_size=self.digest_size).hexdigest()

    def get_id(self, structure):
        """Generate a Graph ID for the given structure.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object representing the crystal or molecule.

        Returns
        -------
        str
            The Graph ID. Format depends on configuration:

            - Default: ``"{formula}-{dim}D-{hash}"``
            - ``prepend_composition=False``: ``"{dim}D-{hash}"``
            - ``prepend_dimensionality=False``: ``"{formula}-{hash}"``
            - Both False: ``"{hash}"``

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> gen.get_id(nacl_structure)
        'NaCl-3D-88c8e156db1b0fd9'

        >>> gen = GraphIDGenerator(prepend_composition=False)
        >>> gen.get_id(nacl_structure)
        '3D-88c8e156db1b0fd9'

        """
        sg = self.prepare_structure_graph(structure)
        n = len(sg.cc_cs)
        array = np.empty(
            [
                n,
            ],
            dtype=object,
        )
        for i, component in enumerate(sg.cc_cs):
            array[i] = self._join_cs_list(component["cs_list"])
        gid = self._component_strings_to_whole_id(array)

        return self.elaborate_comp_dim(sg, gid)

    def elaborate_comp_dim(self, sg, gid):
        """Add composition and dimensionality prefixes to a Graph ID.

        Parameters
        ----------
        sg : StructureGraph
            The prepared structure graph.
        gid : str
            The base Graph ID hash.

        Returns
        -------
        str
            The elaborated Graph ID with prefixes.

        """
        if self.prepend_dimensionality:
            dim = get_dimensionality_larsen(sg)
            gid = f"{dim}D-{gid}"

        if self.prepend_composition and not self.topology_only:
            gid = f"{sg.structure.composition.reduced_formula}-{gid}"

        return gid

    def get_id_catch_error(self, structure):
        """Generate a Graph ID with error handling.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.

        Returns
        -------
        str
            The Graph ID, or an empty string if an error occurs.

        Notes
        -----
        This method catches all exceptions silently and returns an empty string.
        Useful for batch processing where some structures may fail.

        """
        try:
            return self.get_id(structure)
        except Exception:  # noqa: BLE001
            return ""

    def get_many_ids(self, structures, parallel=False):
        """Generate Graph IDs for multiple structures.

        Parameters
        ----------
        structures : list of Structure
            A list of pymatgen Structure objects.
        parallel : bool, default False
            If True, use parallel processing with all available CPU cores.
            Shows a progress bar via tqdm.

        Returns
        -------
        list of str
            A list of Graph IDs corresponding to each input structure.
            Failed structures will have empty string IDs.

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> structures = [Structure.from_file(f) for f in cif_files]
        >>> ids = gen.get_many_ids(structures, parallel=True)

        """
        if parallel:
            n_cores = multi.cpu_count()

            p = Pool(n_cores)
            imap = p.imap(self.get_id_catch_error, structures)

            return list(tqdm(imap, total=len(structures)))

        return [self.get_id(s) for s in structures]

    def get_component_ids(self, structure):
        """Get Graph IDs for each connected component in the structure.

        For structures with multiple disconnected fragments (e.g., molecular
        crystals), this returns a separate ID for each component.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.

        Returns
        -------
        numpy.ndarray
            Array of dictionaries, each containing:

            - ``site_i``: Set of site indices in this component
            - ``graph_id``: The Graph ID for this component

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> components = gen.get_component_ids(molecular_crystal)
        >>> for comp in components:
        ...     print(f"Sites {comp['site_i']}: {comp['graph_id']}")

        """
        sg = self.prepare_structure_graph(structure)
        cc_gid = np.empty(
            [
                len(sg.cc_cs),
            ],
            dtype=object,
        )
        for i, component in enumerate(sg.cc_cs):
            each_long_str = blake("-".join(sorted(component["cs_list"])))
            gid = blake2b(each_long_str.encode("ascii"), digest_size=16).hexdigest()
            # cc_gid[] = gid
            cc_gid[i] = {"site_i": component["site_i"], "graph_id": gid}

        return cc_gid

    def _molecule_to_structure(self, mol: Molecule, vacuum: float = 10.0) -> Structure:
        coords = mol.cart_coords
        species = mol.species

        min_c = coords.min(axis=0)
        max_c = coords.max(axis=0)
        lengths = max_c - min_c + 2 * vacuum

        lattice = Lattice.from_parameters(lengths[0], lengths[1], lengths[2], 90, 90, 90)

        shifted_coords = coords - min_c + vacuum

        return Structure(lattice, species, shifted_coords, coords_are_cartesian=True)

    def get_merged_id(self, materials_list: list[Structure | Molecule | Atoms]):
        """Generate a merged Graph ID for multiple materials.

        This method computes Graph IDs for all connected components found in
        the input materials and merges them into a single identifier. The
        connected components extracted from each material are converted into
        canonical strings, sorted, concatenated with ``:`` separators, and
        hashed using BLAKE2b to produce the final merged Graph ID.

        The input materials may be provided as pymatgen ``Structure`` objects,
        pymatgen ``Molecule`` objects, or ASE ``Atoms`` objects. Molecules and
        ASE atoms are internally converted to ``Structure`` objects before
        graph analysis.

        Parameters
        ----------
        materials_list : list of Structure or Molecule or Atoms
            A list of materials to be included in the merged Graph ID. Each
            item must be one of the following types:

            - ``pymatgen.core.Structure``
            - ``pymatgen.core.Molecule``
            - ``ase.Atoms``

        Returns
        -------
        str
            A hexadecimal string representing the merged Graph ID of all
            connected components found in the input materials.

        Raises
        ------
        TypeError
            If an item in ``materials_list`` is not a supported material type.

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> gid = gen.get_merged_id([structure1, structure2])
        >>> print(gid)

        The method can also accept mixed object types:

        >>> gid = gen.get_merged_id([structure, molecule, ase_atoms])
        >>> print(gid)

        """
        array_list = []
        for material in materials_list:
            if isinstance(material, Structure):
                structure = material
            elif isinstance(material, Molecule):
                structure = self._molecule_to_structure(material)
            elif isinstance(material, Atoms):
                structure = AseAtomsAdaptor.get_structure(material)
            else:
                error_message = (
                    "Item of materials_list must be pymatgen.core.Structure, "
                    f"pymatgen.core.Molecule, or ase.Atoms, got {type(material).__name__}"
                )
                raise TypeError(error_message)

            sg = self.prepare_structure_graph(structure)
            n = len(sg.cc_cs)
            array = np.empty(
                [
                    n,
                ],
                dtype=object,
            )
            for i, component in enumerate(sg.cc_cs):
                array[i] = self._join_cs_list(component["cs_list"])
            array_list.extend(array)

        long_str = ":".join(np.sort(array_list))

        return blake2b(long_str.encode("ascii"), digest_size=self.digest_size).hexdigest()

    def are_same(self, structure1, structure2):
        """Check if two structures have the same Graph ID.

        Parameters
        ----------
        structure1 : Structure
            The first pymatgen Structure object.
        structure2 : Structure
            The second pymatgen Structure object.

        Returns
        -------
        bool
            True if both structures have identical Graph IDs, False otherwise.

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> if gen.are_same(struct1, struct2):
        ...     print("Structures are topologically equivalent")

        """
        return self.get_id(structure1) == self.get_id(structure2)

    def prepare_structure_graph(self, structure):
        """Build and prepare the structure graph with compositional sequences.

        This method constructs a graph representation of the structure,
        computes compositional sequences for each site, and iteratively
        refines them until convergence.

        Parameters
        ----------
        structure : Structure
            A pymatgen Structure object.

        Returns
        -------
        StructureGraph
            The prepared structure graph with compositional sequence node
            attributes. The graph also has a ``cc_cs`` attribute containing
            the compositional sequences for each connected component.

        Notes
        -----
        This is primarily an internal method, but can be useful for
        advanced analysis of the structure graph.

        """
        sg = StructureGraph.from_local_env_strategy(structure, self.nn)
        use_previous_cs = False

        compound = sg.structure
        prev_num_uniq = len(compound.composition)

        if self.topology_only:
            for site_i in range(len(sg.structure)):
                sg.structure.replace(site_i, Element("H"))

        if self.wyckoff:
            sg.set_wyckoffs(symmetry_tol=self.symmetry_tol)

            # remove nx?
            prev_num_uniq = len(list(set(nx.get_node_attributes(sg.graph, "compositional_sequence").values())))

        elif self.loop:
            sg.set_loops(
                diameter_factor=self.diameter_factor,
                additional_depth=self.additional_depth,
            )

        else:
            sg.set_elemental_labels()

        while True:
            sg.set_compositional_sequence_node_attr(
                hash_cs=True,
                wyckoff=self.wyckoff,
                additional_depth=self.additional_depth,
                diameter_factor=self.diameter_factor,
                use_previous_cs=use_previous_cs or self.wyckoff,
            )

            num_unique_nodes = len(list(set(nx.get_node_attributes(sg.graph, "compositional_sequence").values())))
            use_previous_cs = True

            if prev_num_uniq == num_unique_nodes:
                return sg

            prev_num_uniq = num_unique_nodes

    def get_unique_structures(self, structures: list[Structure]) -> list[Structure]:
        """Filter a list of structures to keep only unique ones.

        Removes duplicate structures based on their Graph IDs. When duplicates
        are found, only the first occurrence is kept.

        Parameters
        ----------
        structures : list of Structure
            A list of pymatgen Structure objects, possibly containing duplicates.

        Returns
        -------
        list of Structure
            A list containing only unique structures (first occurrence of each).

        Examples
        --------
        >>> gen = GraphIDGenerator()
        >>> all_structures = load_many_cifs()
        >>> unique = gen.get_unique_structures(all_structures)
        >>> print(f"Reduced {len(all_structures)} to {len(unique)} unique")

        """
        unique_structures = []
        graph_ids = set()

        for strct in structures:
            new_graph_id = self.get_id(strct)
            if new_graph_id not in graph_ids:
                graph_ids.add(new_graph_id)
                unique_structures.append(strct)

        return unique_structures

Methods:

__init__(nn=None, wyckoff=False, diameter_factor=2, additional_depth=1, symmetry_tol=0.1, topology_only=False, loop=False, digest_size=8, prepend_composition=True, prepend_dimensionality=True)

Initialize the GraphIDGenerator.

Parameters:

Name Type Description Default
nn NearNeighbors

A neighbor-finding strategy from pymatgen.analysis.local_env. If None, defaults to MinimumDistanceNN().

None
wyckoff bool

If True, include Wyckoff position information in the ID. Cannot be used together with loop=True.

False
diameter_factor int

Multiplier for the graph diameter to determine traversal depth. The total depth is diameter_factor * diameter + additional_depth.

2
additional_depth int

Extra depth added to the calculated traversal depth.

1
symmetry_tol float

Tolerance for symmetry operations when detecting Wyckoff positions. Only used when wyckoff=True.

0.1
topology_only bool

If True, generate topology-only IDs that ignore element types. Useful for finding isostructural materials. Cannot be used together with loop=True.

False
loop bool

If True, use loop/ring-based identification algorithm. Cannot be used together with wyckoff=True or topology_only=True.

False
digest_size int

Size of the BLAKE2b hash digest in bytes. The output will be 2 * digest_size hexadecimal characters.

8
prepend_composition bool

If True, prepend the reduced chemical formula to the ID.

True
prepend_dimensionality bool

If True, prepend the dimensionality (0D, 1D, 2D, 3D) to the ID.

True

Raises:

Type Description
ValueError

If incompatible options are specified:

  • wyckoff=True and loop=True
  • loop=True and topology_only=True

Examples:

>>> gen = GraphIDGenerator()  # Default settings
>>> gen = GraphIDGenerator(topology_only=True)  # Topology-only
>>> gen = GraphIDGenerator(wyckoff=True, symmetry_tol=0.01)  # With Wyckoff
>>> gen = GraphIDGenerator(diameter_factor=3, additional_depth=2)  # Deeper traversal
Source code in graph_id/core/graph_id.py
def __init__(  # noqa: PLR0913
    self,
    nn=None,
    wyckoff=False,
    diameter_factor=2,
    additional_depth=1,
    symmetry_tol=0.1,
    topology_only=False,
    loop=False,
    digest_size=8,
    prepend_composition=True,
    prepend_dimensionality=True,
):
    """Initialize the GraphIDGenerator.

    Parameters
    ----------
    nn : NearNeighbors, optional
        A neighbor-finding strategy from pymatgen.analysis.local_env.
        If None, defaults to MinimumDistanceNN().
    wyckoff : bool, default False
        If True, include Wyckoff position information in the ID.
        Cannot be used together with ``loop=True``.
    diameter_factor : int, default 2
        Multiplier for the graph diameter to determine traversal depth.
        The total depth is ``diameter_factor * diameter + additional_depth``.
    additional_depth : int, default 1
        Extra depth added to the calculated traversal depth.
    symmetry_tol : float, default 0.1
        Tolerance for symmetry operations when detecting Wyckoff positions.
        Only used when ``wyckoff=True``.
    topology_only : bool, default False
        If True, generate topology-only IDs that ignore element types.
        Useful for finding isostructural materials.
        Cannot be used together with ``loop=True``.
    loop : bool, default False
        If True, use loop/ring-based identification algorithm.
        Cannot be used together with ``wyckoff=True`` or ``topology_only=True``.
    digest_size : int, default 8
        Size of the BLAKE2b hash digest in bytes.
        The output will be ``2 * digest_size`` hexadecimal characters.
    prepend_composition : bool, default True
        If True, prepend the reduced chemical formula to the ID.
    prepend_dimensionality : bool, default True
        If True, prepend the dimensionality (0D, 1D, 2D, 3D) to the ID.

    Raises
    ------
    ValueError
        If incompatible options are specified:

        - ``wyckoff=True`` and ``loop=True``
        - ``loop=True`` and ``topology_only=True``

    Examples
    --------
    >>> gen = GraphIDGenerator()  # Default settings
    >>> gen = GraphIDGenerator(topology_only=True)  # Topology-only
    >>> gen = GraphIDGenerator(wyckoff=True, symmetry_tol=0.01)  # With Wyckoff
    >>> gen = GraphIDGenerator(diameter_factor=3, additional_depth=2)  # Deeper traversal

    """
    if wyckoff and loop:
        msg = "wyckoff and loop cannot be True at the same time"
        raise ValueError(msg)

    if loop and topology_only:
        msg = "loop and topology_only cannot be True at the same time"
        raise ValueError(msg)

    if nn is None:
        self.nn = MinimumDistanceNN()
    else:
        self.nn = nn

    self.wyckoff = wyckoff
    self.additional_depth = additional_depth
    self.diameter_factor = diameter_factor
    self.symmetry_tol = symmetry_tol
    self.topology_only = topology_only
    self.loop = loop
    self.digest_size = digest_size
    self.prepend_composition = prepend_composition
    self.prepend_dimensionality = prepend_dimensionality

get_id(structure)

Generate a Graph ID for the given structure.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object representing the crystal or molecule.

required

Returns:

Type Description
str

The Graph ID. Format depends on configuration:

  • Default: "{formula}-{dim}D-{hash}"
  • prepend_composition=False: "{dim}D-{hash}"
  • prepend_dimensionality=False: "{formula}-{hash}"
  • Both False: "{hash}"

Examples:

>>> gen = GraphIDGenerator()
>>> gen.get_id(nacl_structure)
'NaCl-3D-88c8e156db1b0fd9'
>>> gen = GraphIDGenerator(prepend_composition=False)
>>> gen.get_id(nacl_structure)
'3D-88c8e156db1b0fd9'
Source code in graph_id/core/graph_id.py
def get_id(self, structure):
    """Generate a Graph ID for the given structure.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object representing the crystal or molecule.

    Returns
    -------
    str
        The Graph ID. Format depends on configuration:

        - Default: ``"{formula}-{dim}D-{hash}"``
        - ``prepend_composition=False``: ``"{dim}D-{hash}"``
        - ``prepend_dimensionality=False``: ``"{formula}-{hash}"``
        - Both False: ``"{hash}"``

    Examples
    --------
    >>> gen = GraphIDGenerator()
    >>> gen.get_id(nacl_structure)
    'NaCl-3D-88c8e156db1b0fd9'

    >>> gen = GraphIDGenerator(prepend_composition=False)
    >>> gen.get_id(nacl_structure)
    '3D-88c8e156db1b0fd9'

    """
    sg = self.prepare_structure_graph(structure)
    n = len(sg.cc_cs)
    array = np.empty(
        [
            n,
        ],
        dtype=object,
    )
    for i, component in enumerate(sg.cc_cs):
        array[i] = self._join_cs_list(component["cs_list"])
    gid = self._component_strings_to_whole_id(array)

    return self.elaborate_comp_dim(sg, gid)

get_id_catch_error(structure)

Generate a Graph ID with error handling.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object.

required

Returns:

Type Description
str

The Graph ID, or an empty string if an error occurs.

Notes

This method catches all exceptions silently and returns an empty string. Useful for batch processing where some structures may fail.

Source code in graph_id/core/graph_id.py
def get_id_catch_error(self, structure):
    """Generate a Graph ID with error handling.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.

    Returns
    -------
    str
        The Graph ID, or an empty string if an error occurs.

    Notes
    -----
    This method catches all exceptions silently and returns an empty string.
    Useful for batch processing where some structures may fail.

    """
    try:
        return self.get_id(structure)
    except Exception:  # noqa: BLE001
        return ""

get_many_ids(structures, parallel=False)

Generate Graph IDs for multiple structures.

Parameters:

Name Type Description Default
structures list of Structure

A list of pymatgen Structure objects.

required
parallel bool

If True, use parallel processing with all available CPU cores. Shows a progress bar via tqdm.

False

Returns:

Type Description
list of str

A list of Graph IDs corresponding to each input structure. Failed structures will have empty string IDs.

Examples:

>>> gen = GraphIDGenerator()
>>> structures = [Structure.from_file(f) for f in cif_files]
>>> ids = gen.get_many_ids(structures, parallel=True)
Source code in graph_id/core/graph_id.py
def get_many_ids(self, structures, parallel=False):
    """Generate Graph IDs for multiple structures.

    Parameters
    ----------
    structures : list of Structure
        A list of pymatgen Structure objects.
    parallel : bool, default False
        If True, use parallel processing with all available CPU cores.
        Shows a progress bar via tqdm.

    Returns
    -------
    list of str
        A list of Graph IDs corresponding to each input structure.
        Failed structures will have empty string IDs.

    Examples
    --------
    >>> gen = GraphIDGenerator()
    >>> structures = [Structure.from_file(f) for f in cif_files]
    >>> ids = gen.get_many_ids(structures, parallel=True)

    """
    if parallel:
        n_cores = multi.cpu_count()

        p = Pool(n_cores)
        imap = p.imap(self.get_id_catch_error, structures)

        return list(tqdm(imap, total=len(structures)))

    return [self.get_id(s) for s in structures]

are_same(structure1, structure2)

Check if two structures have the same Graph ID.

Parameters:

Name Type Description Default
structure1 Structure

The first pymatgen Structure object.

required
structure2 Structure

The second pymatgen Structure object.

required

Returns:

Type Description
bool

True if both structures have identical Graph IDs, False otherwise.

Examples:

>>> gen = GraphIDGenerator()
>>> if gen.are_same(struct1, struct2):
...     print("Structures are topologically equivalent")
Source code in graph_id/core/graph_id.py
def are_same(self, structure1, structure2):
    """Check if two structures have the same Graph ID.

    Parameters
    ----------
    structure1 : Structure
        The first pymatgen Structure object.
    structure2 : Structure
        The second pymatgen Structure object.

    Returns
    -------
    bool
        True if both structures have identical Graph IDs, False otherwise.

    Examples
    --------
    >>> gen = GraphIDGenerator()
    >>> if gen.are_same(struct1, struct2):
    ...     print("Structures are topologically equivalent")

    """
    return self.get_id(structure1) == self.get_id(structure2)

get_unique_structures(structures: list[Structure]) -> list[Structure]

Filter a list of structures to keep only unique ones.

Removes duplicate structures based on their Graph IDs. When duplicates are found, only the first occurrence is kept.

Parameters:

Name Type Description Default
structures list of Structure

A list of pymatgen Structure objects, possibly containing duplicates.

required

Returns:

Type Description
list of Structure

A list containing only unique structures (first occurrence of each).

Examples:

>>> gen = GraphIDGenerator()
>>> all_structures = load_many_cifs()
>>> unique = gen.get_unique_structures(all_structures)
>>> print(f"Reduced {len(all_structures)} to {len(unique)} unique")
Source code in graph_id/core/graph_id.py
def get_unique_structures(self, structures: list[Structure]) -> list[Structure]:
    """Filter a list of structures to keep only unique ones.

    Removes duplicate structures based on their Graph IDs. When duplicates
    are found, only the first occurrence is kept.

    Parameters
    ----------
    structures : list of Structure
        A list of pymatgen Structure objects, possibly containing duplicates.

    Returns
    -------
    list of Structure
        A list containing only unique structures (first occurrence of each).

    Examples
    --------
    >>> gen = GraphIDGenerator()
    >>> all_structures = load_many_cifs()
    >>> unique = gen.get_unique_structures(all_structures)
    >>> print(f"Reduced {len(all_structures)} to {len(unique)} unique")

    """
    unique_structures = []
    graph_ids = set()

    for strct in structures:
        new_graph_id = self.get_id(strct)
        if new_graph_id not in graph_ids:
            graph_ids.add(new_graph_id)
            unique_structures.append(strct)

    return unique_structures

get_component_ids(structure)

Get Graph IDs for each connected component in the structure.

For structures with multiple disconnected fragments (e.g., molecular crystals), this returns a separate ID for each component.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object.

required

Returns:

Type Description
ndarray

Array of dictionaries, each containing:

  • site_i: Set of site indices in this component
  • graph_id: The Graph ID for this component

Examples:

>>> gen = GraphIDGenerator()
>>> components = gen.get_component_ids(molecular_crystal)
>>> for comp in components:
...     print(f"Sites {comp['site_i']}: {comp['graph_id']}")
Source code in graph_id/core/graph_id.py
def get_component_ids(self, structure):
    """Get Graph IDs for each connected component in the structure.

    For structures with multiple disconnected fragments (e.g., molecular
    crystals), this returns a separate ID for each component.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.

    Returns
    -------
    numpy.ndarray
        Array of dictionaries, each containing:

        - ``site_i``: Set of site indices in this component
        - ``graph_id``: The Graph ID for this component

    Examples
    --------
    >>> gen = GraphIDGenerator()
    >>> components = gen.get_component_ids(molecular_crystal)
    >>> for comp in components:
    ...     print(f"Sites {comp['site_i']}: {comp['graph_id']}")

    """
    sg = self.prepare_structure_graph(structure)
    cc_gid = np.empty(
        [
            len(sg.cc_cs),
        ],
        dtype=object,
    )
    for i, component in enumerate(sg.cc_cs):
        each_long_str = blake("-".join(sorted(component["cs_list"])))
        gid = blake2b(each_long_str.encode("ascii"), digest_size=16).hexdigest()
        # cc_gid[] = gid
        cc_gid[i] = {"site_i": component["site_i"], "graph_id": gid}

    return cc_gid

prepare_structure_graph(structure)

Build and prepare the structure graph with compositional sequences.

This method constructs a graph representation of the structure, computes compositional sequences for each site, and iteratively refines them until convergence.

Parameters:

Name Type Description Default
structure Structure

A pymatgen Structure object.

required

Returns:

Type Description
StructureGraph

The prepared structure graph with compositional sequence node attributes. The graph also has a cc_cs attribute containing the compositional sequences for each connected component.

Notes

This is primarily an internal method, but can be useful for advanced analysis of the structure graph.

Source code in graph_id/core/graph_id.py
def prepare_structure_graph(self, structure):
    """Build and prepare the structure graph with compositional sequences.

    This method constructs a graph representation of the structure,
    computes compositional sequences for each site, and iteratively
    refines them until convergence.

    Parameters
    ----------
    structure : Structure
        A pymatgen Structure object.

    Returns
    -------
    StructureGraph
        The prepared structure graph with compositional sequence node
        attributes. The graph also has a ``cc_cs`` attribute containing
        the compositional sequences for each connected component.

    Notes
    -----
    This is primarily an internal method, but can be useful for
    advanced analysis of the structure graph.

    """
    sg = StructureGraph.from_local_env_strategy(structure, self.nn)
    use_previous_cs = False

    compound = sg.structure
    prev_num_uniq = len(compound.composition)

    if self.topology_only:
        for site_i in range(len(sg.structure)):
            sg.structure.replace(site_i, Element("H"))

    if self.wyckoff:
        sg.set_wyckoffs(symmetry_tol=self.symmetry_tol)

        # remove nx?
        prev_num_uniq = len(list(set(nx.get_node_attributes(sg.graph, "compositional_sequence").values())))

    elif self.loop:
        sg.set_loops(
            diameter_factor=self.diameter_factor,
            additional_depth=self.additional_depth,
        )

    else:
        sg.set_elemental_labels()

    while True:
        sg.set_compositional_sequence_node_attr(
            hash_cs=True,
            wyckoff=self.wyckoff,
            additional_depth=self.additional_depth,
            diameter_factor=self.diameter_factor,
            use_previous_cs=use_previous_cs or self.wyckoff,
        )

        num_unique_nodes = len(list(set(nx.get_node_attributes(sg.graph, "compositional_sequence").values())))
        use_previous_cs = True

        if prev_num_uniq == num_unique_nodes:
            return sg

        prev_num_uniq = num_unique_nodes

Quick Example

from graph_id.core.graph_id import GraphIDGenerator

gen = GraphIDGenerator()
graph_id = gen.get_id(structure)

Common Configurations

Topology-Only Mode

gen = GraphIDGenerator(topology_only=True)
# Output: "3D-a1b2c3d4e5f6g7h8" (no formula)

With Wyckoff Positions

gen = GraphIDGenerator(wyckoff=True, symmetry_tol=0.1)

Minimal Output

gen = GraphIDGenerator(
    prepend_composition=False,
    prepend_dimensionality=False
)
# Output: "a1b2c3d4e5f6g7h8" (hash only)

Output Format

Configuration Format
Default {formula}-{dim}D-{hash}
prepend_dimensionality=False {formula}-{hash}
prepend_composition=False {dim}D-{hash}
Both False {hash}

Invalid Combinations

  • wyckoff=True + loop=True → ValueError
  • loop=True + topology_only=True → ValueError

See Also