SMILES Tutorial: Atoms

This document is intended to be viewed with a tables-capable browser.

Atom specification

The SMILES atom specification sublanguage represents the atomic properties element identity, isotope, formal charge, and implicit hydrogen count. The syntax for this sublanguage is:


   atom : '[' <mass> symbol <chiral> <hcount> <sign<charge>> ']'
        ;

Elemental identity is represented by a standard atomic symbol which is required to be present for each atom. The second letter of two-character symbols is lower case, e.g., "Br", not "BR". Two character symbols are used for the transactinides (Rf, Ha, and Sg). If unspecified, the implicit properties are unspecified mass, 0 charge, and no implicit hydrogens. Here's a list of the atomic symbols used by SMILES, as a periodic table:

Table 2. Periodic table with atomic symbols.
1a 2a 3b 4b 5b 6b 7b 8 8 8 1b 2b 3a 4a 5a 6a 7a 0
1
H
2
He
3
Li
4
Be
5
B
6
C
7
N
8
O
9
F
10
Ne
11
Na
12
Mg
13
Al
14
Si
15
P
16
S
17
Cl
18
Ar
19
K
20
Ca
21
Sc
22
Ti
23
V
24
Cr
25
Mn
26
Fe
27
Co
28
Ni
29
Cu
30
Zn
31
Ga
32
Ge
33
As
34
Se
35
Br
36
Kr
37
Rb
38
Sr
39
Y
40
Zr
41
Nb
42
Mo
43
Tc
44
Ru
45
Rh
46
Pd
47
Ag
48
Cd
49
In
50
Sn
51
Sb
52
Te
53
I
54
Xe
55
Cs
56
Ba
57
La
72
Hf
73
Ta
74
W
75
Re
76
Os
77
Ir
78
Pt
79
Au
80
Hg
81
Tl
82
Pb
83
Bi
84
Po
85
At
86
Rn
87
Fr
88
Ra
89
Ac
104
Rf
105
Ha
106
Sg
58
Ce
59
Pr
60
Nd
61
Pm
62
Sm
63
Eu
64
Gd
65
Tb
66
Dy
67
Ho
68
Er
69
Tm
70
Yb
71
Lu
90
Th
91
Pa
92
U
93
Np
94
Pu
95
Am
96
Cm
97
Bk
98
Cf
99
Es
100
Fm
101
Md
102
No
103
Lr

Elements in the following "organic subset" typically have well-defined valence and may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. This subset, and their lowest normal valences, are:

B(3), C(4), N(3,5), O(2), P(3,5), S(2,4,6), F(1), Cl(1), Br(1), I(1)

Advanced issues

The full SMILES language allows sp2-hybridized atoms to be indicated by writing the atomic symbol in lower case. This seemingly strange convention is explained in the section on aromaticity.

The symbol `*' ("star") is treated by SMILES as a valid atomic symbol meaning "unspecified atomic number" and is represented as an atom of atomic number zero.

Examples

Table 3. Atomic specification in SMILES.
Depiction SMILES Name Remark
[S] elemental sulfur Defaults inside brackets: mass unspecified, charge 0, hcount 0.
[Au] elemental gold Second character of 2-character symbols is lower case.
C methane Normal valence of carbon is 4
P phosphine Lowest normal valence of phosphorous is 3.
S hydrogen sulfide Lowest normal valence of sulfur is 2.
Cl hydrochloric acid Lowest normal valence of the halogens is 1.
[OH-]
or
[OH-1]
hydroxide anion If charge value is missing, 1 is assumed, i.e., `+' is equivalent to `+1' and `-' is equivalent to `-1'
[Fe+2]
or
[Fe++]
iron (II) cation Charge sign may be repeated or have a signed value, e.g., `++' is equivalent to `+2'.
[235U] Uranium-235 A leading integer represents a specified atomic mass.
[*+2] not a molecule An atom of unknown atomic number with a +2 formal charge.

Forward to "Bonds".
Return to table of contents.
Daylight Chemical Information Systems, Inc.
info@daylight.com