The SMILES atom specification sublanguage represents the atomic properties element identity, isotope, formal charge, and implicit hydrogen count. The syntax for this sublanguage is:
atom : '[' <mass> symbol <chiral> <hcount> <sign<charge>> ']' ;
Elemental identity is represented by a standard atomic symbol which is required to be present for each atom. The second letter of two-character symbols is lower case, e.g., "Br", not "BR". Two character symbols are used for the transactinides (Rf, Ha, and Sg). If unspecified, the implicit properties are unspecified mass, 0 charge, and no implicit hydrogens. Here's a list of the atomic symbols used by SMILES, as a periodic table:
1a | 2a | 3b | 4b | 5b | 6b | 7b | 8 | 8 | 8 | 1b | 2b | 3a | 4a | 5a | 6a | 7a | 0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 H |
2 He |
||||||||||||||||
3 Li |
4 Be |
5 B |
6 C |
7 N |
8 O |
9 F |
10 Ne |
||||||||||
11 Na |
12 Mg |
13 Al |
14 Si |
15 P |
16 S |
17 Cl |
18 Ar |
||||||||||
19 K |
20 Ca |
21 Sc |
22 Ti |
23 V |
24 Cr |
25 Mn |
26 Fe |
27 Co |
28 Ni |
29 Cu |
30 Zn |
31 Ga |
32 Ge |
33 As |
34 Se |
35 Br |
36 Kr |
37 Rb |
38 Sr |
39 Y |
40 Zr |
41 Nb |
42 Mo |
43 Tc |
44 Ru |
45 Rh |
46 Pd |
47 Ag |
48 Cd |
49 In |
50 Sn |
51 Sb |
52 Te |
53 I |
54 Xe |
55 Cs |
56 Ba |
57 La |
72 Hf |
73 Ta |
74 W |
75 Re |
76 Os |
77 Ir |
78 Pt |
79 Au |
80 Hg |
81 Tl |
82 Pb |
83 Bi |
84 Po |
85 At |
86 Rn |
87 Fr |
88 Ra |
89 Ac |
104 Rf |
105 Ha |
106 Sg |
||||||||||||
58 Ce |
59 Pr |
60 Nd |
61 Pm |
62 Sm |
63 Eu |
64 Gd |
65 Tb |
66 Dy |
67 Ho |
68 Er |
69 Tm |
70 Yb |
71 Lu |
||||
90 Th |
91 Pa |
92 U |
93 Np |
94 Pu |
95 Am |
96 Cm |
97 Bk |
98 Cf |
99 Es |
100 Fm |
101 Md |
102 No |
103 Lr |
Elements in the following "organic subset" typically have well-defined valence and may be written without brackets if the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds. This subset, and their lowest normal valences, are:
The full SMILES language allows sp2-hybridized atoms to be indicated by writing the atomic symbol in lower case. This seemingly strange convention is explained in the section on aromaticity.
The symbol `*' ("star") is treated by SMILES as a valid atomic symbol meaning "unspecified atomic number" and is represented as an atom of atomic number zero.Depiction | SMILES | Name | Remark |
---|---|---|---|
[S] | elemental sulfur | Defaults inside brackets: mass unspecified, charge 0, hcount 0. | |
[Au] | elemental gold | Second character of 2-character symbols is lower case. | |
C | methane | Normal valence of carbon is 4 | |
P | phosphine | Lowest normal valence of phosphorous is 3. | |
S | hydrogen sulfide | Lowest normal valence of sulfur is 2. | |
Cl | hydrochloric acid | Lowest normal valence of the halogens is 1. | |
[OH-] or [OH-1] |
hydroxide anion | If charge value is missing, 1 is assumed, i.e., `+' is equivalent to `+1' and `-' is equivalent to `-1' | |
[Fe+2] or [Fe++] |
iron (II) cation | Charge sign may be repeated or have a signed value, e.g., `++' is equivalent to `+2'. | |
[235U] | Uranium-235 | A leading integer represents a specified atomic mass. | |
[*+2] | not a molecule | An atom of unknown atomic number with a +2 formal charge. |