Monday, June 10, 2013

Hyphenation and Chemical Names - Guest Post by Albert Dijkstra

Dividing and Rules -What do you do when you reach the end of the line?

Proofreading is an integral part of publishing, and checking whether words have been properly divided at the end of a line is an integral part of proofreading. I always check these divisions at the end of my proofreading since previous corrections may have affected the need to divide or the likely location of the division. It is not surprising that German and French printers have difficulty in dividing English words correctly, but when I recently read the proofs of a French chapter on oils and fats to be printed in France, incorrect word divisions accounted for over a third of the corrections and comments I made. One of the possible reasons is that most of these incorrect divisions concern chemical names. These names can be long and are thus more likely to have to be divided. Besides, there are hardly any rules telling people how to divide chemical names[1]. It is the purpose of this contribution to propose novel, unambiguous, and generally applicable dividing rules for chemical names.

When finding out how to divide an English word properly, I never rely on my memory but look up every word concerned in Webster’s since this dictionary tells me how to divide individual words. Collins, my English dictionary, and the Oxford English Dictionary, which I have on hard disk, provide no guidance in this respect. Moreover, I find it very difficult to remember what strikes me as somewhat arbitrary.

Take the words analyze and analysis. They have the same origins and you might therefore expect them to be divided identically. No way: an-a-lyze and anal-ysis. Having learned Greek at grammar school[2], I recognize from which elements these words were assembled[3]. To me, it is only to be expected that a word combining several elements should be divided between these elements: ana-ly . . . but Webster is apparently not aware of this.

Different languages have different ways of dealing with the division of words at the end of a line. Their needs also differ. Some languages such as German and Dutch combine elements into a single word whereas English maintains these elements as separate words[4]. The French language also tends to keep them separate but often reverses their order.

German Sandbad
Dutch Zandbad
English Sand bath
French Bain de sable
aLemma: citation form of a word.

So for the English language, the need to divide words at the end of a line is of less importance than for some other languages, and this may well be the reason that systematic chemical names, which can be very long indeed, pose a problem. Accordingly, I think we could do with some clear rules describing the ways we should divide chemical names at the end of a line. The division problem is aggravated because these systematic names often contain hyphens and, according to word processing software, these hyphens show where a word is to be divided if it happens to be longer than the space left on the line. Accordingly, said programs cause chemical names to be divided in an illogical manner. To discuss this aspect in more detail, I first include definitions of the various hyphens[5] :
  • Normal hyphen. This is the hyphen that results from typing a hyphen. It shows on screen and it is printed as such. If it happens to be at the end of the line, it just functions as a hyphen.
  • Soft hyphen. Formerly, authors just handed their handwritten articles to their secretaries. Nowadays, most authors submit their manuscripts themselves, and publishers often only accept electronic versions. So these poor authors must (learn how to) work with a computer and word processing software and perhaps even devote some attention to copyediting. The latter means that they can make life easier for the typesetter by indicating where a word can be divided. Authors can do this by inserting a so-called soft hyphen, which shows up in print as a hyphen when the word must be divided at the location indicated by said soft hyphen but does not show up in print when the word is not divided. When using MSWord, a soft hyphen can be inserted by typing “Ctrl + hyphen.” When the Show/Hide (Ctrl + *, or the ¶) reveals some of the codes, the hyphen shows up on screen as a hyphen with a small line descending from the right side of the hyphen. Like any letter, the soft hyphen can be deleted by using Delete or Backspace. (Microsoft Word 2010 refers to this as an optional hyphen.)
  • Hard hyphen. This hyphen is not regarded as a hyphen by the word processing software. Consequently, a line never ends on a hard hyphen. When using MSWord, it can be introduced into a word by typing “Ctrl/Shift + hyphen”; it shows on screen like a slightly elongated hyphen and it is also printed as a hyphen. (Microsoft Word 2010 refers to this as a nonbreaking hyphen.)
  • Dash (long hyphen; also called an en-dash). According to IUPAC (Recommendations 1993) R‑, this should be used to connect the names of components of an addition compound, and the example used is carbon monoxide–borane.
My wife is English and when discussing how to divide words at the end of a line with her, she pointed out that the bit left at the end of the line should preferably indicate how the entire word should be pronounced. So the words li-bel and li-cense have the hyphen after the “I,” indicating this “i” will be pronounced as in “light.” Words like lib-eral and lim-onene are divided after the letter after the “I,” causing the “i” to be pronounced as in “fish.” We agreed that this approach might well stem from the lack of phoneticism in the English language[6]. This led us to discover why The ACS Style Guide divides meth-yl the way it does, whereas a slightly longer alkyl group is divided as pro-pyl. That is just American pronunciation. When I pronounce propyl, it rhymes with se-nile and not with dev-il. This discovery reinforced my notion that there is a need for a multilingual method of dividing chemical names at the end of a line.

The ACS Style Guide (3rd edition, page 247) has an example of a chemical name,
and it indicates all the places (15 in all) where this word can be divided:
I disagree with this approach and therefore want to propose the following novel end-of-line dividing system for systematic chemical names:
1.    If a systematic chemical name must be divided at the end of a line, this can be done by a hyphen separating its “moieties.” So what do I call a moiety in the present context? What about a descriptor of a chemical structure? In the above example, there are three moieties: 5-(2-chloroethyl), 9-(diaminomethyl), and 2-anthracenol. As you can see, I have included the numbers in the anthracene molecule where the groups are attached.
2.    Moieties themselves can be divided at the end of a line between “terms” unless this makes them awkward to pronounce or a generally accepted way of hyphenation can be applied. So how do I define a “chemical term?” This can be an element such as hydrogen, a root such as palmit, cholester, methyl, amino, a prefix such as ortho, mono, iso, whereby I want to exclude numbers indicating positions (such as in 4-methyl) and letters indicating isomers (such as in α‑tocopherol), or a suffix like anoic, ate, amide. Terms mean something to a chemist. Because this proposal keeps terms together, it reduces the number of places where a chemical name can be divided. The ACS Style Guide gives a number of end-of-line hyphenation examples that I compare with my proposal in the table below:

ACS Style Guide My proposal
ace-to-ni-trile aceto-nitrile
ac-ryl-am-ide[7]  acryl-amide
cy-clo-hex-ane cyclo-hexane
iso-cy-a-nate iso-cyanate
per-chlo-rate per-chlorate
phos-pho-lip-id phospho-lipid
sul-fu-rous sulfur-ous
tol-u-ene toluene
vi-nyl-i-dene vinyl-idene
3.    The above rule has a clause about “awkward pronunciation.” Take the word anthracene. This consists of two chemical terms: the root word antrac[8] and the suffix ene but nobody says anthrac-ene, people say anthra-cene, making it rhyme with Magdalene. Moreover, if the line had ended on antrac-, people might have started to pronounce this as antrak. So anthra-cene is the generally accepted way of dividing this word.
4.    There are also generally accepted ways of hyphenating long terms such as phos-pho-rus, pra-seo-dym-i-um (a rare earth element), and si-tos-ter-ol[9]. However, this last example shows that divisions presented by Webster may lack chemical sense. Sitosterol is a sterol and should be divided as sito-sterol[10].Again, by focusing on terms, I reduced the number of hyphenations from three to just one.
5.    In general, the division of terms should be avoided, and no syllable shall have less than three letters. So prefixes such as iso, meta, neo, ortho, and para will not be divided, which makes reading the name much easier.
6.    As general rule I could say: “When in doubt, don’t divide.”
In the example from The ACS Style Guide mentioned above, the number of potential dividing points has also been reduced. According to my proposed end-of-line dividing system, this word would only be divided at the underlined hyphens shown below:


So the number of possible dividing points has been reduced from 15 to five, which in practice should be enough, especially since they are spread fairly evenly. Let’s have a look at the various hyphens. Dividing after the “5” would leave this number on its own at the end of the line. Printers know that a single number or letter by itself at the end of the line is not allowed, so the name should not be divided at this location anyway. According to the proposed system, this “5” forms an integral part of the first moiety. Dividing after the “2” would be prevented by the novel dividing system since it concerns a prefix indicating a position and dividing chlo-ro is not permitted either, since this would lead to a syllable with only two letters. This brings us to the first hyphen that has been underlined; it is between chloro and ethyl, both chemical terms and thus a logical dividing point. The novel system would not allow ethyl to be divided according to eth-yl or e-thyl and thus avoid this controversy.

The next hyphen is after the bracket that closes the grouping 5-(2-chloroethyl). A division at this point is between two moieties and thus fully in line with the novel end-of-line dividing system, but just as the “5” belongs to the first moiety, so does the “9” to the second. Accordingly, a division after the “9” is not allowed. No divisions are possible as in di-ami-no, because they would lead to syllables with only two letters. Just as ethyl was not divided, the term methyl is not divided either; but after the moiety bracket, division is allowed. The subsequent “2” clearly belongs to the moiety thereafter so the hyphen between the two should be regarded as a hard hyphen that does not permit division at the end of a line. Finally, the division of the term anthra-cene is generally accepted as shown, so the corresponding alcohol can be divided in an analogous manner.

When proposing a hyphenation system, we must ask ourselves who will be involved in its use. That will first of all be the authors, then there may be copyeditors, and finally there are the printers. Authors of a paper containing chemical names will nearly always be chemists for whom the end-of-line dividing rules outlined above will be quite straightforward..

In other words, authors should type their manuscripts in such a way that what appears in print cannot be but correct. So they should make use of the possibilities offered by word processing programs. Where a name contains a hyphen that should not be used to divide the name at the end of a line, the author should type a hard hyphen. Hyphens in the name that can be used to divide the name should be typed as normal hyphens. This ensures that they appear in print as a hyphen and can also divide the name. Locations where the name can be divided but that do not show a hyphen in normal print should be indicated by a soft hyphen. In chloro-ethyl the author should insert the hyphen as a soft hyphen. Then it is printed as chloroethyl except when it is divided at the end of a line.

So when I use the symbol “=” for a hard hyphen, the symbol “~” for the soft hyphen, and the symbol “-” for the normal or regular hyphen, the name of the anthracenol compound used above would look like:
My main reason for submitting this paper is to solicit comment on how the system could be improved. It would be great if readers were to think of names that they feel that cannot be divided in an acceptable manner when adhering to the rules that I listed above. If I knew about them, that might cause me to amend these rules to accommodate these names. So please let me know them. My postal and e-mail addresses are given below.

I have already received some comment[11] on a draft of this paper that mentioned that IUPAC nomenclature books have a small section on the use of punctuation and hyphens but do not bother about how to divide a chemical name at the end of a line—about time someone did.

Albert J. Dijkstra
47210 St Eutrope-de-Born, France

[1] The only one I have come across so far is that there should be at least three letters at the end or the beginning of a line. The number at the end of a line can be controlled by setting a minimum right-hand margin in the word processing program, but I haven’t discovered how to prevent a new line from starting with only two letters. The compound 2,5-cyclohexadiene-5-on could be divided in such a way that only two letters move to the next line. Widows [a single, usually short line of type, as one ending a paragraph, carried over to the top of the next page or column] and orphans [a line of type beginning a new paragraph at the bottom of a column or page] refer to the number of lines at the top and bottom of a page. Something similar for letters at the end or beginning of a line has not been designed.
[2] This was the former Latin school in Rotterdam, Netherlands, which was founded in 1328. It is now called Gymnasium Erasmianum.
[3] These are the preposition ανα meaning “spread out”, and the verb λυειν meaning “to loosen”. So αναλυειν means: to detach, unravel, reduce to.
[4] The Dutch for acetic acid anhydride (three separate words) is azijnzuuranhydride (one word).
[5] There is some confusion in the nomenclature of these hyphens. One of the first word processor programs I used was WordPerfect; it used the term soft hyphen, but in Word97, this is called an optional hyphen. What WordPerfect calls (or called?) a hard hyphen is a nonbreaking hyphen in Word97 and the normal hyphen is called a regular hyphen in Word97.
[6] Written and spoken English have little in common. For example, in Japan, they are taught separately. The standard course comprises the Roman alphabet and reading English. Then there is a separate course called “English conversation ”
[7] There are other inconsistencies in this list: acet-amide vs. ac-ryl-am-ide; ac-ry-late vs. acryl-ic.
[8] This is the Greek word ανθραξ, meaning charcoal.
[9] This is how Webster divides the word, but I wonder how generally accepted this is.
[10] Often, people do not know the roots of a word. Ascorbic acid (vitamin C) is effective against scurvy (Latin: scorbutus) so the “a” in ascorbic acts as a privative “a” (i.e., a prefix indicating the absence of something).Logically, it should be divided according to a-scor-bic. Nobody does.
[11] I acknowledge this contribution by Dr Ir L. Maat, who is the secretary of the Dutch/Flemish Committee for the Nomenclature of Organic Chemistry.

