Last update: 7 November 2007
Languages are ultimately based on sounds, not letters; romanisation is thus not the art of designing your phonology by choosing an interesting use for every single letter and several combinations of letters. We've all been guilty of this; my first conlang used both <s z> and <t d> for /s z/, while /t d/ had to be represented by <pt bd>!
Diacritics should generally be used consistently and sparingly. They are useful if, for example, you have two parallel series of phonemes for which similar transcriptions are clearly desireable - for example long and short vowels, or plain and palatalised consonants. On the other hand, more or less as Katherine "Deverry" Kerr put it, your readers don't necessarily want to be confronted with words which bristle like porcupines; if you decide to use diacritics, it is better to save them for less common phonemes.
Digraphs, on the other hand, are potentially ambiguous. For example, if you use <s> and <h> for /s/ and /h/ respectively, and decide to use <sh> for /S/, you have to think about how to represent the sequence /sh/. If it doesn't occur at all in your conlang, or the ambiguity doesn't trouble you, there's obviously no problem; otherwise you could separate the letters with an apostrophe, viz. <s'h>.
FInally, extra letters are useful for one-off transcriptions of otherwise difficult phonemes, but there aren't very many of them, and they tend to have well-defined uses which may not fit your conlang. They also tend to stand out somewhat as unusual, and they're not always easy to type.
Ultimately it's down to your personal preference which you use; the discussion below will present examples of all three.
<c> is probably the most troublesome single consonant letter; it is natural for /ts/ and /k/ in Slavic- and Celtic-flavoured romanisations respectively, and represents /tS/ in Indonesian, Malay, and Sanskrit. Less justifiably, it could be used for /S/, as I did in older transcriptions of some of my other conlangs; it might also be a possibility for /c/. Generally speaking, though, it should be treated with caution.
<g> is best used for plain /g/, unless you really have a rule like in English where /g/ often becomes /dZ/ before front vowels. It could also be used for the velar nasal /N/, or for /dZ/ or /Z/, if your conlang has no /g/.
<h> on its own is useful for /h/, or for /x/ if there is no /h/.
<j> most commonly suggests either /j/ or /dZ/; the first is typically Slavic or Germanic, the second English. It could also be used for /Z/, as in French.
<q> could be used for /kw/, or the uvular stop /q/ if you have one. Generally speaking, <c k q> all suggest back voiceless stops in increasing order of backness.
<w> is reasonable for /w/; /v/, to give a German or Polish flavour, is possible at a pinch.
<x> is awkward, and as with <c> should be treated with caution. Normally it suggests /ks/, for which <ks> is however preferable; this is useful if you have various clusters of stop + /s/ and want to represent them by single letters where possible. Among the other possibilities are /S/ (Portuguese and Old Spanish) and /x/ (from IPA via Cyrillic).
<y> as a consonant is best used for /j/ only.
It is no doubt possible to contrive a use for <ß>, the German "eszet", but it has no uppercase form, and there's probably a better transcription anyway.
<h> is best used after stops to indicate aspiration or spirantisation; for example, <th> suggests /t_h/ or /T/. Beware of <ch>, which is perhaps best used for /tS/; it could also suggest any of /S C x/. <sh zh> are obvious choices for /S Z/.
<h> before or after <m n l r w y> generally suggests voicelessness, as with <wh> or <hw> for /W/. Alternatively you could use, for example, <lh> to represent another lateral which contrasts with whatever <l> represents; <nh> for /J/, as in Portuguese; or <rh> for /R/, as Mark Rosenfelder did with his Verdurian.
<j> suggests palatalisation, especially if you already use it on its own for /j/; for example <tj> implies a palatal stop /c/ or a palatalised coronal stop /t_j/. If you already use <j> for /Z/, <tj> could represent /tS/.
<y> could be used with the same meaning as <j>, although it is more likely to be read as a vowel.
<z> might be used to give your orthography a Polish flavour, for example with <sz cz> for /S tS/.
If you really want to use <c> for /S/, <tc> will naturally represent /tS/; and similarly, if <j> is /Z/, then <dj> is /dZ/. I don't recommend either of these unless there's good reason.
<ng>, as in English, is probably the most convenient and least surprising representation of the velar nasal /N/. If you need to distinguish it from /Ng/ or /ng/, you could spell it <ng'>, as in Swahili, or leave it as <ng> and spell the cluster as <n'g> or the rather unwieldy <ngg>.
Double letters normally imply gemination ("long consonants"). If you don't have these, you could use the odd double consonant for an otherwise problematic phoneme; Welsh, for example, uses <dd ll> for /D K/.
Otherwise, the only diacritics which are systematically useful on consonants are the hachek or caron (the little <v> used in many Slavic languages) and the acute accent. Generally, they suggest palatalisation or palato-alveolar consonants; <č š ž> are thus reasonable representations of /tS S Z/, and the corresponding transcription of /dZ/ is then <dž>. <ğ> might do for /dZ/, although this implies that <g> represents /dz/; it is better used for something like /G/.
<ť ď> (lowercase <Ť Ď>), if anything, suggest /t_j d_j/ or /c J\/; they could be used for /T D/ if you don't like <th dh> or <þ ð> and don't otherwise use hacheks. <ň> could be used for /n_j/ or /J/, and <ř> for /r_j/, /R/, or the Czech /r_r/ if you have it.
<ć ĺ ń ŕ ś ź> should have a systematic relationship to <c l n r s z> if possible, of which they suggest palatal or palatalised equivalents. Otherwise you could use, for example, <ś ź> for /S Z/ and <ĺ ŕ> for another lateral and rhotic.
Digraphs are a natural possibility for representing long vowels, as with <oo> or <ou> for /o:/.
| Accent | Examples | Uses |
|---|---|---|
| Acute | áéíóúý | Rising tone; length; more close quality ([i e o u]) |
| Grave | àèìòù | Falling tone; more open quality ([I E O U]) |
| Circumflex | âêîôû | Complex tone; length |
| Tilde | ãĩõũ | Nasality |
| Diaeresis or umlaut | äëïöü | Systematically modified qualities (see below) |
| Macron | āēīōū | Length |
| Breve | ăĕĭŏŭ | Shortness |
| Ogonek | ąęįų | Nasality |
| Double umlaut | őű | Long umlauted vowels, typically /2: y:/ |
The original use of the diaeresis was to indicate the fronting of an original back vowel, as with <ä ö ü> for /{ ø y/. This has often been extended in conlangs and some phonetic transcription systems to indicate generalised reversal of backness, as with <ë ï> for /7 M/, although this is use not found in any natural languages. <ë> is also a common choice for phonemic schwa /@/.
If you have diphthongs, one option is to represent them as digraphs which indicate their component vowels; thus /ai au/ are best transcribed <ai au> or, for something slightly more exotic, <ay aw> or <ae ao>. But note that if you have, for example, <a ä> for /{ a/ and an /{i/ but no /ai/, there's no point in representing the diphthong with <äi>; <ai> is simpler and thus preferable.
Tones are probably best indicated with diacritics, although you may be able to work out a system with digraphs.
A conlang which uses lots of vowels with diacritics, and which uses them well, is Alurhsa.
| Stops and affricates | Fricatives | Nasals | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Place | Voiceless | Voiced | Voiceless | Voiced |   | |||||
| Labial | p | p | b | b | f | f (ph) | v | v (bh w) | m | m |
| Dental | t | t | d | d | T | th þ ŧ (ť) | D | dh ð đ (ď) | n | n |
| Alveolar | ts | ts c (ţ) | dz | dz | s | s | z | z | ||
| Palato-alveolar | tS | ch č | dZ | j dž | S | sh š | Z | zh ž | ||
| Palatal | c | tj ć ť | J\ | dj ď | C | (ch ś) | j\ | (jh ź) | J | nj ny ñ ň ń |
| Velar | k | k (c) | g | g | x | h kh (x ch) | G | gh ğ | N | ng ŋ (ñ) |
| Uvular | q | q (k) | G\ | ğ | X | qh (x xh) | R | rh | N\ | |
It's much harder to provide a similar table for vowels, so I won't try. Instead, here are some possible transcriptions of a vowel system like that of Vulgar Latin; I'm sure you can invent many more.
| Method | i | e | E | a | O | o | u | Notes |
|---|---|---|---|---|---|---|---|---|
| Digraphs 1 | i | e | ea | a | oa | o | u | 1 |
| Digraphs 2 | i | ei | e | a | o | ou | u | 2 |
| Digraphs 3 | i | ei | ea | a | oa | ou | u | 3 |
| Diacritics 1 | i | e | è | a | ò | o | u | 1 |
| Diacritics 2 | i | é | e | a | o | ó | u | 2 |
| Diacritics 3 | i | é | è | a | ò | ó | u | 3 |
| Extra letters | i | e | æ | a | å | o | u | - |
| Roman only 1 | i | y | e | a | o | w | u | 4 |
| Roman only 2 | y | i | e | a | o | u | w | 4 |
Notes: