Last update: 27 March 2006
To use SCA, you'll need to install Python on your system; most Linices should have it already installed. You can get it from here. I leave writing a Perl version up to someone else; I tried it myself once, but it got horribly messy!
SCA works well enough for me, and while I hope it will be found useful, I can't guarantee that it will be suitable for your requirements; if you've any comments or suggestions for improvements, let me know. It works with Python 2.4 but not 1.5, for example.
Meanwhile, a nice chap by the name of "pharazon" has turned SCA into a CGI-based webpage, which I would have done myself if my ISP allowed CGI...
NOTE: The Ruby implementation will not be supported until further notice.
python sc_apply.py -c<sc-file> [arguments] <words>
sc-file is the file containing the sound changes; its format is described below. The file should have the extension ".sc"; this extension can be omitted from the command-line.
words are the words to process, separated by spaces. If no words are supplied, and no -l argument is present, nothing will happen.
[arguments] are optional arguments, as follows:
For example:
python sc_apply.py -cspanish.sc flamma gutta ossu petra cu:pa >>> flamma S: llama >>> gutta S: gota >>> ossu S: hueso >>> petra S: piedra >>> cu:pa S: cubaOf course, it can't get everything right; if you have any improvements, let me know!
If you want to save your output to a file, on Linux and Window$ just add " > name-of-file" to the command-line. I don't know what to do on a Mac.
For example, these two lines define two categories called ufric and vfric:
ufric = pTsx vfric = fDzG
This line defines a category fric which consists of all the sounds in the two categories already defined:
fric = ufric~vfric
And this line defines foo to be fric plus two other sounds:
foo = fric~hH
Note that the meanings of the letters which represent sounds are meaningful only to you and to the sound change rules; the program has no way of knowing that ufric is supposed to be a voiceless fricative, for example.
Note too that if you really want to define one category to be equal to another, you have to add a tilde to the second definition, for example:
avowels = aáâàäã low = avowels~Otherwise Strange Things will happen; in this example, the phonemes represented by a, v, o, w, e, l and s will be treated as low vowels, which probably isn't what was intended.
Include = FILE will immediately read in the contents of the file named FILE, which is handy if for example you have a set of common definitions you don't want to copy into several other files.
dialects = DIAL indicates that each character in DIAL is a separate dialect; the program will output one word for each dialect. For example, dialects = PGSCFOIR might be used by someone who's interested in comparing cognates in the Romance languages.
A rule consists of four space-separated fields:
DIALECTS BEFORE AFTER ENV
which informally mean "for each dialect in DIALECTS, BEFORE changes to AFTER in the environment ENV".
DIALECTS is a string of letters or numbers which specify whch dialects the rule applies to. Not all of the letters have to represent dialects, which may make the file visually easier to follow; for example:
PGSC.... (rule applying to Portuguese, Galician, Spanish, and Catalan) ...CFO.. (ditto Catalan, French, and Occitian)
before and after can contain a sound, a category, or a sequence of sounds or categories; these must not be empty! If you really mean "nothing", you must use '0' (i.e. zero).
env contains the character '_' (underscore), optionally preceded and followed by a sound, a category, or a sequence of sounds or categories. Essentially, a_b means "when preceded by a and followed by b"; _ by itself means "always".
Categories are always enclosed in <angle brackets>; thus, the following rule:
<ufric> <vfric> <vowel>_<vowel>
means "voiceless fricatives become voiced fricatives between vowels" - assuming, of course, that the categories have previously been defined to consist of characters which you use to represent the sounds in question.
If the name of the category is preceded by "^", the category is complemented; in other words <^category> means "anything not in category". So:
<front> <back> _<^soft>
might mean "a front vowel becomes the corresponding back vowel when not before a soft consonant".
| Character | Where valid | Meaning |
| # | ENV | beginning or end of a word |
| % | ENV | BEFORE |
| 0 (zero) | BEFORE, AFTER | null phoneme, empty, blank |
| < | BEFORE, AFTER | part of ENV preceding the underscore |
| > | BEFORE, AFTER | part of ENV following the underscore |
Within BEFORE, AFTER and ENV, categories are enclosed in <angle brackets>. If AFTER consists solely of a category, so must BEFORE, and both categories must contain the same number of sounds; a sound in BEFORE which occurs in the appropriate environment is then replaced by the corresponding sound in AFTER.
Because the strings are converted internally to regular expressions, regular expression metacharacters may also appear in any of BEFORE, AFTER and ENV. . (a dot) may be used to represent "any character"; the most useful of the others are |, +, * and ?. For example:
h 0 _<vobs>|<fric> u o _<cons>*a u o _<cons>+a ja E #h?_
mean respectively:
Note that a+ is equivalent to aa*, and that ENV in the first example rule could just as easily be _<foo> where _<foo> is earlier defined to be <fric>~<vobs>.
* <vowel><liquid> #2#1 _<cons>|#
Non-positive numbers, if you need them, count from the end: #-1 refers to the last, #-2 to the second-last, and so on; the #2#1 above could thus also be one of #-1#1 #-1#-2 #2#-1, for whatever that's worth.
Another example comes courtesy of the spelling rules of Liotan and Breathanach, which require purely orthographic "glide" vowels to be inserted in certain situations. Here I is a pseudo-phoneme which indicates that a preceding consonant is slender, and evow and ivow are all varieties (long and short) of /e/ and /i/ respectively:
* I<evow> #2i _<slender> * I<evow> #2a _ * I<ivow> #2 _<slender> * I<ivow> #2o _
You can see how the rules make it easy to throw the I away when the relevant rule is applied. Without the indexing, I needed eleven separate rules to do this.
* <lvow><lvow> #1`short`long _
Here the strange-looking AFTER rule means "the character in long which occupies the same position in short as the forst phoneme in BEFORE". Note that there are no angle brackets around the sound category names in AFTER if you are using a backquote rule.
Here's another example. The following rules, which assimilate a nasal to a following voiced stop consonant:
nasal = mnN * <nasal> m _<labial> * <nasal> n _<dental> * <nasal> N _<velar>
can be represented more succinctly by this:
vstop = bdg * <nasal> >`vstop`nasal _<vstop>
With a little bit of thought you should be able to work out that this rule will convert nb to mb.
A shorthand is available for certain types of backquoting rule. If you have rules like:
* <A> >`B`A _<B> * <A> <`B`A <B>_
you can replace them with, respectively:
* <A> `> _<B> * <A> `< <B>_
So the second example could also be:
* <nasal> `> _<vstop>
which is conveniently read as "a nasal assimilates to a following voiced stop" - assuming, of course, that the categories nasal and vstop represent the same places of articulation in the same order. Similarly, assimilation in the other direction (/nb/ to /nd/) could be represented by:
* <vstop> `< <nasal>_
| Type of change | What to do | Rule | BEFORE | AFTER | ENV |
| Loss | Use 0 (zero) for AFTER | /t/ disappears word-finally | t | 0 | _# |
| Epenthesis | Use 0 (zero) for BEFORE | /d/ is inserted between /n/ and /r/ | 0 | t | n_r |
| Assimilation | Use < or > for AFTER | /h/ asssimilates to a preceding fricative | h | < | <fric>_ |
| Simplification | Use % in ENV | Two identical fricatives are reduced to one | <fric> | 0 | _% or %_ |
| Metathesis | Use #2#1 for AFTER | <nasal> before <liquid> moves to after it | <nasal><liquid> | #2#1 | _ |
By the metathesis example rule, assuming the categories are properly defined, canra will become carna.
* <cat1>X <cat1> _
to mean "X disappears after something of category cat1", you'll get the error "Error: Translation strings different lengths". The correct way is this:
* X 0 <cat1>_
* 0 ; <vowel>_<nasal><obs> * 0 ; <vowel>_<nasal><liquid> * 0 ; <vowel>_<nasal># * a|u A _; * e|i E _; * á|â|ú|û U _; * é|ê|í|î I _; * ; 0 _
The first three rules set up the nasalisation environment, the next four nasalise the vowels, and the last rule removes the environment since it's no longer useful. Without this there would need to be twelve rules, since for technical reasons it's not possible to have a rule like the following:
* <vowel> <nasvow> _<nasal>(<obs>|<liquid>|#)
Although you can of course do this:
* <vowel> <nasvow> _<nasal>#
vstop = bdg vfric = vðG vowel = aeiou * <vstop> <vfric> <vowel>_<vowel>
and you run "abadaga" through it. You'll get "avadaGa" back, not "avaðaGa" as you might expect. So you try this:
* 0 ; <vowel><vstop>_<vowel> * <vstop> <vfric> _; * ; 0 _
with the same result. Before you take my name in vain, don't panic: this is an instance of the banana problem (search for it in the Jargon File). Essentially, are there one or two instances of the string ana in the word banana? What is happening in this example is that once the trailing <vowel> matches the "a" following the "b", this "a" is then considered to have been processed, and next time round the <vowel> section matches the "a" after the "d" instead.
In summary, if obvious-looking conversions are being missed, and there's some overlap between the two parts of ENV, the solution is simple: just include your rule twice in succession:
* <vstop> <vfric> <vowel>_<vowel> * <vstop> <vfric> <vowel>_<vowel>
and all should now be OK.