Sogdian is one of the Eastern Middle Iranian languages once spoken in Sogdiana (northern Uzbekistan and Tajikistan) before the Islamization of the area in the 10th century. Sogdians were traders along the Silk Roads and founded many diasporas along the routes, with the result that the bulk of its materials was discovered in Turfan and Dunhuang in western China. The Sogdian language was written in three scripts: Sogdian, Manichean, and Syriac. While only religious texts were written in Manichean and Syriac scripts, any kind of texts, both religious and secular, are recorded in Sogdian script, which was a kind of a national script, although it ultimately originated from Aramaic script.

The history of Sogdiana is largely obscure. It constituted a satrapy of the Achaemenian Empire, which was conquered by Alexander the Great in the fourth century BCE. Later it was governed by or under the influence of neighboring empires, such as the Kushan (1st to 3rd centuries CE), Sasanian (3rd to 4th centuries), Kidarite (5th century), Hephthalite (6th century), Western Turk (6th to 7th centuries), and Tang China (7th to 8th centuries). However, until Sogdiana was conquered by the Arabs in the 8th century it enjoyed a degree of independence, during which the Sogdians played an active role as international traders along the Silk Roads between China and the West. So much so that Sogdian became a kind of lingua franca of the region and the nomadic peoples like Turks and Uighurs adopted it as their administrative language (see SOGDIAN TRADE).

Since the language of Samarqand and Bukhara of the 10th century reported by Moqaddasi is nothing but a local variety of New Persian (Yoshida, 2009a, p. 330), Sogdian had began to lose its ground by that time. However, Moqaddasi’s statement that Soḡd (the area between Samarqand and Bukhara) had its own language which was similar to that spoken in the suburb of Bukhara seems to indicate that at that time Persian was still restricted to the urban area (Moqaddasi, tr., p. 335). Later it gradually gave way to Persian and then to Turkish. A language called Yaghnobi still spoken in the valley of Yaḡnāb, a tributary of the Zarafšān River, is the only descendant of a dialect of the Sogdian language and is sometimes referred to as “Modern Sogdian,” whereas the language of texts known to us is believed to represent the standard Sogdian spoken in Samarqand.

Studies of the language were began by F. W. K. Müller (1904, pp. 96-103; Figure 1), who cited two manuscripts discovered by the German Turfan Expedition (see Turfan Expeditions). At that time he referred to the language as “ein bisher nicht bekannter Pehlevi-Dialekt,” although he mentions “soghdisch” later in “Nachträge und Verbesserungen” (idem, p. 111) without giving any reason. Later, in 1908 it became clear that it was F. C. Andreas who first identified the language; he compared some Sogdian calendar terms recorded by Biruni with those found in the Turfan texts (Andreas apud Müller, 1908, p. 3, n. 3). The first Sogdian grammar was written by Robert Gauthiot (1914-23) and Émile Benveniste (1929), who mainly referred to the Buddhist and Christian texts published by that time. This was greatly improved by W. B. Henning (1937), who based his analysis on phonologically more reliable materials written in Manichean script. The most extensive description of the grammar remains that of I. Gershevitch (1954; to be consulted with Gershevitch, 1945), which covers all aspects of the grammar, including the syntax. Since then major improvements have been made by N. Sims-Williams, who has published a series of important articles mainly on the phonology and morphology. His discoveries were incorporated into his compact description published in 1989. Sims-Williams’s later discoveries have been included in Y. Yoshida’s (2009a) description. Apart from useful glossaries accompanying editions of substantial texts, three lexicons have been made public: B. Gharib (1995), Sims-Williams and Durkin-Meisterernst (2012), and Sims-Williams (2016a). Personal names found in Sogdian texts are collected in Lurje (2010).

Materials and publications. Most of the Sogdian manuscripts have been discovered in what is now western part of China, in particular from the oasis of Turfan and the so-called “library cave” of the Caves of the Thousand Buddhas in Dunhuang (see DUNHUANG ii. Buddhist and Other Texts in Iranian Languages). An important exception is the site at Mount Mug in Sogdiana proper (120 km east of Samarqand) where 70-odd documents of the early 8th century were unearthed.

Apart from the so-called Ancient Letters of the early 4th century, inscriptions discovered in Kultobe in Kazakhstan (see Sims-Williams et al., 2007; 1st-2nd century CE?) and in the Upper Indus valleys (see Sims-Williams, 1989/1992 and 2000, Yoshida, 2013; 5th century?), the bulk of Sogdian materials dates back to the 7th to early 11th centuries. The latest datable material is the inscription of 1026 CE discovered in Kyrgyzstan (Livshits, 2008, p. 379, idem, 2015, p. 290). While the Mug documents represent part of the archives of a local ruler Dhēwaštīč (ruled until 722), almost all the manuscripts from Turfan and Dunhuang are religious texts translated from the originals in Chinese (Buddhist texts), Middle Persian and Parthian (Manichean), and Syriac (Christian texts; see CHRISTIANITY iv. Christian Literature in Middle Iranian Languages).

While virtually all the texts discovered in Dunhuang have been published, a substantial number of the Turfan texts remain unpublished. The studies of the Buddhist, Manichean, and Christian texts have been surveyed, respectively, by Yoshida (2009b), Sundermann (2009), and Sims-Williams (2009). Useful catalogues and lists of the Turfan texts have also been made public (Boyce, 1960; Kudara et al., 1997; Reck, 2006; Morano, 2007; Sims-Williams, 2012; Reck, 2016). Recent substantial editions include Durkin-Meisterernst and Morano (2010), Sundermann (2012), and Sims-Williams (2014, 2015). On the Mug documents, which include letters, administrative, economic, and legal documents, see the fundamental publications by Livshits (2008) and its English version published in 2015 and Bogoljubov and Smirnova (1963). Some texts were revised by Grenet and la Vaissière (2002) and Yakubovitch (2002, 2006). Upper Indus inscriptions were read by Sims-Williams (1989/1992). Since Hans Reichelt (1931) published the Ancient Letters, some of them have been revised by Sims-Williams (2001a, Letter II) and Sims-Williams et al. (1998, Letter V). See also Sims-Williams (2005). For several secular documents and inscriptions of various periods see Sims-Williams (1993, Ladakh), Yoshida and Moriyasu (1988, Astana, Turfan), Sims-Williams and Bi Bo (2011and 2015, Khotan), Yoshida (2005, Xi’an), Livshits (2008/2015, Semirech’e, etc.). For the three inscriptions discovered in Mongolia, see Kljashtorny and Livshits (1971, Sevrey; 1972, Bugut), Yoshida apud Moriyasu and Ochir (1999, pp. 122-25, Bugut; pp. 225-27, Sevrey), and Yoshida, 1988 and 2011 (Karabalgasun); see also KARABALGASUN ii. The Inscription.

Scripts, orthography, and basic phonology. Sogdian texts are written in three scripts: Sogdian, Manichean, and Syriac. While Manichean script is restricted to Manichean texts and Syriac to Christian, Sogdian script is a kind of national script employed for all kinds of texts, regardless of the religious affiliation of the writer. On the scripts, see Skjærvø (1996) and Yoshida (2009a, pp. 281-84). In this article, spellings in the three scripts are indicated by S[ogdian], M[anichean], and C[hristian], when distinction is necessary. Apart from them a handful of fragments written in Brahmi script are known (cf. Maue and Sims-Williams, 1991; Sims-Williams, 1996c). Sogdian script is an adaptation of the Achaemenian chancellery script deriving from Aramaic, from which resulted the phenomenon called “heterography” where several Aramaic words appeared in their original spellings but were pronounced with their Sogdian equivalents (see ARAMAIC). However, heterograms or ideograms in Sogdian are far fewer than in Middle Persian or Parthian. In the printed texts these ideograms are transliterated with capital letters, e.g., CWRH (cf. Aram. ṣwr-h “his neck, throat”) was pronounced as γrīw “self, body.” However, employment of the ideograms is not consistent at all, in that the corresponding Sogdian forms are often spelled out in different places of the same text, e.g., CWRH ~ γrywh, KZNH = mʾδ “thus,” ZK = ʾxw (article).

Because of its long history, texts written in Sogdian script contain many historical spellings, from which those in Manichean and Syriac scripts are virtually free, e.g., ʾxšʾyʾʾδ “king” < OIr. *xšāyaθya; cf. M xšyδ. In the texts later than the 7th or 8th century, two styles of Sogdian script are discerned, formal and cursive (Sims-Williams, 1976, pp. 44-45). The former used to be referred to as “sūtra script,” since it was encountered mainly but not exclusively in Buddhist texts or sūtras. The cursive script was later applied to write down the Uighur language and became Uighur script, which developed further into Mongolian and Manchu scripts (Sims-Williams, 1981a, 2016b; Kara, 1996; Moriyasu, 1997). It is of some interest that the Sogdian alphabet differs from the Uighur counterpart in that the former still includes d (daleth), (teth), ʿ (ʿain), and q (koph), which are not employed for transcribing Sogdian sounds, while the latter disposes of them totally. However, the forms of these four letters in the Sogdian alphabet are completely different from what one expects as the natural developments from the original Aramaic letters (Livshits, 1970, 2008, pp. 298-305, 2015, pp. 227-32; Yoshida, 1995). Possibly under Chinese influence, Sogdian script began to be written vertically rather than horizontally in the latter half of the 5th century (Yoshida, 2013). In fact, all the epitaphs later than Kultobe and Upper Indus inscriptions are inscribed vertically.

Sogdian is a dead language, and the phonemic tables given below are largely based on comparative phonology and internal reconstruction. The phonetic values of certain sounds are inevitably speculative (see Table 1 and Table 2). In discussion of phonology, Sogdian forms are cited in orthographic as well as in vocalized forms; in succeeding sections transliterated forms are mainly given. For understanding of the Sogdian phonology and morphology, the so-called “rhythmic law” is of vital importance (Sims-Williams, 1981a, 1984, 1989, pp. 181-82). According to this law, the retention or loss of proto-Sogdian final vowels depends on whether they bore a stress; in its turn, the position of the stress in words of more than one syllable is determined by the quantity of those syllables: all stems in Sogdian, whether nominal or verbal, containing a long vowel are heavy and bear a stress, while a stress falls on the endings of light stems that consist only of short vowel(s). In what follows, light stems are written with a final hyphen (e.g., wn- [wan-] “to do”) to distinguish them from heavy stems (e.g., wyn [wēn] “to see”).

All three writing systems, which ultimately originate from Aramaic, are consonantal and have many features in common; they are hardly adequate for representing the vowels of Sogdian. The vowels are indicated or rather hinted at by means of three mater lectiones, i.e., ʾ, w, and y. Thus ʾ (initially ʾʾ as well) represents [ā], w [w, u/ū, o/ō] and y [y, i/ī, e/ē], although the short vowel [a] is not spelled out except for the initial and final positions (indicated by ʾ- or, respectively). A somewhat better situation is acquired by Syriac script, where the employment, though not consistent, of special diacritics enables the distinction between [i/ī] and [e/ē], [u/ū] and [o/ō]; sometimes word medial [a] is also indicated by two diacritic points. Unfortunately, the texts written in Brāhmī script are not many and are not of much help in this respect (Maue and Sims-Williams, 1991; Sims-Williams, 1996c).

Some pairs are cited to illustrate the vowel phonemes: βaγ- (βγ-) “god,” βāγ (C bʾγ) “garden, farm”; fneš- (C fnyš-) “to be deceived,” fnēš (C fnyš) “to deceive”; witar- (wytr-) “to go,” wītar (wytr) “he went” (3rd sg. imperfect); but- (pwt-) “Buddha,” pūtē (pwtʾk) “rotten”; roxšn- (rwxšn-) “light, bright,” rōγn (rwγn) “oil.” In Table 1, the sounds in parentheses are variants of the phoneme /a/. In addition to the simple vowels shown above, Sogdian possessed three rhotacized vowels: ər, ir, and ur and a nasal vowel . The rhythmic law shows that a syllable containing a simple vowel followed by a rhotacized or nasal vowel is treated as heavy, e.g., maərγ (mrγ) “forest,” kaṃθ (knδh) “city” in contrast to light stems such as rγ- (mrγ-) “bird,” kirm- (kyrm-) “snake,” purn- (pwrn-) “full.” There are some uncertainties surrounding the vowel written as wy. Most of the forms in question originate from *u/wa/wā followed by *y or *aya, e.g., wyžp- “terror” < *ubǰyā-, xwyr “sun” < *xwarya-, and xwyr “to feed” < *xwāraya-. The fact that in the Uighur orthography front rounded vowels ü and ö are represented by wy suggests that Sogdian also possesses the front rounded vowels, while the spellings in Brāhmī script does not seem to support this assumption (Sims-Williams, 1981a; idem 1996b). In any case, most forms lose the -y- element in later texts, e.g., C wžp- “terror” and C xwr “sun.”

In Table 2, the sounds in parentheses are allophones (voiced stops and [ŋ]) and marginal phonemes (ts, l, and h) mainly employed in foreign words. Complex consonants clusters do not seem to be avoided, as in roxšn- (rwxšn-) “light (adj.),” xēpθt (C xypθt) “one’s own (pl.),” γurδtk- (C γwrdtq-) “kidney.” Typologically unusual combinations of voiced fricative and voiceless plosive or affricate are also not uncommon, e.g., əzpaərt (ʾzprt) “pure,” āδč (ʾʾδc) “something,” paδk- (pδk-) “law.” The second consonant is likely to be realized as a weak, voiceless plosive (Sims-Williams 1989a, p. 179).

Correspondences between the twenty-two letters of the Aramaic alphabet and Sogdian phonemes are as follows: ʾ (aleph): a, ā, ə; b (beth): β, f; g (gimel): γ; d (daleth): only in ideograms; h/H (he; only used in word-final position): -a, zero (cf. Sims-Williams 1981a, p. 350, e.g., wana (S wnh) “tree,” rāθ (S rʾδh) “road”; feminine nouns usually end with -h; Wendland (1998) transcribes -H and takes it for an ideogram indicating feminine forms); w (vau): w, u/ū, o/ō; z (zain): z, ž; x (cheth): x, (h); (teth): not attested; y (jod): y, i/ī, e/ē; k (kaph): k, (g); δ/L (lamed): δ, θ, (l); m (mem): m; n (nun): n, ṃ; s (samech): s; ʿ (ain): only in ideograms; p (pe): p, (b), f; c (tzaddi): č, (ǰ); q (koph): not attested; r (resh): r, Vr, (l); š (schin): š; t (tau): t, (d). Note that the letter lamed is transliterated as δ when it represents Sogdian sounds [δ] and [θ]. On the historical background in which the letter lamed came to stand for [δ], see Sims-Williams, 2016b. When one letter stands for more than one sound or two letters coalesce to assume one and the same form, diacritics are not uncommon among the texts later than the 8th century, e.g., frāγāzt (S frʾγʾzt) “he begins” with β provided with an extra stroke, žamnu (ẓmnw) “time, hour” with a point below z, which is otherwise indistinguishable from n and z (Sims-Williams, 1978, pp. 257-58). Some of the diacritics are shared by the Uighur script and could have originated from the Uighur usage.

In the texts written in the Manichean script, new signs were introduced to transcribe β, δ, γ, and the letters b, d, g represent the voiced allophones of p, t, k, while the letters q and are interchangeable with k and t, respectively. ʿ (ain) stands for the prothetic vowel [ɨ], an allophone of /a/, preceding a cluster beginning with s, e.g., ɨspurn- (M ʿspwrn-) “full.” Otherwise it combines with the following y, namely ʿy, to indicate front vowels, e.g., ēw (M ʿyw) “one.” Doubling of mater lectiones and (teth) like wʾʾxšṭṭ “words” is not uncommon but is of no phonetic or phonemic relevance (Gershevitch, 1954, §76). In addition to the original 22 letters of the Syriac alphabet, the new characters f, x, and ž were invented. Though b (beth) and d (daleth) transcribe /β/ and /δ/, respectively, in most manuscripts a phoneme /γ/ is represented not by g (gimel) but by ʿ (ain), which is transliterated as γ. It is only in the materials written in the Syriac script that /δ/ (= daleth) and /θ/ (= tau) are distinguished, e.g., θβar- “to give” (S/M δβr-, C θβr-) ~ δūr “far” (S/M δwr, C dwr).

Prosodies are only inferred from the rhythmic law: a stress accent falls on the ending of a light stem or on the stem of a heavy stem: βaγ-i “god (nom. sg.)” ~ βāγ-ī “garden (obl. sg.).” Light stems lose the ending when it become a clitic, e.g., βaγ-a “o lord”~ rti-βaγ “and o lord.” Several function words including numerals have both independent and clitic forms: kδ ~ kδʾ “if, when,” δs ~ δsʾ “ten,” wyʾ ~ ʾwy “(article, loc. sg.),” (ʾ)sty ~ (ʾ)st “is.”

Historical phonology. The typologically marked system with voice opposition found only with fricatives is due to the East Iranian sound change in which the voiced plosives and affricate *b, *d, *g, and have become respective fricatives β, δ, γ, and ž even in initial position, and also due to the Sogdian conservatism which preserves the voiceless plosives and affricate *p, *t, *k, and even after a vowel: βr- “to bring” < *bara-, δʾr “to hold, have” < *dāra-, γr- “mountain” < *gari-, C žʾr “poison” < *ǰanθra-; ʾʾp “water” < *āp-, wʾt “wind” < *wāta-, ʾwtʾk “country” < *awa-tāka-, wʾc “to send, release” < *wāča-. Sogdian conservatism is also seen in its preservation of Old Iranian , e.g., xēpθ (C xypθ) “(one’s) own” < *xwaipaθya-. The voiced plosives occur only after nasalized vowels, in which position the Old Iranian voiced and voiceless series have fallen together, as in saṃg (S snk, cf. M sngcyk “stony”) “stone” < *asanga- vs. zaṃg (S znk, M zng) “kind, sort” < *zanaka-. Sogdian shared with other East Iranian languages the voicing of the fricatives found in the OIr. clusters *-ft- and *-xt- to -βt- and -γt-, e.g., (ə)βta (C (ʾ)btʾ) “seven’ < *hafta, ōsuγtē (M ʾwswγtyy) “pure” < *awa-suxtaka-. Another East Iranian feature is the shortening of some long vowels followed by *y and *w, e.g., sayāk (M syʾk) “shadow” < *sāyāka-, žu- (M jw-) “to live” < *ǰīwa-.

While consonants are relatively conservative, vowels underwent extreme innovations due to the shift of the stress accent and subsequent loss of unaccented short vowels. The accented syllables *-aC/āC were palatalized by the following *-y or *-aya-, which were later lost, as in nišēδ (M nšyδ) “to seat, place” < *nišādaya-, and neš- (M nyš-) “to be spoiled” < *nasya-, frēž (C fryž) “to make straight” < *fra-rāzaya-. Note also the simultaneous palatalization of s and z resulting in š and ž.

Among the combinatory changes of Old Iranian consonant clusters the followings are to be mentioned: sp < OIr. *tsw < IE *ḱṷ, e.g., ʾsp- “horse”; < OIr. *dzw < IE *ǵh, e.g., ʾzβʾk “tongue”; δβ < OIr. *dw, e.g., δβr- “door” < *dwara-; ž < OIr. *dr, e.g., C žwk “healthy” < *druwaka-; š < *čy, sr, θr, e.g., šw- “go” < *čyava-, C šwn “hips” < *srauni-, C žwšy “offering” < *zauθraka- (with assimilation of z- to ž). Sometimes a metathesis precedes the merger of δr and θr, e.g., ʾrδʾyšp “banner” < *drafša-, S cyrδpʾδw “quadruped” < *čaθru-pāda-. The dialect variation of tiray ~ siray “three” (cf. S ʾδry, C šy < *θrayah) found in Yaghnobi or Modern Sogdian suggests a relatively late development of *θr to š. The development of OIr. *θw has been extensively discussed by Sims-Williams (2004).

No centralized government having been established in Sogdiana, orthographic rules were not established in Sogdian, and one encounters a considerable number of spelling variations, some of which are due to the historical spelling, yet others to the variations in pronunciation. Most conspicuous is the metathesis of [w, u], which is so common that almost every word containing the sounds has alternative forms (e.g., δwγth ~ δγwth [δuγta ~ δγuta] “daughter”), and this phenomenon is also observed among loanwords (e.g., smwtr- ~ swmtr- < Skt. samudra “world ocean”). When č comes into contact with t, it often becomes š, e.g., sʾct ~ sʾšt “it is necessary.” Some forms found in Christian texts in Syriac script seem to reflect vulgar and chronologically later pronunciations. The loss of nasal vowels before continuants (e.g., C [kaθ] < S knδh [kaṃθ] “city”) and that of ər in the coda position (e.g., C zyn [zen] “gold” < S zyrn [zeərn]) are peculiar to Christian texts. See also the reduction of the durative particle skun to (s)kən (C sqn ~ qn after -t), and (s)k (C sq ~ q after -t), and that of the future particle kām to (C ). Syncope of medial vowels and resultant changes of consonants are also common, e.g., C wγdʾrt < M wγtwδʾrt “he said” (Sims-Williams, 1985a, p. 97 with n. 21). Total loss of a prothetic vowel preceding consonant clusters are also peculiar to Christian texts, e.g., C psʾq ~ S ʾpsʾk “garland.” On the spelling variations see also Provasi, 2003.

Morphology. Nouns. In the nominal declension the three numbers and three genders (masculine, feminine, and neuter) of Old Iranian are preserved, although the survival of the neuter is marginal, and many old neuter nouns have shifted to masculine or feminine. Old dual forms have come to be used in the position immediately following a numeral, not only two but also higher numbers. Therefore, this special form is called “numerative” (Sims-Williams, 1979). Nouns are classified into several declensions. Apart from the distinction between light and heavy stems, a few light stems ending with -u (-w) inflect differently from ordinary light stems. A considerable number of stems go back to forms extended by the suffixes *-aka (m.) and *-ākā (f.) and are conventionally referred to as aka-stem and ākā-stem. The two stems are often written with historical spellings in Sogdian script, the former with -(ʾ)k, etc. and the latter -ʾkh, e.g., βaṃtē “slave” (S βntʾk, M βndy), xānā “house” (S xʾnʾkh, M xʾnʾ). Finally there are a small number of indeclinable nouns ending with -ī, e.g., martī (mrty) “man.”

Light stems distinguish six cases: nominative, accusative, genitive-dative, instrumental-ablative, locative, and vocative. Instrumental-ablative forms are never used independently and are always accompanied by the preposition S cnn/cʾwn (C cn) “from” or S δnn/δʾwn (C dn) “with.” Although the old dative and instrumental merged with the genitive and ablative, respectively, the system is largely in accordance with the pattern of Old Iranian. Most of the plural forms are characterized by the ending –t, and these plural stems are treated as feminine singular. Case endings are: masculine nom. -i, acc. -u, gen.-dat. -e, inst.-abl. -a, loc. -ya, voc. -a, numerative, nom./acc. -a; feminine nom./acc. -a, gen.-dat./inst.-abl./loc. -ya, voc. -e. Examples: m. nom. βaγ-i (βγy) “god,” acc. βaγ-u (βγw), gen.-dat. βaγ-e (βγy), inst.-abl. βaγ-a (βγʾ), loc. δβar-ya (δβryʾ) “door,” num. βaγ-a (βγʾ), voc. βaγ-a (βγʾ); f. nom./acc. wan-a (wnʾ) “tree,” gen.-dat./inst.-abl./loc. wan-ya (wnyʾ), voc. δuγte (δwγty, cf. Sims-Williams, 2013); plural nom./acc. δβar-ta (M δβrṭʾ), gen.-dat. /inst.-abl./loc. δβar-tya (M δβrṭyʾ), etc. “doors.”

The heavy stem declension does not differentiate masculine from feminine and distinguishes only direct and oblique cases. The oblique case is provided with the ending(-y): C myθ “day” (sg. dir.), myθ-y (sg. obl.); myθt (pl. dir.), myθ-ty (pl. obl.). The light stem feminine vocative ending -e is borrowed by the heavy stem and the pl. stem ending with -t, e.g., S xwtynyh “o queen,” C xwtʾwty “o lords” (Sims-Williams, 2013).

In the aka- and ākā-stems the original intervocalic -k- was lost, and resulting hiatus was later contracted (Sims-Williams, 1990). Thus, the m. nom. sg. ending *-aki first became -*aʾi, and then -ē, while the acc. ending originates from *-aku via *-aʾu. In one Christian Sogdian manuscript, C2 (now E27), the original situation after the contraction is well preserved, while in all the other texts, the ending is generalized from nom., gen.-dat., and loc. sg. to the other cases. The plural ending -t is suffixed to the nominative form, i.e., -ē-t, and inflects as a heavy stem: mrtxmy “man” (sg. dir.), mrtxmy-t (pl. dir.), mrtxmy-t-y (pl. obl.). Similarly, in the case of the ākā-stem, the effect of vowel contraction leads to a pattern similar to the heavy stem declension: C šmʾrʾ “thought” (sg. dir.), šmʾry (sg. obl.); šmʾry-t (pl. dir.), šmʾry-t-y (pl. obl.). Tendencies to substitute the nominative for other cases and for the heavy stem to borrow light stem endings are observable. Some animate nouns, mostly light stems, constitute plural forms by means of the ending -īšt(): βγ-yšt (dir.), βγ-yšt-y (obl.) “gods.” On the origin of the ending, see Sims-Williams (1979). The plural forms of βrʾt “brother” and δwγt- “daughter” are peculiar in ending with -rt, e.g., βrʾt-rt. and δwγt-rt. Archaic pl. gen. ending -ān (< OIr. *-ānām) is sometimes encountered in such stereotyped phrases as βγʾn βxtm “most divine of gods (= Skt. devātideva).” In the Christian manuscript C5 (now E5), the nominal inflection shows a strong tendency to use the nominative form in all the case functions, and a new oblique case begins to be formed by adding to the nom. sg. and similar generalized forms, e.g., nom. m. rmy “people” ~ obl. rmy-y, nom. f. wnʾ “tree” ~ obl. wnʾ-y (Sims-Williams, 1982). In this stage, the so-called “differential object marking” with this oblique ending is also observed: the new oblique case is employed for marking the direct object which is both definite and human (cf. Yoshida, 2009a, p. 307), e.g., fšmdʾrt qw wyšnt sʾ xypθ zʾty-y “He sent (fšmdʾrt) his son (xypθ zʾty-y) to them (qw wyšnt sʾ).”

The inflections of Sogdian nouns and adjectives are almost identical, and what is described about nouns applies also to adjectives, which, however, lack numerative forms. The productive suffix of the comparative is -(y)str, e.g., γwʾncyk-str “more sinful.” Somewhat obsolete is the suffix –tr, which is not attached to stems derived with suffixes or compounded forms: δwr-tr “farther,” but M mndγrβʾk-str “more stupid.” The adjective and adverb possess a special formation called “elative” with the meaning “so much ~, very ~.” It is formed by means of the prefixes - “how much” - (wt- before s-) “so much,” and suffixes -t, -(ʾ)st, in various combinations: wʾ-zʾry “so miserable,” cʾ-zʾry-ʾst “id.,” etc. The old superlative ending -tm becomes obsolete and survives only in such fixed expressions as βγʾn βxtm and forms extended by another suffix, cf. ʾskʾtmcyk “highest, most.”

Word-formation of nominal stems. Productive ways of forming new words are derivation by affixes and compounding. Derivational prefixes are not so common as suffixes. Many suffixes are aka-stems, of which the feminine counterpart ends not with -ʾkʾh but -c. The most productive suffixes are the following: -yny (f. -ync) “(made out) of ~,” e.g., δʾrwkync < δʾrwk “wood”; -cyk, e.g., γrcyk “of mountain” < γr-; -mync, e.g., ʾyncmync “female” < ʾync “woman.” Nouns provided with the suffix -ʾnc denote the female counterparts of titles and occupations, e.g., nγwšʾkʾnch “auditorice” < nγwšʾk. For its origin, see Avestan f. ahurānī- < m. ahura- “lord.” Abstract nouns are derived with such suffixes as -wny (-ōnī), -yʾk (-yʾ with heavy stems), and -ʾwy (-āwē), e.g., tʾywny “theft” < tʾy “thief,” M rwxšnyʾk “lightness, light,” S ptptʾynʾwʾk “isolation” < ptptyn (Sims-Williams, 1981b). The suffix -ʾmnty (-āmaṃtē), which derives verbal nouns from the present stem, is very common in Manichean and Christian texts, but rare in Buddhist texts, e.g., C ʾysʾmnty < ʾys “to come.” Productive prefixes are ʾʾw- “co-,” and privatives nʾ-, (ʾ)pw-, and mnt-, e.g., ʾʾwnʾm “namesake” < nʾm “name,” M nʾ-pδkcyq “unlawful,” M mndγrβʾk “foolish,” ʾpw-ptšmʾr “innumerable.” The most common compounds are bahuvrīhis and agent nouns/adjectives consisting of the present stem as the second member; both types end with the aka-suffix, e.g., M xii-ryṭy (δwāts-rītē, cf. ryt “face”) “having twelve faces” and S ypʾk-βrʾk “furious” (ypʾk “anger’ + βr- “bear”). On Sogdian compounds in general, see Gershevitch (1945).

Numerals. Cardinal numbers except for “2” are indeclinable. Oblique forms ending with -nw are occasionally encountered: 1 ʾyw [ēw] (M ʿyw, C yw [yō]), 2 m. (ʾ)δw(ʾ), f./n. S ʾδwy, C dwy, obl. δyβnw, 3 M ʾδryy (C šy), 4 M ctfʾr, 5 pnc, 6 wxwšw ~ wxwšwnw (C xwšw), 7 S ʾβt(ʾ) ~ S ʾβtnw, 8 S ʾšt(ʾ), 9 S nw(ʾ), 10 S δs(ʾ), 20 C wyst, 30 šys, 100 C stw, 200 C dwyst, 1000 zʾr, S 1-LPw-nw ([zārnu]), 10,000 βrywr. The light stem forms without like ʾβt are originally proclitic forms. In the cardinal numbers “11” to “19,” one digit figure is followed by δs “10,” which itself is reduced to [ts]; cf. B pncδs, M pncṭsyḥ (obl.), C pncc “15.” Except for 1st (M ʾftm-, etc.), 2nd (δβty-, etc.), and 3rd (ʾšty-, etc.), ordinals are formed with the suffix -m (LS -my) or -myk, e.g., S ʾβtmy ~ ʾβtmyk “7th.”

Pronouns. Personal pronouns and their enclitic forms are as follows: 1st sg. dir. (ʾ)zw, obl. mnʾ, encl. -my; 2nd sg. dir. tγw, obl. twʾ, encl. -fy (in some texts acc. -f, gen.-dat. -t(y)); 1st pl. mʾx, encl. -mn; 2nd pl. šmʾx, encl. -fn; 3rd sg. encl. -šy/šw, pl. -šn. In some texts M/C tʾmʾ (S tʾmʾkh) and M/C tʾfʾ (S tʾβʾkh) function as direct object forms of the 1st and 2nd person sg. pronouns. In the latest stage of the language attested in the Christian text C5 (now E5), the secondary oblique ending -y is also attached to the personal pronouns: mnʾyy, šmʾxy, mʾxy, wnyy (Sims-Williams, 1989a, p. 186). Enclitic forms are attached to the first element of the sentence irrespective of their function. On the independent forms of the 3rd person pronoun, see below.

Sogdian demonstratives are characterized by the triple system of deixis depending on the three persons, each showing two bases: y-/m- “this (with me),” š-/t- “that (with you),” and x-/w- “that (with him, etc.)” (Sims-Williams, 1994). The three are also seen in such adverbs as mδy “here (by me),” tδy “there (by you),” and wδy “there (by him, etc.).”  y-/m- and x-/w- bases have both anaphoric and deictic uses, but š-/t-, of which the instances are not many, seems to be employed only as deictic. Among the demonstratives, simple and extended forms are distinguished; the former function as the article and 3rd person pronouns, while the latter are extended from the former and serve as strong demonstratives. The masculine (and feminine) singular forms of x-/w- deixis are as follows: nom. xw (f. ); acc. ʾw (f. ); gen.-dat. wyny ~ ʾwyn (wyʾ ~ ʾwy); loc. m./f. wyʾ ~ ʾwy. The corresponding ideograms are based on ZK, e.g., ZK = xw, ZKwy = ʾwy. In late texts articles undergo phonetic reduction and are prefixed to the following nouns: w-mʾn “mind” (acc.), n-γrʾmy “wealth” (gen.-dat.), y-mʾny (loc.). The extended forms are nom. xwny or xwnx (f. xʾnʾ); acc.ʾwnw (f. wʾnʾ); gen.-dat. (w)nywʾnt; loc. wyʾwnt. Another extended form is provided with the element -ēδ, e.g., x-yδ and w-yδ. In the texts written in Sogdian script (ʾ)xw = ZK is sometimes used as a copula. The corresponding y-/m- forms are nom. yw = ZNH, ywny (f. yʾnʾ); acc. ʾmw, mwnw (f. mʾnʾ); gen.-dat. ʾmyn, nymʾnt; loc. myʾ, myʾmnt; ʾyδ and myδ. It is to be noted that the employment of the article is declining and one finds many fewer articles in Christian texts than in the Mug documents of the early 8th century and the contemporary Buddhist texts. The Sogdian prepositions are portmanteau forms containing a weak pronoun or article (cf. below) and they are scarcely followed by another article. This indicates that no distinction is made between definite and indefinite by means of the articles, of which the function is more syntactic than semantic/pragmatic. On the Sogdian weak demonstratives or articles, see Wendtland (2011). An adjective wysp- “all” sometimes takes pronominal endings, e.g., gen.-dat. wyspny, ins.-abl. wyspnʾδ, wyspnʾc, and pl. nom. wyspy, gen.-dat. wyspyšnw.

Sogdian pronouns also have portmanteau forms containing a prepositional element: with 1st and 2nd sg. pronouns: S c-ʾmʾkh “from me,” c-ʾβʾkh, δ-ʾmʾkh, etc. (the t- element of tʾmʾ, etc. mentioned above is an obsolete preposition ʾt(ʾ) “to”); with articles: c-ʾwn (< OIr. *hača awanā; Sims-Williams, 1990, p. 277, n. 5) “from the ~,” δ-ʾwn “with the ~,” pr-w “on the ~,” etc.; with strong demonstratives: pr-ywʾnt/pr-ywyδ “on that/those,” etc.

No special reflexive pronoun is known, but a feminine noun γryw “body” functions as a kind of reflexive. The forms derived from OIr. *xwa- are xwty “self” and xypδ (C xypθ); the former emphasizes a personal pronoun (often not expressed), e.g., ʾzw xwty “I myself,” while the latter expresses possession “(one’s) own.”

Interrogatives function also as relatives, which are often provided with the conjunction (ʾ)t(y), on which see below: ky (obl. kyʾ, abl. cknʾc ) “who,” cw “what,” kδʾ “when,” kw “where,” cʾnw “how,” cʾf “how much,” etc.

Indefinite pronouns are ʾʾδy, ʾyδy “someone, anyone,” nyδy “nobody,” ʾʾδc, ʾyδc “something, anything,” nyδc “nothing.” Reciprocals are expressed by the combination of ʾyw “one” and an inflected form of δβty- “the second.”

Verbs. Each Sogdian verb has two stems: present and past. Historically the former originates from the Old Iranian present stem and the latter, which always ends with -t, from the past participle derived with the suffix *-ta. Therefore, the two stems may sometimes differ remarkably: e.g., pres. stem (k)wn- “to do, make” ~ past stem S ʾkrt-, (C qt-) going back to OIr. *kṛnau- and *kṛta-, repectively. Several common verbs derive their present and past stems from different roots, e.g., wʾβ/wγt- “to say,” ʾʾβr/ʾʾγt “to bring.” The productive way to derive past stems from the present is to add a suffix -(ʾ)t. Some irregular verbs have both forms, e.g., S ʾprs- (C ps-) “to ask” and its past stems M fšt- and S psʾt

Pairs of transitive/causative and intransitive/passive stems are frequently met with, e.g., xwyr “to feed” ~ xwr- “to eat,” βr- “to bring” ~ βyr- “to be brought.” Some of the intransitive/passive forms derive from the old inchoative stems (Weber, 1970), e.g., swc/swγt- “to burn’ ~ swxs- “to be burnt,” wγryš/wγrʾt “to arouse, wake” ~ wγrʾs “to wake up.”

In addition to the present and past stems, certain verbs possess distinct imperfect stems characterized by the preservation, or analogical extension, of the Old Iranian augment *a-. In principle, only those stems containing preverbs undergo certain processes by which the old augment (inserted between a preverb and a stem) and a final vowel (*-i or *-a) of the preverb are fused into -ī- or -ā-: e.g., patγōš (ptγwš) “to hear” > patīγōš (ptyγwš), cf. OIr. *pati-a-gauša-; framāy (frmʾy) “to order” > frāmāy (frʾmʾy), cf. OIr. *fra-a-māya-. Contrary to expectation, verbs containing a preverb β- (< OIr. *abi) show the augment -ā- rather than -ī-, e.g., ʾβžʾy- “to increase (intransitive)” > βʾžy (< OIr. *abi-jawya-); similarly, those beginning with *us/uz- take -ī-, e.g., (ʾ)zwʾrt “to turn” > zywrt (< OIr. *uz-warta-). Present stems beginning with a preverb ʾn- (< OIr. *ham-) form their imperfect stem by adding m- to the present, e.g., ʾnxz “to rise” > mnxz. This m- was later extended to verbs beginning with the preverb ā- (< OIr. *ā-), e.g., ʾʾβr “to bring” > mʾβr. Analogical extension to etymologically simple verbs is also known: snʾy “to wash” > synʾy (< OIr. *snāya-).

Forms based on the present stem. Sogdian has six moods: indicative, subjunctive, optative, injunctive, irrealis, and imperative. The so-called precative has been shown to be the middle conjugation of the optative (Sims-Williams, 1989a, p. 188; Yoshida, 2009a and 2009c). The tense and aspect system comprises present, imperfect, optative imperfect, preterit based on the past stem, ʾz-imperfect, and yq (w)mʾt imperfect, the latter two being productive only in some Christian Sogdian texts. The perfect tense and the periphrastic passive are formed from the past participle (i.e., the past stem extended by the aka-suffix) and auxiliary verbs. Old middle endings are almost all displaced by the active, only a few of them having survived, among which is the occasionally encountered -ʾymn (1st pl.). However, 2nd sg. and 3rd sg. optative middle and 3rd sg. imperfect middle endings are formally so salient that they serve as starting-points for the analogical development of new paradigms. The old force of the middle voice is only perceivable in the 3rd sg. present with the ending -ty, which conveys passive meaning when used with transitive verbs, e.g., wyn-ty “(he) is seen” as against active wyn-t “(he) sees.” Durative and future meanings are conveyed, respectively, by particles -(ʾ)skwn (in a few texts occasionally -ʾštn as well; cf. Benveniste, 1966, who compares it with Yaghn. -išt) and -kʾm, e.g., šwʾm-ʾskwn, šwʾm-ʾštn (cf. Yaghn. šawomišt) “I am going” and šwʾm-kʾm “I will go.” While kʾm is attested in the Ancient Letters, ʾskwn or ʾštn does not occur there.

Present indicative forms (for the sake of convenience, idealized forms based on sample stems kun- “to do, make” and patγōš “to hear” are given): 1st sg. kun-ām (patγōš-a/ām), 2nd sg. kun-e (patγōš-e), 3rd sg. kun-ti (patγōš-t), 1st pl. kun-ēm (patγōš-ēm), 2nd pl. kun-ta/θa (patγōš-θ/ta), 3rd pl. kun-aṃt (patγōš-aṃt); imperative: 2nd sg. kun-a (patγōš-ø), 2nd pl. kun-θa/ta (patγōš-θ/ta); imperfect: 1st sg. kun-u (patīγōš-u), 2nd sg. kun-i (patīγōš-i), 3rd sg. kun-a (patīγōš-ø), 1st pl. kun-ēm (patīγōš-ēm), 2nd pl. kun-θa/ta (patīγōš-θ/ta), 3rd pl. kun-aṃt (patīγōš-aṃt); optative: 1st sg. kun-ē, 2nd sg. kun-ē/ya, 3rd sg. kun-ē, 1st pl. kun-ēm, 2nd pl. kun-ēθ, 3rd pl. kun-ēṃt; subj. 1st sg. kun-ān (patγōš-an/ān), 2nd sg. kun-a (patγōš-a), 3rd sg. kun-āt (patγōš-at/āt); injunctive: 1st sg. kun-u (patγōš-u), 2nd sg. kun-i (patγōš-ø), 3rd sg. kun-a (patγōš-ø), 1st pl. (not attested), 2 pl. kun-θa/ta (patγōš-θ/ta), 3 pl. kun-aṃt (patīγōš-aṃt). Optative middle endings are: 1st sg. -ēm/ētu, 2nd sg. -ēš/-ēta, 3rd sg. -ēt/-ētē, 1st pl. -ētēman, 2nd pl. ēšθ(a), 3rd pl. -ētēṃt; irrealis endings are: 1st sg. -ōtu, 2nd sg. -ōta, 3rd sg. -ōtē, 2nd pl. -ōtēšta. In the āz-imperfect and imperfect middle, the imperfect endings follow the present stem enlarged respectively by -ʾz and -t: δār-āz-u “I was holding,” ās-t-u “I took.” However, only two verbs, (k)wn- and ʾʾs “to take” show the imperfect middle (Sims-Williams, 1996b, p. 177, n. 11). Optative forms sometimes denote durative/iterative past and are referred to as “imperfect optative,” which often takes the imperfect stem rather than the present: wʾptʾy “was falling, fell repeatedly” < ʾwpt.

Nonfinite forms based on the present stem are present participles formed with -(ē) (f. -(ē)), e.g., S ptʾʾwnʾk “enduring,” S δβrʾynʾk “giving,” future passive participles ending with -(ī)čīk or , e.g., M swmbcyk “to be bored,” S βyry “to be found,” and gerunds formed by the suffix -kya/kī, e.g., βrkyʾ “having brought,” wʾβky “having said.” Infinitives derived from the present stem are also found. Their formation and usage vary from text to text (Yoshida, 1979). One type of them, the so-called pr-infinitive consisting of the preposition pr and the present stem, is common only in Christian texts. In some Christian Sogdian texts, the present participle ending in -ēk or -ēsk construes with the auxiliary verb (w)mʾt, which itself is the preterit form of the copula, and the combination denotes the durative past, namely yq (w)mʾt-imperfect, e.g., C šwyqmʾt “he was going.” In the texts in which āz-imperfect is attested the yq (w)mʾt-imperfect is virtually absent (Yoshida, 1980). The Old Iranian present participle ending with *-ant is no longer productive. The form extended with the aka-suffix is found in some fossilized adjectives, e.g., žuwaṃtē (M jwʾndy) “living.”

Forms based on the past stem. The intransitive or passive preterit consists of the past stem and the enclitic forms of the copula, except for the 3rd sg., which is nothing but the nom. sg. form of the past stem (idealized forms based on S βw-/ʾkrt- “to become” and ʾys/ʾʾγt “to come”): sg. 1st əkt-im, 2nd əkt-iš, 3rd əkt-i, (both masculine and feminine, but occasionally f. sg. əkt-a; āγat-ø), pl. 1st əkt-ēm, 2nd əkt-(aṃ), 3rd əkt-aṃt. The corresponding transitive preterit is formed with the auxiliary verb δʾr “to have.” The light stem takes the old acc. sg. ending -u while the heavy stem has no ending: 1st sg. žaγtuδāram (M jγṭw-δʾrm) “I held (M δʾr/jγt-),” patxrīt-δāram (M pṭxryṭδʾrm) “I hired (ptxryn/ptxryt),” etc. The modal forms of the preterit tense are formed by inflecting the auxiliary verb, e.g., S wmʾt-ʾt “(he) would have been (subj.),” M ʾwjγystδʾrn “(if) I should have settled down (subj.; M ʾwjγynd/ʾwjγyst).” Meager survivals of the so-called “ergative construction” based on the past stem of the transitive verbs are also encountered: ʾḤRZYm ZK δykh wyth “I (-m) saw the letter (δykh) (wyn/wyt “to see”; cited from one of the Ancient Letters).” The potentialis is a periphrasis consisting of a past stem in -t (heavy) or -tu/ta (light) plus auxiliary verb (k)wn- “to make” (trans.) or β(w)- (intrans./pass.). On the origin of the construction, see Sims-Williams (2007b). Two senses can be distinguished, one “potential” (e.g., M ṭwγṭʾ kwnyy (3rd sg. opt.) “he might be able to pay”), the other “anterior” (e.g., S cʾnʾkw ... wγtw wntʾ “when he had said”). Anterior meaning is also expressed by the non-finite construction based on the potentialis: čan ~tu/ta kārī, e.g., S mnʾ cnn nyrβʾn wytʾrt kʾry “after my passing into nirvāṇa (wytr-/wytrt “to depart”),” S cnn pwγtʾ-kʾry “after cooking (it) (pc-/pwγt- “to cook”).”

The past participle is derived from the past stem with the aka-suffix, e.g., m. parθaγtē ~f. parθaγ(t)č < S prδʾync/prδʾγt- “to pull.” The periphrastic passive and intransitive perfect are formed with the past participle (agreeing with the subject) and the auxiliary verbs, which are β(w)- (passive) and existential verbs (perfect), e.g., M sfryṭyṭ wβʾnd “they are created (pass.)” and S tγtʾy ʾskwʾy “you have entered (perfect).” The active perfect with the auxiliary verb δʾr is attested but not common: S ptγrβʾtʾk δʾrʾnt “they have received.” More often the passive construction is employed: S δβʾrty ʾstʾt “(if) it should have been given.” The past infinitive ends with -te (light) or -t (heavy) and always follows the main verb: e.g., S sʾct ʾPZYšy ... cšmtʾ knt “it is necessary (sʾct) to dig out (kn-/knt) his eyes.” On the Sogdian infinitives, see Yoshida (1979).

Indeclinables. Both prepositions and postpositions are common. Those which are construed with accusative case of the light stem are pr, prw, prm “in, on, by, for,” and (ʾ)kw “to, toward,” which has replaced the obsolete ʾt(ʾ). cnn, cʾwn “from” and δnn, δʾwn “with” take the instrumental-ablative. Somewhat rare is wsn “for the sake of” (Sims-Williams, 2001b).

The most common postpositions are sʾr (C ) “toward, from,” prʾyw (C prw; < pr + ʾyw “lit. in one”) “together with,” prm “up to, until,” and pyδʾr “for the sake of, because of” which are often preceded by (ʾ)kw or cnn/cʾwn, e.g., kw δynh sʾr “to the church,” cnn rymʾyš pyδʾr “because of rebuke.” Many adverbs and oblique case forms of nouns function as postpositions, e.g., myδʾny “among” < myδʾn “middle, waist,” ryty “in front of” < ryt “face,” cyntr “inside,” cwpr “upon.” Thus, Sogdian postpositions constitute an open class in contrast to the prepositions.

Sentences are negated by either (M ny(y), C ny) or (M/C ) preceding finite verbs, both being masked by the ideogram . In the case of the perfect and periphrastic passive, precedes the auxiliary verb. While negates proposition, is a prohibitive particle: M nyy wʾβʾmkʾm “I shall not speak,” M kpyy nyy nyʾtδʾrt “he did not catch the fish,” C ʾyty ny bwtqʾ “it will not be taken,” C nʾ psʾ “don’t ask!” The negation of the imperfect is formed not by negating the imperfect verb, but by negating the present indicative or injunctive form, occasionally preceded by an enclitic element S (y) (Sims-Williams, 1996b): S rty-βy Lʾ δwry zʾyh šwt (pres.) “He did not go far.”

Conjunctions. Coordinate conjunctions are (ʾ)rt(y) (= ʾḤRZY) which marks the beginning of the clauses, ʾt(y) (= ʾPZY, ZY) “and,” and ktʾr (= ʾWZY) “or.” Subordinate conjunctions are: (a) preceding the main clauses: cw “if,” cʾnw “when, while, since,” (ʾ) “if,” mnt “when, while,” kw prm, kwδprm “as long as”; (b) following the main clauses: ʾt(y) = ZY “that,” cʾnw (ʾ)ty “as (= like), than,” pʾr(w)ty “for, (not...) but,” prʾw ʾt(y) “because,” and ywʾr (ʾty) “however.” On the element ʾty, etc. (= ZY), see below. Combinations of and ʾt(y)/-wty and mʾδ “thus” and ʾt(y) come to be new subordinate conjunctions kt (C qt) “that” and mʾt “that” respectively. Somewhat rare is twty “that, then, and” (combination of ʾt(y) “and” and the complementizer, on which see below).

Syntax and phraseology. A syntactic feature peculiar to Sogdian (and for that matter Bactrian as well) is that in each clause (both main and subordinate) an enclitic complementizer (ʾ)t(y), -wty, yty (= ʾPZY, ZY) stands in the second position, to which other enclitic elements are added (Sims-Williams, 1985b). On the usage of the corresponding ideogram ZY, see Yakubovitch (2005). For example, the most frequent conjunction rty = ʾḤRZY marking the beginning of a clause consists of an obsolete adverb r- (cf. Khotanese rro “also”) and -ty. This conjunction is also affixed to the first element of a direct quotation, e.g., S KZNH ptʾyškwy <wγšʾ ZY βγʾ xwtʾw pʾrZY γrʾnh ʾkrtʾym> “She said thus (to him), ‘Rejoice, o lord king, because I have become pregnant’” (Weber, 1971). Similarly, the above mentioned subordinate conjunctions (e.g., pʾr-(w)ty, etc.) contain the complementizer. This is also true with the relative clauses, where the relative pronouns or adverbs are often followed by it: M xii δβrṭʾ ky-ʾṭy wyʾ smʾnyty ʾskwnd “Twelve doors which exist on the heavens.” During the period when Sogdian was attested use of the complementizer declined and in later texts it only survives in fossilized forms like C qyt (< ky ’ty), M/C kt/qt (< kδ-wty), etc. Similarly, less and less examples of rt(y) are attested in later texts. 

Word order. Sogdian is basically an OV language where heads follow the dependent elements. The basic structure of the Sogdian sentences is SOV: M ʾrṭy xww mrγʾrṭy xypδʾwnd mʾyδ pwskfṭy ww 100 δynʾr zyrn ṭwj “Thus under such constraint the owner (xypδʾwnd) of the pearls (mrγʾrṭy) paid (ṭwj) the hundred gold dinars (δynʾr zyrn).” However, Sogdian is far from being a consistent OV language and attests a number of counterexamples. For example, both prepositions and postpositions are common and the relative clause always follows the head noun: M wyspw ʾrk cw ʾṭy-my ṭγw frmʾyy “All the work (wyspw ʾrk) which (cw ʾṭy) you order me.”

As a Middle Iranian language, Sogdian still observes “Wackenagel’s law,” by which enclitics occupy the second position in the sentence. In classical Sogdian the second position is usually occupied by the complementizer (ʾ)t(y), and the enclitics are added to it: M pʾr-ṭy-šy xw wynʾ jnyy frmʾṭ-δʾryy “But you ordered (frmʾṭ-δʾryy) him (-šy) to play (pres. inf. jnyy) the lute (wynʾ).” Among the enclitics are the above-mentioned (y), a hypothetical particle -n (Sims-Williams, 2007a, p. 192b), and other clitic elements. These enclitics can pile up: ʾḤRZYmβc δykh ʾʾγt “A letter (δykh) came to me (-m-) from you (-β-c).” After this rule of syntax became no longer obligatory, enclitic pronouns became more independent and stand without their host: S ... šw kδʾc Lʾ wʾcʾyδ kʾm šw ms Lʾ ptxwyδʾ  “Never let him escape! Moreover, do not kill him!”

With regard to the relaxed agreement phenomena in Sogdian, cf. Sims-Williams’s (1989, p. 190) remark: “The rules of agreement which apply to light stems are relaxed in the case of heavy and contracted stems, leading to phenomena such as group inflexion. To a large extent the suffixes (obl.) and -t (plural) are treated as optional, being often omitted where clarity is unimpaired.” “Group inflexion” is a phenomenon where only the last in a series of (usually asyndetically coordinated) words is inflected (Gershevitch, 1954, §§1639-43): e.g., M. cn ʾnxr pxryty βyq “with the exception of fixed stars (instead of pl. obl. form ʾnxrty) and planets.” The rule of group inflexion is not compulsory either.

Honorific and polite expressions are collected by Yoshida (2006). Among them are frmʾy “talk” (the subject being someone higher in rank), lit. “order,” and its opposite ptškwʾy, lit. “entreat”; both mean “to say, speak.” The addressor’s humble feeling is also expressed by the 1st sg. injunctive and a verb rxn- “dare”: S rtyn ʾzw cʾnw rxnʾw ʾPZYn ywnʾk ʾsβrʾckʾ ʾprsʾw “How dare I ask (the Buddha) about this matter?” Note also the use of the hypothetical particle -n in this sentence.

Language contact and lexicon. The Sogdian lexicon consists of three groups of words: (a) those inherited from Old Iranian, (b) loanwords, and (c) foreign elements or nonce forms temporarily appearing in texts. Some of the native Iranian elements show distinctively East Iranian features, e.g., S knδh “town,” kp- “fish,” and C myθ (S, M myδ) “day.” Fully assimilated loanwords mainly originate from Western Iranian (Middle Persian and Parthian) and Indian. For example, the names of the seven days of a week are of Middle Persian origin: myr “Sunday,” mʾx “Monday” (see HAFTA). Other examples include šʾnšʾy “king of kings” and spʾntn “grain of mustard” (cf. original Sogdian form šywšpδn). In view of the pronunciation -c, rwc “day (of a month)” was borrowed from Old Persian raučah rather than Middle Persian rōz. Parthian forms are more common: swkβʾr “monk,” msyδr “presbyter,” βγpwryc “divine maiden,” etc. The word mγδβ- “minister” (< Parth. maγβeδ) sometimes construes with yet another Parthian loan word: wzʾrkt mγδβtʾ “great ministers,” cf. Parth. wzrg (Sims-Williams, 1983b, p. 44). Of Indian origin are mwδy “price,” rtn- “jewel,” prʾny “insect,” swmtr- “ocean,” mkr- “monkey,” škr- “sugar,” prmʾn “reliable,” etc. (Sims-Williams, 1983a; idem, 2007, p. 252). Two doublets of Sogdian and Bactrian words are found: Sogd. cxr- ~ Bactr. sxr- “wheel,” Sogd. rδδ- ~ Bactr. rxʾkh “cart.” An Iranian-Indian hybrid word sʾrtpʾw “caravan leader” (cf. Skt. sārthavāha- “id.”) may also originate from Bactrian (Sims-Williams, 1996a, p. 51, n. 37). Greek loanwords are not few, but their immediate origins are obscure: δyδym “diadem,” nwm “law,” M qpyδ “shop” (Sims-Williams, 1996a, p. 51, n. 39); δrxm- “drachma” is from Greek, but δynʾr “dinar” from Latin. Foreign elements are Sanskrit forms in Buddhist texts (Provasi, 2013), Western Middle Iranian lexical items in Manichean texts, and Syriac words in Christian texts. (On the Syriac elements, see Sims-Williams, 1988.) The foreign elements are quite numerous in translations, and their number seems to depend on the scholarship of each translator. Chinese elements are found, but they seem to be cultural words shared by other Central Asian languages: šnk “pint” (Chin. sheng , Uighur šing, Khotanese śiṃga/ṣaṃga, Tocharian ṣak), tym “inn” (Chin. dian , Uighur tem, Pers. tim), mkʾ “ink” (Chin. mo , Uighur mäkkä) (Yoshida 1994, p. 379; Sims-Williams, 1996a, p. 62).

Sogdian lent numerous words to Old Turkic, in particular Uighur: ažun (< ʾʾžwn) “living being,” tamu (< tm-) “hell,” užik (< M ʾwjkʾk, ʾwjʾk) “letter, character,” etc. Sogdians’ cultural influence may be seen in the word for “paper” kʾγδ(y)ʾ, which was borrowed into several languages: Uighur kägdä, Persian kāḡaḏ, Central Asian Sanskrit kākali. However, this word itself seems to have been borrowed from Greek chartés (Sims-Williams, 1996a, p. 62). Since the New Persian literary language came into being among the speakers of Eastern Iranian languages, including Sogdian, a number of Sogdian words found their way into New Persian: čaγz (< cγz) “frog,” naγz “good, excellent” (< nγz-), pasāk (< (ʾ)psʾk) “garland,” setāγ (< stʾγ) “childless.” It is curious to see Sogdian θ and δ merge into l in New Persian loanwords, e.g., lenj “to pull out” < *θēnč, cf. zβāk-θēṃčē (S zβʾʾk-δyncʾk) “one who pulls out a tongue,” lastpardarak “handkerchief” < C. dstprtry “id.,” see SOGDIAN LANGUAGE ii. Loanwords in Persian).

Varieties of Sogdian and Yaghnobi. The Sogdian language documented in the bulk of materials handed down to us most probably represents the standard variety spoken in the area surrounding Samarkand during the 7th to 10th centuries; the Bukharan dialect cited by Islamic writers is slightly different (Henning, 1958, pp. 85-86; Sims-Williams, 1989b, pp. 165-66). Linguistic features of the much earlier language are attested in the Ancient Letters of the early 4th century (see Ancient Letters). The so-called “Turco-Sogdian” is one of the latest varieties of the language, which shows very strong Turkish influence, not only in vocabulary but also in syntax. It was spoken (or merely written?) by those who were bilingual in Sogdian and Old Turkic (Sims-Williams and Hamilton, 1990, pp. 10-11; Sims-Williams, 2008; Yoshida, 2009d, 2011a; Sundermann, 2016). The wide range of linguistic differences that once existed in Sogdiana may be inferred by comparing Sogdian with Yaghnobi, the so-called “Modern Sogdian.” For example, the formation of imperfect stem by adding the augment a- to any present stem (Yaghn. piraxs- > a-piraxs- “left” vs. Sogd. prxs- > pʾrxs “id.”) and the 3rd pl. ending -or (a-wen-or “they saw” vs. wyn-ʾnt) cannot be attributed to the linguistic change from Sogdian to Yaghnobi. In the formation of the imperfect stem, Choresmian is much closer to Sogdian than to Yaghnobi (MacKenzie, 1974). On the other hand, it may be worth noting that Yaghnobi shares the 3rd pl. -r ending with the neighboring Choresmian and Khotanese.


(Yutaka Yoshida)

