Tocharian is the conventional name for two closely related Indo-European languages that were spoken in northwest China, in the north of the Tarim Basin in present-day Xīnjiāng. The two languages, usually referred to as ‘Tocharian B’ and ‘Tocharian A,’ are attested by predominantly Buddhist manuscript fragments found in the three regions of Kuča (Chin. Kùchē) in the west, Qarašahr (Chin. Yānqí, Uyg. Qarašähär, Skt. Agni) in the middle, and Turfan (Chin. Tǔlǔfān) in the east. The Tocharian B fragments probably date from the 5th to the 10th centuries CE, and those in Tocharian A from the 7th to the 10th centuries.

Tocharian B shows an internal chronological development (Peyrot, 2008 with references). Since the oldest (‘archaic’) of the three linguistic stages is attested only in Kuča, Tocharian B must be originally from there (see also Lévi, 1913). The middle (‘classical’) and late stages are also found in Qarašahr and Turfan, which is due to a later spread to the east in approximately the 7th century. Tocharian A is only attested in Qarašahr and Turfan. It seems to be originally from the Qarašahr region (Ogihara, 2014) and spread east to Turfan together with Tocharian B.

Tocharian B is attested with ca. 10,000 manuscript fragments and Tocharian A with ca. 2,000; most of these are very small. The majority of the corpus consists of fragments of Indian-style oblong poṭhī leaves of Buddhist content. There are also glosses in Sanskrit manuscripts, administrative and legal documents in the form of scrolls or wooden tablets, as well as painting captions and graffiti. The corpus is divided among Berlin, Paris, London, St. Petersburg, and Japan and China. For the corpus, see M. Malzahn (2007, ed.) and cetom.

The name of the Tocharian languages. The name of the Tocharian languages has been a subject of lively debate for more than a century. The conventional name ‘Tocharian’ goes back to F. W. K. Müller (1907, p. 960), who based himself on a colophon of the Old Uyghur Maitrisimit, which indicates that it was translated from the tohrı language. The name was accepted by E. Sieg and W. Siegling (1908, p. 916, 928), the decipherers of Tocharian, who distinguished two different languages that they labelled ‘Gruppe A’ and ‘Gruppe B’ (corresponding to Tocharian A and Tocharian B, respectively). Müller and Sieg could later prove that the Old Uyghur Maitrisimit was indeed translated from the Tocharian A Maitreyasamitināṭaka (1916), and therefore there can be no doubt that the Old Uyghur name for Tocharian A is tohrı or tohrı tilitohrı language.”

However, there is widespread agreement today that the speakers of the Tocharian languages A and B cannot be equated with the historical people of Afghanistan called τόχαροι in Greek, tochari in Latin and tukhāra in Sanskrit (pace Müller, 1907 and Sieg and Siegling, 1908). An important role in the discussion has been reserved for the Yuèzhī 月氏 migration from the Tarim Basin to Afghanistan (Bactria) as recorded in Chinese sources. The Yuèzhī might indeed be related with the historical Tocharians of Afghanistan, but there is no evidence that they spoke Tocharian or that there is any connection between the speakers of Tocharian and the Yuèzhī in the Tarim Basin (Mallory, 1989, p. 60). On the other hand, the Old Uyghur designation tohrı is likely to be connected with the Sogdian ctβʾr twγrʾystn “four Tuγri countries” of the Karabalgasun inscription (line 19), which seems to refer to the area around Qarašahr (Yoshida, 2011, p. 532b; Henning, 1938, p. 550 and passim).

A Sanskrit-Tocharian B bilingual text has also been an important issue in the discussion on the name of the Tocharian language. This bilingual has been thought to prove that Sanskrit tokharika refers to the Tocharian B language of Kuča because of the sequence tokharika : kucaññe iṣ̱ca̱ke, which was thought to mean “Tocharian : Kuchean ...” or “Tocharian : Kuchean woman.” This interpretation was not correct. “Kuchean” is kuśiññe in Tocharian B, not kucaññe, and a word iṣcäke “woman” (as it is most probably to be read) does not exist. Instead, a more probable interpretation takes tokharika to be a Prakritism for Skt. tūbarikā “fragrant earth” and the Tocharian B translation to mean “a kind of clay” (Pinault, 2002b; Adams, 2013, pp. 191-92; for iṣcäke, see also below).

Thus, although the name ‘Tocharian’ should best be kept for practical reasons, it is not correct. The Tocharian B name for the Tocharian B language was probably derived from kuśi “Kuča” (cf. Skt. Kuci and classical Chin. Qiūcí 龜茲 with spelling variants); cf. kuśiñ pele rekisa “in speech in Kuča manner” (Pinault, 1989, p. 21) and the adj. kuśiññe “Kuchean.” The Tocharian A name for the Tocharian A language was ārśi-käntu “Ārśi language” (Sieg, 1918); ārśi is the old name of Qarašahr, corresponding to Skt. Agni and Chin. Yānqí 焉耆 (Sieg, 1937).

Writing, phonology and grammar. Tocharian is written in the Brāhmī alphabet, from left to right with akṣaras that denote a consonant or group of consonants as well as the following vowel. It is transcribed according to the Sanskrit values of the characters. Thus, <c> is a palatal stop or affricate; <ś> is a palatal sibilant; <ṣ> is a retroflex sibilant; <ñ> is a palatal nasal. Two additional sounds are expressed with digraphs: <ly> is a palatal lateral and <ts> a dental affricate. For two further sounds the script has been adapted more radically: a character <w> has been added, probably for a labio-velar approximant, and <ä> (expressed either by a vowel diacritic or by a special consonant sign without vowel diacritic) for a weak vowel close to schwa. The vowels <i> and <u> are sometimes used without syllabic value, transcribed in subscript, e.g., TochB <krui> [krwi] “if.”

There is no distinctive vowel length in Tocharian. i and u interchange with ī and ū in both languages. In TochA, <ā, a, ä> denote three distinct vowel phonemes, approximately /a, ʌ, ə/. In TochB, these three vowels stand for two vowel phonemes, <ā> being accented /á/, <ä> unaccented /ə/, and <a> unaccented /a/ or accented /ə́/. The consonantal system is relatively simple: there is no distinctive voice or aspiration (e.g., no d, th, dh, but only t), there are no retroflex stops and nasals (e.g., no , ṭh, ); there are no flat fricatives (e.g., no x, f, θ); and there are no v and h. Consequently, many characters occur only in loanwords from Sanskrit. Anusvāra, transcribed as <ṃ>, usually denotes /n/ (not /m/, but sometimes /ñ/). The language allows rather heavy consonant clusters, e.g., tk, tsk, rtk, initial pk-, pkl-, ścm-.

The morphology of both languages is relatively complex and can typologically best be compared with Khotanese, while it is simpler than that of, for instance, Avestan and Sanskrit. The noun has a two-layer inflectional system consisting of the fusional primary cases nominative, oblique (= accusative), genitive, and vocative (the latter only in TochB), supplemented by agglutinative secondary case suffixes, attached to the oblique, for the ablative, locative, allative, comitative, perlative, causal (the latter only in TochB), and instrumental (only in TochA). On the basis of the formation of the plural and the primary cases, around thirty nominal inflectional categories with often only minor differences can be distinguished. The verb has five basic stems: present, subjunctive, preterite, imperative, preterite participle. From the present is derived the imperfect, and from the subjunctive the optative. Finite verb forms are inflected for person, number, and voice (active vs. middle). There is a large number of different patterns for the formation of the basic stems. The basic word order is SOV: determiner, adjective, and genitive precede the noun, there is a strong preference for postpositions, and main clauses generally follow subclauses. For the grammar of the Tocharian languages, see E. Sieg, W. Siegling, and W. Schulze (1931), W. Krause and W. Thomas (1960); and further G.-J. Pinault (1989, 2008), Malzahn (2010), M. Peyrot (2013).

Loanwords from Indian. In view of the Buddhist content of the texts, which belong to the hīnayāna Sarvāstivāda and Mūlasarvāstivāda schools, it comes as no surprise that both languages contain a large number of loanwords from Sanskrit. Undoubtedly some of these made their way into the spoken language, but the majority clearly belongs to a learned register, as is shown also by the attempt to keep the original Sanskrit spelling in spite of the frequent occurrence of phonemes absent from genuine Tocharian. One of the few regular adaptations is that the Skt. stem vowels -a and are deleted (sometimes is found). Further, non-final a and ā are often not fully correct, and occasionally consonantal phonemes are replaced with their genuine Tocharian equivalents. Example: Skt. gaṅgā- “Ganges” : TochB gaṅk, gaṅgä, gāṅg, gāṅgä, gāṅk, TochA gaṅk, gāṅk. Feminine personal names regularly keep the stem vowel as -a in TochB and as in TochA, while the stem vowel -a of masculine personal names is in TochA often and in TochB always replaced by -e. Example: Skt. ānanda- : TochB ānande, TochA ānand; Skt. nanda- : TochB nande, nānde, TochA nande, nānde; Skt. nandā- : TochB nānda, TochA nandā.

There are also loanwords from Prakrit, which are mostly better integrated into the language and do not necessarily belong to a higher register. Example: TochB ṣamāne, TochA ṣāmaṃ “mendicant monk, śramaṇa”; cf. Niya-Pkt. ṣamana. In some cases, borrowing through Khotanese needs to be assumed, while in other cases it cannot be excluded; cf. in this case Khot. ṣṣamana- “id.”

Loanwords from Iranian. While the contacts with Indian are from a relatively recent date, with many of the loanwords belonging to a superficial layer of the lexicon, contacts with Iranian languages have taken place over a much longer period. Loanwords from Iranian are mostly well integrated into the language and do not show marginal or loan phonemes, and their morphological patterns are often indistinguishable from Tocharian words directly inherited from Proto-Indo-European. The main problem with the Iranian loanwords in Tocharian is that they are heterogeneous, dating from different periods and coming from different sources, and that the exact source cannot always be identified. While many words may come from dialects for which the relevant word happens to be unattested, or from earlier stages of dialects that are known only from a later stage, it is practically certain that there are also borrowings from extinct Eastern Iranian dialects that have left no written testimony. Important questions are when and where these words were borrowed, from which dialect, and under what circumstances.

From the viewpoint of Tocharian, the most important chronological criterion is whether a word has been borrowed into both languages separately or into Proto-Tocharian. Proto-Tocharian is the common ancestor of Tocharian A and B that is not attested directly but has to be reconstructed through the comparison of the daughter languages. The latest stage of Proto-Tocharian before the split cannot be dated precisely but is commonly estimated to have been approximately between 1000 BCE and 500 BCE. Evidently, words that were borrowed into early stages of the two daughter languages soon after the split may be hard to distinguish from loanwords into Proto-Tocharian.

The identifiable Iranian source languages conform with the geographical position of Tocharian: Bactrian, Khotanese, Sogdian. Words demonstrably from Bactrian date from after the split of Proto-Tocharian and are probably due to the influence of the Kushans in the Tarim Basin in the 2nd century CE, possibly also to that of the Hephthalites in the 5th century. Loanwords from Khotanese are partly recent and may date from the historical period; borrowings from earlier stages of Khotanese or other Saka dialects are expected, but the dialect assignment is often difficult. In the Tocharian-Iranian contacts, Iranian is almost exclusively the donor language: borrowings from Tocharian into Iranian are exceedingly rare (from the items given by X. Tremblay, 2005, p. 444, yolo is to be removed; see Peyrot, forthcoming). The semantic fields of warfare and society are well represented, but in general the loanwords cover a broad semantic range. The most important contributions on the Iranian loanwords in Tocharian are: D. Q. Adams (2013), G. Carling (2009), O. Hansen (1940), L. Isebaert (1980), Pinault (2002a), K. T. Schmidt (1985), M. Schwartz (1974), N. Sims-Williams (2000-2012), Tremblay (2005), W. Winter (1971).

Loanwords from Old Iranian. A number of Iranian loanwords are to be dated before the break-up of Proto-Tocharian and in part reflect Old Iranian forms. Characteristic of this early layer is the representation of OIr. *a as PToch. *e, also in final position.

– TochB waipecce “possession” ← OIr. *hwai-paϑya- “own” (cf. Av. xvaēpaiϑiia-). The initial h- is lost without a trace.

– TochB perne, TochA paräṃ  “rank, diginity” < PToch. *perne ← OIr. *farnah- “glory” (Av. xvarənah-). The *f is represented by *p; *h is lost. The development in TochA is regular: *perne > *parna > *parn > *parən = paräṃ.

– TochB keṣe, TochA kaṣ “fathom” < PToch. *keṣe ← OIr. *kaša- “armpit” (Av. kaša-). The Toch. word looks non-Toch. because there is no regular source for in this position; the necessary Proto-Indo-European *Kosē(n) is not warranted morphologically.

– TochB tsain, pl. tsainwa “weapon” < PToch. *tsainu ← OIr. *dzainu- < IIr. *hai- (cf. Av. zaēnuš- “baldric,” which M. de Vaan, 2000, p. 531, derives from an earlier *zai-nu-; the u-stem is also found in Arm. zēn). Interestingly, Toch. shows the intermediate stage *dz of the development of PIIr. *h > z- as in Av. The u-formant is reflected in the Toch. plural.

A further feature of loanwords from Old Iranian is the frequent syncope of *a in medial syllables, which cannot be explained within Tocharian and must be attributed to the Iranian source dialect. Although such a syncope has taken place in Bactrian, the relevant words are older and cannot derive from historical Bactrian.

– TochB retke, TochA ratäk “army” < PToch. *retke ← *ratka- < OIr. *rataka- “line of battle” (cf. MP rdg “line, rank, row”).

– TochB speltke, TochA spaltäk “zeal” < PToch. *speltke ← *spaḍka-? < OIr. *spardaka- (cf. Av. spərəd-, a different formation). Toch. lt may reflect *rd assimilated to *rḍ or *.

Loanwords from Bactrian. There is no necessity to assume Bactrian loanwords into Proto-Tocharian. It has been argued (Pinault, 2002a, pp. 262-64) that TochB kamartāññe “rulership” and TochA kākmart “master,” which seem to go back to Proto-Tocharian, are borrowed from Bactrian καμιρδο, a god, i.e., “chief god” < *kamṛda- “head” (cf. Av. kamərəδa-). However, these words derive from a base *kamarta, and the related TochB kamartīke “ruler” and TochA kākmärtik reflect *kamartike (Adams, 2013, p. 149). The problem is that the a of the second syllable cannot be derived from Bctr. ι, but points instead to another dialect in which * > ar. Closer in this respect is Khot. kamala- “head” < *kamarda-. Another problem of a derivation from Bactrian is that καμιρδο is late for regular *καμιρλο, which is not a possible source because of -ρλ-.

A special problem of loanwords from Bactrian is that they could sometimes also be from Sogdian and that some of them were borrowed from TochB into TochA.

– TochB pärmaṅk “hope” ← Bctr. φρομιγγο. TochA pärmaṅk is borrowed from TochB.

– TochB kṣuṃ “regnal year” ← Bctr. χþονο (but cf. Niya-Pkt. kṣ̄una, Khot. kṣuṇa-).

– TochB pärkāu, TochA pärko “profit” ← Bctr. φρογαοο.

– TochB perāk “faithful, credible” ← Bctr. πηρο “belief,” *πηραγο. TochA perāk is either borrowed from TochB or from Bctr. directly.

– TochB postak, TochA postäk “book” ← Bctr. πωσταγο (°ογο, °ιγο, °ιιο) “document” (but cf. Sogd. pwstk). TochA cannot be from TochB but must have been borrowed independently.

The following words are close to historical Bactrian forms but cannot derive from them directly:

– TochB sapule “jar”: not from Bctr. σαβολο because of the final -e.

– TochB mālo, obl. māla “alcohol”: not from Bctr. μολο because the vocalism deviates.

– TochB ārte “river branch” or “canal”: not from Bctr. αρλο “side, bank,” because both -ρλ- and -ο do not match.

The Tocharian B word for “drachma” (see DIRHAM) obviously has come by way of the Greco-Bactrian cultural sphere (cf. Greek δραχμή “drachma”) but seems to have been borrowed from Gāndhārī or Niya Prakrit rather than from Bactrian itself. It has the shape drakhma/// (Peyrot, 2014, p. 152) or trākäṃ and is used as a measure of weight in medical texts, just as Niya-Pkt. drakhma (KI 702). Bctr. δδραχμο or Δραχμο is so far only attested in the meaning “dirham; money,” but may of course have been used as a measure of weight too. It is in particular the TochB spelling with kh (which cannot stand for Bctr. χ /x/) that suggests Indian origin.

Loanwords from Sogdian and Khotanese. A few examples of loanwords from Sogdian and Khotanese:

– TochB mot “wine” ← Sogd. mwδ- (next to mδw).

– TochB ñyās, TochAB ñās “need” ← Sogd. nyʾz.

– TochAB menāk “comparison, example” ← Sogd. mynʾk.

– TochA twantaṃ “reverence” ← Khot. tvaṃdanä.

Loanwords from the BMAC language. Another source of loanwords may be the language of the Bactria-Margiana Archaeological Complex (BMAC; see IRAN vii. NON-IRANIAN LANGUAGES (1) Overview), remnants of which are preserved as substrate words in Indo-Iranian (Lubotsky, 2001) and possibly also Tocharian itself (Pinault, 2003, 2006). The main argument to assume borrowing from the BMAC language is that no exactly corresponding Indo-Iranian forms can be found. At the same time, this means that the proposed etymologies are more uncertain than usual:

– IIr. *išt(i̯)a- “brick” (Skt. íṣṭakā-, Av. ištiia-, OP išti-): TochB iścem “clay,” iṣcäke (for iścake*) “clay” (see also above);

– IIr. *ćaru̯a- name of a deity associated with shooting arrows (Skt. śarvá-, Av. sauruua-): possibly TochB śerwe “hunter,” TochA śaru “id.”;

– Skt. āṇí- “axle-pin; part of the leg just above the knee”: possibly TochB oñiye*, obl.  sg. oñi “hip.”

In view of the heterogeneity of the Iranian loanwords in Tocharian, of which only some examples could be given above, it seems that the contacts took place over a long period but were not particularly intensive. This suggests that Tocharian has had many apparently culturally dominant Iranian neighbors without really being part of the Iranian world.


D. Q. Adams, A Dictionary of Tocharian B, 2nd ed., Amsterdam and New York, 2013.

G. Carling, Dictionary and Thesaurus of Tocharian A. Part 1: A−J, Wiesbaden, 2009.

cetom = A Comprehensive Edition of Tocharian Manuscripts,

O. Hansen, “Tocharisch-iranische Beziehungen,” Zeitschrift der Deutschen Morgenländischen Gesellschaft 94, 1940, pp. 139-64.

W. B. Henning, “Argi and the “Tocharians”,” Bulletin of the School of Oriental Studies 9, 1938, pp. 545-71.

L. Isebaert, De Indo-Iraanse bestanddelen in de Tocharische woordenschat, Diss. Leuven, 1980.

W. Krause and W. Thomas, Tocharisches Elementarbuch, I. Grammatik, Heidelberg, 1960.

S. Lévi, “Le ‟tokharien B”, langue de Koutcha,” Journal Asiatique, 11e série, 2, 1913, pp. 311-80.

A. M. Lubotsky, “The Indo-Iranian Substratum,” in Chr. Carpelan, A. Parpola, P. Koskikallio, eds., Early Contacts between Uralic and Indo-European: Linguistic and Archaeological Considerations, Helsinki, 2001, pp. 301-17.

J. P. Mallory, In Search of the Indo-Europeans: Language, Archaeology and Myth. London, 1989.

M. Malzahn, The Tocharian Verbal System, Leiden and Boston, 2010.

Idem, ed., Instrumenta Tocharica, Heidelberg, 2007.

F. W. K. Müller, “Beitrag zur genaueren Bestimmung der unbekannten Sprachen Mittelasiens,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften, Berlin, 1907, pp. 958-60.

F. W. K. Müller and E. Sieg, “Maitrisimit und »Tocharisch«,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften, Berlin, 1916, pp. 395-417.

H. Ogihara, “Fragments of Secular Documents in Tocharian A,” Tocharian and Indo-European Studies 15, 2014, pp. 103-29.

M. Peyrot, Variation and Change in Tocharian B, Amsterdam and New York, 2008.

Idem, The Tocharian Subjunctive. A Study in Syntax and Verbal Stem Formation. Leiden and Boston, 2013.

Idem, “La relation entre la chronologie du tokharien B et la paléographie,” in N. Balbir and M. Szuppe, eds., Lecteurs et copistes dans les traditions manuscrites iraniennes, indiennes et centrasiatiques. Roma, 2014, pp. 121-47.

Idem, “Language Contact in Central Asia: On the Etymology of Tocharian B yolo ‘bad’,” forthcoming.

G.-J. Pinault, “Introduction au tokharien,” LALIES 7, 1989, pp. 3-224.

Idem, “Tocharian and Indo-Iranian: Relations between two Linguistic Areas,” in N. Sims-Williams, ed., Indo-Iranian Languages and Peoples, Proceedings of the British Academy, 2002a, pp. 243-84.

Idem, “Tokh. B kucaññe, A kuciṃ et skr. tokharika,” Indo-Iranian Journal 45, 2002, pp. 311-45.

Idem, “Sanskrit kalyāṇa- interpreté à la lumière des contacts en Asie Centrale,” Bulletin de la Société de Linguistique 98, 2003, pp. 123-61.

Idem, “Further Links between the Indo-Iranian Substratum and the BMAC Language,” in B. Tikkanen and H. Hettrich, eds., Themes and Tasks in Old and Middle Indo-Aryan Linguistics, Delhi, 2006, pp. 167-96.

Idem, Chrestomathie tokharienne, textes et grammaire, Leuven and Paris, 2008.

K. T. Schmidt, “Zu einigen der ältesten iranischen Lehnwörter im Tocharischen,” in U. Pieper and G. Stickel, eds., Studia linguistica, diachronica et synchronica. Werner Winter sexagenario anno MCMLXXIII gratis animis ab eius collegis, amicis discipulisque oblata. Berlin and New York, 1985, pp. 757-67.

M. Schwartz, “Irano-Tocharica,” in Ph. Gignoux and A. Taffazzoli, eds., Mémorial Jean de Menasce, Louvain, 1974, pp. 399-411.

E. Sieg, “Ein einheimischer Name für Toχrï,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften, Berlin, 1918, pp. 560-565.

Idem, “Und dennoch »Tocharisch«,” Sitzungsberichte der Preußischen Akademie der Wissenschaften, Philosopische-historische Klasse, 1937, pp. 130-39.

E. Sieg and W. Siegling, “Tocharisch, die Sprache der Indoskythen. Vorläufige Bemerkungen über eine bisher unbekannte indogermanische Literatursprache,” Sitzungsberichte der Königlich Preußischen Akademie der Wissenschaften, Berlin, 1908, pp. 915-932.

E. Sieg, W. Siegling and W. Schulze, Tocharische Grammatik, Göttingen, 1931.

N. Sims-Williams, Bactrian Documents from Northern Afghanistan, 3 vols., London, 2000-2012.

X. Tremblay, “Irano-Tocharica et Tocharo-Iranica,” Bulletin of the School of Oriental and African Studies 68, 2005, pp. 421-49.

M. de Vaan, “Die Lautfolge āum im Vīdēvdād,” in B. Forssman and R. Plath, eds., Indoarisch, Indoiranisch und die Indogermanistik. Arbeitstagung der Indogermanischen Gesellschaft vom 2. bis 5. Oktober 1997 in Erlangen, Wiesbaden, 2000, pp. 523-33.

W. Winter, “Baktrische Lehnwörter im Tocharischen,” in R. Schmitt-Brandt, ed., Donum Indogermanicum. Festgabe für Anton Scherer zum 70. Geburtstag, Heidelberg, 1971, pp. 217-223.

Y. Yoshida, “Karabalgasun ii. The inscription,” in Encyclopædia Iranica XV, 2011, pp. 530b–533b.

(Michaël Peyrot)

Originally Published: July 27, 2015

Last Updated: July 27, 2015

Cite this entry:

Michaël Peyrot, "TOCHARIAN LANGUAGE," Encyclopædia Iranicaonline edition, 2015, available at (accessed on 27 July 2015).