Wakhi is one of the Pamir languages in the Eastern Iranian branch of Indo-European languages. Native speakers of Wakhi live in Tajikistan, Afghanistan, Pakistan and China. The total number of Wakhi-speakers is about 70 thousand people . The language does not have unified writing system, literary norm and educational status, but as a koine language it occupies a dominant position in Wakhan. Although the Wakhan language is officially non-written, several types of graphics have been created: Wakhis from Tajikistan prefer Cyrillic alphabet, in Afghanistan Arabic system is harnessed, in Pakistan and China Latin script is applied. Russian scientists traditionally use Latin transcription with additional Cyrillic letters. Latin script is also applied in this research.
Most of the Wakhis are multilingual: the second language in Tajikistan is Tajik and the third language is Russian. A majority of Wakhis in Afghanistan speaks Dari and Pashto. Wakhis of Northern Pakistan speak Sheena and Burushaski as well as the official languages (English and Urdu). For native speakers of Wakhis in China Uighur is the second language and Chinese is the third one.
The Tajik variant of Wakhi has two main dialects: upper and lower; sometimes the central dialect is distinguished as an intermediate one. Over the past decades, differences in dialects have become more clearly visible, and they emerge in lexical borrowings .
In China the Wakhi language has features of the upper dialect with subtle differences presented in the lexical aspect. According to the available data, due to the remote location and presumably a number of borrowings from non-Iranian languages, Xinjiang variant of Wakhi appears to be a distinct dialect of the Wakhi language, but due to the lack of reliable relevant materials, it is impossible to form a clear idea of the extent of their divergence.
Phonetically, positional quantitative changes in vowels are typical for Wakhi, and vowels are characterized by inherent duration. There are 6 vowel phonemes distinguished: i, ?, a, o, u, ?, with ? marked out by increased variability and the ability to realize in forms ü and ö. Besides, vowel e appears in borrowings from Tajik. Vowels are characterized by the inherent duration, which was studied by Sokolova V.S., Pakhalina T.N., Lashkarbekov B.B., Grunberg A.L., Steblin-Kamensky I.M. Duration intervals for Wakhi vowels were computed by Ivanov V.B.
Purpose of the study
Before this research, the Wakhi word stress has not been studied experimentally. It was considered to be expiratory (dynamic) [3, p. 175], i.e. a stressed syllable marker was supposed to be intensity. The purpose of the work is to provide a complex instrumental analysis of the Wakhi word stress in order to determine a stress type in this language and the importance of such parameters as intensity, pitch and duration.
Methodology and course of study
All the vowels acted as syllable nucleuses during our experiments, in which the parameters of syllable nucleuses in stressed and unstressed syllables were compared. Phonetic environment of syllable nucleuses can be both voiced (sonorous) and unvoiced, and consequently can vary in terms of vocal cords work. Since when searching for an acoustic correlate it is important to have the same set of parameters for each item studied, phonetic environment was not regarded, and we only dealt with syllable nucleuses themselves.
In the course of the first experiment, the speech of four Wakhi speakers from Tajikistan (students, two men and two women) recorded solely via the acoustic channel was analyzed. Two- and three-compound numerals were taken as a lexical material studied, such as:
c?b?r – four;
?asiw – eleven;
?asbuy (?as?tbuy) – twelve;
?astruy (?as?ttruy) –thirteen;
?asc?b?r – fourteen;
?aspan? (?as?tpan?) – fifteen;
?assad – sixteen;
?as?b – seventeen;
?asat (?ashat) – eighteen;
?asnaw – nineteen;
tru?as – thirty;
The segmentation of the voice signal and the acoustic analysis of the recorded realizations were carried out using Praat at the Laboratory of Experimental Phonetics of Institute of Asian and African Studies, Moscow State University. In each syllable nucleus duration (T) was measured as an independent parameter.
The parameters of intensity (I) and pitch frequency (F0) in normal speech are correspondent via subglottal pressure, i.e. with an increase in subglottal pressure both these parameters enhance at the same time (but to a different degree). In vocal speech these parameters are separate, since vocalists are able to control them being independent of each other. We proceed from the fact that the interaction of these three main parameters depends on the prosodic system of the language and in each language occurs in a different way.
In addition to these main parameters, derivative integral parameters were examined. F0-area parameter represents the area of the figure, bounded by the curve and the time axis. This parameter is connected to both: pitch frequency (F0) and duration (T), and consequently increasingly correlates with them. The structure of intensity area (I-area) parameter, which depict the area of the figure under the intensity curve, is similar. Volume (V) parameter is a three-dimensional figure, limited by I-curve, F0-curve and center line (see Fig. 1).
According to our measurements, a table was compiled which was processed by the SPSS statistical package, using a linear multidimensional model. The statistical connection of stress with the above-mentioned parameters of syllable nucleus was verified. Along with the parameters’ absolute values, its’ relative derivatives were considered and calculated in percent ( % %). The maximum value of the parameter within the phonetic word was taken as 100 %. In total 120 syllables were examined (58 stressed and 62 unstressed).
The results of statistical processing showed a highly significant connection between all the parameters and the stress (p < 0,001), which is exceptional for the Iranian languages: in other Iranian languages stress is either quantitative or tonic . So far, the stress marked by all the parameters mentioned was found exclusively in one isolated language – Burushaski. It is worth drawing attention to the fact that in the north of Pakistan all the Wakhi speakers are multilingual, and one of the variants is Burushaski-Wakhi bilingualism. Although we worked with the Tajik Wakhi speakers, it is possible that owing to the migration and interaction, the prosodic system of Burushaski has influenced the prosodic system of the Tajik variant of Wakhi. This problem requires special research.
Fig. 1. Integral parameters of syllable nucleuses: a) representation of F0-area; b) representation of I-area; c) representation of Volume
At this stage, we could draw a conclusion that the stress in the Wakhi language was multicomponent – dynamic, tonic and quantitative. In order to clarify the role of various parameters in Wakhi prosody, the second experiment was planned.
During the second experiment, the speech of three other native speakers of Wakhi (two men and one woman) was recorded at the Laboratory of Experimental Phonetics of IAAS MSU, using hardware and software complex Real-Time EGG in two-channel mode: the microphone signal was captured in the first channel, and the glottographic signal – in the second one. Informants were asked to read out loud the text of the Wakhi folk tale, recorded by T.N. Pakhalina during her field research in 1965 according to Makhmudov Khanjarbek (20 y.o., village Vrang, Tajikistan). After that the phonetic words, in which the syllable nucleuses were in suitable positions and no phonetic process affected the accuracy of the experiment, were selected, with all of them being two-syllable:
a-bu: а – а, bu – two;
arzuq – orzuk, naan bread;
a-ska: а – а, ska – this;
bowar – faith;
dis?vd – show;
k?shun – hang;
pac?n – prepare;
potsho – tsar, king;
p?d-i – legs;
r?ch?n – go;
ruz-i – once;
sayish – you;
s?po – our;
s?wor – on horseback;
shafshish – hair;
tuw?tk – was;
t?sha – provisions;
v?d?k – road;
wizit – come;
w?z?md – bring;
xoli – only, empty;
yaw-?n – he/she has;
?m?t – (he/she) has.
In the work of Hussain Q., Mielke J.  the Pakistan variant of Wakhi was examined using a glottograph. Since we worked with the Tajik variant of Wakhi, this part of the study as well as the first experiment has undoubted novelty. Both idioms can be considered different dialects. Although native speakers of different variants speak the same language, Tajik informants note that they no longer understand the speech of Pakistan Wakhi speakers since they begin to communicate with each other. This fact points to the conclusion that in the Wakhi language local dialects coexist with koine forms. Native speakers are aware of these differences and choose the appropriate register depending on the «friend-or-foe» situation.
Fig. 2. Segmentation of the word bowar ‘faith’ performed by informant M.: intonogram; F0-graph; glottogram; infrasound larynx fluctuations; VLP
Using a glottograph, frequency of vocal cords vibrations (F0) and coefficient of glottis openness (Q – quotient), i.e. ratio of the time of glottis openness to the entire period of vocal cords fluctuation (calculated in % %), are registered. Another definition of quotient is possible but the numerator must be less than the denominator. With that type of phonation when the duration of open position is longer than the duration of closed one, the voice turns hoarse.
The common approach is when vertical larynx position (VLP) is recorded and measured, which allows to assess the vocal tract’s length. The rise of larynx was clearly visible in the glottogram and was measured in relative units in relation to zero line. The laryngeal movements in a vertical plane can be characterized by an average infrasound frequency, which in our case was determined in the range of 7–37 Hz. With such a low frequency, only 1-2 fluctuations within a syllable nucleus could be detected.
Two more parameters of syllable nucleus were added to those in the first experiment: subsonic frequency (Sub) and vertical larynx position (VLP). In total 88 syllables (44 stressed and 44 unstressed) were analyzed. An additional channel of information – a glottogram – significantly increased the accuracy of speech segmentation in comparison to the single-channel recording because when using two-channel recording, the harmonic component and the noise were presented separately.
Fig. 2 represents the implementation of the word bowar ‘faith’, performed by informant M. (woman). The total duration is 0.3978 seconds and is presented on the x-axis. The y-axis measures pitch frequency (F0) in Hz. The F0-graph is presented in the bold dots form. The lower vertical lines that extend from the x-axis indicate the boundaries between segments. The horizontal central line of the glottogram passes above.
The sawtooth curve that crosses it is a glottogram. It is captured by the electrodes, placed around the informant’s throat. When the vocal cords close, the skin’s resistance decreases, which reflects in an upward shift in the curve. Consequently, each «tooth» of the sawtooth curve corresponds to one closing of the vocal cords.
Furthermore, the skin’s resistance diminishes when the larynx moves up. This reflects in large waves of the sawtooth curve. The value of the larynx upward deflection is indicated in the figure by the VLP parameter. The Praat program measures it in relative units. The peaks of the larynx’s upward displacement waves are indicated by vertical lines above the glottogram. These irregular waves correspond to infrasound larynx fluctuations, measured in Hz.
It might also be pointed out that the wavy infrasound curve in the graph is modulated by the frequency of the vocal cords’ vibrations. The microphone signal is represented as an intonogram in the upper part of the Fig. 2.
Similarly to the first experiment, all the absolute parameter values were converted to relative values in percent ( % %). The table obtained was processed by the SPSS statistical package using a multidi-mensional linear model. The presence of syllable stress (0 or 1) was taken as an independent pa-rameter while the other eight parameters were considered dependent.
Statistical analysis has shown that the parameters of duration (T) and pitch frequency (F0) are significantly related to the syllable nucleus’s stress (p < 0,001). The integral parameters F0-area and Volume (V), into which these parameters are jointly included as components, appeared to the same extent substantial. Intensity (I) (p = 0,001) and its’ derivative parameter I-area (p = 0.002) are found to be slightly less relevant. Laryngeal parameters of infrasound frequency (Sub) and vertical larynx position (VLP) appeared insignificant for the prosody (p = 0.838 and p = 0.897, respectively).
Results and conclusion
According to the results of both experiments, the most significant parameters for indicating a stressed syllable appear pitch frequency (F0) and duration (T), which mark stressed syllables in almost all the cases. Intensity (I) and its integral modification (I-area) also rise in the stressed syllable but this occurs less regularly. Statistically significant connection between stress and laryngeal parameters (subsonic frequency (Sub) and vertical larynx position) was not discovered at this stage of the study. Thus, the stress in the Wakhi language can be defined as quantitative-tonic.
The question of how much the intensity factor is contrasted to other acoustic characteristics of stress requires further analysis. This can be done by analysis through speech synthesis followed by listening. In addition, the problem of the connection of laryngeal movements with prosody and intonation also requires further study. In order to do this, it is necessary to consider the participation of such a parameter as Quotient in speech formation.