Research Ideas and Outcomes : Research Article
Research Article
Language evolution is not limited to speech acquisition: a large study of language development in children with language deficits highlights the importance of the voluntary imagination component of language
expand article info Andrey Vyshedskiy
‡ Boston University, Boston, United States of America
Open Access


Did the boy bite the cat or was it the other way around? When processing a sentence with several objects, one has to establish ‘who did what to whom’. When a sentence cannot be interpreted by recalling an image from memory, we rely on the special type of voluntary constructive imagination called Prefrontal synthesis (PFS). PFS is defined as the ability to juxtapose mental visuospatial objects at will. We hypothesised that PFS has fundamental importance for language acquisition. To test this hypothesis, we designed a PFS-targeting intervention and administered it to 6,454 children with language deficiencies (age 2 to 12 years). The results from the three-year-long study demonstrated that children who engaged with the PFS intervention showed 2.2-fold improvement in combinatorial language comprehension compared to children with similar initial evaluations. These findings suggest that language can be improved by training the PFS and exposes the importance of the visuospatial component of language. This manuscript reflects on the experimental findings from the point of view of human language evolution. When used as a proxy for evolutionary language acquisition, the study results suggest a dichotomy of language evolution, with its speech component and its visuospatial component developing in parallel. The study highlights the radical idea that evolutionary acquisition of language was driven primarily by improvements of voluntary imagination rather than by improvements in the speech apparatus.


linguistics, comparative cognition, developmental psychology, evolution of imagination, language evolution, human evolution, recursive language, human language, syntactic language, modern language, neurolinguistics, combinatorial language, language-as-thought-system

Prefrontal Synthesis is an essential component of recursive language

Language cannot be equalled with speech alone. An essential component of language is Prefrontal Synthesis (PFS), which is defined as the process of juxtaposing mental visuospatial objects at will. Consider the two sentences: “The lion carries the monkey” and “The monkey carries the lion.” The two sentences use identical words and the same grammatical structure. Appreciating the delight of the first sentence and the absurdity of the second sentence depends on the visualisation of the scene, that is accomplished by the lateral prefrontal cortex (LPFC) synthesising the mental object of the monkey and the mental object of the lion into a novel picture (hence the name Prefrontal Synthesis or PFS).

The PFS ability is essential to imagine a hybrid object with the head of a lion and body of a human; to predict the outcome of an imaginary event (“The tiger ate the lion. Who is alive?”); to add two two-digit numbers mentally; to imagine yesterday’s football game per friend’s description; and to follow a fairy tale (“…the Shark took a deep breath and, as he breathed, he drank in the Marionette as easily as he would have sucked an egg. Then he swallowed him so fast that Pinocchio, falling down into the body of the fish, lay stunned for a half hour...”) (Collodi 2008). The head-spinning drama of Carlo Collodi’s classic tale is only as good as the mind’s ability to produce an image of a wooden boy trapped inside the belly of a shark.

Full language comprehension depends on the PFS ability. PFS is necessary for grasping the meaning of sentences with spatial prepositions (e.g. “Put the pen {under|on|behind} the table”), time prepositions (e.g. “Touch your nose {before|after} you touch your ear”), passive verb tense (“The boy was defeated by the girl”) and nested sentences (e.g. “John lives below Mary, who lives below Steve”). Nesting in sentences is also called recursion. For this reason, linguists refer to modern human languages (that rely on PFS) as recursive languages.

The majority of people report actively imagining the scenes when reading a fairy tale, but a small minority (~ 0.8% of population) claim a life-long trait in which visual mental imagery is entirely absent, a condition called aphantasia (Dance et al. 2022). The relationship between aphantasia and PFS ability remains unclear. Some aphantasiacs may have normal PFS and deficits in metacognition preventing them to introspect accurately about their thoughts (Flavell 1979, de Vito and Bartolomeo 2016). Other aphantasiacs may indeed have PFS paralysis and corresponding deficits in recursive language comprehension.

Neurology of recursive language

Association of language with Wernicke’s and Broca’s areas is well-known. Less common is the realisation that understanding of the full language depends on the lateral prefrontal cortex (LPFC). Wernicke’s area primarily links words with objects (Friederici 2011), the Broca’s area interprets the grammar and assigns words in a sentence to a grammatical group such as noun, verb or preposition (Friederici 2011), but only the LPFC can synthesise the objects from memory into a novel mental image according to the provided description (Vyshedskiy et al. 2017, Vyshedskiy et al. 2017b). This latter visuospatial function may be called imagination, but a more specific term, Prefrontal Synthesis (PFS), is superior, for it distinguishes this function from other components of imagination, such as simple memory recall, dreaming, spontaneous insight, mental rotation and integration of modifiers, that evolved at different times (Vyshedskiy 2019b).

PFS was hypothesised to be mediated by LPFC-dependent synchronisation of object-encoding neuronal ensembles (Dunn and Vyshedskiy 2015). The scientific consensus is that each familiar object is encoded in the brain by a network of neurons known as a neuronal ensemble (Hebb 1949). The sensory component of each object stored in memory is physically encoded by neurons of the posterior cortex, that was auspiciously named by Christof Koch and colleagues ‘the posterior cortical hot zone’ for its ability to single-handedly generate conscious experience (Koch et al. 2016). When one recalls any object, the object-encoding neuronal ensemble (objectNE) in the posterior cortical hot zone activates into synchronous resonant activity that results in conscious perception of the object (Quiroga et al. 2008). The neuronal ensemble binding mechanism, based on the Hebbian principle “neurons that fire together, wire together,” came to be known as the Binding-by-Synchrony hypothesis (Malsburg 1981, Singer and Gray 1995, Singer 2007). However, while the Hebbian principle explains how we perceive a familiar object, it does not explain the infinite number of novel objects that humans can imagine. To account for the limitless human imagination, it was proposed that synchronization of objectNEs is a general mechanism underlying any novel imaginary experience (the Neuronal Ensembles Synchronisation hypothesis or NES) (Wilson et al. 2011, Dunn and Vyshedskiy 2015, Vyshedskiy and Dunn 2015, Vyshedskiy 2019b). When the synchronisation of objectNEs is driven from the front by the LPFC, we refer to it as the PFS; when the synchronisation is driven from the back, we refer to it as dreaming or hallucination. The synchronisation hypothesis has never been directly tested, but is indirectly supported by several lines of experimental evidence (Rodriguez et al. 1999, Hirabayashi 2005, Uhlhaas and Singer 2006, Sehatpour et al. 2008, Hipp et al. 2011).

The PFS is a component of voluntary imagination. The word “voluntary” is always associated with activity initiated in and controlled by the frontal cortex. Voluntary muscle contraction is initiated in and controlled by the motor cortex (Li et al. 2015), voluntary thinking is initiated in and controlled by the lateral prefrontal cortex (LPFC) (Luria 1980, Duncan et al. 1995, Baker et al. 1996, Christoff and Gabrieli 2013, Fuster 2015, Waltz et al. 2016) and voluntary talking is initiated in and controlled by the Broca’s area in the frontal cortex (Friederici 2011). When activity is initiated outside of the frontal cortex, it is never described as voluntary. In contrast to voluntary muscle contractions, spasmatic skeletal muscle contractions are neither initiated by, nor controlled from the frontal cortex: their origin results from spontaneous action potentials in muscle fibres. Involuntary swearing, observed in patients with expressive aphasia, is initiated by the subcortical structure called basal ganglia (Jay 1999). Involuntary imagery during REM-sleep dreaming is neither initiated nor controlled by the LPFC. The dramatic decrease of blood flow to the LPFC (Braun 1997) and reduction of EEG power in the LPFC (Siclari et al. 2017) demonstrate that LPFC is inactive during sleep: the dreaming hallucinations are the result of spontaneous activation of neuronal ensembles in the posterior cortex.

A stroke affecting the motor cortex commonly results in paralysis of voluntary movement, but cannot prevent involuntary muscle spasms. A stroke in the LPFC often results in paralysis of voluntary imagination, but does not affect dreaming (Solms 1997). Thus, the neurological difference between the voluntary and involuntary imagination is linked to the LPFC: the voluntary imagination is controlled by the LPFC and the involuntary imagination is LPFC-independent.

Voluntary imagination includes multiple neurologically distinct components: integration of colour, integration of size, PFS. The time of acquisition of different voluntary imagination components (Vyshedskiy 2019b) has a direct bearing on language evolution: hominins who could not mentally re-size and re-colour objects, could not use colour and size adjectives; and hominins who could not juxtapose two mental objects, could not have used spatial prepositions.

Prefrontal synthesis and Chomskyan Merge

Chomskyan Merge (Chomsky 2008) is defined linguistically as a combination of any two syntactic objects to create a new one. Importantly, PFS is defined independent of language. Juxtaposing objects in visuospatial mental space does not directly depend on knowledge of any words. An individual does not need to know the names of objects in order to combine them mentally into a novel hybrid object or scene. One can mentally combine objects of strange geometrical shape that do not have names at all.

Neurologically, the Merge operation depends on a broad range of distinct mechanisms. Interpreting a sentence ‘ship sinks,’ can be accomplished via simple memory recall, i.e. by remembering a previously-seen picture of a sinking ship. Memory recall involves activation of a single objectNE in the posterior cortex and only minimally involves the LPFC (Gabay et al. 2016). Combination of an adjective and a noun is a Merge operation that relies on the LPFC ability to modify the activity of a small group of neurons within a single objectNE (Gabay et al. 2016). Combination of two or more nouns with spatial prepositions is a Merge operation that relies on the LPFC ability to synchronise independent objectNEs in the process of PFS (Goodale and Milner 1992, Cohen et al. 1996, Lee et al. 2006, Schendan and Stern 2007, Zacks 2008). These neurological mechanisms are dissociable in psychology tests and also are acquired by children at different ages. To score above 73 on a standardised IQ test, individuals usually have to demonstrate simple memory recall ability; to score above 77, they have to demonstrate the integration of modifiers ability; and to score above 85, they have to demonstrate the PFS ability (Vyshedskiy et al. 2017). Children can understand combinations of verbs and nouns before 2 years of age, learn to integrate an adjective and a noun around three years of age and acquire PFS around four years of age (Vyshedskiy et al. 2020a). In other words, the Merge operation is not a unitary all-or-none ability, but an assembly of several skills that rely on neurologically distinct mechanisms that differ between individuals (Martins and Boeckx 2019, Benítez-Burraco et al. 2021).

Therefore, it is impossible to describe PFS in terms of the Merge operation. PFS, defined as deliberate visuospatial juxtaposition of mental objects, is mediated by a single neurological mechanism: synchronisation of objectNEs. The Merge operation employs the neurological process of PFS for some functions, but many of the Merge operations rely exclusively on simpler neurological mechanisms: simple recall, categorically-primed spontaneous imagination, integration of modifiers etc. (Vyshedskiy 2019b). The overly broad definition of the Merge makes it useless for the neurological discussion of language evolution, as different visuospatial mechanisms underlying the Merge were acquired at different times phylogenetically and are also developing at different age ontogenetically (Vyshedskiy 2019b.)

Dissociation of PFS and articulate speech in patients with brain damage

Patients with damage to the LPFC (Waltz et al. 2016) or the frontoposterior fibres (Skeide et al. 2015) or to the posterior cortical hot zone (Dragoy et al. 2017) (where the sensory objectNEs are encoded) often experience PFS paralysis (Fig. 1). A distinguished neuroscientist Joaquin Fuster calls their condition “prefrontal aphasia” (Fuster 2015) and a renowned psychologist Alexander Luria “semantic aphasia” (Luria 1970). Fuster explains that “although the pronunciation of words and sentences remains intact, language is impoverished and shows an apparent diminution of the capacity to ‘prepositionize.’ The length and complexity of sentences are reduced. There is a dearth of dependent clauses and, more generally, an under-utilisation of what Chomsky characterises as the potential for recursiveness of language” (page 194). Luria reports that “these patients had no difficulty grasping the meaning of complex ideas, such as ‘causation,’ ‘development’ or ‘cooperation’. They were also able to hold abstract conversations. But difficulties developed when they were presented with complex grammatical constructions which coded logical relations. ... Such patients find it almost impossible to understand phrases and words which denote relative position and cannot carry out a simple instruction like ‘draw a triangle above a circle’” (Luria 1970) (page 45).

Figure 1.  

The “high-speed” connections between the front (marked as Lateral Prefrontal Cortex) and the back of the brain (marked as Posterior Cortex), such as arcuate fasciculus and superior longitudinal fasciculus, mediate voluntary imagination and combinatorial language comprehension. The connections are marked Frontoposterior connections.

There is no established term for ‘PFS paralysis’ in the English-speaking literature. Henry Head, English neurologist, first identified this condition in aphasiacs in 1920 and named it “semantic aphasia” (Head 1920). Nordic, Spanish-speaking countries and Russia adopted “semantic aphasia” to describe this condition. However, in English-speaking countries, the semantic aphasia term is used to describe a deficit in understanding word meanings. It is a very different condition stemming from damage to the Wernicke’s area. Thus, in English-speaking countries, semantic aphasia means a difficulty on the word level, while in Nordic, Spanish-speaking countries and Russia, it means a difficulty on the sentence level. The naming uncertainty results in clinical confusion, scientific misunderstanding and scarcity of research on this condition (Dragoy et al. 2017). To resolve this confusion, we suggest calling this condition ‘PFS paralysis.’ PFS paralysis also makes greater semantic sense than the aphasia term, since aphasia is translated from Greek as “speechless” and these patients often experience no speech deficit, but the visuospatial combinatorial deficit.

Acquisition of PFS in children

Typically developing children acquire PFS between the ages of 3 and 4 years (Vyshedskiy et al. 2020). Atypically developing children often struggle with PFS acquisition. In developmental psychology this problem is traditionally described as stimulus overselectivity, tunnel vision or lack of multi-cue responsivity (Lovaas et al. 1979, Schreibman 1988, Ploog 2010). Affected children have difficulty accomplishing seemingly trivial tasks, such as an instruction to “pick up a blue straw that is under the table,” which requires them to combine three different features i.e. the object itself (straw), its colour (blue) and its location (under the table). These children may “over-select” the word “straw” and ignore both its location and the fact that it should also be blue, therefore picking up any available straw; alternatively, they can “over-select” on the colour, therefore picking up any blue object. (The name of this phenomenon is erroneous. It is not that a child “over-selects” any single feature, rather it is the failure of mental integration. In other words, it is not an attention or focus problem (Vyshedskiy et al. 2020), but paralysis of voluntary imagination.)

Failure to acquire PFS results in a lifelong inability to understand recursive language, including spatial prepositions, time prepositions, fairytales (that require the listener to imagine unrealistic situations) and recursion (here and later, recursion is used to refer to sentence level recursion only as in this example: “John lives below Mary, who lives below Steve”). Amongst individuals diagnosed with Autism Spectrum Disorder (ASD), the prevalence of lifelong PFS paralysis is 30 to 40% (Fombonne 2003) and can be as high as 60% amongst children enrolled into special ASD schools (Vyshedskiy et al. 2020). These individuals are frequently referred to as having low-functioning ASD. They usually exhibit full-scale IQ below 70 (Beglinger and Smith 2001; Boucher et al. 2008) and typically perform below the score of 85 in non-verbal IQ tests (Boucher et al. 2008).

Accordingly, ASD children with language deficits could serve as a proxy for early hominins who were not exposed to recursive language (Murphy and Benítez-Burraco 2017). Would visuospatial PFS exercises improve their language? To answer this question, we have conducted a study that had both humanitarian and scientific goals. The humanitarian goal was to improve language in individuals with ASD. The scientific goal was to investigate language acquisition in early hominins.

Voluntary imagination exercises are associated with improvement of combinatorial language in children with autism

We hypothesised that language in ASD children could be significantly improved with voluntary imagination exercises. Accordingly, we developed voluntary imagination exercises, organised them into an application and provided this application to ASD children ages 2 to 12 years (Dunn and Vyshedskiy 2015, Dunn et al. 2017a, Dunn et al. 2017a, Dunn et al. 2017b, Vyshedskiy et al. 2018).

This application includes both non-verbal and verbal gamified exercises. Non-verbal activities aim to provide voluntary imagination training visually through implicit instructions. For example, a child can be presented with two separate images: that of a train and a window pattern. The task is to mentally integrate the train and the window pattern and to match the result of integration to the picture of the complete train positioned amongst several incorrect trains. The child is encouraged to avoid trial-and-error, focusing instead on integrating separate train parts mentally, thus training voluntary imagination. Different games use various tasks and visual patterns to keep the child engaged. Verbal activities train the same voluntary imagination ability by using higher forms of language, such as noun-adjective combinations, spatial prepositions, recursion and syntax. For example, a child can be instructed to put the cup {behind|in front of|on|under} the table or take animals home following an explanation that the lion lives above the monkey and under the cow. In every activity, a child listens to a short story and then works within an immersive interface to generate an answer. Correct answers are rewarded with pre-recorded encouragement and animations.

In a 3-year clinical study of 6,454 ASD children, children who engaged with voluntary imagination exercises showed 2.2-fold greater combinatorial language comprehension improvement and 1.4-fold expressive language improvement than children with similar initial evaluations (Vyshedskiy et al. 2020b). This difference was statistically significant: p < 0.0001 and p = 0.0144, respectively. No statistically significant change was detected in other subscales not targeted by the exercises (Fig. 2). The complete methods and the discussion of results can be found in Vyshedskiy et al. 2020b.

Figure 2.  

Longitudinal plots of subscale scores LS Means. Horizontal axis shows months from the 1st evaluation (0 to 36 months). Error bars set at 95% confidence interval. To facilitate comparison between subscales, all vertical axes ranges have been normalised to show 35% of their corresponding subscale’s maximum available score. A lower score indicates symptoms improvement. P-value is marked: ***< 0.0001; **< 0.001; *< 0.05. (A) Receptive Language score. (B) Expressive Language score. (C) Sociability score. (D) Cognitive awareness score. (E) Health score. The test group included study participants who completed more than one thousand PFS exercises and made no more than one error per exercise. The control group was selected from the rest of participants by a matching procedure. Each test group participant was matched to the control group participant by age, gender, expressive language, receptive language, sociability, cognitive awareness and health score at 1st evaluation using propensity score analysis. The complete methods and the discussion of results can be found in Vyshedskiy et al. 2020b, from which the figure (which is available under a Creative Commons Attribution 4.0 licence) is reproduced.

These findings suggest that language may be improved by training voluntary imagination and exposes the importance of the voluntary imagination in language evolution.

Evolution of voluntary imagination

The LPFC is smaller in apes and the frontoposterior fibres (such as arcuate fasciculus) mediating all aspects of voluntary imagination in humans are much smaller or absent in apes (Rilling et al. 2008). Thus, it is not surprising that PFS has never been demonstrated in non-human animals. Even simpler components of voluntary imagination (Vyshedskiy et al. 2017), such as integration of modifiers, seem to be out of reach for animals (Yang et al. 2017). Animals which know the names of objects, colours and sizes are not capable of integrating colour, size and objects together – they are incapable of finding “a large red Lego” amongst multi-coloured, multi-sized pieces of Lego, crayons and pencils.

Evolutionary improvement of voluntary imagination can be followed by looking at the stone tools evolution (Fig. 3). According to Ian Tattersall, stone tools manufacturing demanded “a mental template in the mind of the toolmaker that determined the eventual form of the tool” (Tattersall 1999). This “mental template” must have been created voluntarily by a toolmaker, based on the unique features of each cobble. Thus, the quality of manufactured stone tools provides a window into the voluntary imagination abilities of our ancestors.

Figure 3.  

Evolution of stone tool culture. Chimpanzees make use of cobbles to break nuts, but they do not modify them. Homo habilis was one of the earliest hominin species that intentionally modified cobbles to manufacture the crude, Mode One choppers. Homo habilis was only able to break out large flakes from a cobble; its voluntarily control of its mental template was quite crude. Homo erectus, on the other hand, was able to break off much smaller flakes and produce the fine, symmetrical, Mode Two handaxes; therefore, Homo erectus was most likely capable of finer voluntary control of its mental template. (Ape reproductions as photographed by the author at the evolution exhibit the Valladoki Science Museum, Spain.)

Apes do not manufacture stone tools in the wild and attempts to teach stone tools manufacturing to apes have failed (Toth et al. 1993), suggesting that this ability was acquired after humans split from the chimpanzee line 6 million years ago. The first stone tools, Mode One choppers, dated to about 3.3 (Harmand et al. 2015) to 2.5 (Semaw et al. 1997) million years ago (ya) are crude and asymmetrical. Starting from about 2 million ya, hominins were capable of manufacturing fine symmetrical Mode Two handaxes with a long cutting edge (Klein 2009). Neanderthals manufactured even better Mode Three Mousterian tools found in the archaeological record from about 0.4 million ya (Klein and Edgar 2002). It is likely that the main reason for stasis in each stone tools culture was not the inability to find proper materials or inferior hand dexterity (Crast et al. 2009), but limitation in voluntary imagination. Hominins who could not imagine the final tool could not manufacture it either. If the quality of stone tools is informing us of the LPFC ability to control their mental template, then stone tools provide a time record of voluntary imagination gradually improving in hominins over the last 3.3 million years.

Speech and voluntary imagination could have been acquired separately

The two components of language – articulate speech and the voluntary imagination – are mediated by different cortical areas and, therefore, it is possible that the two processes have evolved separately. It has been hypothesised that the visuospatial control by the LPFC evolved in response to the predation pressure (Isbell and Etting 2016, Vyshedskiy 2021). As fighting off larger and stronger felines was impossible, the only option for hominins travelling from site to site to collect food and water was early identification of predators. Big cats favour an unexpected attack (Hart and Sussman 2018). If detected by prey from a distance, the feline often abandons the hunt and moves to a new location (Turkel and Dunbar 1999). In felines-infested savannah, early identification and harassment of big cats by throwing rocks and sticks was the only path to safe food foraging. However, it is notoriously hard to detect a camouflaged motionless feline crouching under the cover of tall savannah grasses. Hominins’ survival in savannah depended on their ability to distinguish a feline from the background – the function of the LPFC control over the visual cortical areas of the posterior cortex. Thus, it is likely that predation from camouflaged motionless felines was driving enlargement of the LPFC and its frontoposterior connections and the resulting improvement of the visuospatial control by the LPFC, i.e. voluntary imagination.

The evolutionary pressure for improvement of the speech apparatus likely came from a different and independent source. Speech apparatus evolution was hypothesised to be the result of hundreds of mutations, each of which incrementally improved articulation ability by enhancing the control of the diaphragm, lips, tongue, chicks, vocal cords, larynx position in the trachea and so on (Vyshedskiy 2021). The first mutation that improved articulation could have increased the number of distinct vocalisations from around 40 words, as in chimps (Goodall 1965, Mitani et al. 1992, Slocombe and Zuberbühler 2007, Slocombe et al. 2008) to 100 words. After many generations, a second mutation could have doubled vocabulary to 300 words. Thousands of years later another mutation may have extended the vocabulary to 600 distinct words and so on. Greater vocabulary of a tribe leader must have improved his survival chances by increasing food procurement through better organisation, job assignment and social adhesion, which was critically important for hominins, who were regularly moving from one place to another and needed to find a protective shelter, edible food, a source of clean water and a myriad of other things in each new place (Bramble and Lieberman 2004) (Homo erectus was moving so much that the species diffused out of Africa and settled in most of Europe and Asia starting around 1.8 million years ago (Carbonell et al. 1995, Broadfield et al. 2001, Lordkipanidze et al. 2013).) Even if no one else in the group, but the leader was able to call each person by name, generate organisational calls and assign jobs without the need to point to each object, both the leader and the tribe would have gained an advantage. Two-word sentences could communicate job assignment: “John flint,” meaning that John is expected to collect flint stones; “Peter sticks,” meaning that Peter is expected to find sticks; “Patrick tubers,” meaning that Patrick is expected to dig tubers; and so on. The leader could also instruct the selected workers in what to take with them: handaxes for cutting trees, spears for hunting or a sack for carrying throwing stones back to the shelter. Critically, such a communication system with many nouns does not rely on voluntary constructive imagination. In fact, apes, dogs and some other animals can learn hundreds of nouns (Cuaya et al. 2022).

When articulate speech mutations originate in a leader, they result in immediate improvement in communication, albeit one-way communication from the leader to tribe members and, consequently, increase tribe’s productivity and the leader’s survival chances. As an alpha male, the leader would have a high number of children and, thus, his “improved vocal apparatus” mutation would have been fixed in a population.

Thus, articulate speech could have developed separately from voluntary imagination: their evolutionary driving forces could have been different and hundreds of mutations associated with improvement of each function could have been independent.

When was speech acquired by hominins?

There is general consensus that articulate speech was acquired from 2 million to 600,000 ya (Conde-Valverde et al. 2021). Dediu and Levinson cite five lines of converging evidence pointing to acquisition of modern speech apparatus by 600,000 ya (Dediu and Levinson 2013):

  1. the changes in hyoid bone,
  2. the flexion of the bones of the skull base,
  3. increased voluntary control of the muscles of the diaphragm,
  4. anatomy of external and middle ear and
  5. the evolution of the FOXP2 gene.

1. The changes in hyoid bone. This small U-shaped bone lies in the front of the neck between the chin and the thyroid cartilage. The hyoid does not contact any other bone. Rather, it is connected by tendons to the musculature of the tongue and the lower jaw above, the larynx below and the epiglottis and pharynx behind. The hyoid aids in tongue movement used for swallowing and sound production. Accordingly, phylogenetic changes in the shape of the hyoid provide information on the evolution of the vocal apparatus.

The hyoid bone of a chimpanzee is very different from that of a modern human (Frayer 1999). The australopith hyoid bone discovered in Dikika, Ethiopia and dated to 3.3 million ya closely resembles that of a chimpanzee (Alemseged et al. 2006). The Homo erectus hyoid bone recovered at Castel di Guido, Italy and dated to about 400,000 ya reveals the “bar-shaped morphology characteristic of Homo, in contrast to the bulla-shaped body morphology of African apes and Australopithecus” (Capasso et al. 2008). Neanderthal hyoids are essentially identical to that of a modern human in size and shape: these have been identified in Kebara, Israel (Arensburg et al. 1989) and El Sidrón, Spain (Rodríguez et al. 2003). At the same time, these are also identical to hyoid of Homo heidelbergensis from Sima de los Huesos, Spain (Martınez et al. 2008) suggesting that the latter was a direct ancestor of both Homo neanderthalensis and Homo sapiens and had already possessed a nearly modern hyoid bone (D’Anastasio et al. 2013, Dediu and Levinson 2013). The similarities between Neanderthal and modern human hyoid make it likely that the position and connections of the hyoid and larynx were also similar between the two groups.

2. The flexion of the bones of the skull base. Laitman (Laitman and Reidenberg 1988) has observed that the roof of the vocal tract is also the base of the skull and suggested that evolving vocal tract is reflected in the degree of curvature of the underside of the base of the skull (called basicranial flexion). The skull of Australopithecus africanus dated to 3 million ya shows no flexing of the basicranium, as is the case with chimpanzees (Laitman and Heimbuch 1982). The first evidence of increased curvature of the base of the basicranium is displayed in Homo erectus from Koobi Fora, Kenya, 1.75 million ya (Laitman et al. 1979). A fully flexed, modern-like, basicranium is found in several specimens of Homo heidelbergensis from Ethiopia, Broken Hill 1 and Petralona from about 600,000 ya (Laitman and Reidenberg 1988).

3. Increased voluntary control of respiratory muscles. Voluntary cortical control of respiratory muscles is a crucial prerequisite for complex speech production (MacLarnon and Hewitt 1999). Greater cortical control is associated with additional innervation of the diaphragm, that can be detected in fossils as an enlarged thoracic vertebral canal. Homo erectus from 1.5 million ya (Turkana Boy) has no such enlarged canal, but both modern humans and Neanderthals do (Dediu and Levinson 2013), providing converging evidence for acquisition of modern-like vocal apparatus by 600,000 ya.

4. The anatomy of the external and middle ear. Modern humans show increased sensitivity to sounds between 1 kHz and 6 kHz and particularly between 2 kHz and 4 kHz. Chimpanzees, on the other hand, are not particularly sensitive to sounds in this range (Martínez et al. 2013). Since species using complex auditory communication systems tend to match their broadcast frequencies and the tuning of perceptual acuity (Kojima 1990), it was argued that changes in the anatomy of the external and middle ear in hominins are indicative of the developing speech apparatus. Data from several Neanderthal and Homo heidelbergensis fossils indicate a modern-human-like pattern of sound perception with highest sensitivity in the region around 4 kHz, that is significantly different from that of chimpanzees (Quam and Rak 2008, Martínez et al. 2013).

5. The evolution of the FOXP2 gene. The most convincing evidence for the timing of the acquisition of the modern speech apparatus is provided by DNA analysis. The FOXP2 gene is the first identified gene that, when mutated, causes a specific language deficit in humans. Patients with FOXP2 mutations exhibit great difficulties in controlling their facial movements, as well as with reading, writing, grammar and oral comprehension (Vargha-Khadem et al. 1995). The protein encoded by the FOXP2 gene is a transcription factor. It regulates genes involved in the production of many different proteins. The FOXP2 protein sequence is highly conserved. There is only one amino acid difference in the chimpanzee lineage going back some 70 million years to the common ancestor with the mouse (Haesler 2007). The FOXP2 proteins of chimpanzee, gorilla and rhesus macaque are all identical. This resistance to change suggests that FOXP2 is extraordinarily important for vertebrate development and survival. Interestingly, there is a change of two amino acids in FOXP2 that occurred over the last 6 million years, during the time when the human lineage had split off from the chimpanzee. These two amino acid substitutions predate the human-Neanderthal split. Both amino acid substitutions were found in two Neanderthals from Spain (Krause et al. 2007), as well as in Neanderthals from Croatia (Green et al. 2010) and in Denisovans, an extinct Asian hominin group related to Neanderthals (Reich et al. 2010). This indicates that Homo heidelbergensis, the common ancestor of Homo sapiens and Neanderthals, already had the two “human specific” amino acid substitutions. Despite evidence of possible further evolution of FOXP2 in Homo sapiens (Maricic et al. 2012), the comparatively fast mutation rate of FOXP2 in hominins indicates that there was strong evolutionary pressure on development of the speech apparatus before Homo sapiens diverged from Neanderthals over 500,000 ya (Green et al. 2008).

Conclusions on acquisition of articulate speech. Based on these five lines of evidence — the structure of the hyoid bone, the flexion of the bones of the skull base, increased voluntary control of the muscles of the diaphragm, anatomy of external and middle ear and the FOXP2 gene evolution — most paleoanthropologists conclude that the speech apparatus experienced significant development starting with Homo erectus about 2 million ya and that it reached modern or nearly modern configurations in Homo heidelbergensis about 600,000 year ago (Tattersall 1999, Dediu and Levinson 2013). Dediu and Levinson wrote: “there is ample evidence of systematic adaptation of the vocal apparatus to speech and we have shown that this was more or less in place by half a million ya” (Dediu and Levinson 2013). We will never know the extent of Homo heidelbergensis neurological control of their speech; however, considering that chimpanzee communication system already has 20 to 100 different vocalisations (Goodall 1965, Mitani et al. 1992, Slocombe and Zuberbühler 2007, Slocombe et al. 2008), it is likely that the modern-like remodelling of the vocal apparatus in Homo heidelbergensis extended their range of vocalisations by orders of magnitude. In other words, by 600,000 ya, the number of distinct verbalisations used by hominins for communication could have been on par with the number of words in modern languages.

When was prefrontal synthesis acquired?

When was PFS, the most advanced component of voluntary imagination mechanisms, acquired by hominins? Voluntary imagination was improving slowly in our ancestors over the last 3.3 million years as revealed by the changing quality of stone tools (Vyshedskiy 2019b). Gradual accretion of ‘symbolic artifacts’ over the last several hundred thousand years (use of pigments – presumably in body decoration (Zilhão et al. 2010), perforated shells (Zilhão et al. 2010), intentional burials (Klein 2009) - further support the notion of developing voluntary imagination and symbolic thinking. However, symbolic thinking is not congruent to PFS. PFS is not necessary for using an object as a symbol. For example, the use of red ochre may be highly symbolic due to its association with blood and battles. However, this association may be entirely based on memory. Memory recall and spontaneously formed imagery do not rely on PFS (Vyshedskiy 2019b) and, therefore, use of red ochre is not an indication of the PFS abilities in hominins. Similarly, personal ornaments, such as perforated shells (Henshilwood et al. 2004, d'Errico et al. 2005, Bouzouggar et al. 2007, Zilhão et al. 2010, Sehasseh et al. 2021), could have been used as symbols of social power. However, neither their manufacturing nor their use require voluntary mental juxtaposition of two independent objectNEs (i.e. PFS). The line marks on stones and shells (Henshilwood et al. 2009), as well as geometrical figures and hand stencils painted on cave walls are undoubtedly associated with general improvement in the LPFC function in their creators, but there is not a single artifact dated before 70,000 ya that could not have been manufactured without the PFS ability.

What artifacts unambiguously signify acquisition of PFS?

1) Composite figurative arts. Depiction of composite objects that do not exist in nature provides undeniable evidence of PFS. These composite objects must have been imagined by the artists by first mentally synthesising parts of two independent mental objects together and then executing the product of this mental creation in ivory or other material.

2) Bone needles with an eye. Bone needles are used for stitching clothing. To cut and stitch an animal hide into a well-fitting garment, one needs first to mentally simulate the process, i.e. imagine how the parts can be combined into a finished product that fits the body. Such mental simulation is impossible without PFS.

3) Construction of dwellings. An integral part in construction of a dwelling is visual planning, which relies on the mental simulation of all the necessary construction steps, which is impossible without PFS.

4) Religious beliefs. An individual without PFS cannot be induced into believing in spirits, as they cannot understand a description of gods, cyclops, mermaids or any other hybrid creatures. Therefore, religious beliefs and beliefs in the afterlife are the ultimate manifestations of PFS. The origin of religious beliefs can be traced by following the evidence of beliefs in the afterlife. Beliefs in the afterlife, in turn, are thought to be associated with adorned burials. Hence, the development of religious beliefs may be inferred by studying the time period when humans started to bury their deceased in elaborate graves with accompanying “grave goods.”

The PFS hypothesis can be rejected if these four types of artifacts appear in the archaeological record at different times: if composite figurative arts appeared in the archaeological record 100,000 years before bone needles with an eye, that would indicate that their manufacturing is not associated with the same underlying cognitive ability. Conversely, the PFS hypothesis would be strengthened if all four types of artifacts were associated with each other in time and geography. Let us look at the archeological evidence.

1. Composite figurative objects. Multiple composite objects appear in the archaeological record around 40,000 ya. The Lowenmensch (“lion-man”) sculpture excavated from the caves of Lone valley in Germany was dated to 39,000 years ago (Dalton 2003) (Fig. 4). The hunting scene depicting part humans part animal from the limestone cave of Leang Bulu’ Sipong 4 (Sulawesi, Indonesia) was dated to 44,000 ya (Aubert et al. 2019). A bird-man from Lascaux was dated to 32,000 ya. A lion-woman from Chauvet was dated to 30,000 ya. The engraving of a bird-horse-man from Hornos de la Peña was dated to 18,000 ya. These composite objects provide direct evidence that by 44,000 years ago humans were capable of PFS.

Figure 4.  

“Lion-man”, statuette carved of mammouth-tusk. Site: Hohlenstein-Stadel-cave, Germany, dated to 39,000 years ago (ya), Inv. Ulmer Museum Prä Slg. Wetzel Ho-St. 39/88. Photo Thomas Stephan © Ulmer Museum, Ulm, Germany. Used with permission.

2. Bone needles with an eye. Earliest bone needles are dated to 61,000 years ago (Backwell et al. 2008) and they provide the unambiguous indication of PFS. Pre-PFC hominins were also processing animal hides, but they likely wore them like a blanket. PFS enabled stitching animal hides into well-fitting clothing.

This time period was also marked by the arrival of bow-and-arrow and musical instruments. The earliest quartz-tipped arrows have been dated to about 64,000 years ago (Lombard 2011). The oldest flute was discovered at Divje Babe in Slovenia and dates back to about 43,000 years ago. It is made out of the femur of a juvenile cave bear, with several holes. The next oldest flute was found in the Geißenklösterle cave and dates back to 42,000-43,000 years ago (Higham et al. 2012). The five-holed flute made from the wing bone of a vulture dates back to 35,000 years ago and was discovered in Hohle Fels Cave near Ulm, Germany (Higham et al. 2012). More flutes were found in the Geißenklösterle Cave in southern Germany: one made from a mammoth tusk (dating back to 37,000-30,000 years ago) and another one made from swan bones (dating back to about 36,000 years ago).

3. Construction of dwellings. There is little evidence of hominins constructing dwellings or fire hearths until the arrival of Homo sapiens. While Neanderthals controlled the use of fire, their hearths were usually very simple: most were just shallow depressions in the ground. There is almost a complete lack of evidence of any dwelling construction at this period (Kolen 1999). Conversely, the arrival of Homo sapiens is marked by a multitude of constructed structures including stone-lined and dug-out fireplaces, as well as unambiguous remains of dwellings, which all flourished starting around 30,000 years ago. These include foundations for circular hut structures at Vigne-Brune (Villerest) in eastern France, dating back to 27,000 years ago (Mellars 1996); postholes and pit clusters at a site near the village of Dolní Věstonice in the Czech Republic, dating back to 26,000 years ago (Verpoorte 2000) and mammoth bone structures at Kostienki, Russia and Mezirich, Ukraine (Holliday et al. 2007).

4. Religious beliefs. The oldest known human burial, dated at 500,000 years ago and attributed to Homo heidelbergensis, was found in the Sima de los Huesos site in Atapuerca, Spain and consists of various corpses deposited in a vertical shaft (Arsuaga et al. 1997). A significant number of burials are also associated with Neanderthals: La Chapelle-aux-Saints, La Ferrassie and Saint-Cesaire in France; Teshik-Tash in Uzbekistan; Shanidar Cave in Iraq (Delson 2004). These early burials, however, completely lack the “grave goods” that would indicate the belief in an afterlife (Tattersall 1999).

Human skeletal remains that were intentionally stained with red ochre were discovered in the Skhul and Qafzeh Caves, in Levant and dated to approximately 100,000 years ago (Bar-Yosef Mayer et al. 2009). One of the burials contains a skeleton with a mandible of a wild boar, another contains a woman with a small child at her feet and yet another one contains a young man with a possible offering of deer antlers and red ochre (McCown 1940). While these burials are clearly intentional, whether or not they indicate the belief in an afterlife is uncertain. The ochre by itself is inconclusive evidence. For example, ochre could have been used during lifetime (e.g. to protect skin from insects (Horváth et al. 2019) and the deceased could have been buried still bearing the ochre marks. The small number of “offerings” found in these burial sites may have simply been objects that fell into the burial pit accidentally. In any case, there is not enough conclusive evidence from these early burials to judge the occupants’ beliefs in an afterlife.

The number of known adorned burials and the sophistication of the offerings significantly increased around 40,000 years ago. To date, over one hundred graves of Homo sapiens have been discovered that date back to the period between 42,000 and 20,000 years ago (Giacobini 2016). In many cases several bodies were interred in a single grave. Burial offerings were commonplace and ochre was used abundantly. Examples include: a burial in Lake Mungo, Australia, dating back to 42,000 years ago (Habgood and Franklin 2008); an elaborate burial in Sungir, Russia that includes two juveniles and an adult male wearing a tunic adorned with beads and carefully interred with an astonishing variety of decorative and useful objects, dating back to 30,000 years ago (Pettitt and Bader 2015) (Fig. 5); a grave in Grimaldi, Italy, which contains the remains of a man and two adolescents along with burial offerings from around 40,000 years ago (Giacobini 2016); and a site in Dolni Vestonice, in the Czech Republic where a woman was buried between two men and all three skulls were covered in ochre dating back to 28,000 years ago (Klima 1987).

Figure 5.  

An elaborate burial of a 60-year-old found in Sungir, Russia. The man is wearing bracelets, necklaces, pendants and a tunic adorned with thousands of mammoth-ivory beads. Two juvenile burials were found at the same site. The site and the skeletons date back to 30,000 ya (Pettitt and Bader 2015). Photo José-Manuel Benito Álvarez [Public domain] .

Conclusions from paleontological evidence. Multiple types of archaeological artifacts unambiguously associated with PFS appear simultaneously around 65,000 ya in multiple geographical locations. This abrupt change in archaeological artifacts’ quality indicating modern imagination has been characterised by paleoanthropologists as the “Upper Paleolithic Revolution” (Bar-Yosef 2016), the “Cognitive revolution” (Harari 2014) and the “Great Leap Forward” (Diamond 2014). Notably, it coincides with migration out of Africa 65,000 ya (detected by mitochondrial DNA (Zhivotovsky et al. 2003, Soares et al. 2009). The genetic bottleneck that has been detected around 70,000 ya (Amos and Hoffman 2009) is consistent with the “founder effect” of a few individuals who acquired the PFS and spread their genes by eliminating other contemporaneous males with the use of PFS-enabled stratagem and newly-developed weapons, such as the bow-and-arrow. (We note that the notion of Upper Paleolithic Revolution, recently became unpopular amongst evolutionary researchers (Kissel and Fuentes 2018). The alternative hypothesis explains the abrupt change in archaeological artifacts’ quality 70,000 ya by the fact that items closer in time are better conserved and complex artifacts have a strong cultural component that builds up over time. The proponents of this hypothesis, however, do not appreciate the neurological difference between PFS and other components of voluntary imagination and, as a result, do not differentiate symbolic artifacts (such as perforated shells) from PFS artifacts (such as lion-man, bone needles with an eye and “grave gods.”)

Additional evidence of PFS acquisition by humans migrating out of Africa 65,000 ya is provided by a significant change in hunting strategy. Without PFS, one cannot envision the building of an animal trap, for example, pitfall trap, which requires digging a deep pit and camouflaging it with twigs and branches. While Neanderthals hunted large animals, such as mammoths, they were not using traps or stratagem. The high frequency of bone fractures found in Neanderthal skeletons, especially in the ribs, femur, fibulae, spine and skull, suggests that their primary hunting technique has been to use thrusting spears (Klein 2009) in an attempt to stab their prey (Tattersall 1999). The demise of the Pleistocene megafauna by Homo sapiens after 70,000 ya (Barnosky et al. 2004, Smith et al. 2018) is likely associated with the invention of animal trapping. PFS aids trap building in three ways. First, a leader can use PFS to mentally simulate multiple ways to build a trap. Second, a leader could use PFS to think through the step-by-step process of building a trap. Finally, a leader could communicate the plan to the tribe: “We will make a trap by digging a large pit and covering it with tree branches. A mammoth will then fall into the pit; no need to attack a mammoth head on”. In fact, early modern humans are known for building traps; traps for herding gazelle, ibex, wild asses and other large animals were found in the deserts of the Near East. Some of the traps were as large as 60 km (37 miles) in length (Holzer et al. 2010). Funnel-shaped traps comprising two long stone walls (up to 60 kilometres in length!) converged on an enclosure or pit at the apex. Animals were probably herded into the funnel until they reached the enclosure at the apex surrounded by pits, at which point the animals were trapped and killed. Some traps date back to as early as the 7th millennium BC (Holzer et al. 2010). The building process must have been pre-planned by a tribe leader (or several leaders) and then explained to all the workers. Each worker, in turn, would have had to understand exactly what they needed to do: collect proper stones, assemble stones into a wall and have the two walls meet at the apex 60 km away from where they started. The correlation of human migration with the demise of the Pleistocene megafauna is consistent with PFS that would have enabled mental planning of sophisticated attack strategies with the use of animal traps (Holzer et al. 2010).

Furthermore, trapping large animals must have provided a significant boost to our ancestors’ diet and set their population growth on to an exponential trajectory. In fact, both the extent and the speed of colonisation of the planet by Homo sapiens 70,000 to 65,000 years ago are unprecedented. Our ancestors quickly settled in Europe and Asia and crossed open water to Andaman Islands in the Indian Ocean by 65,000 years ago (Macaulay et al. 2005) and Australia as early as 62,000 years ago (Thorne et al. 1999). Abrupt appearance of the four types of unambiguous PFS archaeological evidence (composite figurative arts, bone needles with an eye, constructed dwellings and grave gods), change of hunting strategy to animal trapping, dramatic rise of human population, crossing open water to Andaman Islands and Australia and the genetic bottleneck detected 70,000 ya are consistent with acquisition of PFS by several individuals 70,000 ya (Vyshedskiy 2019a) and disease-like spread of modern imagination thereafter.

Non-recursive communication system in pre-PFS hominids is counter-intuitive

If PFS was acquired around 70,000 ya and articulate speech was acquired before 600,000 ya, there must have been at least half a million year interval when hominins were using non-recursive communication systems. Visualising a pre-PFS hominin from before 70,000 ya is extremely counterintuitive. Students tend to imagine an ape, which has learned several thousand words, gained an ability to generate articulate sounds, acquired control over their impulses and improved their imagination. A better way to visualise a pre-PFS hominin is to imagine a modern human with a LPFC lesion that resulted in PFS paralysis. Waltz et al. has demonstrated that these individuals can perform many voluntary imagination tasks, such as integration of modifier and mental rotation, but fail precipitously in visuospatial and verbal relational questions that require PFS (Waltz et al. 2016). They have good crystallised intelligence, normal memory, normal articulate speech, normal ability to abstract and generalise, can be pleasant and inviting, but have their IQ ≤ 85, because they cannot answer PFS questions like “The girl is taller than the boy. The monkey is taller than the girl. Who is the shortest?” (Waltz et al. 2016). Luria explains that “...patients with this type of lesion have no difficulty articulating words. They are also able to retain their ability to hear and understand most spoken language. Their ability to use numerical symbols and many different kinds of abstract concepts also remains undamaged... these patients had no difficulty grasping the meaning of complex ideas such as ‘causation,’ ‘development’ or ‘cooperation’. They were also able to hold abstract conversations. ... They can repeat and understand sentences that simply communicate events by creating a sequence of verbal images” (Cole et al. 2014). Luria further explains that their disability shows only when patients have to imagine several objects or persons in a novel combination (revealing the problem of PFS): “But difficulties developed when they were presented with complex grammatical constructions which coded logical relations. ... Such patients find it almost impossible to understand phrases and words which denote relative position and cannot carry out a simple instruction like ‘draw a triangle above a circle.’ This difficulty goes beyond parts of speech that code spatial relations. Phrases like ‘Sonya is lighter than Natasha’ also prove troublesome for these patients, as do temporal relations like ‘spring is before summer’. ...Their particular kind of aphasia becomes apparent only when they have to operate with groups or arrangements of elements. If these patients are asked, ‘Point to the pencil with the key drawn on it’ or ‘Where is my sister's friend?’ they do not understand what is being said. As one patient put it, ‘I know where there is a sister and a friend, but I don't know who belongs to whom’” (Cole et al. 2014).

Individuals with PFS paralysis (as a result of lesion or a neurodevelopmental condition) do not understand recursive sentences (e.g. “John lives below Mary, who lives below Steve”) and spatial prepositions and, therefore, by definition, use a non-recursive communication system. They provide the best window into the non-recursive communication system of pre-PFS hominins living before 70,000 ya.

The great synergy: marriage of articulate speech and PFS creates modern language

While speech apparatus and voluntary imagination were improving as a result of separate independent evolutionary pressures over several million years, it does not mean that there was no synergy between them. Recent studies demonstrate a clear synergistic relationship between language proficiency and voluntary imagination in children. Deaf individuals communicating through a formal sign language from an early age develop normal voluntary imagination. However, in the absence of early communication or when the sign language is lacking spatial prepositions and recursion, deaf individuals show clear deficits of voluntary imagination. Deaf individuals who had learned American Sign Language (ASL) early in life were found to be more accurate than later learners at identifying whether two complex-shape figures presented at different degrees of rotation were identical or mirror images of each other (Emmorey et al. 1993). Individuals who learned ASL earlier were also faster than later learners at identifying whether two-dimensional body-shaped figures (bears with one paw raised) presented at different rotations were identical or mirror images of each other (Martin 2009). Even after decades of signing experience, the signers who learned ASL earlier were better at mental rotation accuracy (Martin et al. 2013). Amongst deaf individuals who acquire sign language at the same age, the richness of “spatial” language makes a difference. Specifically, two cohorts of signers were tested with the first cohort of signers acquiring the emerging sign language in Nicaragua when this language was just invented and had few spatial prepositions, while the second cohort of signers acquired the language in a more complex form with more spatial prepositions. Predictably, the second cohort of signers (tested when they were in their 20s) outperformed the first cohort of signers (tested when they were in their 30s) in several mental rotation tasks (Pyers et al. 2010). Finally, deaf individuals who are never exposed to formal sign language until puberty invariably suffer lifelong PFS paralysis despite learning significant vocabulary through intensive post-pubertal language therapy (Vyshedskiy et al. 2017b).

All available experimental evidence from modern-day children suggests the existence of an ontogenetic synergistic relationship between early childhood recursive language use and voluntary imagination skills. It is likely that similar synergy also existed on the phylogenetic level. Improving speech apparatus enabled better visuospatial processing and vice versa. The greatest synergy between articulate speech and voluntary imagination has been achieved with acquisition of PFS. PFS has enabled articulate speech to communicate an infinite number of novel object combinations with the use of a finite number of words, the system of communication that we call recursive language. At the same time, PFS endowed the human mind with the most efficient way to simulate the future in the neocortex: by voluntarily combining and re-combining mental objects from memory. The marriage of articulate speech and voluntary imagination at approximately 70,000 ya resulted in the birth of a practically new species – the modern Homo sapiens, the species with the same creativity and imagination as modern humans.

Improvement of voluntary imagination defined the pace of language evolution

In this manuscript, we have presented multiple theoretical and experimental observations that argue for dissociation of articulate speech and voluntary imagination: 1) The neurological apparatus for articulate speech (the Broca’s and Wernicke’s areas) is distinct from the neurological apparatus for voluntary imagination (the LPFC control over the visual areas in the posterior cortex). 2) Double dissociation of PFS and articulate speech in patients with brain lesions: patients with PFS paralysis do not demonstrate changes in articulate speech and patients with expressive aphasia can have normal PFS. 3) Double dissociation of PFS and articulate speech in childhood language development: some children acquire normal articulate speech while showing clear deficits in voluntary imagination, while others can have trouble in articulate speech, but attain normal PFS. 4) Our recent data from a large group of children with autism demonstrate that children improve their language following a course of voluntary imagination exercises. All these observations point to the dichotomy of recursive language evolution and the importance of the visuospatial component of language.

The dichotomy of recursive language evolution poses a dilemma: which of the two components of language was driving recursive language acquisition in hominins? Since articulate speech is so obviously different between humans and apes, this question has been commonly answered in favour of articulate speech. Charles Darwin wrote in 1871: “I cannot doubt that language owes its origin to the imitation and modification, aided by signs and gestures, of various natural sounds, the voices of other animals, and man’s own instinctive cries” (Darwin 1871). In his view, Darwin followed Max Müller (1861) who assumed that once hominins had stumbled upon the appropriate mechanism for producing articulate speech, a communication system would develop and language would evolve. However, as clearer understanding of differences in voluntary imagination between humans and apes emerges, this conventional wisdom is put in doubt. Apes who learned hundreds of words do not show any improvement of their voluntary imagination: they cannot integrate modifiers or juxtapose various mental objects at will to demonstrate PFS ability.

In this paper, we propose a radical idea that evolutionary acquisition of recursive language was limited not by the capacities of the speech apparatus, but by the improvement of voluntary imagination (i.e. the gradual progress in the development of the visuospatial control by the LPFC). Voluntary imagination is mediated via some of the longest fibres in the brain (arcuate fasciculus). Fine-tuning of these fibres by experience-dependent myelination is far more complex and slower than acquisition of vocabulary. Typically-developing children commonly acquire articulate speech by 2 years of age, but do not acquire PFS until 4 years of age (Vyshedskiy et al. 2020).

In fact, the argument in favour of the speech apparatus limiting the acquisition of recursive language is fundamentally weak, as speech is not an obligatory component of recursive language at all. If hominins had neurological machinery for voluntary imagination, they could have invented sign language. A sign language does not require hundreds of mutations necessary for an articulate speech apparatus and apes easily learn hundreds of signs (Patterson and Gordon 2002, Segerdahl et al. 2005). All formal sign languages include spatial prepositions and other recursive elements. In a largest natural experiment of language origin, four hundred Nicaraguan deaf children assembled in two schools in the 1970s (genetically modern children, with the propensity for normal voluntary imagination) spontaneously invented a new recursive sign language in just a few generations (Senghas and Coppola 2001). Thus, the capacities of the speech apparatus could not have been a limiting factor in the acquisition of recursive language. The only possible explanation for not acquiring recursive language earlier during human evolution is the unavailability of PFS in our ancestors before 70,000 ya.

Additional supporting evidence for this hypothesis comes from the observation of the variety of sound boxes in birds and the uniqueness of human voluntary imagination. Articulate sounds can be generated by Grey parrots and thousands of other songbird species (Pepperberg 2010). This shows that improving sound articulation is, evolutionarily speaking, a simpler process than improving voluntary imagination.

On the bases of neurological observations, archaeological findings, children studies, the sign language argument and variety of sound boxes in birds, we argue that the evolution of hominin speech apparatus must have followed (rather than led to) the improvements in voluntary imagination. Contrary to Darwin’s prediction, not speech, but voluntary imagination appears to define the pace of recursive language evolution.


We wish to thank Dr. P. Ilyinskii for careful editing of this manuscript.

Conflicts of interest

Authors declare no competing interests.