Immersive virtual reality technology has been developing extremely dynamically in the last decade. One of the reasons is the impressive progress in 3D graphics and the increasing accessibility of hardware with sufficient computational power to smoothly render high-quality 3D environments. Virtual reality overwhelms the user by replacing the physical world with a computer-generated 3D scenario. Its power lies in embodied, complex, and vivid scenarios rich in context and dynamic engagement of the sensorimotor system, which provokes more naturalistic behavioural and physiological responses than abstract stimuli. Congruent multisensory stimulation builds virtual experience and makes us forget about the real world. Strong presence and embodiment illusions can cause deep cognitive, affective, and behavioural changes, leading to enhanced learning, perspective-taking, and treatments. Some types of sensory input have received much more attention than others. This includes mainly visual and auditory cues, but also, to some extent, nociception and touch. However, the human sensory system is much richer and more complex and includes many other senses, such as smell, and taste, but also thermoception, interoception, equilibrioception, and others. Most VR experiences are built on rich audiovisual cues since these two modalities allow us to perceive the virtual world and interact with it. For example, touch, although extremely important, is often represented in a reductionist form, and other modalities, such as the sense of smell or thermoception, are completely ignored.