Sorting any game audio artifact into a certain category can greatly help to organize production, spawn new ideas and create balanced and meaningful audio artifacts. The traditional framework to categorize game audio is a very production oriented one. It basically divides all game audio into
This classical taxonomy is understood by anyone in the business and is useful for managing the production workflow. Our services page for example is subdivided in that manner. However, this categorization is a very technically oriented and doesn't allow much insight into the function of audio concerning game design.
Sander Huiberts and Richard van Tol from the Utrecht School of Arts invented a new framework for categorizing audio. The IEZA framework builds upon works from the film music business and it gives us a taxonomy to better understand the purpose of audio in games.
One nice thing about IEZA is its 2-dimensionality. Not only since the advent of tagging we know that 1-dimensional categorization is often not a suitable tool. Things like aspect-oriented programming, UML or plain algebra teach us, that subdividing a field along multiple dimensions can greatly improve results. The IEZA framework categorizes game audio into two dimensions: 1) source and 2) expression.
IEZA Framwork
The first dimension stems from classical movies, who subdivide all audio events into diegetic and non-diegetic.
A diegetic sound is basically any sound coming from within the movie or game scene (this includes sound sources not visible in the current view frustrum). Examples are footsteps of the player character, chatter of non player characters, ambient city noise or sound of rain and thunder. This category most often consists of real world sound effects. Sometimes music is also used in a diegetic way (e.g. coming from a radio or street singer in the scene).
Non-diegetic sounds have no direct relation to anything in the visual scene and usually refer to music and moody sound effects. A typical example is an orchestral soundtrack or a narrator. Of course these audio events are somehow connected to the visual scene (at least in good game audio), but the can't be directly attributed to objects in the virtual world.
The second dimension subdivides all sound into those directly triggerable by the player and those not directly triggerable.
Sounds directly triggerable by the player fall into this category. Typical examples are gun shots or button click sounds from the HUD.
This is audio only indirectly affected by the player's activities. Examples are a background music track or ambient weather sounds.
The above mentioned dimensions form four different categories of audio, which greatly help to organize the creation of sound for a game:
These are diegetic settings sounds. These sounds logically originate from objects in the virtual world, but have no true vector coordinates to locate them (typically there is only one layer of ambient zone sounds). An example would be a cheering crowds noise in a soccer game or a wind sound in an adventure. Zone audio consists mostly of real world sounds. Its main purpose is to enhance the visual representation of a location and make it clear to the player. Zone audio is sometimes also used to prevent complete silence in the event that no other sounds are triggered.
These are diegetic activity sounds. This category is what most people think of when they talk about sound effects. Sounds are cognitively triggered by the player. Gun shots or collision sounds are typical examples. Also, most voice-overs fall into this category.
These are non-diegetic settings sounds. The most typical example of this category is the musical soundtrack. Music is usually not attributed to a sound source on screen and is not cognitively triggerable by the player. However, affect audio can (and in our opinion should) still react to what the player does in the game world. If e.g. the player approaches a closed door with enemies waiting behind, the music should intensify and become more frighting with each step. Another example would be a battle in which the music mirrors the current state of the fight on a scale from "almost defeated" to "close to victory". These interactive affect sound changes are the core of our interactive audio endeavors at Hans HiScore.
These are non-diegetic activity sounds. A typical example is the click sound of a button in the HUD or the sound when picking up a power-up. Sometimes these sounds mimic natural sounds, but most often they are artificially created bleeps or sounds completely removed from their original context (e.g. a skidding tires sound when clicking a button). The purpose of interface audio is to give audible feedback to the player. Interface sounds should match the mood and setting of the whole game - using standard button sounds often destroys the mood created by all other elements of the game.
The IEZA Framework is yet another approach of a taxonomy for game audio. The categorization is already used by several game audio composers and has proven its benefits. It helps to better understand the function of different parts of music and audio effects when producing audio for a game. We at Hans HiScore do use this framework as a tool and constantly enhance it with our own ideas from theory and practice.
Photo under CC license by Chang'r