Sound is an important component of games. Sounds can be a cue to the player that something interesting or important has happened. They can direct attention, set mood, and/or provide rewards.
(Examples of sounds in games.)
First lecture: sounds; second lecture backgrounds/music. Distinction: short vs long.
Sound is pressure waves in the air. It comes from any vibrating object, travels (at ~300m/sec), and you can feel it with your ears [and the rest of your body, depending on strength and frequency].
How to look at it: waveform (good for loudness) & spectrum (good for everything else, also loudness).
Simplest sound: sine wave (power at just one frequency). Simple & boring.
Perception: people can hear ~20Hz to ~20kHz. Two important perception things: 2*freq = same pitch (we'll come back to this); 0.5*amplitude != 0.5*loudness.
Thing to think about: we have two ears but we can detect whether sounds are left or right and high or low and in front or in back.
Thing to think about: we have two ears but "surround sound" is still a thing.
Recording.
Easy way to get sound. Record pressure at different time points. (How fast to record?).
Synthesis.
Make sound with electronics or code; basic ideas in synthesis: oscillator, envelope, trigger, filter, LFO, FM/PM.
Environmental modelling: (Reverb; echo; impulse response; panning; falloff.)
Let's design a basic model [asset pipeline diagram].
A straightforward game sound setup consists of background sounds (long form, perhaps music) and sound effects (short sounds, clicks and beeps). Generally background sounds are slowly evolving and you only have one or two playing, and sfx are short and many play (potentially at once). Also some SFX loop.
How might you manage SFX and music in-game? A very simple model: SFX are stored uncompressed and can be triggered (potentially with pitch modification) to provide a handle to a playing sample. Music is stored compressed and is streamed as needed.
Audio output is really straightforward: an output device will play back a waveform of some bit depth at some sampling rate. So your game will need to supply some sort of proper waveform to the underlying audio system with some frequency.
SDL provides two ways to do this: in one, you provide a callback which is invoked whenever new audio is needed, while in the other, your code is responsible for queuing audio as it notices that the queue is draining.
NOTE: latency vs CPU time spent on audio.
NOTE: audio thread.
So, we've got the abstraction on one hand and the underlying need for output on the other hand. What do?
Pretty simple! For every playing effect, sum it up. That's it. (Quick note on panning: can't just linearly fade without a perceptual "dip" in the center. Nonlinear effects!)
Audio perception is weirdly non-linear. That is to say, a wave that is half the amplitude will *not* sound like it is half as loud.
This can lead to some weird effects, like having a "dip" in the center of the sound field when doing what seems like a straightforward stereo pan (look for "equal power panning" to realize you should weight by cos/sin).
Audio perception is frequency-domain, but our perceptual apparatus tends to alias 2x frequency multiples. Weird stuff.
Trying to deal with real environments requires thinking about the shape of the external portion of the ears, how velocity shapes sound, and how the world reverberates. Also: inverse-square falloff. If you want to go this way, consider OpenAL (like OpenGL, but for 3D audio).
Stereo isn't everything. Real environments echo in really interesting ways.
From cheap-to-less-cheap:
Workflow 0: your phone. Just go record stuff, adjust in Audacity. Call it good.
Workflow 1: loops. Just pick some layers that seem good, audio done.
Workflow 2: chords. WELCOME TO MUSIC THEORY WORLD.
Sound will probably not break a game, but it can certainly make the game.
It's not clear if music theory reflects any essential perceptual truth or simply reflects a set of self-reinforcing cultural preferences. Regardless. Here's how it works.
We break every frequency doubling (octave) into 12 steps (pitches), each an even multiple (even temperament).
We then subset these pitches into sub-groups based on... well, it's not really clear, but we call them scales.
If you start at a pitch and walk 2,2,1,2,2,2,1 you get back to where you started (mod 12) and end up walking over what get's called a "major" scale. Flip that around (2,2,2,1,2,2,1) you get a "minor" scale.
Scales don't need to be 7 notes. 2,2,2,3,2,3 is a "pentatonic" scale. 2,1,2,1,2,1,2,1 is a "whole-half" scale.
Scales are basically subsets of the 12 steps that we commonly put together, which means they sound good together. (Or is it the other way around?)
Regardless, further subsetting scales gives you chords. Take the first, third, and fifth notes of a major scale to get a major triad. (That's also a 4,3 in terms of half-steps.) Through in the 7th note of the scale to get a 7th chord. 7th chords sound super sweet.
And remember that you can move notes by an octave without changing their meaning. This always blows my mind, but you can basically just copy and paste your chord all over the scale (well, in +/- 12 half-steps).
Basic idea: pick a chord, express it, move occasionally. Or sometimes move just part of the chord, resolve to a new chord. Some instruments will just be there to build and maintain the chord. Other instruments will pick around the scale for interest.
You can use DAWs for sound design also. This is actually kinda fun. Using old-style synths can give you old-style video game sounds.
Reverb and distortion to make things excellent.
Consider your listener's speakers.