Why keyframes are bad for character animation

One of the somewhat controversial opinions I’ve expressed in this blog is that keyframe animation* is bad and should be replaced with raw poses. I’ve always been a little bit vague about this though, without a clear statement about why exactly this is. That’s because it’s been more of a feeling than anything else, a frustration with the futzing around with keyframes we’re all forced to do when animating.

I recently submitted a talk proposal for SIGGRAPH 2019, and this required me to be much more rigorous about stating what it is exactly that I think is wrong with the process, and it clarified my thinking. I now think that you can boil down the issues with both keyframe animation and hierarchical rigging to this statement:

Animation curves and a rig together form a system that generates character motion. Animators do not create motion--they edit inputs to that system in the form of keyframe values. But with conventional keyframing and rigs, you cannot, by looking at the end results, understand the system and inputs that produced them.

Everything wrong with the keyframe animation process flows from this basic fact. Crossing effects between multiple layers of control, unwanted spline behavior, the mess created by space switching and FK/IK switches/blends, even gimbal lock issues--these all reduce to the fact that there is no one-to-one relationship between the inputs and the result, and the animator must therefore mentally model the keyframe/rig system to understand what inputs will produce the desired results. But, since multiple possible sets of inputs (ie. different combinations of key placement and rig state) can produce visually indistinguishable results, that mental model degrades extraordinarily quickly, and sussing out the real relationship between inputs and results requires constant attention and interpretation. This is true even for a “blocking plus” process, as the moment you spline generally reveals on even a very tightly blocked shot.

Put this way, the entire history of CG animation technique sounds completely insane, doesn’t it? Why the hell is this how we decided to animate characters? Why would anyone think this was a good idea?

There are multiple factors involved, but I think a lot of it comes down to what I’ve begun to think of as the “nondestructiveness problem” in computer art. “Nondestructive” in this case might also be described as “parameterized” or “procedural”--basically any case in which the end result is continually regenerated from a set of inputs that can be altered at any time. Nondestructive techniques are one of the major advantages to doing art with a computer...except when they aren’t. A nondestructive technique that in one instance allows you to do the work of ten artists working with more traditional techniques will in another instance absolutely cripple your ability to get anything done at all.

As an example, let's say you’re designing a logo, something along these lines:

This is a flat shape with very well defined, simple curves. I did this in Illustrator by placing down bezier handles, because that’s the obvious way to approach something like this. If I’d tried to paint the shape it would have taken forever to tune the shape to the right curvature, and I would probably have ended up with something that looked a bit wobbly no matter how long I worked on it.

Clear win for the nondestructive technique, right? Tuning a simple shape through bezier handles is much faster then painting it. Now imagine a completely naive observer, a hypothetical, possibly alien intelligence that has never encountered this thing you Earth people call “art” before. Such a being could be forgiven for concluding that a nondestructive approach is always correct. Something that can be quickly adjusted just by tweaking a few bezier handles has got to be better than thousands of messy pixels.

Listen, Zog...can I call you Zog?...let's put that idea to the test. You’re an alien superintelligence, so you should be able to use Adobe Illustrator, which was clearly designed for your kind and not for actual human beings. Only I don’t want you to make a logo. I want you to make this:

This background, used in the Monkey test me and Chris Perry produced for Vintata, was painted by Jeet Dzung and Ta Lan Hanh.

Kind of a different situation, isn’t it? When drawing clean shapes vectors are the obvious choice, but trying to paint by placing bezier handles down for each stroke is immensely inefficient,** even to a being of Zog’s incalculable intellect.

Now this doesn’t mean that nondestructive techniques have no utility for digital painters at all. Layers, for instance, are clearly very useful. And yet, the number of layers you can keep around and still have something useful to interact with is actually pretty limited. Dividing a painting into foreground/midground/background or into layers for tone and color makes sense. Making every alteration you make to the painting into a new layer, on the other hand, leaves you with an incomprehensible stack that you’re going to end up having to either collapse, or basically leave in place and never modify (in which case you might as well have never done it at all). It’s the same problem with keyframes--the mapping between inputs (the pixels in each layer) and output (the final image) is too complex to hold in your head, and eventually it becomes more work then treating everything as a flat image. Compare this to CG modeling, where surfaces generated from control points (such as NURBs and subdivision surfaces) make modeling and adjusting simple shapes very easy, but are vastly inferior to sculpting tools that use micropolygons or voxels when it comes to a complex shape like a character.

I think of nondestructive vs destructive means of creation as being on a graph like this:

When complexity is low, nondestructive techniques are clearly superior, sometimes by a lot. But the difficulty of using nondestructive techniques increases exponentially as complexity increases, where destructive techniques increase linearly. There is a point at which the two lines cross, and a primarily nondestructive workflow (as opposed to a mostly destructive workflow with nondestructive assistance) flips from great to terrible.

And that’s the crux of the issue. A lot of the things you might want to animate with a computer are on the left hand side of the graph. If you want to animate a bouncing ball then a graph editor is the right thing. It’s the right thing for motion graphics, for camera movement, and for mechanical/vehicular motion. But the right hand side of the graph? That includes all character animation. Because there is really no such thing as a character performance that isn’t complex.

Now, I do want to take a moment to discuss my use of the term “complexity” here. I’m using it because I don’t have a better term, but the term could be misleading, because what I mean here isn’t quite the same thing as visual complexity. It’s quite easy to make something that’s very visually complex through nondestructive means--think of any fractal pattern. The best I can do to nail down this definition of “complexity” is that it has less to do with number of elements present and more to do with how distinct those elements are. A painting or a character performance is extremely specific, and cannot be easily broken down into constituent elements. It doesn’t “parameterize” very well. Art that has, one might say, specific complexity is on the right side of the graph, and should be authored in as direct a manner as possible.

There is a pretty important exception to this rule: cases where the end result must be generated from inputs because it’s going to be applied to multiple sets of inputs. For instance, a compositing graph might well be complex and difficult to reason about. That’s just too bad, because “collapsing” the graph would make the results useless.

I suggest that this is an indication that compositing moving images is actually a completely different class of problem--just like rigging, compositing is in fact programming. A crucial difference here is that, unlike an animator who is either authoring inputs into the keyframe/rigging system, a compositor’s creation is the system itself, ie. the graph that will take in rendered or filmed inputs (plus inputs the compositing artist may have created like keyframes, ramps, masks, etc) and output final frames. The difference is whether what you are creating is fundamentally data that will be fed into a process (keyframes, poses, pixels, bezier handle locations, vertices, voxels, etc) or whether you are creating both data and the process that will be used to process that data (probably in the form of a node graph).

This idea isn’t at all new--Shake files used to be called “scripts” after all--but it’s not how people usually think about what a compositing artist is creating. And it’s not necessarily just true of node-graph-based systems. Are you using a lot of nested comps and complex layer interactions in After Effects? Congratulations, you’re a programmer. You’re using an extremely obfuscated system to do your programming, but that doesn’t make you not a programmer. Also, being a programmer doesn’t make you not an artist. There is nothing whatsoever mutually exclusive about those roles.

I’m really, really old***, so I remember the early days of CG. I was excited when I first saw Chromosaurus and Stanley & Stella in Breaking the Ice. I remember how much promise there was supposed to be in computer art, how the ability to tweak anything via a simple parameter was supposed to take the drudgery out of the artistic creation process. And sometimes, it did. Sometimes you got the kind of “big win” represented by nonlinear editing, which we now just call “editing” because doing things the old-fashioned way is so impractical by comparison that it barely exists. But just as often, the promise of “parameterized” art failed.

In the ensuing decades most disciplines gradually settled into an understanding of what different techniques were and weren’t good for. You generate a cityscape procedurally, but you sculpt a character. This understanding has never truly emerged for animation, and we’re still stuck with a system that is fundamentally built on a “parameterized” approach in every case. I’d go so far as to say that the fundamental assumption behind Maya is that this is the only approach, even though the actual animators and TDs using the software have been trying to turn it in other directions for the sake of their sanity for decades now.

To do something about this, it would help to gain some sort of understanding of why nondestructive techniques are good for some things and terrible for others. I don’t think this conception of the problem fully gets at all it’s aspects, but I think it’s a start.

You see, Zog? We’re a young species, but we show great promise. You were not so different, once.

*To be completely clear, by “keyframe animation” I mean animation based on curves with keyframes as control points. This is not the same, and in fact is in some ways opposed to, the concept of “key poses” as used by traditional animators.

**There are, of course, vector-based painting systems, but I’d argue that the user interacts with them more like raster painting systems then like Illustrator. The problem of nondestructive workflows is a user interaction problem--whatever the program uses to represent what the user is creating under the hood may be a separate question.

***I’m 37, but I got out ahead of the pack and started being super old at a young age. As evidence, I not only remember when “desktop video” was a thing, I remember when “desktop publishing” was a thing. For you youngsters, that’s what we now refer to as “publishing.”