Computer Science Hard Things

There is a popular saying about Computer Science (see here and here):

There are only two hard things in Computer Science: cache invalidation and naming things.

— Phil Karlton

There is a funny variation that makes it “There are only two hard problems in Computer Science: cache invalidation, naming things, and off-by-one errors.”

I propose there are actually three hard things:

  1. Naming things
  2. Cache invalidation
  3. Dependency resolution

My criteria for being a “hard thing”:

  1. Must be applicable to multiple scopes
  2. Must not be fully solved

Examined this way, it is interesting to see why these are the three deserve to be on the list:

  1. Naming things
    1. Applicable to every area in computer science – variable names, class names, machine names, network names, security policy names, URIs, etc. It even applies to this list: think of the difference between naming the first item “cache invalidation” versus just “caching”.
    2. Not at all solved. You can barely say we have good heuristics for this.
  2. Cache invalidation
    1. Applicable to multiple layers of computer hierarchy: CPU registers, L1, L2, L3, etc., disk caches, network resource caches, DNS caches, etc.
    2. Solved in the sense we know it is a balancing act between efficiency and correctness. Not solved for the general case, however. If there even is a “general case” at all.
  3. Dependency resolution
    1. Applicable to multiple domains: run-time (think Dependency Injection), build time (think Apache Ivy and Maven), hardware-software, distributed systems, and probably more
    2. Solved in the sense we know about topological sorting to help with transitive dependencies.
      For run-time, the entire sub-field of dependency injection has multiple solutions: Spring Framework, Guice, PicoContainer. Does anybody remember DLL Hell? That shows that “API definition” (which is a candidate for its own “Hard Thing” entry) is a sub-problem of dependency resolution.
      For build time, the better build systems make it easy to specify your dependencies and add global exclusions to get you out of transitive dependency issues.
      For hardware-software, think about the hardware requirements for running a particular application or installing a particular driver.
      For distributed systems, think about (for example) your application requires which version of which database. For provisioning, has been partially solved by Chef and Puppet and others. For detection, still very much roll-your-own.

So, did I create any converts? Do you agree there are 3 Hard Things in Computer Science?

This entry was posted in Software Engineering. Bookmark the permalink.