column: When your best engineers log off for good, don’t be surprised when the cloud forgets how DNS works

  • D61 [any]@hexbear.net
    link
    fedilink
    English
    arrow-up
    43
    ·
    17 days ago

    Pretty common upper management mindset, “Once something is done, its done forever and needs no specialized experience to maintain. So anybody can do it, even young inexperienced newly hired will be able to figure out how to take care of everything without any senior staff around to help them.”

    • trinicorn [comrade/them]@hexbear.net
      link
      fedilink
      English
      arrow-up
      21
      ·
      17 days ago

      In some cases it’s not even young/inexperienced new hires, just that in a project of any complexity, understanding all/as many as possible of the moving pieces that make it tick becomes more important than just experience or skill. So they hire a few really experienced people to throw at the problem, and manage the young/inexperienced, and they struggle and burn out too

    • lamassu@lemmygrad.ml
      link
      fedilink
      English
      arrow-up
      4
      ·
      17 days ago

      The entire hypothesis is nothing but speculation. It’s based on the assumption that troubleshooting steps live in the mind of some main character dev instead of where it probably lives, inside of a runbook for the service. I don’t doubt that some of the problems at Amazon that the author calls out are real, but sometimes outages aren’t caused by the first source you investigate and sometimes fallback solutions fail.

  • thethirdgracchi [he/him, they/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    18
    ·
    17 days ago

    Nah AWS has been having massive outages around the DNS for us-east-1 for the past decade. This is nothing new. Recovery times around this one were particularly brutal because it was Diwali yesterday and all their Indian support engineers were out enjoying the holiday.