1. Amanda: understanding the scheduler

    One of the most worrisome things for me about Amanda is that the amdump will always use a new tape. An early test on the dangers inherent in this (as amdump also loops around your set of vtapes) was this:

    for i in `seq 12` #I have only 10 tapes
        sudo -u backup amdump DailySet1
    

    The result was as I feared: my full backup was over written by an incremental one. That sucks. However as I understood Amanda better I realized that the above is not a valid test. Amanada’s scheduler takes into account a complex relationship between dates, scheduling, the dumpcycle, runspercycle, and tapes that is still a bit weird for me but not indecipherable.

    In your amanda.conf, two variables control important aspects of the scheduler. dumpcycle is the max number of days between full backups and runspercycle is the number of times that amdump will be run during that cycle. So if you want a full backup every week but nightly incrementals dumpcycle = 7 and runspercycle = 7 and a midnight crontab to run amdumpcycle. If you wanted incremental backups every 12 hours and a full backup every week you’d set dumpcycle = 7 and runspercycle = 14 and a crontab to run amdump every 12 hours. By default the harddrive template we used will only create 10 vtapes. Since Amanda will always want to use a different tape for each backup, running more amdump more often than you promised in runspercycle and a greater number of times than you have vtapes will cause overwrites.

    For a system that wants you trust its smart scheduler this seems like pretty dumb behavior.

     
  2. Recovering Amanda Backups

    having backups running isn’t very useful if you can’t actually recover your data. Recovery is done through amrecover and requires that amanda-client.conf have a tape-server (which holds the data) and a index-server (which holds the metadata), in simple cases it’ll be the same box that you configured as your amanda server.

    There are six critical commands to use in amrecover:

    • sethost - which box do you want to restore? You don’t have to restore the same box you’re running amrecover on.

    • setdisk - which dle you’re trying to recover.

    • setdate - from which date do you ant to do the recovery? This command is optional, by default the last backup is selected.

    • cd - after you set the DLE you’re recovering you can traverse it using cd to exactly what you want to recover.

    • add <path> - add to the list of files/folders to extract. Adding directories is recursive. You can add . to add the current directory and its children (quick way to restore the disk).

    • extract - Do the actual recovery. There will be prompts to check that the right “tapes” are in and accessible.

    When you extract the only files extracted are the file backed up at the nearest backup date before the one you set. This is great for restoring a couple files but less good if a whole system needs restoring. For that you first need to restore to the last full backup (the best way to find it is to use amrecover’s history command and du of the tape drive directory) then restore to the latest date.

     
  3. Amanda: checking config and running a backup

    With Amanada now configured we can check the configuration and then run a backup.

    amcheck configname will run “a number of tests”, most important is that it will check that tapes load right (using the same process as amdump). It will also check how the client connections and permissions. If everything’s ok we can do our backup!

    Backups are done (usually cron’d) using amdump configname. Amanda’s big thing is that it’s smart enough to do the “right kind of backup”. On the first run it will do a full backup and then it will do incremental dumps “right”. While you can force full backups you really ought trust the scheduler. I’ll cover how it works soon. amdump, like most of Amanda, reports using mail and using log files in your config dir:

    • Mail reports are the most useful for everyday use (hence why they’re mailed). Includes status, size, speed, and success in handy tables. If there’s a problem you’ll have to look at one of the below reports.

    • configdir/log.YYYYMMDDHHMMSS.N is a trace log detailing what got put where and when. They’re not just logs for the user/sysadmin but represent the Amanda catalog. Deleting them can break recovery and reporting. More info on the wiki.

    • configdir/amdump.1 is always the latest dump log. Whenever a new dump is run the previous amdump.1 is becomes amdump.2 and .3 becomes .4, etc. This is the verbose log that details status even during runtime. More info on the wiki.

     
  4. ColourLovers: helping make colouring fun and easy again

    Design is hard for me. Wireframes I can do (for better or worse) but what’s really hard is anything to do with colour. For that I need all the help I can get. One of my favourite resources for colour help nowadays is ColourLovers.com.

    One of the craziest (to my un-artistic mind) and most useful (to my … you know) is the Palettes section. They have loads of different palettes which you can use as inspiration. If you find that you like a palette but it’s just a bit off, you can select a colour in the palette and see what other palettes use it. For the colour sense deficient it’s a wonder!

    By default all works are Creative Commons licensed (yay!) but CC-BY-NC-SA. That means you can’t just grab a palette and build a theme around it (I think). On the flip side ColourLovers provides a pretty cool basic and advanced (COPASO) palette editor which can help you act on your inspiration (you maintain ownership of all IP). They also link to a bunch of cool colour and pattern tools.

     
  5. Thoughts on routing in Python Web Frameworks

    A friend and I were discussing Flask as he’s quite taken with its simplicity. I’m more on the fence but, our discussion helped me realize how much of my thinking is now Django oriented. A little concerned that I might be seeing everything as a django nail for my django hammer, I started thinking about what does a web framework need?

    Going back to first principles, the core things of a web framework is to easily and obviously route requests to my code. Everything else, all the “batteries included”, all the community projects, are extra. Here’s how three popular handle the routing problem:

    @app.route("/")
    def hello():
        return "Hello World!"
    
    config.add_route(‘helloworld’, ‘helloworld’)

    and then pairing the view and route:

    config.add_view(hello, route_name=‘helloworld’)
    

    Though a decorator syntax is also available for the add_view call.

    • Django uses url object to represent the url and urlconf to group them.
    patterns(‘’, 
             url(r’^helloworld$’, hello, name=“helloworld”),
            )
    

    Looking at these three approaches… I still think Django’s is the most obvious. With all the things that a framework actually has to do (help with sessions, input validation, data persistence, authorization, etc.) routing is pretty basic part so it definitely shouldn’t be the only thing one looks for in a framework and choosing Django just for the routing is probably a poor idea… Still I like Django best.

     
  6. Amanda: Disklist, DLEs, and dumptypes

    The disklist file can be a relatively simple file (it can frequently look like three columns, whitespace delimited) with a simple, though critical, goal: say what to back up, from whom, and how. It also exposes us to the important concept of dumptypes.

    Each line in the file is referred to as a Disk List Entry (DLE). The word disk is misleading as Amanada can backup directories as well as whole disks. In the simple directory case you only need specify three things:

    host.example.org /path/to/directory dumptype
    

    Dumptype, in Amanda, refers to how a backup should be done. Dumptypes are defined in your backup configuration’s amanda.conf (which might use an include directive to include other files). They describe what archiving tool is used, whether the archive is compressed and/or encrypted and if so where (client or server) and how, and other options. Much of the time of configuring amanda is really configuring dumptypes (a topic worthy of its own post). A base set of dumptypes lives in /etc/amanda/template.d/dumptypes and your amanda.conf.

    With basic communication and some DLEs, you’re ready to check your config and possibly run your first backup - more on that next time!

     
  7. Amanda: Setting Client/Server Communication In A Test Environment.

    Having setup a basic config on our server, we now need to setup a config on our client. The client can be either the same box or any number of others. The ubuntu package amanda-client works out of the box (10.04). Client configuration is done via /etc/amanda-client.conf (obviously, on the client).

    Until we get to recovery, amanda-client.conf really only needs one setting: auth. auth needs to have same setting on server and on the client. By default, Amanda uses bsdtcp and a white list file ~backup/.amandahosts which limits who can execute what. It’s worth noting that bsdtcp does only host authentication not user authentication: once you white list an ip/domain any packet from that domain will be honoured. Naturally you don’t want to use this in production - but for testing using some VMs on your box it’s convenient.

    The wiki documents setting up bsdtcp pretty well so I’ll only comment on a few things. No dumptype overrides the auth setting so feel free to set it in the global dumptype at the top of amanda.conf. The wiki talks about setting up xinetd to work with amanda but I found that the default install on ubuntu 10.04 worked out of the box.

    Next: setting up DLEs!

     
  8. Amanda: Basic Config Setup

    Setting up an amanda backup config requires that you have installed Amanda correctly. This maybe mean you needed to do some extra steps after installing the .deb (if you’re using the Ubuntu 10.04 .deb). It also requires you to have a working mail command (which might mean a mail server) — this is mandatory, you can’t turn mail communication off and Amanda will pitch a fit if the command fails.

    Setting up an empty config (including getting the virtual tapes, etc.) is fairly simple as Amanda comes with a bunch of “templates” which you use as the base config of your system. Templates, which live in /var/lib/amanda/template.d, include using Amazon S3 and hard disk. Despite the template files all having a prefix of “amanda-“ when running the amserverconfig command leave off the prefix, leaving:

    #syntax: amserverconfig <config_name> -t <template_name>
    amserverconfig DailySet1 -t harddisk
    

    This will create a DailySet1 config directory in /etc/amanda and the required virtual tapes in /var/lib/amanda/vtapes/.

    Most of the changes you’ll be making will be to either DailySet1/amanda.conf (the main config file) or DailySet1/disklist. amanda.conf holds all the general config information, how frequently you want full backups, what types of backups it can do (compress, tar’d, encrypted), how to do authentication (bsdtcp or ssh), and so on. disklist is what will specify what to backup and how (which “dumptype” to use). I’ll expand on these two files over the coming days.

     
  9. Amanda: No, it’s not an rysnc wrapper

    I like simple apps. Particularly for important functions like system recovery. But eventually a job gets too complex to be handled in a simple way. Eventually, one needs a more complex, even Enterprise, tool. Enter Amanda, a system I’ve spent the past few weeks (on and off) learning and configuring.

    Amanda is very different from any system that I would pick. My idea of backups is rsync + compression. And compression is optional. Instead, Amanda is written around the concept of using tapes (not a physical requirement, can use virtual tapes) and supports doing incremental backups across these tapes. While offering a some neat features (multiple time snapshots without using n times as much disk space) it also adds complexity. 

    Throughout this week I’ll likely be writing about configuring and using Amanda. But just to close off today here are some useful links:

     
  10. Tumblr and Mail from the same domain

    For a recent project I needed to setup tumblr and mail on the same domain. That is, going to http://example.com showed you a tumblr page and mailing to someone@example.com sent the mail (actually forwarded to some gmail addresses). There were a couple hiccups, one around forwarding and one around DNS. The main issue with the latter is that, embarrassingly, I never really bothered to learn the details of how DNS works. I know about A records and CNAME records, I know how key lookups work, and maybe a couple other things. But it’s always something I have to look up whenever I have to work with it - about every couple years. 

    So here’s a simple guide to having mail and tumblr on the same domain:

    1. Change the A record for the domain you want point at Tumblr’s IP. The Tumblr settings page will tell you what it is:

      example.com A TUMBLRIP
       
    2. Create a subdomain with an A pointing at where your mail server is located:

      mail.example.com A MAILSERVERIP

    3. Set an MX record on the main domain to point at the mail subdomain:

      example.com MX mail.example.com

    I never worked with MX records directly before. I was surprised that MX records do NOT accept an IP but must point to another record (A or CNAME). 

    Also, MX records aren’t required when hosting a web and mail server on the same box as SMTP will implicitly use A, AAAA, and CNAME records for the specified domain if MX is absent.  

    Happy mailing and posting  - and remember to give it time to propagate!