Predictable reads with dynamodb

The scenario

Three weeks ago I decided to replace a MySQL database with a couple of DynamoDB tables. The main reason for this was that Tido are still very much in the ‘discovery phase’ of startup life and so requirements are changing thick and fast. MongoDB might have been a more obvious choice but that opens up lots of configuration options. I wanted something much simpler.

At Tido I’m the backend server guy, frontend developer and ops guy all in one; true ‘full stack’. DynamoDB seemed like an ideal solution: zero setup overhead and very limited configuration choices. All you get are two knobs to twiddle with that make things go faster, read and write capacity.

I still felt a little uncomfortable having never used DynamoDB before and it felt like we were stepping into the unknown. First off, how do we interact with it? We could use the aws-sdk but this seemed very verbose and clunky since you have to specify the type of property you want to set on every object. I looked at dynamoose but this didn’t seem to be being maintained (whatever that means these days). Then at a SushiJS event a friend told me about vogels. This was exactly what I had been looking for: a sequelize/mongoose style abstraction over the aws-sdk.

I won’t bore you with the details of our setup but we went from 11 MySQL tables to two DynamoDB tables.

The reduction in the amount of code needed was nice and it ‘felt’ fast; not only to read and write but to work with too. We were able to completely swap out the database in a couple of weeks.

This feeling that we’d made the right decision was short lived. Our IOS team had started implementing against the api and were reporting that it wasn’t responding. And sure enough, a quick look in the logs showed response times well above 30s. This is way too high for an api that is just returning around 100 products. Through some basic browser refreshes I struggled to replicate the issue. Welcome minigun Artillery.

Testing

This isn’t the best use of Artillery as we didn’t need to walk through a user journey as such but instead we just had to replicate a couple of ios developers reloading apps and browsers simultaneously. From what I’d read about DynamoDB the read capacity units (RCU) that you can set on your tables is the maximum requests per second (up to 1000kb). If you are happy with ‘dirty reads’ (eventually consistent) then you get twice the throughput. Based on this I estimated that for our product catalogue 100 RCU would be plenty. So I tested it by running the following commands:

artillery quick -d 60 -r 1 "https://dynamo.eu-west-1.elasticbeanstalk.com/volumes"
artillery quick -d 60 -r 2 "https://dynamo.eu-west-1.elasticbeanstalk.com/volumes"
artillery quick -d 60 -r 3 "https://dynamo.eu-west-1.elasticbeanstalk.com/volumes"

At this point I started to see some throttled requests but I wasn’t seeing enough of a trend to be able to predict what would be a correct value for the RCU. Running the tests for longer would probably make sense so I started running them for 10 minutes. I also increased the number of requests per second to 5 and the RCU to 200:

artillery quick -d 600 -r 5 "https://dynamo.eu-west-1.elasticbeanstalk.com/volumes"

This is the resulting cloudwatch monitoring output for those first two sets of tests. The spike on the far left being the first set of three tests (1-3 req/s) and the persistently high blue line on the right being the 10 minute test:

First test runs

That last test was interesting because a consistent behaviour was starting to appear. For a catalogue of 100 items at 5 ‘dirty’ requests per second, the metrics were showing 250 consumed RCU (100x5/2). You can see from the artillery output that I was not seeing the best performance but that was down to the throlled requests. If your consumed RCU goes above your provisioned RCU then the extra requests will be throttled or rejected. AWS do give you a little leeway but not when you’re over for a prolonged period of time.

First test results

I now wanted to see how predictable this really could be. If I doubled the requests per second, could I guess the correct RCU? The results:

Second test run

And from the artillery metrics I was pleased to see better performance than with the previous test as very few of the requests were being throttled.

Second test results

Configuring an AWS Elastic Beanstalk application

Having spent some more time recently deploying apps using Elastic Beanstalk for Tido I have stumbled across some really nice features and some really annoying gotchas. I predominantely use the CLI but have been known to tamper in the AWS console UI too. That ended up being one of the gotchas as you’ll find out.

##Application Setup When building an app you might not initially think about how you want your servers configured but it is actually really important. Docker does address this in a similar way by bundling the container configuration alongside the application code. This is how you can do something similar with Elastic Beanstalk.

Firstly create a .elasticbeanstalk directory in the root of your application. In here you’ll create a config.yml file similar to this:

branch-defaults:
  master:
    environment: api-prod
global:
  application_name: store-api
  default_platform: Node.js
  default_region: eu-west-1
  profile: eb-cli
  sc: git

If you run the eb init command you will be taken through a wizard which will create this for you.

The next command you’ll need is eb create. This will again take you through another wizard but this time it will deploy your app to EC2.

If you run this command from a different branch it will link the new environment to the current branch. For example we’re going to deploy our develop branch to a dev environment so we now have the following config:

branch-defaults:
  master:
    environment: api-prod
  develop:
    environment: api-dev
global:
  application_name: store-api
  default_platform: Node.js
  default_region: eu-west-1
  profile: eb-cli
  sc: git

##Application Configuration From within your application root folder create a folder .ebextensions. In here you can put as many .config files as you wish. These will be run in alphabetical order so I prefix mine with a two digit number e.g. 00-enviornment-variables.config.

For example the aws-sdk npm module recommends setting a couple of environment variables which we can do like so:

option_settings:
  - option_name: AWS_ACCESS_KEY_ID
    value: MY_AWS_ACCESS_KEY
  - option_name: AWS_SECRET_ACCESS_KEY
    value: SUPER_SECRET_AWS_SECRET

You can also configure all other aspects of how your Elastic Beanstalk app runs for example the healthcheck url, min and max cluster size:

option_settings:
  - namespace:  aws:elasticbeanstalk:application
    option_name:  Application Healthcheck URL
    value:  /ping
  - namespace: aws:autoscaling:asg
    option_name: MinSize
    value: 5
  - namespace: aws:autoscaling:asg
    option_name: MaxSize
    value: 40    

This page in the AWS docs has all the available options and details the defaults.

You can also configure the proxy settings. For example we wanted to accept longer urls than the default Nginx setting.

files:
  "/etc/nginx/conf.d/proxy.conf" :
    mode: "000755"
    owner: root
    group: root
    content: |
     large_client_header_buffers 8 256k;

###Precedence Gotcha

The caught me out big time! I had previously gone into the admin UI and changed the minimum number of nodes for our cluster from the default 1 to 2.

This means that when I tried to configure the min and max settings from within my .ebextensions file it wasn’t setting the values I wanted. Eventually I found a solution with the eb config command. Running this command from the linked branch will open up the default text editor with the current config file for that enviornment:

Live editing config

By deleting the two highlighted lines and saving the file, those settings reverted back to their defaults. By then running eb deploy once more my .ebextensions config files were run correctly.

npm scripts on AWS Elastic Beanstalk

I’ve been building a little app running a Hapi server and decided I wanted to plug a web client in front of it. I decided to try webpack as I’d heard people raving about it. The front-end tech is irrelevant for this post as what I’ve learnt applies to any pre deployment processes; running grunt tasks, less compilation, etc. The basic problem I’m trying to solve is that I want to run some front end compilation before starting my Hapi server. In my case this will create a bundle.js file containing the webpacked js code.

As a quick hack I knew I could just include the bundle.js file in my git repo. This obviously works but it’s not very DRY; all the code is in the repository twice. This is clearly not the right way to do it but it works. Another option I looked at was to use the npm postinstall script. This worked perfectly on my local environment but when I pushed to Elastic Beanstalk a number of things didn’t work quite how I expected.

  1. When Elastic Beanstalk runs npm install it includes the –production flag.
  2. The postinstall script doesn’t run

For #1 this is kind of obvious but I’d considered my front end libraries as pre-production. I was only going to ship the bundled code so why would I need to include the likes of Backbone, Marionette, etc in my dependencies. The fix for this was to move them from devDependencies to dependencies.

For #2 I decided to run an experiment. I created a package.json file with all the possible scripts with just a console.log in each one:

{
  "name": "npm-script-test",
  "version": "1.0.0",
  "description": "",
  "scripts": {
    "prepublish": "node -e \"console.log('prepublish');\"",
    "publish": "node -e \"console.log('publish');\"",
    "postpublish": "node -e \"console.log('postpublish');\"",
    "preinstall": "node -e \"console.log('preinstall');\"",
    "install": "node -e \"console.log('install');\"",
    "postinstall": "node -e \"console.log('postinstall');\"",
    "preuninstall": "node -e \"console.log('preuninstall');\"",
    "uninstall": "node -e \"console.log('uninstall');\"",
    "postuninstall": "node -e \"console.log('postuninstall');\"",
    "preversion": "node -e \"console.log('preversion');\"",
    "version": "node -e \"console.log('version');\"",
    "postversion": "node -e \"console.log('postversion');\"",
    "pretest": "node -e \"console.log('pretest');\"",
    "test": "node -e \"console.log('test');\"",
    "posttest": "node -e \"console.log('posttest');\"",
    "prestop": "node -e \"console.log('prestop');\"",
    "stop": "node -e \"console.log('stop');\"",
    "poststop": "node -e \"console.log('poststop');\"",
    "prestart": "node -e \"console.log('prestart');\"",
    "start": "node -e \"console.log('start');\"",
    "poststart": "node -e \"console.log('poststart');\"",
    "prerestart": "node -e \"console.log('prerestart');\"",
    "restart": "node -e \"console.log('restart');\"",
    "postrestart": "node -e \"console.log('postrestart');\""
  }
}

The resulting log on Elastic Beanstalk in /var/log/nodejs/nodejs.log:

> worth-sharing@1.0.0 prestart /var/app/current
> node -e "console.log('prestart');"

prestart

> worth-sharing@1.0.0 start /var/app/current
> node -e "console.log('start');"

start

> worth-sharing@1.0.0 poststart /var/app/current
> node -e "console.log('poststart');"

poststart

It appears that these are the only npm scripts run by Elastic Beanstalk.

My solution was to move my webpack compilation from the postinstall to prestart and everything started working!

Migrating from SVN to git

When other attendees at conferences and meetups asked what version control we used I was always a little embarrased to say that we were still using SVN. Starting the CP4 project at Concrete allowed us to start afresh and we had reasonably free reign to pick the technologies that we wanted to use. This meant that picking git for versioning was the obvious choice.

This has worked great for the past nine or so months for our new repositories but it’s now time to migrate our old SVN repositories over. The main CP3 application is in it’s own repository and a team of engineers work on this daily so my view was that it wasn’t going to be a simple switch over. This isn’t the sort of thing you do every day so there’s always going to be a bit of a risk and it couldn’t impact on our feature delivery schedule or the monthly release cycle.

When to do the migration

The monthly release is pushed out into production on the last Wednesday of every month. The week preceeding this is considered as our UAT/QA week where we can run our regression tests and perform final demos to the clients and with our sprints running Monday to Friday we have a code freeze the friday before the UAT week. On that Friday we normally create a new branch for the next release and I felt that it would be at this point we would make the switch from SVN to Git

Preparing to migrate

There are a number of resources that have helped with getting this working but I wanted to summarise our process as we have a few other repositories to migrate too:

1. Create the authors mapping text file

This command will output all the users that have ever commited to your svn repository. This is really useful when you have an old repository like ours as it means you can re-create the git commits using real git users later.

svn log --xml | grep author | sort -u | perl -pe 's/.*>(.*?)<.*/$1 = /'

We edited the author’s file as our gitlab server is hooked up to active directory and our svn server isn’t so every user’s details were slightly different.

2. Checkout/clone the svn repository as a git repo

Using the text file created in the previous step I created a local git version of our svn repository.

git svn clone --authors-file=../authors-transform.txt \
    -T trunk \
    -b branches/releases \
    http://[SVN_SERVER]/svn/concretePlatform

I had to use the -b branches/releases flag as we haven’t follow the standard svn naming convention of having branches directly under the branches folder.

3. Create a remote repository on gitlab

I created a new repository in the CP3 group called concrete-platform and then added it as a remote to the newly checked out repository

git remote add gitlab git@[GITLAB_SERVER]:cp3/concrete-platform.git

Notice that I’ve named the gitlab remote gitlab. This is because the SVN remote is by default named origin. By default gitlab, github and bitbucket all tell you to add a remote called origin but when I did this on the repository checked out from SVN I then couldn’t fetch any new commits from the SVN remote. I’m sure there must be a way to correctly reference the SVN/origin remote vs the git/origin but I found the easiest and clearest way was to call the new git remote gitlab.

4. Checkout and push trunk/master branch to the gitlab remote

Navigate into the repository folder and checkout the master branch. Once this is complete it can be pushed to the newly created remote pointing to the gitlab server.

cd concretePlatform/
git checkout master
git push gitlab master

5. Checkout and push the 2015-apr release branch

We only have one other active branch at this time so I also checked this one out and pushed it up to gitlab.

git checkout 2015-apr
git push gitlab 2015-apr

We now have a copy of our SVN repository on our gitlab server. During the time between the initial clone and code freeze a number of new commits have come in but it’s just a matter of running the following to fetch the commits from SVN and push them up to gitlab:

git svn fetch
git checkout 2015-apr
git push gitlab 2015-apr

The final step is to duplicate our Jenkins jobs and updating the SCM details. We’re going to run a number of dev site builds from git and then run the April release to production on 29th April.

Both the git tutorial and Atlassian’s tutorial were very useful with helping with this process.

Using AWS SQS with Node.js

At Concrete we’re looking at utilising some of AWS’s services to speed up development of our new platform. As part of this I’ve been investigating the use of their SQS service to help coordinate the delivery of thumbnails and assets globally. It’s quite a simple queueing mechanism that office a guarantee of delivery “at least once”. This isn’t really that much of a problem for us as we’ll generate keys in our apps and if we create the same thumbnail twice it’s just a waste and nothing more.

Pumping messages into the queue was simple enough using the aws-sdk module:

var AWS = require('aws-sdk');

AWS.config.update({accessKeyId: 'KEY', secretAccessKey: 'SECRET'});

var sqs = new AWS.SQS({region:'eu-west-1'}); 

var msg = { payload: 'a message' };

var sqsParams = {
  MessageBody: JSON.stringify(msg),
  QueueUrl: 'QUEUE_URL'
};

sqs.sendMessage(sqsParams, function(err, data) {
  if (err) {
    console.log('ERR', err);
  }

  console.log(data);
});

The response will look something like this

{ 
  ResponseMetadata: { RequestId: '232c557d-b1ed-54a1-a88c-180f7aaf3eb3' },
  MD5OfMessageBody: '80cbb15af483887b15534f2ac3dfa46f',
  MessageId: '6cc50b09-17a8-4907-beeb-ed3a620b562f' 
}

On the other end of the queue you need to create a worker to do something with the message. The sqs-consumer module by the BBC that handles polling the queue for you. Using this module my worker looked something like this:

var Consumer = require('sqs-consumer');

var app = Consumer.create({
  queueUrl: 'QUEUE_URL',
  region: 'eu-west-1',
  batchSize: 10,
  handleMessage: function (message, done) {

    var msgBody = JSON.parse(message.Body);
    console.log(msgBody);

    return done();

  }
});

app.on('error', function (err) {
  console.log(err);
});

app.start();

Calling done() handles the removal of the message from the queue. Clearly, you will want to do a little bit more with your message but it gives an idea of what’s going on.

All this was very exciting but what I didn’t realise was that when deploying apps using Elastic Beanstalk you don’t need to worry about polling the queue yourself, all your app needs to do is expose a POST route that takes the message as the payload. I’m a big fan of the Hapi framework so my worker ended up like this:

var Hapi = require('hapi');

var server = new Hapi.Server();
server.connection({ port: process.env.PORT || 80 });

server.route({
  method: 'post',
  path: '/',
  handler: function (request, reply) {

    var msgBody = request.payload;
    console.log(msgBody);

  }
});

server.start(function() {
  console.log('server started');
});

Using JSON to produce a CV

For over 6 months I’ve been working with Node.js at Concrete so wanted to give my CV a little refresh. I could have just downloaded my existing CV from Google Docs, edited and re-uploaded it again. I could have edited it in Google Docs but I’ve had this CV format for about 10 years so it gets a bit messed up when it gets parsed. There must be a better way.

Enter JSON Resume

This was remarkably easy to get up and running by following the getting started guide. Beyond that it was then just a matter of copying the text from my word document version into a json template.

Here’s the finished product…

So why is this better than a word doc?

  1. I can put it in git and host it on github keeping a history of any changes I make.
  2. The templates can be edited to give it a personal feel or you can create your own from scratch.

What’s not so nice?

  1. This is more an issue with my IDE but spell checking a CV is quite important and with it being in JSON makes that harder.

There is also the Jekyll CV Generator which might be worth a try at some point too.

I clearly could have exported my CV from linkedIn but that would have been too easy.

New Year, New Blog

Creating a blog using github pages is really quite easy. I did a bit of googling before I started and found plenty of articles saying how easy it was but you don’t really know until you try. Well, here it is. All the blog posts I found I sure are useful but I found the Github docs perfectly adequate.

I still need to work on the colours but thanks to Jekyll and with a theme by dbtek the framework is now there.

Why pick jekyll?

I was trying to write my own blogging platform using Railo but why bother? Why pay for hosting when Github will host it for you? Not only is it hosted by Github but through the magic of git it’s all now properly versioned. If anyone spots a typo they can use the github issues to let me know.

So, now that I don’t have to waste time trying to write my own platform I can concentrate on writing more posts.

Happy new year!

First ever pubic presentation

It’s done! After a few weeks of sleepless nights I’ve finally presented in public. I’d like to thank all the advice and support I’ve received over the last few days; it really helped. I’m sorry to say that I didn’t break out into song! In the end I really enjoyed it; it certainly was a buzz! I was chuffed with the great questions at the end and the pre-presentation discussion too.

Non of the bits and pieces I presented are perfect, I’m not sure they’ll ever be (I think that was the point of my talk). I’m always happy to chat about this stuff though and I’m always looking for new and better ways of doing things.

The slides from my presentation can be found on slide share.

Thanks also to the SOTR team for organising a great event. Roll on SOTR15!

Scotch on the Rocks 2014

That’s right, I’m going to be speaking about Technical Debt at Scotch on the Rocks 2014. To say I’m a little nervous is an understatment; I’ve never spoken at a conference before. Having attended SOTR twice already I’m aware of the format and understand the need to keep the crowd entertained.

So what am I going to be talking about?

Well that’s what I now need to figure out. Technical debt is not a physical entity that can be measured as such so how do we know if we’re doing the right thing? How can it be measured? Where is it? How does it manifest itself in our daily lives? I’ve recently taken on a house renovation project and there’s definately some technical debt in there! But what about our software? Is it just in the code or is it in our databases, processes, behaviours?

I know it sounds like I could be waffling for the majority of my talk but that is why I have set up this site. I want to start documenting my thought processes and hopefully add some structure to my debut talk!

Head here for more information on the conference: SOTR