Mandatory 5: Horizontal Scaling. Continuous Integration

Learning objective:

To explore horizontal scaling, by making multiple replicas of the daemon; and to use (BitBucket) build pipelines for CI.

Deadline

December 12th at 23.59

Exercise 'horizontal-scaling-failing'

Learning Goal: To demonstrate the inability of SkyCave to do scale horizontally.

Note: This exercise may strain your laptop quite a bit. If you can spare it, increase Memory (6GB) in the VM, as I had to do.

We aim to make a stack running 2-3 skycave daemons as replicas, but configured to accept any login credentials; and next generate some load on it.

Experiment Part I: Setting up

There is a null-object implementation of the SubscriptionServiceConnector, 'cloud.cave.doubles.NullSubscriptionService'. Review the code to see how it implements the 'authorize()' method.
Create a new CPF 'swarm-db-load.cpf' which configures the skycave for running in a stack with a Redis and the above mentioned NullSubscriptionService. The quote service binding is optional. That is, something like:

Subscription: *NullSubscription* CaveStorage: Redis Quote: TestDouble PlayerNameService: InMemory
Make a copy of your 'swarm-db.yml' into something ala 'swarm-db-load.yml' and make a minor change so your 'daemon' uses the CPF above (the one with null subscription server).
Create a new image, upload to Hub, and start the stack.
Small Steps: Run your 'cmd' against the locally deployed stack and verify that any credentials are valid:
```
gradle cmd -Pid=anything -Ppwd=anything            
          
```
Load testing: Now try the load generator
```
gradle load
          
```
Find the source code and explain what happens. Review the parameters you may set on the load tester (inspect the 'build.gradle' file).
Log in again using any user after the load tester has run and see "what has happened".

Experiment Part II: Scaling

Shut down the stack; rewire the deploy section in your compose file such that 2 (or 3?) replicas of SkyCave are deployed.
Review the logs for your stack's daemon service to see all services being started.
Open your web browser on SkyCave's info page, http://localhost:7777/info and do a few 'shift-reloads' to see that you actually hit one, and then the other, daemon. (The IP addresses change).
Redo the load experiment.
Explain what happens? What role implementation needs to be changed?

Exercise 'session-database' [M 20]

Prerequisite: You have done the experiment above, as it is required as part of the hand-in.

The previous exercise should establish that our SkyCave daemon does not presently scale horizontally, as it is statefull, a state which is not shared across daemon replicas.

You should update SkyCave so it handles horizontal scaling of the 'daemon' using the 'session database' pattern.

Requirements:

Solve this issue, using the Bahga 'Session Database' pattern, by implementing a proper 'PlayerNameService' implementation, and adding a 'session-database-load.cpf' which configures the daemon to use it.

Subscription: NullSubscription CaveStorage: Redis Quote: Real PlayerNameService: SessionDatabase

Hand-in:

Full path of the one Java class which solves this issue.
In addition a screenshot/text file containing the central session database method/algorithm in the above java class.
A before screen shot of the load generator running on the daemon with 'swarm-db-load.cpf' which throws errors.
A after screen shot of the load generator running on the daemon with 'session-database-load.cpf' which works correctly without errors.
Screenshots showing the log output from BOTH daemons and how they are both handling request/replies from your 'load' runner.

Evaluation: I see your screenshots, I see your code, I like it, I award 20 points. And by now, you know what I like...

Hint: The solution to this exercise is extremely simple, once you 'see the light', and SkyCave is already prepared for it through the configurable 'PlayerNameService'. So - if you find yourself wanting to code a lot of Java, starting new databases, adding new service points to the CPF, then you are on the wrong track! You should notice that current SkyCave design already has a database which stores current session data...

Exercise 'horizontal-scaling' [M 40]

Update 'daemon' so it runs two or three SkyCave daemon instances in a horizontal scaled / load-balanced setup in Docker Swarm.

Note: Solve the previous exercises first, as they contain 95% of the solution ;-).

Requirements:

Create a compose file, horizontal-scaling.yml, that runs three (or at least two) replica instances of your SkyCave daemon (use the 'no source code' image from the multistage exercise), and connecting to a B) Redis on a C) named network and storing data on a D) named volume, and E) including the Docker visualizer.
Use Bagha's session database technique for handling horizontally scaling of session data in those SkyCave delegates that requires that.

Hand-in:

Submit your compose/stack file. Remember to include it in your skycave-image as well.
Add a CPF file in the server subproject, cpf/horizontal-scaling.cpf, defining the following configuration:

Subscription: Real CaveStorage: Redis Quote: Real PlayerNameService: SessionDatabase
Screenshots of
- Docker visualizer web page, showing all your services running.
- Output from 'docker stack ps'
- 'docker service logs -f ...daemon' for the skycave daemons, while you issue some cmd commands...
- The 'cmd' connected to the cluster of SkyCave daemons.

Evaluation: I will evaluate analyzability and correctness of the compose file, with (0,4,7,10) points for each learning goal, as outlined below. I will then multiply by two to get the final score.

Learning Goal	Assessment parameters
Analyzability	The compose file is easy to analyze (read, understand, reason correctly about). The compose file contains a preamble outlining how to use it (build instruction, stack deploy command, etc.)
Correctness	I will (as best possible) evaluate correctness of the composefile from your screenshots, the file contents, and potential execution. The screenshots must be detailed and comprehensive enough that I can convince myself that everything works correctly.

Exercise 'horizontal-scaling-haproxy'

This exercise is "very optional" and only works on Linux!

In this exercise we will run two daemons and use HAProxy as load balancer. HAProxy defaults to true 'round-robin' load balancing which makes it much easier to detect failures when horizontally scaling the daemon. Thanks to former students Mark and Finn for defining the ha proxy configuration file.

Use the HAProxy load balancer for a manually configured, local, setup: Your 'cmd' should contact the IP of the load balancer (localhost:7777), which in turn routes all calls to two instances of daemon (e.g. localhost:6677 and localhost:6688). Naturally, you have to run the daemons with the Redis storage attached (but all other services may be test doubled). And create CPF files that match a suitable configuration for each of the two daemons (you need to set the SKYCAVE_APPSERVER to make the two daemons run on individual ports.)

You can find Finn and Mark's (slightly modified) haproxy configuration which should more or less work out of the box, if you run all services on localhost.

The (as always) by far easiest way to run the HAProxy service, is using docker but beware that you need to run it using --network host so the load balancer is on the host machines network (otherwise it cannot forward requests to the localhost based daemons.)

Note: you have to copy the haproxy configuration file into the haproxy container. Either build a small Dockerfile which creates a local image you can run (it is described at docker hub), or use volume mounting using -v to make the configuration file available from within the container. I did the former, and it boils down to:

# Build file for the HAProxy server alone

FROM haproxy:2.0
COPY haproxy.cfg /usr/local/etc/haproxy/haproxy.cfg

Process:

Do all the trickery to run HAProxy and daemons on two different ports. Start by starting just one daemon and validate that everything is working by connecting a 'cmd'. (HAProxy will know that one daemon is missing and direct all traffic to the single, live, daemon.)
Now, restart all services with both daemons running.
Now, connect a 'cmd' and do a couple of commands. See cmd and one of the daemons fail, and diagnose why it happens. Note, that the failure reported by SkyCave is a bit confusing.
Fixing the issue is the next exercise.

Exercise 'horizontal-scaling-mq'

As an alternative to swarm's ingress network or a load balancer, you can use a Message Broker instead. We may return to MQ systems in course two, as they have a lot of nice Availability Quality aspects.

In RabbitMQ's RPC Tutorial for Java you can find both client side and server side example code, which is rather easily modified to fill the 'ClientRequestHandler' and 'ServerRequestHandler' roles of the FRDS.Broker pattern.

So, start a RabbitMQ in a docker container, code the CRH and SRH roles, and let the Broker pattern use the MQ as IPC layer instead of HTTP or Sockets.

Exercise 'bitbucket-pipeline-ci' (*) [M 80]

We have a lot of trickery going to move from a code change, until it provides value for our customers, and while gradle and docker IaC helps a lot, we still do a lot by hand. In this exercise, we start automation in a build pipeline, but focuses on the production of a 'release candidate deployment unit'.

In this exercise, you should write a BitBucket IaC pipeline, which does Continuous Integration, to produce a (Docker) Hub image release candidate.

Requirements: Your pipeline must (at least) consist of four stages:

Unit Test. Run all unit tests. ('gradle test')
Integration/Service Test. Run out-of-process tests, that is, all tests that use TestContainers to start docker containers (service tests, consumer driven tests, integration tests). ('gradle itest')
SkyCave Image Release. Build the skycave image (with code) and push it to your private (docker hub) account with the normal tag that I/Crunch will pull (ala 'private:mycave' or 'hub.baerbak.com/zulu/mycave').
Production Image Release. Build the skycave image (the 'jar' without code) and push it to your private hub account, both with a build number (ala 'private:mycave-jar-2.56' or 'hub.baerbak.com/zulu/cave-jar:v2.56) as well as something resembling a latest tag ala 'private:mycave-jar'.

Hand-in

Submit your 'bitbucket-pipeline.yml' file, and include it in the skycave-image so I can see it from Crunch.
Submit a set of screenshots, showing
- The bitbucket pipeline page starting, executing, and finishing the build pipeline.
- The detailed bitbucket view showing the actual commands executed for the Production Image Release step of the build pipeline.
- Showcase that a newer image, version identified, is now available for pulling from (Docker) hub.

If you are used to another pipeline system, like GitLab, you can do the exercise using that particular system, but please provide some good comments as to what is going on, so I can get an understanding :-).

Evaluation: The 80 points are split into (20/60). I use the (0, 4, 7, 10) scale to evaluate your pipeline IaC file's analyzability - and then multiply by two. The remaining 60 points are rewarded in case your screenshots leaves no doubt in my mind, that you have actually made a working pipeline.

Learning Goal	Assessment parameters
Submission	Required artifacts (pipeline file, screenshots) are all present. All four required steps in the pipeline has been made.
Analyzability	The pipeline file is easy to analyze (read, understand, reason correctly about). It contains author/group information. It contains concise comments on cental aspects of each step.

Hint: You have to log into docker to push your image but do NOT put your credentials in the script! Instead you can enter your credentials securely in the pipeline account variables which are then exposed as normal Linux environment variables in your pipeline scripts. To set a variable, you have to open the BitBucket repository for your project, next click Repository settings (in the menu on the left), then scrolle down to Pipelines/Repository variables where you can add an environmental variable and its value which is stored encrypted. Alternatively you can set a 'workspace variable' from your root bitbucket account page.

Example: I have set the workspace variable DOCKERHUB_PASSWORD to be my hub password and can then do stuff like

          - echo ${DOCKERHUB_PASSWORD} | docker login --username "$DOCKERHUB_USERNAME" --password-stdin

If you use hub.baerbak.com, remember that in your login!

Exercise 'alternative-ci'

Redo the above exercise but with Jenkins, or GitLab CD, or Concourse, or Hudson, or ... instead.