Deadline: 1st December 2021.
Please report any issues and unclarities, and I will update the exercise continously.
Improve Availability (Stability) of SkyCave by introducing timeouts (or circuit breakers), and introducing (simple) health monitoring. (To limit workload substantially, you only need to address integration points related to your group's own microservice.)
You have a working SkyCave microservice architecture from the previous exercise including your own and two other (one-man groups: one other) groups' services.
Hand-in:
Evaluation:
Your report is evaluted pass/not pass initially (and with some ideas for improvement). The final report (also including the next mandatory exercise) is finally evaluated along with the final oral defense for a final grade for this course.
Note that in contrast to the 'timeout-quote-service' exercise from the seminar in which you could get away with just returning a 'it-did-not-work-sorry' quote from the quote service, in this case you have to do something in the daemon's architecture (The PlayerServant or perhaps better in the invokers) to properly address the safe failure mode. It is not sufficient to return, say, a 'RoomRecord' with a description "this room does not exist", right? So - catch connection/timeout exceptions from your library, convert them to CaveIPCExceptions (or your own variants - I have made a subclass called CaveFailureModeException) and let the server side catch them to provide proper feed back to the 'cmd'.
Implementation hint: The 'cmd' receives marshalled 'ReplyObject's and if its 'statusCode' is outside the 200-299 interval, the client side Broker library will instead throw a 'frds.broker.IPCException' which is caught in the CmdInterpreter's 'readEvalLoop()' method. This is a proper place to handle safe failure modes by informing the player, that something unusual happened and other actions needs to be taken. Have a look at the test case 'TestCmdFailureHandling' in the client project.
You have to add an additional GET /health path to your REST service, but you are not required to update the first API specification section of your report.
If you add healthchecks to the SkyCave daemon itself, you either just update the 'CaveUriTunnelServerRequestHandler' code, or add a new implementation/subclass which forces you to overwrite the SKYCAVE_SERVERREQUESTHANDLER_IMPLEMENTATION in the CPF. To inspect the state of the container, the
docker inspect minus minus format='{{json .State.Health}}' (container-id)
comes in handy (or use 'docker ps').
If you a new to writing reports in an academic setting, please consult my review guide. Basically, you should write clear and concise, demonstrate systematic work and document your work convincingly. Easier to say than to do...