[RFC] Github integration research

Abstract

In order to develop this integration between Github and Fyndr its mandatory to do some research in advance. In this RFC I will describe some of the research that has been done in order to land on certain conclusions.

Status

💬 Being discussed
🎉 Implemented in !303
🎉 Planned in #330

Background

To ensure the security of this integration there was some research that had to be done on Github's webhook security. The hypothesis is that Github is secure enough to use. However, it is important that we follow their guidelines on security if we want to effectively secure our endpoint. With this integration we're essentially adding an endpoint that can directly post, update and delete articles from a Fyndr instance.

Problem 1

We need to secure our webhook endpoint with an API key. But the question is, how do we make an API key? Can we not just use a JWT (JSON Web Token) and call it a day? We already use JWTs in our backend so we just need to give GitHub a JWT to use in the header of the request and it should work. Except it doesn't work like that.

Solution

My initial thought was to use a JWT because we already use these in our API. This would be the easiest solution to implement. They don't need to be saved in the database because all the information is in the token itself and the validation process is relatively simple and is considered a standard in this industry¹. However, JWTs are not meant to be used as API keys. JWTs are used for user-level access. They provide authorization and authentication for different users. Where every user gets their own level of access. Whereas API keys provide application-level access, giving every 'user' the same access level¹. This type of security is not quite what I was looking for. The main reason for me to use JWTs would be for their expiration date. It is very easy to add an expiration date, some libraries do this automatically for you, usually a JWT is valid for not longer than 24 hours. But because of this short lifespan this solution becomes quite unusable. I can't alter GitHub's script, so I also can't give GitHub the ability to use a refresh token.

But what if we generate a JWT with a very long expiration date? Technically this would work. I could add information to the JWT like the repository URL which is supposed to represent the user in this case. The only problem with JWTs is that they are vulnerable to multiple type of attacks. ²³ They can be captured by a hacker and then he can use this JWT until it expires. Since JWTs are stateless its not possible to invalidate a JWT remotely. Unless you keep a blacklist of specific tokens, but that defeats the purpose of being a stateless token. By increasing the lifespan they become less secure. So JWTs are not a reliable solution for this exact problem. However, using JWTs instead of API keys is still possible as long as the client application supports it.

After working on this integration I have found out that it wouldn't be an option to use JWTs. Github signs their requests using SHA256 and a secret token as the private key. Technically its possible to use the JWT as the private key but its not possible to uncover what token was used for the encryption. So therefor its not possible to use the features of a JWT and is then only being used as a random string. ⁴

Background - Problem 2

The integration needs to be tested with integration tests. This requirement came from a riskanalysis that was written beforehand. But in order to figure out how to do these integration tests I need to figure out how this is done.

Problem

One of the requirements for this integration is that it has to be tested. For this integration it is not sufficient to only write unit tests. There must be some sort of integration test to ensure the quality.

Solution ⁵

After doing some research on integration testing I have discovered that integration tests are not quite what I was looking for. During integration tests you test the compatibility between modules and/or classes⁶. This is not something I'm interested in though, I need something that will test the endpoints in our API. Something that makes sure the API responds appropriately.

So that's when I started looking for different testing methods⁷. Some of the testing methods I found were specifically for the front-end, others were for manual testing. So I set a couple of requirements for the type of testing I need:

I need to be able to mock the requests that GitHub sends so that I can test how the API will respond to requests.
It must not include the front-end
It must have the possibility to be automated
A single test must cover an entire functionality (like posting a repository)

From this list of testing methods⁷. The following methods matched my requirements best:

Functional testing⁸
- Functional testing is the process of validating functionality of a software application. Pass or fail is the result of a functional test, because either a feature works as designed or it does not.

Some other methods also looked interesting to me:

Load testing ⁹
- Load testing is where you test how the application performs under a specific load.
Performance testing ⁹
- Performance testing is where you measure the responsiveness of an application with different amounts of users.

When searching for tools to help me with these types of testing I very quickly stumbled upon Postman. I can run performance and functionality tests with ease through postman. I can technically run Load tests as well but postman is limited to 100 VUs (Virtual Users). I have run performance tests at 100 VUs and the results were not as accurate as I had hoped. I was using 'ngrok' which had a request limit. Causing the requests to return 429 Too many requests. But this was only an issue because ngrok is only meant for development and not for production. Besides that, the API performed really well with an average response time of around 60ms¹⁰.

Source list

Edited Aug 15, 2023 by Tim Verkleij