The term webhook is an extension of the term hook, altering the behavior of a system, first used by Jeff Lindsay in 2007. Webhooks are user defined callbacks that allow for the alteration of data or behavior of a website through a third party application. They are typically not built or maintained by the developers who are using them so they have to be able to interact with a variety of systems.
Webhooks are used for a variety of functions. Most commonly they are used in continuous integrations systems to trigger builds and releases as well as in tracking systems to notify developers of bugs before they become a problem. A common website use is in payment processors so that they client does not have to worry about personally identifiable information or financial security, those are on the third party providing the webhook.
When triggered the client will send an HTTP request to the webhook’s URL where the service will act on the information and call back into the client’s system with a return object. Events like submitting an order request or accepting a pull request into the main branch are typical triggers for webhooks. The URL, triggering events, and sometimes the return objects are configurable when setting up the webhook.
At some point in your career you are likely to find yourself building or maintaining a webhook, if not you will definitely be consuming them. Understanding the basics as well as some of the best practices around them will help you to navigate what could otherwise be a confusing world of HTTP requests and callback logic. Use the information here to better understand how to build and work with webhooks. It is not comprehensive, but more of a starting place for you to better understand and show you where to look in order to dive deeper.
User Interface For Setting-Up Webhook
For each disparate event that your system emits there needs to be the ability to let it call a different webhook. It may be a different system that they want it to call into. Users need to be able to configure different environments (dev, test, uat, prod) and be able to configure them individually. Developers do not need access to production, nor should you want access as a developer.
Your calls need to have an authentication mechanism such, as a key-pair, to make sure that you are who you say you are. These need to be different between environments. The user interface needs to automatically test the webhook to check that is actually working once it is set up.
Testing Mechanism for Developers
Developers need to be able to shape the data and send it through the process for each webhook available. Make sure to include the ability to test error conditions. That makes manual testing easier for the developer while setting up. The developer also needs to be able to see how their system responded and in what kind of time period it did so.
Special characters and things like dates need to be rich input so that developers can see how the system will handle it based on what they send in via JSON. Constraints that are on the outgoing payload needs to also apply to the testing and incoming payload.
Understand Return Object Shape
Be specific about what is expected in the return especially with HTTP status codes so that consumers know what to expect when using the webhook. The return payload needs to be in the HTTP body, not in headers or other weird areas. Make sure this is highlighted in the testing interface if incorrect.
Provide examples or sample return objects so that consumers can build for what they will receive before they begin consuming. Be lenient in the what you accept, but be specific about formatting of known types. Make sure that the return error codes are understandable and easy to find in the documentation. Allow developers to manually validate their return payload by copying it in, in case they can’t open up the firewall for testing.
Follow the Guidelines as if Your Client Implemented the Protocol (REST)
Use correct HTTP verbs for the action you are intending to take. Most times webhooks are updating a system or object so they will be PUT, may occasionally be a POST. You will not see a GET or DELETE as often.
Make sure to follow the other rules of REST such as authentication protocols and query string parameters. You will need to have an agreed upon set of query string parameters for GETs. For the other HTTP verbs they will be placeholders for the path.
Avoid overloading the consumer’s server by having a form of rate limiting on your calls. This isn’t REST as much as it is good manners. You would expect someone you are calling to rate limit you so when calling out you should rate limit yourself.
Log Request Process
The client needs to be able to see what the results of the webhook were on your end, including any errors. This section should also vividly display any problems with the payload coming back, so that they can be dealt with by the dev.
The validation system in development needs to be the same that you use in production so that development will be able to address issues that may come up in the production environment. Provide at least some basic statistics for usage and any problems that may arise using your system or calling back into their system. This will allow the consumer to troubleshoot sets of problems without having to crawl through logs of individual errors.
Separate Testing Environment
The authentication must be separate between production, development, and test/uat. This will prevent data leakage between environments, especially dev and prod. Endpoints will vary between environments. The host names will likely be the only part that varies whereas your paths should stay the same between environments.
Give the consumer the ability to change their paths and structure of the URLs in development without changing them in other environments. Developers can’t always open a port on the firewall for you to call into, so you may also want to represent (dev-only) payloads in some other usable format, such as one understood by postman, so that they can test locally without getting an actual call from you.
Retry Process with Idempotency
An idempotent method is one in which calling it more than once does not change the state. A method that sets a value equal to a number is idempotent whereas a method that increments that value is not. Have a built in retry process for calling the client because servers may not always be available or may be going through an update when calling into it.
The process should not be linear, but instead an incrementally increasing process. This will ensure that the call goes through without overwhelming the server when it comes back up. If the result of a webhook call initiates an action on your system that changes data, that action must be idempotent. You cannot ensure that it will only be called once in a scalable environment.
Refresh Mechanism for Entities
Things will get out of sync when the retry process fails. There needs to be a manual way to resync data. While it is possible to do this with a GET, it is better to do this as a POST that causes a hook to fire at some point in the future. This allows you to control the way in which your system scales, versus having your client control it.
This also helps with rate limiting. It’s not just about keeping your client from shooting you in the foot, but about keeping them from loading the gun that you use to shoot THEM in the foot. Your least stable clients will be the heaviest users of this part of the system, so the architecture needs to have a bias towards keeping them from inserting instability into your stuff.
Versioning and Deprecation Policy
Always plan what is in a version, and keep the interfaces stable. The typical agile process may have to be abandoned at the edges, but you cannot subject a client’s production environment to a webhook that is constantly changing. This includes stuff that you haven’t considered, such as possible new error conditions, adding properties, changing the format of properties, etc. Even a tiny change can break someone else’s production.
This also means that your client should be able to use different versions in different environments. They have to be able to test. You also need to be sane in your deprecation policy, with a long tail of deprecation before you remove the ability to use an old version. Some people’s systems simply can’t change very quickly, for a variety of reasons.
You need correlation identifiers. In other words, if there is a multi-step process, the client needs to be able to trace the same request all the way through. It’s often better if the client can set the correlation identifier themselves, rather than allowing you to set it. It will often be a part of a database key, so they need to be in control of structure.
These correlation identifiers should also be exposed in any logs and analytical outputs that may require them. The client will likely need them to track down particular use cases that are causing them problems. It goes without saying (right?) that if it is going into logs and transiting the wire a lot, the correlation identifier shouldn’t be PII, PCI data, etc.
Your webhook code should be forgiving of errors sent by the client. While you should specify what potential errors the client may emit, your code must be tolerant of whatever comes out from the client. Assume that people don’t read documentation (or that your docs will become inaccurate, because these things will happen concurrently).
Also be aware of the impact of soft errors on your system (such as calls that simply become dog-slow for some unknown reason). These poorly behaved calls can mean that a badly behaving system at one client can hurt performance for other clients. You may want to structure things so they only hurt themselves.
Pay attention to what happens when your system is slow. If a message is a day late, there is a vast difference between something that requires quick turnaround and something that requires immediate turnaround – you need to know which is which. This may vary between webhooks in the same system.
Tricks of the Trade
Will didn’t write anything.