API Best Practices
Podcast: Play in new window | Download (56.1MB) | Embed
Subscribe: Apple Podcasts | Spotify | Email | RSS | More
Designing a web API (or Application Programming Interface) that lives on a webserver can be very difficult. Not only must you be concerned with security, but also with working well with a clients that may be written in any number of other programming languages, running on who-knows-what platforms, possibly with slow or intermittant connectivity. You’ll also have to future-proof your API in such a way that you can safely upgrade it as you discover new things that it needs to do. In addition, you want to control how other parties use your API, so that they can’t damage the system. You also need to consider how you will handle things like errors, outages, and upgrading the system. All in all, there is a lot of stuff you need to consider in order to successfully build a new API that other people will use and that you can actually support.
“There’s all sorts of things that can go wrong.”
Rolling out an API is easy. Theoretically, you just stand up an endpoint and have your clients call it. Nothing could be simpler. The problems come in when you have to maintain it, when someone has to use it, and when you have to support it. It’s also possible to break systems, including your own, with a bad API. Many managers are not aware of all the risks of simply kicking out an API and walking away from it. This is the sort of thing that a manager can push and then get promoted for, while two years later, you get blamed. This is not a position you want to be in, especially not with large clients or partners in the mix.
There’s a lot to consider when building an API. While we frequently try to simplify the process down to where it feels the same as making a simple library for our own use, this approach really doesn’t get us where we need to be. Instead, it’s better to embrace the complexity and to realize that building an API is a bit bigger than a lot of typical development experiences and adjust accordingly.
Episode Breakdown
09:40 Versioning
Trying to retrofit versioning on an API is a pill. Bear in mind that if you require something new, return something new, or put things in a different shape, congratulations, you’ve incremented the version. Small changes can break other people’s apps. Note that you can usually add functionality, but not change or remove functionality without problems.
“Versioning stuff while it’s live is really hard to get right.”
If you didn’t plan ahead, different parts of the app can end up with widely different versions. This is a good place for waterfall development, at least as far as your interfaces. This goes badly when done without p roper planning. This can also make documentation and support into a really nasty piece of work, so it’s important to get this right so you don’t drastically increase the costs to the company.
“This is a really good place for waterfall development.”
There are a few ways to handle this on the web. Most approaches boil down to having separate endpoints for different versions of the API. So you would do something like /api/v1 for version 1 and /api/v2 for version 2. Above all, this needs to be consistent between APIs. There are also alternative options instead of using separate endpoints, directories, or domains for version incrementing, but these make it harder to tell which version of the API is being used if you aren’t careful.
Another issue is present here as well, and that is how you deal with beta versions and sunsetted/deprecated versions. Usually with beta versions, you advise users not to use them in production, but you still need to be careful. If the user is a client with a large account, you can get stuck with an interface you don’t want because you put it in the beta. Be careful. You also need to have a reasonable strategy for promoting a beta to a real, live production system, including informing users about when the transition will occur, and when the old one will go away. Similarly, you need to have a strategy for informing callers before an older version is removed. You need to do this some time in advance. Again, if a big client is using it and isn’t moving any time soon, you may be stuck supporting it.
21:24 Error Handling
Your application will have errors at some point. Error conditions and messages are part of the interface specification and as such, they need to be designed with it. It goes without saying that your code should be very defensive.
“All our stuff is internal so the developer on the other end is on the same team as me.”
Communicating errors clearly is key here. The error message should clearly express the problem in terms that allow the consumer to decide what to do. Note that the consumer program can still change the message they display, but you need to make it clear to the developer on the other end what is wrong. Note that you may need to provide the error message in multiple languages (human or computer) along with enough diagnostics to help them troubleshoot the issue.
Some thoughts on how to mitigate these problems. Use error numbers for known conditions. Look up the error number in a table for the actual error message you give to the client. If you do this, you can also make the error message a template with information specific to the client. Don’t give them a stack trace, but give them an identifier for the particular instance of the error, so you can keep a stack trace that you can look up. Be careful about what you reveal in an error message, as hack attempts will get lots of error messages in many cases. Also don’t forget to watch for the usual nasty web shenanigans.
25:40 Rate Limiting
Given a big enough API someone will misuse it, whether intentionally or not. This can be anything from a hack attempt, to someone trying to bulk load data into the system. If there is a cost in servicing an API request and users are paying a fixed amount for using the API, then you have to rate limit. You should build this in from the beginning, as it can be difficult to redesign a system to handle rate limiting, especially if you are dealing with the kind of problems that happen when you don’t have it in place.
“Management doesn’t think about stuff like this, this is your job.”
Some users will also try to abuse the system. Having a great API is not really enough. With a large enough population of users, someone will eventually attempt bad things using your service.
There are a few ways to handle this. The gist of it is that you need to limit requests to no more than a certain number within a given time period from a given account. You will almost certainly need to set these limits on an account by account basis. Someone with a starter account shouldn’t get the same limits as an enterprise account. Also, when someone asks to raise the rate limit, you need to find out why. In many cases, you’ll be better off keeping the rate limit in place and adding bulk processing functionality.
29:15 Asynchronous and Batch Processes
Long-running processes in an API context add some difficulties. Clients will tend to want to check to see if the process is finished (or get results). Many clients don’t consider what happens when they (and everybody else) call your API too frequently, which can knock a server down in no time. Clients may also need something processed within a certain time frame under a service level agreement (SLA), which can add complexity. An SLA is an agreement between a provider of a service and the client. This is basically the contract of what is and is not provided to the client. Many clients may try to kick off processes within the same time frame, while off-peak periods can be slow.
You may also not want to run processes on an ad-hoc basis in general because of rate limiting on services that you use. This may mean that you end up running batches, or queuing up work. This can get really tricky if you find yourself under load and have aggressive SLAs.
“You reach into your wallet and you find those green pictures of Benjamin Frankin and you start throwing them.”
There are several options here, depending on the problem. You should prefer web hook setups over allowing clients to poll for status updates. This sets it up so you tell the clients when something is done, rather than them asking you. You may also need to look into automatically provisioning additional servers to run jobs as load increases so that you can meet SLAs. If you are automatically provisioning cloud servers for extra load, you have to careful that you don’t over-provision them and that you can quickly decommission them. The above approach generally means that you are going to have to think carefully about how you deploy your software. It also goes without saying that you need to think about how to scale your databases in response to higher load.
34:25 Developer API Playgrounds
When on-boarding a new client into using your API, you probably don’t want them using a production environment. Users testing out the API need to be able to quickly get to the point where they can get something working. They also need to be controlled so that they don’t damage production environments. They also need a reliable way to simulate known error conditions so that they can effectively test. The transition between the development sandbox and a production environment needs to straightforward.
“They’re going to need to know what errors you are going to throw.”
Don’t put the production environment on a big server cluster and put the development playground on a laptop somewhere with 8 gigs of RAM and spotty internet. Don’t allow the development system to get drastically behind the production system. Don’t take down or change the development system without clear messaging to the clients. Don’t ever change interfaces between development and production, or you’ll be forcing them to test in production.
The developer sandbox should assist the developers using your system into success. Changing from one to the other should require a change in endpoint and change in keys. You should have known inputs to your development system that reliably simulate common errors and system conditions that the developers are likely to experience. If providing webhooks to the client, your development API system should have some mechanism for triggering them. With webhooks, you may also need to provide samples of the payload (if JSON) that will be sent into the webhook under various conditions so that developers can test without opening a hole in their firewall.
39:45 API Approval
“You really don’t want to use an API that would have you as a client.”
You probably shouldn’t allow just anyone to set up an account and start using your API. This has a pretty obvious potential for abuse and makes it easier for abusers to just create a new account and continue. It also could mean a stray post to slashdot, hacker news, or whatever can instantly overwhelm your system through hundreds of new accounts, even if none of the individuals are abusive. You probably also want to know something about the people using your API. Or at least your marketing department does. None of this applies if you are google, because you can handle auto-DDOSing yourself, can easily get information about clients, etc. For the rest of us, caution is required.
In the early days of an API, people signing up should be vetted manually. You might think you know where things will fail to scale. You might even be right. A restricted beta to a limited number of clients allows you to evolve the API based on feedback before everybody and their dog is sending HTTP requests your way.
“There’s realities to living on the internet.”
As an API matures, you can probably go to a more automated signup process. An automated signup should still have a little bit of a delay or the bots will play. You should also consider a probationary period for new accounts with tighter limits on behavior. Above all, don’t let the marketing and non-tech types entirely dictate the decision. They won’t be there when the servers fall over.
43:20 Multi-Language Support for Computers
Your potential clients probably aren’t going to be using the same programming language as you. Heterogenous environments are the norm. Very few have the luxury of a single environment. This can restrict what kinds of things you are able to do. For instance, non-dynamic languages can be painful to use if you API returns data that isn’t shaped in the same way. Note that the above can be an issue with some languages, but not all. Other people’s platforms can be weird to call into as well. For instance, many platforms will throw errors if you attempt to post HTML to an endpoint because of cross-site scripting concerns.
Multiple languages can also make it tricky to give everybody a decent library that they can pull in with a package manager. Your library will stink if you have someone who isn’t fluent in a particular language attempt to build an SDK in it. However, your company probably can’t afford enough developers to cover the ten most common languages, much less the many less common ones.
“I know a guy that can write Clipper code in any language.”
You really need more tooling. The default stuff in your API editor is probably ok, but isn’t necessarily the best choice when having to support a lot of other languages. You should probably consider a setup like Swagger, which gives you a lot of tooling to help you design your API, make it work across lots of different platforms, and even see how it behaves. This tool can also help you standardize your API approach across teams, which will help you reduce the number of surprises that your clients experience in using your API.
47:05 Multi-Language Support for Humans
“English is actually the third most common language.”
It turns out that most people on the planet don’t use English as their main language. It also turns out that unless you spend a lot of effort attempting to disable it, people who don’t speak your language will be attempting to call/use your app. Should the number of users of a second language be high enough, your employer will probably want to be able to provide API interactions in more than one language. This can be pretty annoying if you’ve hard-coded a lot of strings or are building them dynamically based on the rules of your own language.
To fix this, you pretty much have to make sure that any client-facing strings in your API have the following characteristics. They are not hard coded in. There is the ability to specify the language, timezone, and other things that change with internationalization at the client level and that you respect it. You don’t build up messages based upon the syntax of your language, nor do you rely on string replacement. This means error message text has to be fixed, with the details for a particular instance separated. Then you outsource the translation to someone else. Don’t let anyone make you do that unless you are multi-lingual.
52:50 Communication with Clients
People that are relying on your API tend to expect your services to continue running. This means that you can’t just push updates and changes in the middle of when people expect things to be working, except in extreme circumstances. This also means that you have to give a lot of notice before you make changes that can cause people problems. Don’t be like Microsoft is with Windows Update and assume that your needs are the most important and that your clients are idiots who need to be forced to take an update right away.
“I’m seriously considering the next personal laptop I get will be a Mac.”
There are several things you have to get right here. You should only do non-critical maintenance in a planned outage window that is either. recurring on a predictable schedule, communicated to the client in advance, etc. or you do it as needed and still warn people well in advance. When critical events happen, you may have to take things down in a hurry. You need to admit to the system being under emergency maintenance, both on twitter and whatever other communication mechanism you use. This at least reassures users that the problem is being looked at.
You also need to make sure that you regularly contact your users and yes, this is kind of a development role. They need to know what’s coming up in the API, with actual useful examples. They also need to know about impending downtime, maintenance, etc. Most of these things are not properly marketing responsibilities, although marketing may try to take them over.
IoTease: Project
Ten best practices for securing the Internet of Things in your organization.
Security in the world of internet of things has become a big issue. It is even more of an issue at an organizational or enterprise level. This article, while a little over a year old, goes into detail on ten best practices for IoT security. These include understanding and mitigating risks of external devices and BYOD workplaces as well as ways to secure API endpoints for devices. One interesting practical tip is to move from device level access to individual identity access since devices are allowing for multiple users.
Tricks of the Trade
This isn’t quite a development thing, but I’m digging into a little more of the digital marketing stuff. I figure at least some of our audience is looking into doing the same. If you are, you may want to check out FIMP!. FIMP is the Free Internet Marketing Project and is a set of free courses on how to think about picking out marketing niches and the like in tech. If you are looking into getting into the marketing game or just trying to understand how it all fits together, then this stuff may help you out. It’s really top-notch content, especially for free. I think it’s better than a lot of stuff I’ve paid for.
Yesterday morning, a coworker came to my desk and said he was starting work on an API and wanted to know if I had any thoughts about best practices. I said “let me tell you about this great podcast I was listening to on the drive in this morning!”
A while later I heard two familiar neckbeardy voices coming from his PC speakers.
Congratulations on the new listener, and keep up the good work!