Optimizing Algorithm Efficiency
Podcast: Play in new window | Download (41.6MB) | Embed
Subscribe: Apple Podcasts | Spotify | Email | RSS | More
Typically, when someone uses the term efficiency, they are describing how much effort is required to achieve a result. The term is quantified and used for comparison to something else’s efficiency. The assertion that something is “efficient” without comparing it to something else isn’t particularly useful. There are numerous ways to quantify efficiency, but most are limited in that they don’t address how well something scales, or they provide insufficient mathematical rigor to justify the decision when there is risk or a lot of money at stake.
“Efficiency” is a weasel word that frequently means whatever the person saying it means for it to mean.
Premature optimization will really harm your efficiency as a developer, without providing any real benefit to the execution of your application. That doesn’t mean you never do, but you have to choose the places where your effort will not negatively impact your own efficiency. Be efficient when trying to be more efficient, as it were. Typically, people tend to care about efficiency when the application is using too many scarce resources, or when it is projected to do so. Usually, you worry about how an algorithm will scale within a given range of available resources.
Episode Breakdown
16:55 The Big O Notation
Way of measuring the growth rate of the worst case of performance for an algorithm. The worst case is usually the one that gets you an irate phone call from a customer, slows a server to a crawl, or initiates a chain of cascading failures. Essentially what the notation does is attempt to describe, mathematically the relationship between the load on an algorithm and the number of operations required to retrieve a result at worst case.
It’s also important to note that big O is a model for the worst case scenario under the following constraints:
1. Normal system operation. If your harddrive is starting to get bad sectors or your comcast connection is performing like AOL from the mid-90s, this doesn’t apply to you.
2. The growth rate is described as a mathematical equation. It should be understood that this equation is within a range and is not necessarily absolute to extremely large values. For instance, a hash table that has a constant lookup time may not be performant on extremely large datasets due to things like paging memory to disk, or hash collisions.
3. Similarly, as the dataset in use gets smaller, the overhead of initial setup of the data structure starts to take a larger slice of the total cost. It is a fixed cost, perhaps, but if it is large enough, it can outstrip the dynamic cost that is added atop it.
4. It’s also very hard to truly predict growth rates on many modern systems, especially in managed languages that automatically do garbage collection.
29:29 O(1) or Constant Time
Growth rates expressed in this fashion are exactly the same regardless of how large a set is. This is what you get when you access an array item by index. You are simply specifying the number of bytes to travel to get to the item in question. Larger jumps are not more expensive than smaller ones. This is why you’ll see things like hash sets.
31:52 O(N) or Linear Time
The required number of steps to achieve a particular result scales in a fashion that is directly related to the number of items in the list (or is some multiple thereof). This is the sort of scaling you would get if you loop through the array to find a particular element; the number of steps you have to take to find the item (in the worst case) would be if you had to iterate through the whole thing. The best case would be if it was the first item, but that’s not what we are measuring.
35:15 O(N2) or Square Time
Growth rates expressed in this fashion scale to the square of the number of items in the list. If you have a list of names and Id’s each with a child list of related Ids. If you want to find the names of those related to a particular name on the parent list you would first find that name, then get the list of Id’s from the child list. Next you need to look through the parent list again to get the names of those items based on their Id. This would be O(N2) because it’s O(N) * O(N).
38:30 O(2N) or Exponential Time
This is an exponential growth rate where each successive generation produces 2 N as much as the previous. If you start with 2 they produce 4, then 16, then 65,536. Growth rates expressed in this fashion are particularly insane. These are best avoided unless you have a really small dataset and are trying to brute force things.
41:10 O(log N) or Logarithmic Time
Using this in a search if you start in the middle and if the desired result is less go with the first half and more go with the second half. The number of entries matters, but because the set is splitting each time, it doesn’t increase linearly with the number of people.
44:48 Real Life Uses
This appears more in interviews than in actual development, at least from the perspective of quantifying efficiency. Being able to characterize how something uses more memory, CPU, storage, or network bandwidth as the application scales is absolutely critical. Having some mathematical understanding around it will keep you from just guessing.
Be very cautious in regards to optimizing at the wrong time. It’s more efficient to use a hash table for constant lookup time, and it will matter if you are doing it a lot. It won’t matter much if you are doing it just once.
Remember that any discussion of scaling that doesn’t include phase changes when system constraints are reached is not a particularly good one to rely on. When you scale up, you will have to adjust. Remember that theoretical physics is beautiful and that actual physics includes equipment failure and bad measurement.
IoTease: Project
Make your own smartwatch from an old cell phone
“The primary reason for this project is simply that I had an old cell phone laying around and wanted to find a creative way to repurpose it.”
I found this project on TinkerNut.com and like all the best projects this started out as someone trying to see if they could do it. It’s a tutorial on how to take an old cell phone and turn it into a relatively smart watch. Looks like a fun weekend project for anyone interested in tinkering around with some old tech.
Hardware
- Old Monochrome Cell Phone
- Arduino Uno
- Ribbon Cable
- Resistors( 1.8 kOhm(4), 3.3 kOhm(4), 10 kOhm)
- Diode
- Bluetooth Module
- Vibrating Motor
- Momentary Switch
- Slide Switch
- Micro USB Charging Board
- 3.7v Lithium Ion Battery
Tricks of the Trade
A short rant on mathematical rigor in programming specifically testing. Developers tend to test things for 1,000 or 10,000 iterations based simply on being a large number. This indicates a lack of analysis. Take time to learn statistics to know the number needed to get a confidence interval.