I wanted to go through the exercise of contributing to open source with a project of my own. After thinking about it for probably 15 minutes, I decided I wanted to try to build my own caching system in Java. Too bad I knew next to nothing about caching. I went off and did some research.
There are certain known algorithms that have become popular when implementing caches. Given that caches have a finite size (either you run out of space or memory), the cache algorithms are used to manage the cache. These algorithms determine things like how long an item remains in the cache and what gets booted out of the cache when it reaches its maximum size. Wikipedia describes the most efficient caching algorithm "would be to always discard the information that will not be needed for the longest time in the future". You need to take a look at the data you want to cache before deciding on a caching strategy. Do you need to support random access (the access to the data is uniformly distributed) or sequential access ( you're interested in large chunks of data at a time)? Is certain data accessed more often that other pieces of data?
Here's a couple common algorithms:
- Least Recently Used (LRU) - the items that haven't been accessed the longest get the boot first. This is implemented by keeping a timestamp for all items in the cache. Check out this simple LRU implementation.
- Least Frequently Used (LFU) - the items that are sitting in the cache but have been accessed the least are booted out first. This is implemented by a counter to see how often an item is accessed.
- First In First Out (FIFO) - the item that first entered the cache is the first to go when it gets full. This can be easily implemented by a queue.