I'm a big fan of Hyperconvergence as you might now. One of the important capabilities of Hyperconverged products is the ability to deduplicate storage. In this article on of the pioneers in deduplication, Atlantis Computing, explains what's important about data deduplication.
Understanding the available data reduction technologies, with their strengths and weaknesses is a good place to start with. But perhaps as important is to understand your own data set. E.g. hoping for a high level of data reduction on an already compressed file format is just that, a hope. On the other hand, there is a widespread misconception that introducing data reduction technology will necessarily, or at least in most cases, impact application performance.
Atlantis Computing created a holistic white paper on the key data reduction technologies. You can find the paper here.
Here is an excerpt from the white paper that talks about the top ten questions you should ask yourself when choosing the right data reduction technology:
- Is my data set de-duplicable?
- Is my data set compressible? If so, is compression already turned on at the application layer?
- What's the granularity (page size) of deduplication? Smaller page sizes like 4K are likely to yield significantly better savings vs. 8K or 16K.
- Does the page size always stay granular or does it become fat (e.g. going to 32K or greater)?
- Is data reduction always inline or does it get into post-process mode sometimes?
- Is I/O offload a benefit of that particular data reduction technology
- Does the data reduction technology improve my overall latency
- Does the data reduction technology improve the life of my solid state storage
- Will the data reduction technology make my storage management more efficient?
- Do storage data services benefit from the underlying data reduction technology?
Read more at the Atlantis Computing blog: https://community.atlantiscomputing.com/blog/Atlantis/December-2015/The-Top-10-Considerations-for-Data-Reduction