ILM Tiered Storage Management

first_imgBroadband, fast computers, cheap storage and multi-megapixel color cameras and scanners have resulted in an explosion of large-content files. 10 years ago, the type of file data managed by document and content management systems was dominated by black-and-white TIFF images.  The rule-of-thumb for bitonal TIFF images was that the size of one compressed scanned page was about 30K.  In that scernario, a 100-Gig disk was probably more than adequate for most bitonal imaging applications. But while black-and-white TIFF images are still pretty common, content like high-resolution color images, video, audio and business documents with multi-media embeddings are in even more demand now.  Enterprise Content Management (ECM) systems provide secure storage for any kind of content, and large-size “multi-media” content is no exception.A single high-resolution color image will typically be multi-megabyte and even modestly-sized videos can occupy tens to hundreds of megabytes of storage.  That kind of data could eat up 100-Gig disk pretty fast. As data ages, it typically loses importance and is referenced less and less frequently.  Keeping large amounts of old data on-line at minimum complicates full-system backups and also has a good chance of eventually slowing down or gumming up overall system performance.That’s where the idea of Information Lifecycle Management (ILM) comes in.  Much of the ILM concept centers on Records Management (RM), how long to retain documents, and when and how to dispose or archive documents. But in terms of storage and cost efficiency, ILM also has the notion that a piece of content might gradually migrate its storage location through different tiers of storage media. When content is new, very relevant, and accessed frequently, it would be kept on the highest-performance storage media.  And as the content ages and becomes less relevant, it would be migrated and stored on media that is cheaper but with lower performance.  Data could still be searchable and available on-line, but it may take longer to retrieve the content.To realize a multi-tiered storage strategy in an ECM system, the system needs to be designed to be able to support distributed storage across multiple volumes.  This sounds like a simple requirement, but there are many systems that can’t handle it.Many ECM systems are built around the concept of storing large binary content as database BLOBs.  But relying on a central database to store all content will lead to problems when the total size of the content being managed starts to get big.  The concept of tiered storage is very hard to support in this architecture — being able to identify and offload BLOBs for storage on another media may not even be possible.In the Formtek system, we did not use BLOBs as a primary storage method for content.  Formtek | Orion by default stores binary content external from the document metadata housed in the database, although it doesn’t completely exclude the use of BLOBs.  In some situations it may make sense, for example, for storing frequently displayed small thumbnail images. The advantage of storing binary content external to the database is many.  First, the support for an ILM tiered storage policy is greatly simplified.  Transaction processing is simplified.  The system will scale well and can be administered much more easily.  And finally, random access within a file and file streaming, especially for video and audio, is much easier to accomplish.Formtek can simultaneously manage data distributed across many devices by using the Formtek Storage Server component.  The Storage Server is an agent that runs on the machine where the data is stored and that can receive and transmit data with the main Orion server and/or directly with user machines accessing the content system applications.  With this architecture, content files can be distributed across many devices, each of which is managed by a Storage Server agent.last_img

Leave a Reply

Your email address will not be published. Required fields are marked *