Most of it is created in an unstructured way. Meaning, it is created as an individual “file” rather than a relational object, carrying information on how it relates to the other data stored (for example, data with your name can carry information to explain how it relates to your age, job, income, etc).It’s estimated that 80 per cent of all data is unstructured, adding to the problem of managing the data.
Organisations are struggling to manage big dataThe International Data Corporation (IDC) states that the amount of data we’ve created, captured or replicated, exceeds available storage for the first time since 2007. This means that in 2012, the size of all data held by mankind is ten times larger than it was five years ago. Data storage requirements are growing at an exponential rate as more people generate and store data on a day to day basis (every photo on Facebook, phone call, email, etc – generates a bit of data that has to be stored). Forrester Research predicts:
- Most organisations will grow their data by 50 per cent in the next year;
- Corporate data will grow by 94 per cent;
- Database systems will grow by 97 per cent; and
- Backups will grow by 89 per cent.
The three challengesThere are three challenges with big data: storing, processing and managing. The first two have been largely addressed in the last five years with the advance of “scale-out” storage architectures (that store large amounts of data) and dedicated storage appliances (that have drastically improved processing). The area missing is how to effectively manage big data through its lifecycle, and how to associate separate pieces of data to create useful information for the organisation.
How can big data be managed?The majority of big data generated at the moment is either duplicated data or artificial data (data that is generated and added to the original data while processed). For a medium-sized organisation that generates a lot of data (a research company for example), this could mean you are storing, backing up and processing petabytes of data for a few hundred terabytes of unique data. Therefore, the first step is to bring the data down to its unique set and reduce the amount of data to be managed. Next, use the power of virtualisation technology. Unique data must be virtualised onto central storage so multiple applications can use the data without needing to store more than one copy of it. This will allow you to store smaller datasets on any vendors’ storage device. Now that the data footprint is smaller, data management is immediately improved in three key areas:
- Less time is required by applications to process data;
- Data can be better secured. The management is centralised while access is distributed; and
- Results of data analysis are more accurate since all copies of data are visible.
- Big data to be backed up more efficiently;
- Makes it easier to recover; and
- Makes the data more accessible, while freeing some resources from your IT team to look at the strategic use of the data rather than constantly fire-fighting.
Share this story