Most of it is created in an unstructured way. Meaning, it is created as an individual “file” rather than a relational object, carrying information on how it relates to the other data stored (for example, data with your name can carry information to explain how it relates to your age, job, income, etc).
It’s estimated that 80 per cent of all data is unstructured, adding to the problem of managing the data.
Organisations are struggling to manage big data
The International Data Corporation (IDC) states that the amount of data we ve created, captured or replicated, exceeds available storage for the first time since 2007. This means that in 2012, the size of all data held by mankind is ten times larger than it was five years ago.
Data storage requirements are growing at an exponential rate as more people generate and store data on a day to day basis (every photo on Facebook, phone call, email, etc generates a bit of data that has to be stored).
Forrester Research predicts:
- Most organisations will grow their data by 50 per cent in the next year;
- Corporate data will grow by 94 per cent;
- Database systems will grow by 97 per cent; and
- Backups will grow by 89 per cent.
This exponential growth means organisations need to find smarter management methods to sort and optimise their data. Organisations are currently trying to fight this by throwing more storage at the problem. However, by generating a large amount of data, this could mean buying new capacity every six months. This is not only expensive, but forces staff responsible for storage to focus on integrating new storage and the physical management of the data. Strategic initiatives of how to manage it and get the most useful information, should be considered.
The three challenges
There are three challenges with big data: storing, processing and managing. The first two have been largely addressed in the last five years with the advance of “scale-out” storage architectures (that store large amounts of data) and dedicated storage appliances (that have drastically improved processing).
The area missing is how to effectively manage big data through its lifecycle, and how to associate separate pieces of data to create useful information for the organisation.
How can big data be managed
The majority of big data generated at the moment is either duplicated data or artificial data (data that is generated and added to the original data while processed). For a medium-sized organisation that generates a lot of data (a research company for example), this could mean you are storing, backing up and processing petabytes of data for a few hundred terabytes of unique data.
Therefore, the first step is to bring the data down to its unique set and reduce the amount of data to be managed. Next, use the power of virtualisation technology. Unique data must be virtualised onto central storage so multiple applications can use the data without needing to store more than one copy of it. This will allow you to store smaller datasets on any vendors” storage device.
Now that the data footprint is smaller, data management is immediately improved in three key areas:
- Less time is required by applications to process data;
- Data can be better secured. The management is centralised while access is distributed; and
- Results of data analysis are more accurate since all copies of data are visible.
Virtualisation of big data gives many additional benefits; from extra flexibility for your users and applications, through to lower costs from not being locked into a specific vendor contract for your storage.
In conclusion, a well thought-out data management approach allows”
- Big data to be backed up more efficiently;
- Makes it easier to recover; and
- Makes the data more accessible, while freeing some resources from your IT team to look at the strategic use of the data rather than constantly fire-fighting.
David Barker is founder and technical director of 4D Data Centres, which he founded at age 14, and a finalist for the Young Entrepreneur Award at the Growing Business Awards 2012.