big_data_migration.jpg

The Concise Guide to Modernizing Your Legacy Data System.

Big Data is a term used to describe large structured and unstructured data sets that must be analyzed computationally to reveal patterns, relationships, and trends.

Introduction

Many organizations are currently using traditional RDBMS systems for legacy reasons, including for some of their critical business areas. These systems have served well over the past decades when there were controlled data input channels. However, the growth of cloud systems and increased information inflow through social media channels has put a tremendous stress on these traditional systems. The challenges are multifold: 

  1. Speed of data influx 
  2. Sheer volume of data 
  3. Unstructured data 
  4. Types of data 
  5. Increased need of real time data processing
 

The traditional data warehouse systems have been unable to efficiently handle these changes. A large number of customers have Teradata as the backbone of their data warehouse system and are now struggling to keep up with business demands.  The need to migrate to Big Data platforms has never been more critical.

Are you a Big Data professional? Many of our clients are looking for talent just like you.

Search Big Data Jobs

 

 



Drivers of Data Migration

Drivers for moving from legacy systems like Teradata to Big Data platforms like Hadoop 

  • Modernization of infrastructure: Maintaining costs of legacy infrastructure are escalating year over year, not to mention that legacy systems are not able to scale for the large volumes of data that organizations want to store and process. 

  • Changing Business needs: Most companies are leveraging social media platforms for directly connecting with their customer base. The speed with which information reaches across the world and can significantly boost or ruin an organizations reputation. It is imperative that a company responds to the external stimuli in real time. The processing power of existing systems – including Teradata – is not designed to handle large volumes of data in real time. Some of these needs include streaming analytics that cannot be handled by traditional data warehouses 

  • Cost containment: Increasing the capacity on legacy systems like Teradata often increases the annual cost by millions of dollars. Most companies simply do not have the resources or the appetite to increase spending so heavily. This is one of the primary drivers for companies to look at moving away from Teradata and similar legacy systems. 

  • Need for speed of response to changing external stimulus: As mentioned earlier, the proliferation of social media and huge volumes of data moving through multiple channels has necessitated that organizations respond in real time. Companies need to be nimble and able to change their strategy almost instantaneously. Any infrastructure needs to be able to support such quickly changing scenarios. 


Why search when you can have them delivered?
Get our latest big data jobs delivered straight to your inbox.

Sign Up For Custom Job Alerts


 

Process for Data Migration 

While Teradata implementations across companies differ in many ways, there are some questions that need to be asked before embarking on any migration plan. These include:  

  1. Understand the real drivers for the migration and the current pain points that need to be addressed.  
    Cost often seems to be the primary driver for most data migrations. However, the initially drivers are often business scenarios that cannot be supported using existing infrastructure that drive the need for change. A good analysis of current pain points and the company business roadmap will give a clearer picture to the IT team on what needs to change so that the infrastructure is ready for all future business needs.  

  2. Align with the company IT roadmap. 
    The IT roadmap is usually aligned to the business goals and hence a complete alignment of any migration effort is necessary.  

  3. Define detailed strategy for the migration. 
    Understanding all of the risks involved in migrating data is the first step in defining a strategy. Making sure that business impact is minimal, and eliminated completely if possible, needs to be a large focus of any migration strategy.  

  4. Consider options for cloud vs. on premise, automation opportunities, etc.  
    While cost is often the primary driver, any organization needs to consider the total cost of ownership and maintainability of their infrastructure. Provisioning on cloud provides an almost unlimited scalability at a fraction of the cost compared to scaling in-house which can take up a significant amount of resources including man hours and financial. With provisioning on the cloud, many tasks can be automated which not only improves speed but also increases scalability. 

  5. Complete a thorough impact analysis 
    A thorough impact analysis must be conducted after any migration to ensure there were no unforeseen issues. Both the users and the IT team need to contribute to this analysis. Pay special attention to border cases such as an annual report that is only run at the end of December. These can easily be overlooked when a data migration occurs in June. Starting with the user group usually leads to the best impact analysis.   

  6. Create a detailed phase-wise execution and transition plan.  
    Once the impact analysis is fully understood, a detailed execution plan should be created. It will need to be phase-wise with the least impactful area to be migrated first. This way, fine tuning of the plan will be possible and any additional risks can be discovered. The detailed plan should include: 
    1. Define solution architecture 
    2. Define governance, processes, user privileges etc. 
    3. Communication and training plan for users 
    4. Detailed validation plan with well-defined test cases and outputs 
    5. Run parallel systems for some time until migration is complete onto the new system 

The overhead of having a parallel run is offset by the business benefit of cutting the risk of a "big bang" approach. Yes, the data will be duplicated for some time. But with utilities built into the plan, it will reduce the overhead significantly.
 
 


 

Best Practices 

Every organization should strive to ensure that all changes are as seamless as possible for business users.  

  1. ETL vs ELT 
    Traditional data warehouses complete the transformation of source data before inserting it into the warehouse. This is a time tested method that works very well for structured data. However, with the growth of unstructured data and unknown types, and the speed at which it is generated in the source systems, it is virtually impossible to transform data before loading. Hence the new method is ELT ( extract, load, transform as needed ). Most structures are defined when the data is queried in a big data store – more commonly known as a data lake. 

  2. Data Lifecycle 
    Understand how the data is added to the data lake as there may be some modifications needed when adding to a big data platform like Hadoop. Given that the actual transformations will happen after the data is loaded, there is no need to pre-define a schema. Once the data is available, the query mechanism can be defined so that meaningful insights can be extracted and shared with key business stakeholders in a visually appealing format. 

  3. Reusable tools/utilities 
    The entire migration process is usually a long and it involves a significant amount of manual work. But using pre-existing tools or utilities can help to significantly reduce the implementation time and also reduce manual errors. Many providers have large libraries of these utilities that have been built from their previous work on large scale migrations.
     



Conclusion

To determine the urgency of a possible data migration, companies with traditional data warehouse platforms need to understand the companies business plan for the next five years. The infrastructure needs should be determined by this, with cost a second tier consideration. A detailed analysis of the current challenges and the changing data storage needs will influence the migration decisions. But it is safe to assume that nearly every company will need to migrate their data to a more robust, scalable, and cost effective solution at some point in the near future.