The Concise Guide To Modernizing Your Legacy Data System

Big Data is large structured and unstructured data sets that must be analyzed computationally to reveal patterns, relationships, and trends.

Introduction

Make crucial business decisions

Many organizations are currently using traditional RDBMS systems for legacy reasons, including for some of their critical business areas. These systems have served well over the past decades when there were controlled data input channels. However, the growth of Cloud systems and increased information inflow through social media channels has put tremendous stress on these traditional systems. The challenges are multifold:

  1. Speed of data influx  
  2. Sheer volume of data  
  3. Unstructured data  
  4. Types of data  
  5. Increased need of realtime data processing

The traditional data warehouse systems have been unable to efficiently handle these changes. A large number of customers have Teradata as the backbone of their data warehouse system and are now struggling to keep up with business demands. The need to migrate to Big Data platforms has never been more critical.

Are You A Big Data Professional?

Many of our clients are looking for talent just like you.

Drivers Of Data Migration

Drivers for moving from legacy systems like Teradata to Big Data platforms like Hadoop.

Process For Data Migration

While Teradata implementations across companies differ in many ways, there are some questions that need to be asked before embarking on any migration plan. These include:

  1. Understand the real drivers for the migration and the current pain points that need to be addressed.
    Cost often seems to be the primary driver for most data migrations. However, the initial drivers are often business scenarios that cannot be supported using existing infrastructure. A good analysis of current pain points and the company business roadmap will give a clearer picture to the IT team on what needs to change so the infrastructure is ready for all future business needs.
  2. Align with the company’s IT roadmap.
    The IT roadmap is usually aligned to the business goals, and hence a complete alignment of any migration effort is necessary.
  3. Define detailed strategy for the migration.
    Understanding all the risks involved in migrating data is the first step in defining a strategy. Making sure that business impact is minimal, and eliminated completely if possible, needs to be a large focus of any migration strategy.
  4. Consider options for Cloud vs. on premise, automation opportunities, etc.
    While cost is often the primary driver, any organization needs to consider the total cost of ownership and maintainability of their infrastructure. Provisioning on the Cloud provides almost unlimited scalability at a fraction of the cost compared to scaling in-house, which can take up a significant amount of resources, including working hours and financial. With provisioning on the Cloud, many tasks can be automated, which improves speed and increases scalability.
  5. Complete a thorough impact analysis.
    A thorough impact analysis must be conducted after any migration to ensure there were no unforeseen issues. Both the users and the IT team need to contribute to this analysis. Pay special attention to border cases such as an annual report that is only run at the end of December. These can easily be overlooked when a data migration occurs in June. Starting with the user group usually leads to the best impact analysis.(Read The Top 6 Data Integration Tools)
  6. Create a detailed phase-wise execution and transition plan. 
    Once the impact analysis is fully understood, a detailed execution plan should be created. It will need to be phase-wise with the least impactful area to be migrated first. This way, finetuning of the plan will be possible and any additional risks can be discovered. The detailed plan should:
    1. Define solution architecture.
    2. Define governance, processes, user privilegesetc. 
    3. Have a communication and training plan for users.
    4. Detail a validation plan with well-defined test cases and outputs. 
    5. Run parallel systems for some time until migration is complete onto the new system. 
The overhead of having a parallel run is offset by the business benefit of cutting the risk of a “big bang” approach. Yes, the data will be duplicated for some time. But with utilities built into the plan, it will reduce the overhead significantly.

Best Practices 

Every organization should strive to ensure that all changes are as seamless as possible for business users.
  1. ETL vs. ELT
    Traditional data warehouses complete the transformation of source data before inserting it into the warehouse. This is a time-tested method that works very well for structured data. However, with the growth of unstructured data and unknown types, and the speed at which it is generated in the source systems, it is virtually impossible to transform data before loading. Hence the new method is ELT (extract, load, transform as needed ). Most structures are defined when the data is queried in a big data store – more commonly known as a data lake.
  2. Data lifecycle
    Understand how data is added to the data lake as there may be some modifications needed when adding to a big data platform like Hadoop. Given that the actual transformations will happen after the data is loaded, there is no need to predefine a schema. Once the data is available, the query mechanism can be defined so that meaningful insights can be extracted and shared with key business stakeholders in a visually appealing format.
  3. Reusable tools/utilities
    The entire migration process is usually long and involves a significant amount of manual work. But using pre-existing tools or utilities can help significantly reduce the implementation time and manual errors. Many providers have large libraries of these utilities that have been built from their previous work on large-scale migrations.

Conclusion

To determine the urgency of a possible data migration, companies with traditional data warehouse platforms need to understand the company’s business plan for the next five years. Infrastructure needs should be determined by this, with cost a second-tier consideration. A detailed analysis of current challenges and changing data storage needs will influence migration decisions. But it is safe to assume that nearly every company will need to migrate their data to a more robust, scalable, and cost-effective solution at some point in the near future.