Automation can accelerate all stages of data management and data warehousing, including data collection, integration, preparation, storage, sharing, and analysis. It can even speed up the identification of siloed data sources and the migration of that data from legacy systems to the data warehouse.
With this in mind, there are seven key steps businesses should follow to ensure data warehousing success:. The future success of the business depends on getting this right. Applications Insight from Guy Harrison. Big Data Notes from Guy Harrison. Emerging Technologies from Guy Harrison. Database Elaborations from Todd Schraml. Data and Information Management Newsletters. Rigid, inflexible architecture In this modern age, businesses need to be more adaptable and agile than ever before and this requires an IT architecture that can be changed quickly on-demand.
High complexity and redundancy Due to the inflexible structure of TDWs, most organizations purchase hardware add-ons and tools to facilitate their data needs more quickly.
Slow and degrading performance The volume of data that businesses need to store, process, and analyze has grown exponentially over the last decade. Outdated technologies We talked about outdated hardware being a cause for slow performance in TDWs but it is actually an entirely separate challenge on its own.
Other than performance issues, outdated technologies and hardware can cause the following issues in traditional data warehouses: Scalability issues: You can not always scale up vertically to meet your requirements. Here are the benefits of a cloud data warehouse over a TDW: Highly scalable: Everything on the cloud is scalable from processing power to storage capacity. You can even auto-scale your infrastructure to scale up during peak demand and scale down during low demand to minimize costs.
Reduced cost: With a cloud DW, there are no physical servers involved and the complexity of configuring the environment is often much easier so you save up on both the infrastructure costs as well as the cost of maintenance and administration. Built-in data processing ecosystem: Most cloud DWs can easily integrate with and connect to other cloud services and tools such as a BI tool or data analytics tool. You get the benefit of parallel processing and ready-made tools that can significantly increase the time required to meet business requirements.
Data Warehousing Automation Tools Complement Modern Cloud Data Warehousing Approach Want to overcome the challenges of your traditional data warehouse or move to a cloud data warehousing platform? These include: An easy-to-use drag-and-drop interface for designing and building your data warehouse pipelines. The simplified process allows you to cut down on the resources required for configuring and maintaining your data warehouse. Leave a Comment Cancel reply Your email address will not be published.
Executives understand outcomes and high-level results. Getting these groups on the same page is an essential task in these projects, and is also one of the most difficult things to do. Managing communications between groups is a constant throughout the data warehouse life cycle. Among the reasons why data warehouse projects fail, this one is a factor in most any such failed initiative.
From initial requirements gathering to setting expectations, from deployment to training, those managing the DW project must constantly ensure that each of these groups understands the others. From the outcomes and deliverables to the jargon used, it is critical to ensure that each group is moving toward the same finish line. When time runs short on a data warehouse project, testing and validation are often the victims.
An inexperienced project manager or architect might be enticed by the time savings of cutting or eliminating testing and validation.
Conversely, someone who has rescued a project that stalled or failed due to inadequate testing knows well that this part of the project is as critical as any other. There are problems that can only be discovered through proper testing and validation.
These take time, but are essential to the success of the data warehouse initiative. Resist the urge to ease scheduling pressure by cutting back on this valuable exercise. Designing, building, and testing the extract-transform-load ETL logic is the most time-consuming part of every data warehouse project. It is also frequently underestimated during project scheduling. Often the ETL process is viewed as simply a copy operation, wherein data is read from one location and written to another.
The ETL layer is like the foundation of the house: get it wrong and the rest of the structure will be unstable. Take the time to do it right, following ETL best practices along the way. While building data warehouses is a lot of work for technical folks like us, learning to use the new data warehouse requires a lot of work as well. Proper training goes a long way to ease this transition. Invest the time to train essential personnel. Train them in terms they understand, using whatever medium run book, video, in-person training that works for them.
One of the worst potential outcomes of such a project is that nobody uses the new data warehouse. Without proper training, data consumers might just keep doing things the old manual way. The person responsible for project management insists that this fact be broken into several different metrics during ETL. The ideal sound reasonable: To cut down on the number of rows for the fact table, and so that the front-end tool could generate the report quicker. Unfortunately, there are several problems with this approach: First, the ETL became unnecessarily complex.
Not only are "case"-type statements now needed in ETL, but because the fact cannot always be broken down into the corresponding metrics nicely due to inconsistencies in the data, it became necessary to create a lot of logic to take care of the exceptions. Second, it is never advisable to design the data model and the ETL process based on what suits the front end tool the most.
Third, at the end of the day, the reports ended up having to sum back these separate metrics back together to get what the users were truly after, meaning that all the extra work was for naught. Lack of Clear Ownership Because data warehousing projects typically touch upon many different departments, it is natural that the project involves multiple teams. For the project to be successful, though, there must be clear ownership of the project. Not clear ownership of different components of the project, but the project itself.
I have seen a case where multiple groups each own a portion of the project. Needless to say, these projects never got finished as quickly as they should, tended to underdeliver, had inflexible infrastructure as each group would do what is best for the group, not for the whole project.
0コメント