Use all your data with a data fabric

Discover the new features IBM® Data Fabric offers

Overview

What is DataStage for IBM Cloud Pak for Data?

What is IBM Cloud Pak for Data? This cloud-native insight platform — built on the Red Hat® OpenShift® container orchestration platform — integrates the tools needed to collect, organize and analyze data within a data fabric architecture. It dynamically and intelligently orchestrates data across a distributed landscape, to create a network of instantly available information for data consumers. IBM Cloud Pak for Data can be deployed on premises, as a service on the IBM Cloud® or on any vendor’s cloud.

DataStage is available as an add-on to an IBM Cloud Pak for Data software license or as a service through IBM Cloud Pak for Data as a Service.

Read the FAQ about upgrading 

Learn about the aaS launch 

How much can you save by upgrading?

Take the business value assessment to see the savings with DataStage for IBM Cloud Pak for Data.

Features

DataStage highlights

Accelerate AI with trusted data

Full spectrum of data and AI services

Manage the data and analytics lifecycle on the IBM Cloud Pak for Data platform. Services include data science, event messaging, data virtualization and data warehousing.

Parallel engine and automated load balancing

Process data at scale by optimizing ETL performance with a best-in-breed parallel engine and load balancing that maximizes throughput.

Metadata support for policy-driven data access

Protect sensitive data with metadata exchange using IBM Watson® Knowledge Catalog. Use data lineage to see how data flows through transformation and integration.

Automated delivery pipelines for production

Automate continuous integration/continuous delivery (CI/CD) job pipelines from dev to test to production and help reduce development costs.

Extensive set of prebuilt connectors and stages

Use prebuilt connectivity and stages to move data between multiple cloud sources and data warehouses, such as IBM Netezza® and IBM Db2® Warehouse on Cloud.

IBM DataStage Flow Designer

Increase developer productivity with machine learning-assisted design in a user-friendly interface, helping cut development costs.

In-flight data quality

Trust data delivery using IBM InfoSphere® QualityStage® to automatically resolve quality issues when data is ingested by target environments.

Automated failure detection

Reduce infrastructure management effort 65% - 85%², letting users focus on higher value tasks.

Distributed data processing

Execute cloud runtimes remotely wherever the data resides, while maintaining data sovereignty and minimizing costs.

Deployment options

As a service

Access all the latest capabilities available as part of IBM DataStage on IBM Cloud Pak for Data as a Service, a subscription model for a set of integrated services fully managed on IBM Cloud.

On premises or any cloud

Add IBM DataStage Enterprise (or IBM DataStage Enterprise Plus) to IBM DataStage on IBM Cloud Pak for Data as a Service to run workloads on premises or on any cloud.

On premises

Run basic ETL jobs on premises using IBM DataStage on IBM Cloud Pak for Data as a Service. Parallel processing and enterprise connectivity deliver a scalable platform.

Modernize your existing capabilities with IBM DataStage for IBM Cloud Pak for Data — AI-powered data delivery, anywhere

Product images

Collaborate

Screen shot showing a flow in DataStage

Collaborate

Work with your peers on DataStage flows and control access to your projects.

Build data pipelines

Screen shot showing the Merge properties

Build data pipelines

Efficiently perform data integration work in a no-code/low-code environment with a user-friendly interface. Hundreds of prebuilt functions and connectors reduce development time and improve consistency of design and deployment.

Auto workload balancing

Screen shot showing the job run details for the insurance customer policy view

Auto workload balancing

DataStage has a best-in-breed, highly scalable parallel engine that processes substantial data volumes. Built-in auto workload balancing provides high performance and elastic management of compute resources.

Connections, integration

Screen shot showing a table of data about drivers with an insurance policy

Platform connections and integration points

Accelerate DataOps with shared platform connections and integrations with other products in IBM Cloud Pak for Data, including data virtualization, governance, business intelligence, and data science services.

Client stories

Testimonials

What’s new

IBM ranks second highest in the Data Fabric Use Case

See why in the 2021 Gartner Critical Capabilities for Data Integration Tools.

How to get more from IBM DataStage

Learn how to improve productivity by connecting to new sources and targets more quickly when building ETL jobs.

IBM is a Leader

See why in the Gartner 2021 Magic Quadrant for Data Integration Tools.​

IBM Cloud Pak for Data

An open, extensible data and AI platform that runs on any cloud

IBM InfoSphere® Information Server Enterprise Edition

An end-to-end data integration platform to help you cleanse, monitor, transform and deliver quality data

IBM InfoSphere® Information Server for Data Integration

A tool to extract and transform data in any style and load the data into any system