dps
  1. Help Center
  2. Overview

Overview

Introduction to DPS

Data Pipeline Service (DPS) is a web service running on the public cloud. It enables you to easily automate the movement and transformation of data between different services.

DPS has a series of pre-packaged data sources and activities.

  • With DPS, you can orchestrate different activities and data sources to build a service-based pipeline, and then schedule, run, manage, and monitor your pipeline.
  • DPS allows you to process data reliably, and move data between internal data sources or between computing and storage services at specified intervals.
  • DPS helps you easily create a fault-tolerant, repurposable, and high-availability pipeline.

Table 1 describes the basic concepts that may appear in this document.

Table 1 Basic concepts

Term

Description

Pipeline

A pipeline is essentially a logical group of activities and data sources that execute a data processing task collaboratively.

Activity

An activity defines the operations (such as movement and transformation) performed for data. For example, the DistCp activity can transfer data between data sources Object Storage Service (OBS) and Hadoop Distributed File System (HDFS).

Data source

A data source indicates a location where data is stored, such as OBS, HDFS, Relational Database Service (RDS), and HBase.

API Classification

DPS provides a series of application programming interfaces (APIs), including APIs for creating, modifying, running, and pausing pipelines. DPS tags specific header fields to API request headers. These header fields are X-Sdk-Date, Authorization, Host, Content-type, and Content-Length. For details about these header fields, see section Common Request Headers.