Home > Cloud Cruiser 4 > Collecting, transforming, and publishing > ETL workbooks

ETL workbooks

Understanding workbooks

An ETL (extract-transform-load) workbook is where you configure the collection, transformation, and publishing of data into the Cloud Cruiser database. A workbook can contain one or more collections, each of which is a set of instructions for collecting a specific set of data from a single data source.

The workbook also contains one or more worksheets, similar to sheets in a Microsoft Excel file. A worksheet contains a flow, a progression of steps that process data, and a grid that displays sample data so that you can see the results of each step as you develop it. You can add to the flow whichever steps you need to manipulate your collected data, but the flow will most often start with an Import Collections step, which brings rows of data from one or more collections into the worksheet so that they can be processed, and most often end with a Publish Data step, which publishes the rows of data to a schema in the Cloud Cruiser database.

For users familiar with previous versions of Cloud Cruiser, a workbook is the equivalent of a process in batch XML, and each collection and flow the equivalent of a job.

This section of articles contains the following:

The ribbon and its Workbook tab

The ribbon gives you access to all the tasks you can perform in a workbook, similar to the ribbon in Microsoft Excel. It's organized into three tabs based on the scope of each action: WorkbookSheet, and Rows & Columns. This section explains the actions you can perform in the Workbook tab. The other two tabs are explained in Working with flows.

Screenshot of the Workbook tab in the ribbon

The Workbook tab lets you perform actions that affect the entire workbook. This includes basic actions like opening, saving, and deleting, and more specialized actions such as:

  • Settings: Set the time zone of origin for data collected by the workbook. Each collection converts usage start and end times from this time zone to UTC before writing CC Record output, so rows of sample data in a worksheet display UTC times. In this dialog box you also set the select date used to collect sample data and play simulations. You can change this date when running a collection or flow.
  • Parameters: Parameters are variables that you can use to pass values to a flow at the time it runs. You can copy a parameter's syntax from the Parameters dialog box and paste it into an input field into a step or processor as a placeholder for the runtime value of the parameter. For example, you might use ${env.selectDate} in a file path so that when a flow runs every day it writes a file whose name includes the relevant date.
    There is a fixed set of system parameters whose values are determined automatically. For information about these, see System parameters. You can create as many workbook parameters as you like. Though you give each a default value that is used in workbook simulations, you can set a different value for each run. For example, you might design a flow to process only a named portion of data collected from a given source and use a workbook parameter to control which portion is processed on each run of the flow by setting the parameter value.
  • Status: View the validation errors and warnings for any collections and flows in the workbook. This ribbon icon displays yellow if you have one or more warnings or red if you have one or more errors.
  • Run: Run one or more components of the workbook. You can choose any combination of collections and flows, and set options for the run such as the select date, workbook parameter values, and whether to publish data.
  • Worksheet: Create, copy, or edit the notes for a worksheet.
  • Collections: Create or edit a collection.
  • Lookup Data: View the lookup files created by flows in this workbook or edit systemwide lookup tables.
Last modified

Tags

Classifications

This page has no classifications.
© Copyright 2018 Hewlett Packard Enterprise Development LP