Home > Cloud Cruiser 4 > Collecting, transforming, and publishing > ETL workbooks > Working with flows

Working with flows

This article is part of To collect, transform, and publish data:

3. Transform data
with a flow

This article explains how to edit the flow in a worksheet. The steps to perform will vary according to what you need to accomplish, so rather than providing a step-by-step procedure, the article teaches concepts and capabilities that you can put together as you see fit.

The Cloud Cruiser Portal provides online help for all steps and processors. This is the best way to learn how to use a specific step or processor. In the editor, click Help icon to the right of a step name, processor name, or field to see help for that item.

Overview

A flow processes collected data. Most commonly this is data about the usage of IT services for which your organization charges (or at least shows cost back to) its customer. In such cases the flow needs to produce CC Records that provide context around usage amounts and also associate those amounts to specific customers. This requires the following steps:

  1. Use an Import Collections step to bring data into the worksheet so that the flow can process it.
    When you create a new workbook, its first worksheet contains an Import Collections step by default. It imports every collection in the workbook, but you can edit it to import only specified collections.
  2. Use a Transformation step to associate usage amounts to specific customers.
    Set one dimension for each level in the account ID of the customer responsible for the usage amounts in each row. You probably will need some processing prior to this to determine and format those values.
    All of this might require more than a single Transformation step, depending upon what you need to accomplish and the order in which those things must happen.
  3. Use a Publish Data step to publish the data to a schema in the Cloud Cruiser database.
    You map the dimensions and measures in the worksheet to fields in the schema, including the levels of the account ID. Published data is available to Cloud Cruiser Insights modules, and publishing triggers the charge process, so the usage is charged to customer accounts and those charges are available in standard reports.

For a flow that does not have this most common objective, use a different set of steps. The following sections explain ways to use the worksheet interface to create the flow you need.

Transformation steps and processors

It is important to understand the difference between a Transformation step and other steps and what a transformation processor does. Though many of the steps available for a flow can be considered transformation in the broad sense of an extract, transform, and load (ETL) process, one step is specifically named Transformation.

Other steps perform a specific action on each row in their input dataset or on the dataset as a whole. For more information, see Sheet tab on the ribbon and Rows & Columns tab on the ribbon. A Transformation step is a container that you can fill with processors, each of which performs a specific action on each row in the input dataset.

  • Each step in a flow reads input from a file and writes output to a file, including a Transformation step. But rows are passed from one transformation processor to the next in memory. This means that you can improve the performance of a flow by grouping transformation processors into as few steps as possible.
  • Sorting rows can be done as a Sort Rows step or as part of a step that does an implicit sort as part of its work, but not within a Transformation step. This means that if you must sort rows between two actions that would otherwise be consecutive transformation processors, you must separate those processors into separate steps with a Sort Rows step in between.

For more information about the processors you can use in Transformation steps, see Transformation processors.

For information about copying steps and processors for reuse, see Copying steps and processors.

Sheet tab on the ribbon

Some steps act against the entire dataset or worksheet at once or perform other actions on behalf of the sheet. In the ribbon, these actions are located in the Sheets tab.

Screenshot of the Sheets tab

For example, writing the dataset to a lookup file (
Create Lookup), posting it to another Cloud Cruiser instance as usage data (Post Usage), or passing it to an external process (Run Script) are in this tab.

The Simulation actions control the simulated results of the flow that are shown in the sample data in the worksheet as you select different steps and processors in the flow. To see the state of the data after running the flow from the first step through the selected step, Play the simulation. To move around more quickly in the flow by eliminating the delay caused by processing on each click, Pause the simulation.

Rows & Columns tab on the ribbon

All transformation processors, and some steps, act on their input dataset one row at a time, working through the dataset in the order that the rows were sorted by prior processing.

Screenshot of the Rows & Columns tab

 
The result of an action in this tab differs from row to row based on the values in each row, and sometimes affect a particular column or columns. For example, Set Measure from Dimension reads the value of a dimension in the first row and uses it to set the value of a measure in that row, then moves to the second row and does the same, and so on.

Entering values

Most values that you enter in a flow are plain text strings or items selected from a list. This section explains a few situations where you enter values in other ways.

Parameters

Parameters are variables that you enter in your flow to reference values that aren't known at design time or that can change over time. You might also use them in place of a long string to avoid typing mistakes. For example, when specifying a file path you might use a variable to pass the value of the system's usage files directory:

  • Without parameter: C:\cc-working\usage_files\3PARarrays
  • With parameter: ${env.usageDir}\3PARarrays 

 There is a fixed set of system parameters whose values are determined automatically. For information about these, see System parameters. You can also create as many workbook parameters in a given workbook as you like. Though you give each a default value that is used in workbook simulations, you can set a different value for each run. For example, you might design a flow to process only a named portion of data collected from a given source and use a workbook parameter to control which portion is processed on each run of the flow by setting the parameter value.

To work with parameters, click Parameters on the Workbook tab of the ribbon. You can copy parameter syntax to paste into a collection, step, or processor, and you can edit workbook parameters.

Input formulas

Some transformation processors such as Set Dimension from Dimensions use as input a string expression built from a dimension value. This is called an input formula, and can include all or part of the dimension value with leading or trailing characters padded to a desired length plus literal text.

For example, if the ServiceName dimension has a value of LargeVM then the input formula  ServiceName(3, 10, LEADING, "0") + ":" evaluates to . For an explanation of how to build input formulas, see Input formulas in transformation processors.

Regular expressions

Some transformation processors such as Normalize Series Names require you to specify information in the form of a regular expression (RegEx). A RegEx is an efficient way to specify a set of items by matching a pattern and it also allows you to capture a substring from a matching item for use in another operation.

If you don't have experience using regular expressions, you can find resources on the Internet for learning about them, such as this site.

Formula bar

Similar to a Microft Excel workbook, a Cloud Cruiser ETL workbook offers a formula bar for expert users to quickly edit a flow. Located under the ribbon, it lets you add and edit steps and processors using compact formulas. The following screenshot shows the formula version of a partially-complete Set Dimension From Dimensions processor that includes a condition:

Screenshot of the formula bar with autocomplete enabled
(click image to enlarge)


As shown, Cloud Cruiser can suggest values as you type so that you can select them with arrow and Enter keys instead of typing complete names. To see suggestions, press Ctrl+Space while typing a word.

To learn the syntax of the formula bar, view an entire flow in that syntax by right-clicking in the flow and choosing View Expert Flow.

What to do next

To collect, transform, and publish data:

Last modified

Tags

Classifications

This page has no classifications.
© Copyright 2018 Hewlett Packard Enterprise Development LP