Debugging a Pipeline

Start a Test Instance

Open your pipeline and click Test Run to begin a debugging instance.

A new instance of your pipeline will be created and the instance view opened. Click Go to trigger the first/next worker to run.

Exceptions

Exceptions occur when there’s an error during a specific worker run.

They’re shown as red failures on that worker and give info about the number of batches which threw exception. Clicking on the coloured info card can show you all the failing batches and give you more information about what happened.

Now you’ll see a list of all the exceptions that occurred on that worker during the test run. Click on a row to view more information.

Now the exception itself will be shown. The input data to the worker, the output data (which won’t exist when there’s an error) and the runtime for the worker are all available in their respective tabs here too.

In this instance the web worker’s root URL doesn’t have an environment configured. It should always have one, when I set it to default and hit the Go button to rerun the worker, it will succeed.

Understanding & Configuring Batches

Batches are the data sets handled by worker instances. Modifying the permitted batch size, allows you to control how much the data processing can be parallelised, how errors are handled and how fast the pipeline runs.

By default all incoming data records are processed as part of a single batch (eg. 100 records → 1 batches), but for debugging this behaviour is sometimes better handled differently.

Default behaviour: 100 records → 1 batch: therefore when 1 record fails → 1 batch fails → all 100 records fail
Helpful testing behaviour: 100 records → 100 batches: therefore when 1 record fails → 1 batch fails → all 100 records fail

When any record in a batch causes an error, the entire batch will fail. This is great for highlighting errors, but if some failures are tolerable, then it can be more beneficial to separate every record into it’s own batch and allow failed batches to stop processing at the point of error, while successful batches progress onwards to the next pipeline worker.

Example

In this pipeline, the web worker Get BOMs makes several HTTPS requests to fetch data. By default it process all requests as part of a single batch, resulting in a single failure/success outcome. This is unhelpful for us to debug the requests individually, since the whole batch will stop processing after a single error.

To change this, we can decrease the batch size to 1. Then each 1 request will be part of its own batch and will throw it’s own exception (if any) and have its own failure/success outcome.

Clicking the whitespace on the worker will open the common worker config (the settings themselves are not shared across workers, but they are available for individual configuration on every worker).

In the common worker config, the batch size is 0 (meaning infinite). I’ll change it to 1.

Then, closing the dialog and hitting Go to rerun the worker, now some better information is made available. We can see that actually 8 of the requests succeed. That’s exactly what I expected from this specific API, since the endpoint throws a 404 error when I request a BOM that doesn’t exist.

What I want now if for the pipeline to take the results of the successful requests and continue onwards. But if I hit Go, my pipeline won’t move onwards. It currently still registered the errors as a reason to stop the pipeline progressing.

Of course there’s a setting for this too, but it cannot be configured while running the test instance.

By clicking on the pipeline name, we can return to the main pipeline view.

Here, we’ll click on the worker card’s whitespace to open the common worker config.

Then, untoggling the setting Abort on Error we can configure the worker so that if there’s an error, the pipeline shouldn’t abort.

We can close the dialog now and return to the test run instance that we were using before by clicking Show Instances.

Then we can locate the instance 99 (at the topi of the list since it’s most recent) and click the row to open it again.

Now when we click Go, the pipeline will continue onwards and the pipeline will succeed.