UGENE Main Page


Back to list of plugins

Workflow Designer

Workflow Designer allows molecular biologists not familiar with any programming language to effectively create and run complex computational workflows. The workflows comprise reproducible, reusable and self-documenting research routines, with a simple and unambigouos visual representation suitable for publications.
The Workflow Designer drives better utilization of available computing resources and helps automation of daily activities in UGENE.
Overview
Dataflow model
Managing Parameters and Iterations
Validation and execution
Examples of usage
Upcoming features

Overview

Using intuitive drag&drop interface of Workflow Designer, you can visually construct computational diagrams from a set of algorithmic blocks, or processes. Processes can be connected to each other with data-flow channels if they have input and output ports of the same data types. Further, each process in a diagram has a set of configurable parameters, specific to the logic of the algorithm it represents. You can tune the parameters while designing the diagram, or specify them prior to execution of the workflow.
The designed schemas can be displayed in a neat self-descripting layout and exported to PDF documents or vector images with publication-ready quality.
An iterative execution facility helps to run the same workflow with varied parameters, automatically scheduling parallel execution as appropriate.

To launch the Workflow Designer, open "Tools" submenu in the main menu of UGENE, and select "Workflow Designer" item. The tool provides multi-window user interface, so you can open as many Designer windows as you want, and build and run several workflows in parallel.

Each window consists of a palette of building blocks, main drawing scene and a Property Editor.
The palette contains building items for the most algorithms integrated by UGENE, and sets of common input and output routines. These are grouped into categories that reflect their uses or features.
The Property Editor provides information about a currently selected diagram item, and allows to configure it.
All elements within the window are resizable. Borders between these areas are movable by mouse, so you can organize working space to better fit your needs.

Menu "Actions" in the main menu bar provides all the standard actions for manipulating the workflow, using the clipboard, etc. Most common actions are available on the main toolbar. Some features are also available through context menus over corresponding areas. On most platforms, the right mouse click is used to open context menus.

Workflow Designer interface

The scene is initially empty, so you start with adding necessary blocks to your diagram. You can select an interesting item by a mouse click on the palette and then click at an intended place of the scene, or just drag the item from palette and drop it onto the scene. So the new process object appears. Objects can be moved around on the scene by dragging them, all objects on the scene can be freely positioned.

Having several processes in the diagram, you can define data flows betweem them. Each process on the diagram has prominent knobs, these are input and output ports. It is easy to distinguish which port is input and which is output by it's form (open or close bubble, correspondingly). More information about a port can be seen in the Property Editor, when you select the port by clicking on it. In particular, data produced or consumed by the port, etc
The data flow is created by dragging a port of one process, so an arrow appears which hints a direction of the flow, and dropping it onto a matching port of another process.

Creating data flow
All matching ports of available processes are highlighted while you drag the arrow, besides the arrow sticks to a near match when you drag closer. If a process has a sole matching port, you can just drop the arrow on the process itself to create a correct connection.
A connection colored in red means the dataflow is void in the current context. Possibly you can validate it by completing the schema or rearranging preceding connections, or it is incorrect indeed. See the next section to understand better the underlying model of data propagation.
Once created, a connection will follow movements of linked objects; you cannot redirect or reshape the connection arrow but only remove it. Still, orientation of a port can be changed relative to the parent process if you hold Alt button while dragging the port. This is helpful to fine-tune visual layout of a workflow.

Dataflow model

In the library of computational algorithms provided with UGENE we try to minimize the role of loops, conditional merge/synchronization points and other complex data flow elements. The idea of the solution is that we can safely replace any single region that contain pairs of split and merge points with a single linear region. In this case to meet input requirements for a process, we must allow it to access results produced by any other process located before it in a workflow, and for every result need to maintain the data context it was created with. We call this model the Data Context Propagation Model or DCPM.
Note, that making a part of workflow linear doesn’t mean that the part must be evaluated in sequential order. The data dependency for each process can be derived before the workflow is run. Thus optimization and parallel execution of workflows depends on scheduler implementation and hardware constraints.
Also, this model naturally supports complicated execution graphs where one output can branch to provide multiple inputs.

This way, any process in a linear workflow region has a set of incoming data slots with part of them used as input parameters for the process. The outgoing set is made up of all incoming data slots plus the slots associated with the results of the process. Every time the process emits results, a copy of the data from incoming data slots is made and propagated further in the workflow combined with the results of the process. So coupled data values are delivered via a single data flow (aka data bus).
In this model input/output ports effectively represent synchronization points for the data slots of a process. In particular, processes can have more than one input port, depending on their nature. For example, "HMM Search" task can operate over a bunch of sequences using the sole model thus it has separate input ports for models and sequences.

Mapping of incoming data slots to required input slots is configurable individually per each connection. Just select a destination port of a connection or the connection arrow itself, so the mapping can be inspected and modified in the Property Editor.

Mapping data slots

Managing Parameters and Iterations

The Property Editor displays complete information about the currently selected object on the scene: short documentation for the object, its' configurable properties, designation names for processes.

Property Editor
The properties available depend on the object being edited. You can modify the property values by clicking on a corresponding display. An appropriate editor widget will immediately appear, like a drop-down list for boolean or choice properties or a spinbox for numerical values. Some properties require assigning non-empty values for correct execution of the represented process, such properties are indicated by bold labels.

Another important feature of UGENE workflows, is a facility for iterative executions of the same schema with varied parameters. You can see a single iteration always defined by default, in a corresponding frame of the Property Editor. In fact, the Property Editor always shows properties of a process respectively to the active (selected) iteration.
You can add an extra iteration by pressing an appropriate button on a toolbar inside the "Iterations" frame, then switch the current iteration by selecting the desired item in the list of iterations. Switching between iterations, you can see that modifications of parameter values are tied to the selected iteration, and get lost when you delete the iteration. It is possible to rename iterations, directly in the list.

The list and selection of current iteration are global to the workflow. Usually it is reasonable to build the workflow topology first then start configuring iterations. There is a dedicated interface to overview and configure all parameters of the workflow and watch variance between iterations side by side. It is invoked by selecting a "Configure iterations" item in the "Actions" menu or by clicking on the same button on the main toolbar.

Configuring iterations
This dialog resembles to extended and multiplied Property Editor, and in fact it really is. On the left side it shows a hierarchical list of all the configurable parameters of the workflow, along with their default values. The right side displays parameter values for all defined iterations. The default values are editable, and are applied to all iterations unless explicitly overridden in a particular iteration. The default values are indicated by gray labels.
Upon tuning the overall configuration, you can immediately launch execution of the workflow or return to editing the diagram.

Validation and execution

Before a workflow can be actually executed, it should be verified by the Workflow Designer. During the process of verification Workflow Designer checks if there are errors in data-flow logic or unspecified parameters and can provide a user with optimization or layout hints. If no errors were found, the workflow is valid to be run.
You can request workflow validation at any stage of workflow design via the main menu or toolbar, or invoke it with pressing Ctrl+E. A list of identified issues and warnings if any, or a popup notification of validation success will appear.

Workflow validation report
Double click on items in the list selects the faulty object/iteration.

Once you are satisfied with the designed workflow diagram and have it configured, click a "RUN" button on the main toolbar (or select "Run schema" item in the menu "Actions"). The schema gets verified and scheduled for background execution. If you continue editing the workflow, this will not affect the launched execution. You can control the workflow execution via common Task Manager of UGENE : watch progress, cancel it, etc.
Upon completion, the workflow produces a summary report. To view it, browse UGENE Task view and click blue Information button of the corresponding task.

Workflow execution report
The report displays status of each iteration and provides other details. Out of curiosity, the Task View also may give some insight on how workflow processes are dispatched.

Examples of usage

Upcoming features

These are planned to be implemented in next releases:

  • Extensibility of library with user-designed workflows
  • Visual tracing of workflow execution
  • More integrated algorithms
  • Support for distributed/grid execution environments
  • And a lot more...

Back to list of plugins


Ïî-ðóññêè.