Data Ingestion

The Load Step

Once Validate passes (green), the user clicks the Load task on the Workflow task bar. The Load step takes cleansed, validated data from the Stage tables and moves it into the Finance (Analytic) Engine — the Cube. This is where Stage "buckets" become Cube "cells." This guide covers the mechanics, performance characteristics, and edge cases.

What Happens During Load

The Load process follows these steps:
  1. Analyze prior data loads — The engine evaluates previously loaded Data Units to determine what needs to be cleared
  2. Clear existing data — Clears Workflow Data Units loaded by the Workflow Unit, at the account level by default
  3. Load data in parallel by Entity — The Cube identifies all Entities in the staged data and processes them simultaneously
  4. Write to the Cube tables — Data lands in three database tables: DataRecord tables (partitioned by year), CalcStatus, and DataUnitCacheTimeStamp
diagramStage to Cube Data Flow

Loading diagram...

When complete, the Load task turns from blue to green, and the Workflow advances to the next configured step (typically Process Cube or Certify).

Performance: First Load vs. Subsequent Loads

The performance profile differs significantly between the first load and subsequent loads for the same Workflow/Scenario/Time combination:
LoadDatabase OperationRelative Speed
First loadInsert — creates new records in DataRecord, CalcStatus, and Timestamp tablesSlower
Subsequent loadsUpdate — modifies existing records in the same tablesFaster
💡Tip
If your initial data load seems slow, this is expected. Subsequent loads for the same Workflow/Scenario/Time combination will be notably faster because the database performs updates rather than inserts.

Partitioning by Entity

The single most important performance principle for data loading in OneStream:
Always partition by Entity.
There is an inherent relationship between the Workflow and Entity that aligns all the way through the data structures — from Stage partitions to Cube Data Units. The Cube loads Entities in parallel, which is how OneStream achieves high throughput.
For very large datasets (millions of rows), the recommended approach is:
  • Give larger Entities their own Workflows (e.g., a dedicated Workflow Profile for the biggest Entity)
  • Group smaller Entities together into shared Workflow Profiles
  • Use the parallel batch API to load multiple Workflows simultaneously (see the Automating IVL guide)
⚠️Warning
Never put all data into a single bucket if it contains many Entities. The Stage Engine works on entire buckets at a time — if one record has a problem, the whole bucket must be emptied and refilled. Smaller, Entity-focused buckets are more efficient and more resilient.
🛑Danger
In a real-world case, a client partitioned 8 million Forecast records by time period instead of by Entity. When they loaded different months simultaneously to the same Entities, the clear-and-replace cycle wiped data from incorrect periods. Repartitioning by Entity resolved the issue. This is the core reason Entity-based partitioning matters — it aligns with how the Cube clears and loads Data Units.

Multiple Import Child Profiles

A Workflow Profile can have multiple Import Child profiles — for example, one for GL data and another for Sales Detail. The Workflow Engine handles this automatically, but the behavior changes:

Single Import Child (Behavior 1)

Basic clear-and-replace. The engine follows standard Data Load Execution Steps — clear previous data for the Data Unit, then load the new data.

Multiple Import Children (Behavior 2)

When two or more Import Child profiles under the same parent might load to the same Cube Data Unit:
  1. The engine checks for overlapping Data Units between sibling profiles
  2. If overlap is found, it clears all previously loaded Data Units for both siblings
  3. Then reloads both using an accumulate (merge) method in the order they appear in the Workflow hierarchy
  4. If both profiles load to the same cells, the values are added together
ℹ️Info
The merge behavior for multiple Import Children is automatic. You do not configure it — the Workflow Engine detects overlapping Data Units and switches from replace to accumulate. If this is not the behavior you want, restructure your Workflow hierarchy so the profiles do not overlap.

Multiple Import Parents (Behavior 3)

If a central Workflow Profile (e.g., Central HR Load) needs to load data for Entities assigned to other Workflow Profiles, set Can Load Unrelated Entities to True on the central profile. Otherwise, the Workflow Engine will prevent the load.

YTD and MTD in the Same Profile

If a single Workflow Profile needs to handle both YTD data in one Origin and MTD/Periodic data in another Origin:
  1. Navigate to the Workflow Profile → expand the hierarchy
  2. Select the Origin that submits MTD/Periodic data
  3. Choose the Scenario Type
  4. Set the View behavior override to control how that Origin's data is interpreted

Load and the Origin Dimension

The Workflow Engine always uses the Import member of the Origin dimension when loading data. It also forces the Local member of the Consolidation dimension. These are automatic — you do not set them, and you cannot override them.
This built-in behavior provides a data protection layer between:
  • Imported data (Origin: Import)
  • Manual data entry (Origin: Forms)
  • Journal adjustments (Origin: AdjInput)