How-To: Use QueryJobs API Pagination

The QueryJobs API returns a 200-event result buffer for filter (non-aggregate) queries. To retrieve all matching events, you must use cursor-based pagination via the around parameter.

Scripts for both LogScale and NextGen SIEM are available for download:

Script Usage

LogScale (direct API)

shell
# One-time setup
python3 -m venv .venv
source .venv/bin/activate
pip install requests

# Environment variables (required)
export LOGSCALE_TOKEN="your-api-token"
export LOGSCALE_BASE_URL="https://your-logscale-instance.com/"
export LOGSCALE_REPO="your-repo-name"

# Run (wrap query in single quotes to preserve double quotes in CQL)
python queryjob_paginator.py -q '#event_simpleName=ProcessRollup2'

# With double quotes in the query
python queryjob_paginator.py -q '#event_simpleName=ProcessRollup2 | groupBy([aid], function=[count(as="Event Count")]) | sort("Event Count", limit=max)'

# Optional flags
python queryjob_paginator.py -q '...' -s 1h -e now -o results.json --max-events 5000 --page-size 200

# Help
python queryjob_paginator.py -h

Available parameters:

Flag Description Default
-q, --query CQL query string (required) โ€”
-s, --start Start time 15m
-e, --end End time now
-o, --output Output JSON file queryjob_results.json
--max-events Max events to retrieve unlimited
--page-size Events per cursor page 200

NG-SIEM (FalconPy)

shell
# One-time setup (same venv)
pip install crowdstrike-falconpy

# Environment variables (required)
export FALCON_CLIENT_ID="your-client-id"
export FALCON_CLIENT_SECRET="your-client-secret"
export FALCON_BASE_URL="https://api.us-2.crowdstrike.com"  # optional, defaults to US-1
export CA_BUNDLE="/path/to/ca-bundle.pem"                  # optional, for corporate proxy

# Run
python ngsiem_queryjob_paginator.py

Configuration is in the script's Configuration section (REPO, QUERY_STRING, START, END, PAGE_SIZE, MAX_EVENTS).

Required API scope: NGSIEM: Read + Write

How QueryJobs API Works

The QueryJobs API flow is as follows:

  1. Create a QueryJob โ†’ returns a job ID

  2. Poll (GET) โ†’ returns up to 200 events + metadata

  3. Check metadata โ†’ hasMoreEvents="true" means more events exist beyond the buffer

  4. Paginate using the around parameter to walk through remaining events

Key Metadata Fields

Field meaning
metaData.resultBufferSize Events in the buffer (default 200 for filter queries)
metaData.eventCount Number of events in current result set
metaData.processedEvents Total matching events found by the query
metaData.extraData.hasMoreEvents "true" (string!) if results exceed buffer
metaData.isAggregate Aggregate queries return all results in one shot
metaData.pollAfter Milliseconds to wait before next poll

Pagination Mechanisms

  1. Offset/Limit (within the buffer only) - Query parameters on the GET poll request:

    • ?paginationLimit=50&paginationOffset=0 - page within the 200-event buffer

    • Only useful for paging within what's already buffered, not for getting more events

  2. Cursor-Based (around parameter) (for results beyond the buffer) - The around parameter creates a new QueryJob anchored on a specific event:

    json
    {
      "queryString": "#event_simpleName=ProcessRollup2",
      "start": "15m",
      "end": "now",
      "around": {
        "eventId": "@id of anchor event",
        "timestamp": 1777469793821,
        "numberOfEventsBefore": 200,
        "numberOfEventsAfter": 0
      }
    }

    Critical detail: LogScale returns newest events first. The last event in the buffer is the oldest. To get more events, anchor on the oldest event and request numberOfEventsBefore (older events).

  3. What Does NOT Work:

    • Automatic cursor advancement - the server does NOT track which segments you've consumed. Repeated polls return the same 200 events.

    • dataspaces endpoint - legacy alias; use /api/v1/repositories/ instead.

    • Offset/limit beyond the buffer - paginationOffset only pages within resultBufferSize, not beyond it.

Pagination algorithm:

  1. Create QueryJob, poll until done=true

  2. Collect initial 200 events

  3. If hasMoreEvents="true":

    1. Take the LAST event (oldest, since results are newest-first)

    2. Create NEW QueryJob with around:

      • eventId = last event's @id

      • timestamp = last event's @timestamp

      • numberOfEventsBefore = 200

      • numberOfEventsAfter = 0

    3. Poll new job until done, collect events

    4. Deduplicate by @id (boundary event may repeat)

    5. Repeat from step 1 ("Take the LAST event...") until no new events returned

API Endpoints

LogScale (direct API)

Action Endpoint
Create POST /api/v1/repositories/{repo}/queryjobs
Poll GET /api/v1/repositories/{repo}/queryjobs/{id}
Cancel DELETE /api/v1/repositories/{repo}/queryjobs/{id}

NG-SIEM (via CrowdStrike API gateway)

Action Endpoint
Create POST /humio/api/v1/repositories/{repo}/queryjobs
Poll GET /humio/api/v1/repositories/{repo}/queryjobs/{id}
Cancel DELETE /humio/api/v1/repositories/{repo}/queryjobs/{id}

NG-SIEM repositories/views: search-all, investigate_view, third-party, falcon_for_it_view, forensics_view

Important Notes

  • hasMoreEvents is a string ("true" / "false"), not a boolean

  • done=true means the query finished, NOT that all results are delivered

  • QueryJobs auto-delete after 90 seconds of no polling

  • Aggregate queries (groupBy(), count(), etc.) return all results in one response - no pagination needed

  • Rate limit: 6000 concurrent query jobs per CID

SSL / Corporate Proxy

For Zscaler environments, set CA_BUNDLE env var pointing to the CA certificate bundle. The FalconPy script reads this and passes it as ssl_verify. If the bundle doesn't cover the API endpoint, use ssl_verify=False.