Module: Parse::PipelineSecurity

Defined in:
lib/parse/pipeline_security.rb

Overview

Canonical security validator for MongoDB aggregation pipelines and filter hashes that the SDK forwards to the driver or to Parse Server.

Previously the codebase had three different validators with three different rule sets:

  • Parse::Agent::PipelineValidator — strict allowlist for the Agent (read-only paths only)
  • Parse::Query#validate_pipeline! — outer-stage-only denylist
  • Parse::MongoDB.assert_no_denied_operators! — recursive denylist of server-side JS operators

Parse::AtlasSearch.convert_filter_for_mongodb was a complete passthrough that bypassed all three. A user-supplied filter containing $where/$expr/$function/$regex was injected straight into the pipeline $match stage, bypassing every existing constraint guard.

This module consolidates the rules. Every entry point that forwards a caller-supplied pipeline or filter to MongoDB now routes through one of the two public methods here:

  • PipelineSecurity.validate_pipeline! — strict mode (allowlist + size/depth caps). Used by Parse::Agent and by Parse::Query#aggregate for user-facing aggregation entry points.

  • PipelineSecurity.validate_filter! — permissive mode (recursive denylist only). Used by Parse::MongoDB.find/aggregate and Atlas Search filter passthrough where the pipeline is constructed by SDK code but a user-controlled filter hash is interpolated. Refuses $where/$function/$accumulator and the data-mutating stages at any nesting depth.

Policy: allowlist top-level, denylist recursive

Strict mode enforces ALLOWED_STAGES ONLY at the top-level stage key — nested sub-pipelines (inside $lookup.pipeline, $unionWith.pipeline, $facet.*, $graphLookup) are walked with the operator denylist but NOT with the stage allowlist. This is intentional: Atlas Search and uncommon-but-legitimate read stages like $densify and $fill must be allowed inside sub-pipelines even when the outer pipeline is strict-validated. The denylist is the security boundary; the allowlist is a shape check.

Caveat for Query#aggregate callers

Parse::Query#aggregate routes through PipelineSecurity.validate_filter!, not PipelineSecurity.validate_pipeline!, so user-supplied pipelines are checked against the denylist only. Permissive mode does NOT block $lookup, $graphLookup, or $unionWith reading from arbitrary collections — these are legitimate read stages but powerful enough to cross Parse ACL/CLP boundaries when the source collection lacks row-level enforcement. Never pass raw attacker-controlled input into Parse::Query#aggregate. Construct the pipeline in SDK code and interpolate only validated values.

Capability gap: $expr

$expr itself is not in DENIED_OPERATORS. The recursive walker catches $function/$accumulator nested inside $expr, so the immediate JavaScript-execution risk is closed. A future Atlas operator gated under $expr would slip until DENIED_OPERATORS is extended. Defense-in-depth callers concerned about expensive aggregation expressions ($regexMatch ReDoS, large $reduce loops) should validate user input shape before reaching this module.

Defined Under Namespace

Classes: Error

Constant Summary collapse

DENIED_OPERATORS =

Operators that are ALWAYS refused at any nesting depth. These either execute server-side JavaScript ($where, $function, $accumulator) or mutate the database ($out, $merge) or the server itself ($collMod, $createIndex, $dropIndex, $planCacheSetFilter, $planCacheClear). None of them are needed for read queries.

%w[
  $where $function $accumulator
  $out $merge
  $collMod $createIndex $dropIndex
  $planCacheSetFilter $planCacheClear
].freeze
DENIED_FIELD_REFS =

Field-reference paths (string values inside $expr whose first byte is $) that point at server-internal columns and must never be reachable from a user-influenced pipeline. A boolean expression inside $expr over any of these is a 1-bit-per-query side channel that bisects the value of a bcrypt hash, session token, or password-reset token. Names match Parse Server's internal column layout (cf. MongoStorageAdapter).

%w[
  $_hashed_password $_password_history
  $_session_token $_sessionToken
  $_email_verify_token $_perishable_token
  $_failed_login_count $_account_lockout_expires_at
  $_rperm $_wperm
  $_auth_data
].freeze
DENIED_FIELD_REF_PREFIXES =

String prefix for per-provider auth-data field references inside $expr. Parse Server stores per-provider columns as _auth_data_facebook, _auth_data_google, etc. — none of these should be reachable from a user-influenced pipeline. The prefix $_auth_data_ covers all of them without requiring an exhaustive list.

%w[$_auth_data_].freeze
ALLOWED_UNDERSCORE_COLLECTIONS =

MongoDB collection names that an SDK aggregation IS permitted to name in from:/coll:. Any name starting with _ outside this set is refused as an internal Parse Server collection. The four entries here are the only _-prefixed collections that hold Parse SDK data classes; everything else with a leading _ is server-managed state (_SCHEMA discloses class-level permissions; _Hooks discloses Cloud Code webhook URLs + secret keys; _GraphQLConfig discloses GraphQL schema state; _Audit holds operational telemetry; _Idempotency/_PushStatus/ _JobStatus/_JobSchedule/_GlobalConfig/_Audience hold internal Parse Server bookkeeping).

%w[_User _Role _Installation _Session].freeze
INTERNAL_FIELDS_DENYLIST =

Field names that are internal to Parse Server's storage layout and must never appear in returned documents. Most are stripped by Parse::MongoDB.convert_document_to_parse, but a raw-result path (raw: true) bypasses that conversion and would otherwise surface the bcrypt hash, session token, or reset token.

sessionToken / session_token (no leading underscore) are the credential column on _Session rows. Unlike the _User-side _session_token, the Session class declares it as a regular property, so without this entry a master-key agent that has had the class explicitly unhidden would receive raw bearer tokens in every row of a query_class("_Session") response. The denylist is the process-level floor — independent of class-visibility state — so even a deliberate agent_unhidden on _Session (or a compromised superadmin tool) cannot exfiltrate active tokens.

%w[
  _hashed_password _password_history
  _session_token _sessionToken
  sessionToken session_token
  _email_verify_token _perishable_token
  _failed_login_count _account_lockout_expires_at
  _rperm _wperm _tombstone
  _auth_data
].freeze
INTERNAL_FIELDS_PREFIX_DENYLIST =

Prefix covering per-provider auth-data columns (_auth_data_facebook, _auth_data_google, …). Used by strip_internal_fields and by the walk_for_denied! field-name screen.

%w[_auth_data_].freeze
FORENSIC_OPERATORS =

Forensic string-introspection operators. When any of these appears INSIDE $expr with a field-reference input string, the query becomes a per-character oracle even though the operator itself is otherwise legitimate. Refused inside $expr regardless of the input — the validator does not try to introspect operand shapes deeply, and these operators have no legitimate use against Parse-Server-managed columns from an SDK aggregation.

%w[
  $regexMatch $regexFind $regexFindAll
  $substr $substrBytes $substrCP
  $indexOfBytes $indexOfCP
  $strLenBytes $strLenCP
  $strcasecmp
].freeze
ALLOWED_STAGES =

Top-level pipeline stages permitted by the strict validator. The set covers Parse-Stack's own aggregation use, plus Atlas Search entry points ($search, $searchMeta, $listSearchIndexes) so that Parse::AtlasSearch calls do not break. $vectorSearch is included for Parse::VectorSearch — like $search, it is a read-only Atlas index stage and must be the FIRST stage of the pipeline (Atlas refuses it otherwise).

%w[
  $match $group $sort $project $limit $skip $unwind $lookup
  $count $addFields $set $unset $bucket $bucketAuto $facet
  $sample $sortByCount $replaceRoot $replaceWith $redact
  $graphLookup $unionWith
  $search $searchMeta $listSearchIndexes $vectorSearch
].freeze
STAGE0_ONLY_ATLAS_STAGES =

Atlas operators that are valid only as the FIRST stage of a pipeline (Atlas refuses them anywhere else). They are present in ALLOWED_STAGES so the SDK's own modules — Parse::AtlasSearch and Parse::VectorSearch — can emit them; both of those modules bypass validate_pipeline! and build their pipelines internally. Caller-supplied pipelines (e.g. through Parse::Agent::Tools.aggregate) must NOT include these stages: the Agent's tenant-scope $match prepend would push them off stage 0, and the proper agent surface for full-text and vector search is the dedicated atlas_search / semantic_search tools, not raw aggregate.

%w[
  $search $searchMeta $vectorSearch $listSearchIndexes
].freeze
MAX_REGEX_PATTERN_LENGTH =

Cap on the length of a caller-supplied $regex (or the regex: field inside $regexMatch / $regexFind / $regexFindAll) pattern string. ReDoS protection: doesn't catch every pathological pattern (small patterns like (a+)+$ can still backtrack catastrophically), but caps the worst class of caller-shipped patterns and stops the "1MB regex" denial-of-service shape that an attacker could send through vector_filter: / filter: / where:. Legitimate Parse-Server queries are well under this.

512
MAX_PIPELINE_STAGES =

Cap on number of top-level stages in a strict-validated pipeline.

20
MAX_DEPTH =

Cap on nested object/array depth during recursive walks. Stops a caller from forcing the validator into a near-infinite traversal. Legitimate Parse-generated pipelines with $facet containing $lookup with let and correlated sub-pipelines ($match.$expr. $and.[…]) can reach depth 12+ on a normal read, so we keep comfortable headroom above the real ceiling.

20

Class Method Summary collapse

Class Method Details

.assert_collection_allowed!(name) ⇒ Object

Refuses any collection name reserved for Parse Server's internal state. Accepts the four SDK-data system classes (_User, _Role, _Installation, _Session) and any non-_-prefixed name. Used by LookupRewriter and by the Agent's pipeline walker to enforce a hard floor independent of any per-Agent MetadataRegistry.hidden? policy.

Parameters:

  • name (String, Symbol, nil)

    the collection name from from:/coll:. nil is treated as "no collection named" -- the caller passes through.

Raises:



309
310
311
312
313
314
315
316
317
318
319
320
321
# File 'lib/parse/pipeline_security.rb', line 309

def assert_collection_allowed!(name)
  return if name.nil?
  str = name.to_s
  return if str.empty?
  return unless str.start_with?("_")
  return if ALLOWED_UNDERSCORE_COLLECTIONS.include?(str)
  raise Error.new(
    "SECURITY: Collection '#{str}' is reserved for Parse Server's internal " \
    "state and is not reachable from an SDK aggregation pipeline.",
    operator: str,
    reason: :denied_internal_collection,
  )
end

.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void

This method returns an undefined value.

Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via $<field> on the RHS of a $project / $addFields / $set / $group / $bucket / $replaceWith / $lookup.let clause.

The protectedFields enforcement layer (CLPScope.redact_protected_fields!) strips the field by NAME from the result rows. But a pipeline can launder a protected field through a rename:

{ "$addFields" => { "ssn_copy" => "$ssn" } }
{ "$project"   => { "renamed"  => "$ssn", "objectId" => 1 } }
{ "$group"     => { "_id" => "$ssn", "n" => { "$sum" => 1 } } }

The post-fetch strip walks the rows and deletes ssn keys, but the value is now stored under ssn_copy / renamed / _id, so the strip walks past it. This scanner runs BEFORE the pipeline reaches Mongo: any $<field> string whose unprefixed name is in the class's protected-fields set raises CLPScope::Denied so the caller knows the join was refused, rather than silently leaking the renamed value.

Variable references ($$ROOT, $$CURRENT, $$user_var) are NOT field references — they're aggregation variables. The walker checks the leading $ is single, not double, before treating the string as a field path.

Master mode + nil resolution short-circuit at the entry: the walker is a no-op when the caller can read everything anyway.

Parameters:

  • pipeline (Array<Hash>)

    the caller-supplied pipeline, before SDK-side ACL stages are prepended.

  • collection_name (String)

    the queried collection / class.

  • resolution (Parse::ACLScope::Resolution, nil)

    the resolved scope; nil-or-master short-circuits.

Raises:



374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
# File 'lib/parse/pipeline_security.rb', line 374

def refuse_protected_field_references!(pipeline, collection_name, resolution)
  return if resolution.nil? || (resolution.respond_to?(:master?) && resolution.master?)
  return if pipeline.nil? || pipeline.empty?
  perms = resolution.respond_to?(:permission_strings) ? resolution.permission_strings : nil
  return if perms.nil?

  # Lazy-require to avoid forcing CLPScope load order when the
  # caller hasn't otherwise needed it.
  require_relative "clp_scope" unless defined?(Parse::CLPScope)

  protected_set = Parse::CLPScope.protected_fields_for(collection_name, perms)
  return if protected_set.nil? || protected_set.empty?

  pipeline.each_with_index do |stage, idx|
    walk_for_protected_ref!(stage, protected_set, collection_name, "pipeline[#{idx}]")
  end
  nil
end

.strip_internal_fields(doc) ⇒ Object

Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep -- raw search documents are flat). Returns a new Hash; the input is not mutated. Non-Hash inputs return unchanged so callers can pipe arbitrary cursor entries through this.



327
328
329
330
331
332
333
334
335
# File 'lib/parse/pipeline_security.rb', line 327

def strip_internal_fields(doc)
  return doc unless doc.is_a?(Hash)
  doc.each_with_object({}) do |(key, value), out|
    k = key.to_s
    next if INTERNAL_FIELDS_DENYLIST.include?(k)
    next if INTERNAL_FIELDS_PREFIX_DENYLIST.any? { |prefix| k.start_with?(prefix) }
    out[key] = value
  end
end

.valid_filter?(node) ⇒ Boolean

Returns true if the node passes permissive validation.

Returns:

  • (Boolean)

    true if the node passes permissive validation.



290
291
292
293
294
295
# File 'lib/parse/pipeline_security.rb', line 290

def valid_filter?(node)
  validate_filter!(node)
  true
rescue Error
  false
end

.valid_pipeline?(pipeline) ⇒ Boolean

Returns true if the pipeline passes strict validation.

Returns:

  • (Boolean)

    true if the pipeline passes strict validation.



282
283
284
285
286
287
# File 'lib/parse/pipeline_security.rb', line 282

def valid_pipeline?(pipeline)
  validate_pipeline!(pipeline)
  true
rescue Error
  false
end

.validate_filter!(node, allow_internal_fields: false) ⇒ true

Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS. Does NOT check the top-level stage allowlist or the stage count cap. Used by direct-MongoDB sinks where callers have explicit intent and want flexibility in stage selection, but server-side JS and data-mutating operators must still be refused.

Parameters:

  • node (Hash, Array, Object)

    the structure to walk.

  • allow_internal_fields (Boolean) (defaults to: false)

    when true, skip the INTERNAL_FIELDS_DENYLIST check (e.g. for SDK-generated ACL filters that legitimately reference +_rperm+/+_wperm+ via Query#readable_by_role and friends). The DENIED_OPERATORS walk and forensic-operator gating still apply. Default false for callers that forward raw, user-influenced pipelines (e.g. Agent MCP tools).

Returns:

  • (true)

Raises:

  • (Error)

    if a denied operator is found at any depth.



276
277
278
279
# File 'lib/parse/pipeline_security.rb', line 276

def validate_filter!(node, allow_internal_fields: false)
  walk_for_denied!(node, depth: 0, allow_internal_fields: allow_internal_fields)
  true
end

.validate_pipeline!(pipeline) ⇒ true

Strict validation: pipeline must be a non-empty Array of Hashes, each Hash's top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.

Parameters:

  • pipeline (Array<Hash>)

    the aggregation pipeline.

Returns:

  • (true)

Raises:

  • (Error)

    if validation fails.



238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# File 'lib/parse/pipeline_security.rb', line 238

def validate_pipeline!(pipeline)
  unless pipeline.is_a?(Array)
    raise Error.new("Pipeline must be an Array, got #{pipeline.class}", reason: :invalid_type)
  end
  if pipeline.empty?
    raise Error.new("Pipeline cannot be empty", reason: :empty_pipeline)
  end
  if pipeline.size > MAX_PIPELINE_STAGES
    raise Error.new(
      "Pipeline exceeds maximum of #{MAX_PIPELINE_STAGES} stages (got #{pipeline.size})",
      reason: :too_many_stages,
    )
  end

  pipeline.each_with_index do |stage, idx|
    validate_stage!(stage, idx)
  end
  true
end