Module: Parse::PipelineSecurity
- Defined in:
- lib/parse/pipeline_security.rb
Overview
Canonical security validator for MongoDB aggregation pipelines and filter hashes that the SDK forwards to the driver or to Parse Server.
Previously the codebase had three different validators with three different rule sets:
Parse::Agent::PipelineValidator— strict allowlist for the Agent (read-only paths only)Parse::Query#validate_pipeline!— outer-stage-only denylistParse::MongoDB.assert_no_denied_operators!— recursive denylist of server-side JS operators
Parse::AtlasSearch.convert_filter_for_mongodb was a complete
passthrough that bypassed all three. A user-supplied filter containing
$where/$expr/$function/$regex was injected straight into the
pipeline $match stage, bypassing every existing constraint guard.
This module consolidates the rules. Every entry point that forwards a caller-supplied pipeline or filter to MongoDB now routes through one of the two public methods here:
-
PipelineSecurity.validate_pipeline! — strict mode (allowlist + size/depth caps). Used by
Parse::Agentand byParse::Query#aggregatefor user-facing aggregation entry points. -
PipelineSecurity.validate_filter! — permissive mode (recursive denylist only). Used by
Parse::MongoDB.find/aggregateand Atlas Search filter passthrough where the pipeline is constructed by SDK code but a user-controlled filter hash is interpolated. Refuses$where/$function/$accumulatorand the data-mutating stages at any nesting depth.
Policy: allowlist top-level, denylist recursive
Strict mode enforces ALLOWED_STAGES ONLY at the top-level stage
key — nested sub-pipelines (inside $lookup.pipeline,
$unionWith.pipeline, $facet.*, $graphLookup) are walked with
the operator denylist but NOT with the stage allowlist. This is
intentional: Atlas Search and uncommon-but-legitimate read stages
like $densify and $fill must be allowed inside sub-pipelines
even when the outer pipeline is strict-validated. The denylist is
the security boundary; the allowlist is a shape check.
Caveat for Query#aggregate callers
Parse::Query#aggregate routes through PipelineSecurity.validate_filter!, not
PipelineSecurity.validate_pipeline!, so user-supplied pipelines are checked
against the denylist only. Permissive mode does NOT block
$lookup, $graphLookup, or $unionWith reading from arbitrary
collections — these are legitimate read stages but powerful enough
to cross Parse ACL/CLP boundaries when the source collection lacks
row-level enforcement. Never pass raw attacker-controlled input
into Parse::Query#aggregate. Construct the pipeline in SDK code
and interpolate only validated values.
Capability gap: $expr
$expr itself is not in DENIED_OPERATORS. The recursive walker
catches $function/$accumulator nested inside $expr, so the
immediate JavaScript-execution risk is closed. A future Atlas
operator gated under $expr would slip until DENIED_OPERATORS
is extended. Defense-in-depth callers concerned about expensive
aggregation expressions ($regexMatch ReDoS, large $reduce
loops) should validate user input shape before reaching this
module.
Defined Under Namespace
Classes: Error
Constant Summary collapse
- DENIED_OPERATORS =
Operators that are ALWAYS refused at any nesting depth. These either execute server-side JavaScript (
$where,$function,$accumulator) or mutate the database ($out,$merge) or the server itself ($collMod,$createIndex,$dropIndex,$planCacheSetFilter,$planCacheClear). None of them are needed for read queries. %w[ $where $function $accumulator $out $merge $collMod $createIndex $dropIndex $planCacheSetFilter $planCacheClear ].freeze
- DENIED_FIELD_REFS =
Field-reference paths (string values inside
$exprwhose first byte is$) that point at server-internal columns and must never be reachable from a user-influenced pipeline. A boolean expression inside$exprover any of these is a 1-bit-per-query side channel that bisects the value of a bcrypt hash, session token, or password-reset token. Names match Parse Server's internal column layout (cf. MongoStorageAdapter). %w[ $_hashed_password $_password_history $_session_token $_sessionToken $_email_verify_token $_perishable_token $_failed_login_count $_account_lockout_expires_at $_rperm $_wperm $_auth_data ].freeze
- DENIED_FIELD_REF_PREFIXES =
String prefix for per-provider auth-data field references inside $expr. Parse Server stores per-provider columns as
_auth_data_facebook,_auth_data_google, etc. — none of these should be reachable from a user-influenced pipeline. The prefix$_auth_data_covers all of them without requiring an exhaustive list. %w[$_auth_data_].freeze
- ALLOWED_UNDERSCORE_COLLECTIONS =
MongoDB collection names that an SDK aggregation IS permitted to name in
from:/coll:. Any name starting with_outside this set is refused as an internal Parse Server collection. The four entries here are the only_-prefixed collections that hold Parse SDK data classes; everything else with a leading_is server-managed state (_SCHEMAdiscloses class-level permissions;_Hooksdiscloses Cloud Code webhook URLs + secret keys;_GraphQLConfigdiscloses GraphQL schema state;_Auditholds operational telemetry;_Idempotency/_PushStatus/_JobStatus/_JobSchedule/_GlobalConfig/_Audiencehold internal Parse Server bookkeeping). %w[_User _Role _Installation _Session].freeze
- INTERNAL_FIELDS_DENYLIST =
Field names that are internal to Parse Server's storage layout and must never appear in returned documents. Most are stripped by
Parse::MongoDB.convert_document_to_parse, but a raw-result path (raw: true) bypasses that conversion and would otherwise surface the bcrypt hash, session token, or reset token.sessionToken/session_token(no leading underscore) are the credential column on_Sessionrows. Unlike the_User-side_session_token, the Session class declares it as a regular property, so without this entry a master-key agent that has had the class explicitly unhidden would receive raw bearer tokens in every row of aquery_class("_Session")response. The denylist is the process-level floor — independent of class-visibility state — so even a deliberateagent_unhiddenon_Session(or a compromised superadmin tool) cannot exfiltrate active tokens. %w[ _hashed_password _password_history _session_token _sessionToken sessionToken session_token _email_verify_token _perishable_token _failed_login_count _account_lockout_expires_at _rperm _wperm _tombstone _auth_data ].freeze
- INTERNAL_FIELDS_PREFIX_DENYLIST =
Prefix covering per-provider auth-data columns (
_auth_data_facebook,_auth_data_google, …). Used by strip_internal_fields and by the walk_for_denied! field-name screen. %w[_auth_data_].freeze
- FORENSIC_OPERATORS =
Forensic string-introspection operators. When any of these appears INSIDE
$exprwith a field-reference input string, the query becomes a per-character oracle even though the operator itself is otherwise legitimate. Refused inside$exprregardless of the input — the validator does not try to introspect operand shapes deeply, and these operators have no legitimate use against Parse-Server-managed columns from an SDK aggregation. %w[ $regexMatch $regexFind $regexFindAll $substr $substrBytes $substrCP $indexOfBytes $indexOfCP $strLenBytes $strLenCP $strcasecmp ].freeze
- ALLOWED_STAGES =
Top-level pipeline stages permitted by the strict validator. The set covers Parse-Stack's own aggregation use, plus Atlas Search entry points (
$search,$searchMeta,$listSearchIndexes) so thatParse::AtlasSearchcalls do not break.$vectorSearchis included forParse::VectorSearch— like$search, it is a read-only Atlas index stage and must be the FIRST stage of the pipeline (Atlas refuses it otherwise). %w[ $match $group $sort $project $limit $skip $unwind $lookup $count $addFields $set $unset $bucket $bucketAuto $facet $sample $sortByCount $replaceRoot $replaceWith $redact $graphLookup $unionWith $search $searchMeta $listSearchIndexes $vectorSearch ].freeze
- STAGE0_ONLY_ATLAS_STAGES =
Atlas operators that are valid only as the FIRST stage of a pipeline (Atlas refuses them anywhere else). They are present in ALLOWED_STAGES so the SDK's own modules —
Parse::AtlasSearchandParse::VectorSearch— can emit them; both of those modules bypass validate_pipeline! and build their pipelines internally. Caller-supplied pipelines (e.g. throughParse::Agent::Tools.aggregate) must NOT include these stages: the Agent's tenant-scope$matchprepend would push them off stage 0, and the proper agent surface for full-text and vector search is the dedicatedatlas_search/semantic_searchtools, not raw aggregate. %w[ $search $searchMeta $vectorSearch $listSearchIndexes ].freeze
- MAX_REGEX_PATTERN_LENGTH =
Cap on the length of a caller-supplied
$regex(or theregex:field inside$regexMatch/$regexFind/$regexFindAll) pattern string. ReDoS protection: doesn't catch every pathological pattern (small patterns like(a+)+$can still backtrack catastrophically), but caps the worst class of caller-shipped patterns and stops the "1MB regex" denial-of-service shape that an attacker could send throughvector_filter:/filter:/where:. Legitimate Parse-Server queries are well under this. 512- MAX_PIPELINE_STAGES =
Cap on number of top-level stages in a strict-validated pipeline.
20- MAX_DEPTH =
Cap on nested object/array depth during recursive walks. Stops a caller from forcing the validator into a near-infinite traversal. Legitimate Parse-generated pipelines with
$facetcontaining$lookupwithletand correlated sub-pipelines ($match.$expr. $and.[…]) can reach depth 12+ on a normal read, so we keep comfortable headroom above the real ceiling. 20
Class Method Summary collapse
-
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server's internal state.
-
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that reference a protected field via
$<field>on the RHS of a$project/$addFields/$set/$group/$bucket/$replaceWith/$lookup.letclause. -
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep -- raw search documents are flat).
-
.valid_filter?(node) ⇒ Boolean
True if the node passes permissive validation.
-
.valid_pipeline?(pipeline) ⇒ Boolean
True if the pipeline passes strict validation.
-
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS.
-
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash's top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
Class Method Details
.assert_collection_allowed!(name) ⇒ Object
Refuses any collection name reserved for Parse Server's internal
state. Accepts the four SDK-data system classes (_User,
_Role, _Installation, _Session) and any non-_-prefixed
name. Used by LookupRewriter and by the Agent's pipeline
walker to enforce a hard floor independent of any per-Agent
MetadataRegistry.hidden? policy.
309 310 311 312 313 314 315 316 317 318 319 320 321 |
# File 'lib/parse/pipeline_security.rb', line 309 def assert_collection_allowed!(name) return if name.nil? str = name.to_s return if str.empty? return unless str.start_with?("_") return if ALLOWED_UNDERSCORE_COLLECTIONS.include?(str) raise Error.new( "SECURITY: Collection '#{str}' is reserved for Parse Server's internal " \ "state and is not reachable from an SDK aggregation pipeline.", operator: str, reason: :denied_internal_collection, ) end |
.refuse_protected_field_references!(pipeline, collection_name, resolution) ⇒ void
This method returns an undefined value.
Wave-3 TRACK-CLP-4: refuse caller-supplied pipelines that
reference a protected field via $<field> on the RHS of a
$project / $addFields / $set / $group / $bucket /
$replaceWith / $lookup.let clause.
The protectedFields enforcement layer (CLPScope.redact_protected_fields!) strips the field by NAME from the result rows. But a pipeline can launder a protected field through a rename:
{ "$addFields" => { "ssn_copy" => "$ssn" } }
{ "$project" => { "renamed" => "$ssn", "objectId" => 1 } }
{ "$group" => { "_id" => "$ssn", "n" => { "$sum" => 1 } } }
The post-fetch strip walks the rows and deletes ssn keys, but
the value is now stored under ssn_copy / renamed / _id,
so the strip walks past it. This scanner runs BEFORE the pipeline
reaches Mongo: any $<field> string whose unprefixed name is in
the class's protected-fields set raises CLPScope::Denied
so the caller knows the join was refused, rather than silently
leaking the renamed value.
Variable references ($$ROOT, $$CURRENT, $$user_var) are
NOT field references — they're aggregation variables. The walker
checks the leading $ is single, not double, before treating the
string as a field path.
Master mode + nil resolution short-circuit at the entry: the walker is a no-op when the caller can read everything anyway.
374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/parse/pipeline_security.rb', line 374 def refuse_protected_field_references!(pipeline, collection_name, resolution) return if resolution.nil? || (resolution.respond_to?(:master?) && resolution.master?) return if pipeline.nil? || pipeline.empty? perms = resolution.respond_to?(:permission_strings) ? resolution. : nil return if perms.nil? # Lazy-require to avoid forcing CLPScope load order when the # caller hasn't otherwise needed it. require_relative "clp_scope" unless defined?(Parse::CLPScope) protected_set = Parse::CLPScope.protected_fields_for(collection_name, perms) return if protected_set.nil? || protected_set.empty? pipeline.each_with_index do |stage, idx| walk_for_protected_ref!(stage, protected_set, collection_name, "pipeline[#{idx}]") end nil end |
.strip_internal_fields(doc) ⇒ Object
Strip INTERNAL_FIELDS_DENYLIST keys from a Hash document (one level deep -- raw search documents are flat). Returns a new Hash; the input is not mutated. Non-Hash inputs return unchanged so callers can pipe arbitrary cursor entries through this.
327 328 329 330 331 332 333 334 335 |
# File 'lib/parse/pipeline_security.rb', line 327 def strip_internal_fields(doc) return doc unless doc.is_a?(Hash) doc.each_with_object({}) do |(key, value), out| k = key.to_s next if INTERNAL_FIELDS_DENYLIST.include?(k) next if INTERNAL_FIELDS_PREFIX_DENYLIST.any? { |prefix| k.start_with?(prefix) } out[key] = value end end |
.valid_filter?(node) ⇒ Boolean
Returns true if the node passes permissive validation.
290 291 292 293 294 295 |
# File 'lib/parse/pipeline_security.rb', line 290 def valid_filter?(node) validate_filter!(node) true rescue Error false end |
.valid_pipeline?(pipeline) ⇒ Boolean
Returns true if the pipeline passes strict validation.
282 283 284 285 286 287 |
# File 'lib/parse/pipeline_security.rb', line 282 def valid_pipeline?(pipeline) validate_pipeline!(pipeline) true rescue Error false end |
.validate_filter!(node, allow_internal_fields: false) ⇒ true
Permissive validation: walks the given Hash or Array (or anything else, which is a no-op) and refuses any nested key that appears in DENIED_OPERATORS. Does NOT check the top-level stage allowlist or the stage count cap. Used by direct-MongoDB sinks where callers have explicit intent and want flexibility in stage selection, but server-side JS and data-mutating operators must still be refused.
276 277 278 279 |
# File 'lib/parse/pipeline_security.rb', line 276 def validate_filter!(node, allow_internal_fields: false) walk_for_denied!(node, depth: 0, allow_internal_fields: allow_internal_fields) true end |
.validate_pipeline!(pipeline) ⇒ true
Strict validation: pipeline must be a non-empty Array of Hashes, each Hash's top-level key must be in ALLOWED_STAGES, and no entry in DENIED_OPERATORS may appear at any nesting depth.
238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/parse/pipeline_security.rb', line 238 def validate_pipeline!(pipeline) unless pipeline.is_a?(Array) raise Error.new("Pipeline must be an Array, got #{pipeline.class}", reason: :invalid_type) end if pipeline.empty? raise Error.new("Pipeline cannot be empty", reason: :empty_pipeline) end if pipeline.size > MAX_PIPELINE_STAGES raise Error.new( "Pipeline exceeds maximum of #{MAX_PIPELINE_STAGES} stages (got #{pipeline.size})", reason: :too_many_stages, ) end pipeline.each_with_index do |stage, idx| validate_stage!(stage, idx) end true end |