Module: Parse::Embeddings

Defined in:
lib/parse/embeddings.rb,
lib/parse/embeddings.rb,
lib/parse/embeddings/jina.rb,
lib/parse/embeddings/qwen.rb,
lib/parse/embeddings/cohere.rb,
lib/parse/embeddings/openai.rb,
lib/parse/embeddings/voyage.rb,
lib/parse/embeddings/fixture.rb,
lib/parse/embeddings/provider.rb,
lib/parse/embeddings/local_http.rb

Overview

Pluggable embedding-provider registry for :vector properties and the upcoming find_similar(text:) / Parse::Retrieval.retrieve surfaces.

Text-only providers shipped:

  • Fixture — deterministic, zero-network. Auto-registered as :fixture so tests can call Parse::Embeddings.provider(:fixture) with no setup.
  • OpenAI — text-embedding-3-small,large and ada-002.
  • Cohere — embed-english,multilingual-v3.0 and *-light-v3.0. Distinguishes :search_query / :search_document at the wire.
  • Voyage — voyage-4 family (incl. open-weight voyage-4-nano), voyage-3 family, voyage-code-3, voyage-finance-2, voyage-law-2. Distinguishes input types.
  • Jina — jina-embeddings-v3/v4/v5 (text + omni-text mode), jina-code-embeddings-00.5b,10.5b,1.5b. Matryoshka via dimensions:.
  • Qwen — qwen3-embedding-00.6b,4b,8b via Alibaba Cloud DashScope compatible-mode. All Matryoshka. The same checkpoints are open-weight on Hugging Face (Apache 2.0) for self-hosting behind LocalHTTP.
  • LocalHTTP — generic OpenAI-compatible client for Ollama, LM Studio, vLLM, etc. Configure-time SSRF gate; requires allow_private_endpoint: true to talk to localhost.

Image / multimodal embedding (embed_image) is a forthcoming feature — the Provider#embed_image hook is defined but only the multimodal-capable providers will override it.

Registration

Two equivalent forms. Embeddings.register is the canonical one-liner and what every example in the gem uses; Embeddings.configure is the block form for registering several providers at once or for Rails-style initializers. Both end up at the same ProviderRegistry, so pick whichever reads better in context.

Examples:

canonical: register one provider

Parse::Embeddings.register(:openai,
  Parse::Embeddings::OpenAI.new(api_key: ENV.fetch("OPENAI_API_KEY")))

block form for several providers

Parse::Embeddings.configure do |c|
  c.providers[:openai] = Parse::Embeddings::OpenAI.new(api_key: ENV.fetch("OPENAI_API_KEY"))
  c.providers[:openai_large] = Parse::Embeddings::OpenAI.new(
    api_key: ENV.fetch("OPENAI_API_KEY"), model: "text-embedding-3-large")
end

lookup

Parse::Embeddings.provider(:openai)   # => the registered instance
Parse::Embeddings.provider(:fixture)  # => default Fixture, zero-config

Defined Under Namespace

Classes: Cohere, Configuration, Error, Fixture, InvalidResponseError, Jina, LocalHTTP, OpenAI, Provider, ProviderNotRegistered, ProviderRegistry, Qwen, Voyage

Constant Summary collapse

CONFIG_MUTEX =

Monitor guarding configuration memoization and register writes. MRI's GVL would normally absorb the race on @configuration ||= ..., but JRuby and TruffleRuby can produce two Configuration instances when two threads race at boot (and lose any provider written to the loser). A Monitor (rather than a Mutex) is used so that register — which holds the lock and then calls configuration — can re-enter without deadlocking on the first-touch allocation path.

Monitor.new

Class Method Summary collapse

Class Method Details

.configurationConfiguration

Returns the singleton configuration object.

Returns:



137
138
139
140
141
142
# File 'lib/parse/embeddings.rb', line 137

def configuration
  # Double-checked memoization. The fast path is a single ivar
  # read; the slow path enters the mutex only when the
  # configuration is unallocated.
  @configuration || CONFIG_MUTEX.synchronize { @configuration ||= Configuration.new }
end

.configure {|config| ... } ⇒ Configuration

Block form for registering multiple providers at once. Prefer the one-liner register when adding a single provider; this form pays off when an initializer needs to set several or to mutate the registry conditionally.

Yield Parameters:

Returns:



131
132
133
134
# File 'lib/parse/embeddings.rb', line 131

def configure
  yield configuration if block_given?
  configuration
end

.provider(name) ⇒ Provider

Look up a registered provider.

Zero-config fallback: :fixture returns a default Fixture instance (64-dim, deterministic) when nothing is registered. Every other name raises ProviderNotRegistered. Tests can rely on provider(:fixture) working out of the box; production code must register what it uses.

Parameters:

Returns:

Raises:



173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
# File 'lib/parse/embeddings.rb', line 173

def provider(name)
  # Avoid blindly `to_sym`-ing the caller's input. An LLM tool or
  # webhook handler that pipes its `name:` argument through here
  # would otherwise let a remote caller grow the symbol table at
  # will. Ruby 3.2+ GCs symbols so the practical impact is small,
  # but a string-matched lookup costs nothing and closes the gap.
  if name.is_a?(Symbol)
    return configuration.providers[name] if configuration.providers.key?(name)
    key_string = name.to_s
  else
    key_string = name.to_s
    found = configuration.providers.keys.find { |k| k.to_s == key_string }
    return configuration.providers[found] if found
  end
  if key_string == "fixture"
    CONFIG_MUTEX.synchronize do
      return configuration.providers[:fixture] ||= Fixture.new
    end
  end
  raise ProviderNotRegistered,
        "Parse::Embeddings.provider(#{name.inspect}): no provider registered. " \
        "Register one via Parse::Embeddings.register(#{name.inspect}, …)."
end

.register(name, provider) ⇒ Provider

Canonical one-liner: register a single provider under name. Overwrites any previous registration. Use configure for multi-provider blocks.

Parameters:

Returns:

  • (Provider)

    the registered provider.



151
152
153
154
155
156
157
158
159
160
# File 'lib/parse/embeddings.rb', line 151

def register(name, provider)
  unless provider.is_a?(Provider)
    raise ArgumentError,
          "Parse::Embeddings.register: #{name.inspect} expects a Parse::Embeddings::Provider " \
          "instance (got #{provider.class})."
  end
  CONFIG_MUTEX.synchronize do
    configuration.providers[name.to_sym] = provider
  end
end

.registered_provider_namesArray<Symbol>

Names of currently-registered providers (does NOT include the implicit :fixture fallback unless it's been instantiated).

Returns:



201
202
203
# File 'lib/parse/embeddings.rb', line 201

def registered_provider_names
  configuration.providers.keys
end

.reset!void

This method returns an undefined value.

Reset the entire registry — intended for test teardown only. Production code should never call this; use register to override a single provider.



210
211
212
# File 'lib/parse/embeddings.rb', line 210

def reset!
  CONFIG_MUTEX.synchronize { @configuration = nil }
end