Class: Parse::Embeddings::LocalHTTP

Inherits:
Provider
  • Object
show all
Defined in:
lib/parse/embeddings/local_http.rb

Overview

Generic OpenAI-compatible local embedding provider. Talks to any server that exposes POST <base_url>/embeddings with the OpenAI request/response shape — covers Ollama (/v1), LM Studio (/v1), vLLM, llama.cpp's server, and any reverse-proxy that translates to a local model runner.

SSRF gate

The base_url is resolved at construction time and the resolved addresses are checked against File::BLOCKED_CIDRS (loopback, RFC1918, link-local, cloud-metadata, CGNAT, IPv6 ULA, …). When ANY resolved address falls in a private/internal range, the constructor refuses unless the caller opts in via allow_private_endpoint: true.

The opt-in is a deliberate, audit-able gate — Parse::Embeddings registration is configuration code, not user input, so opting in to "yes, this base_url really is my Ollama on localhost" is a one-line decision by the operator at boot time. A Kernel#warn fires when the opt-in is taken so the choice shows up in operator logs / bundle exec rake about output.

http:// base URLs are accepted with allow_private_endpoint: true (the typical local-runner deployment), and refused otherwise unless the caller also passes allow_insecure_base_url: true (escape hatch for self-signed internal HTTPS proxies fronted by http://).

Why no fixed model whitelist

Ollama, LM Studio, and vLLM all serve operator-chosen models — we cannot enumerate "supported" models the way OpenAI can. The constructor instead takes the dimensions: explicitly, and the provider's Provider#validate_response! (inherited) enforces that every returned vector matches that width. Mis-specified dimensions surface as InvalidResponseError on the first embed call.

Security

  • Configure-time SSRF gate (above).
  • The Faraday connection refuses proxy: unless the caller opts in via allow_faraday_proxy: true. Env-proxy autodiscovery is suppressed by default — same model as OpenAI.
  • #inspect (inherited from Provider) never surfaces @api_key.

Examples:

Ollama on the same host

Parse::Embeddings.register(:ollama,
  Parse::Embeddings::LocalHTTP.new(
    base_url: "http://localhost:11434/v1",
    model: "nomic-embed-text",
    dimensions: 768,
    allow_private_endpoint: true,
  ))

public OpenAI-compatible proxy (e.g. internal gateway on a public DNS name)

Parse::Embeddings.register(:gateway,
  Parse::Embeddings::LocalHTTP.new(
    base_url: "https://embeddings.example.com/v1",
    api_key:  ENV.fetch("GATEWAY_API_KEY"),
    model:    "bge-small-en-v1.5",
    dimensions: 384,
  ))

Defined Under Namespace

Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError

Constant Summary collapse

DEFAULT_TIMEOUT =
30
DEFAULT_OPEN_TIMEOUT =
5
DEFAULT_MAX_RETRIES =
3
DEFAULT_BATCH_SIZE =
32
MAX_RESPONSE_BYTES =
16 * 1024 * 1024

Constants inherited from Provider

Provider::AS_NOTIFICATION_NAME

Instance Method Summary collapse

Methods inherited from Provider

#embed_image, #embed_text_batched, #inspect, #instrument_embed, #max_input_tokens, #modalities, #validate_response!

Constructor Details

#initialize(base_url:, model:, dimensions:, api_key: nil, normalize: false, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_private_endpoint: false, allow_insecure_base_url: false, allow_faraday_proxy: false, connection: nil) ⇒ LocalHTTP

Returns a new instance of LocalHTTP.

Parameters:

  • base_url (String)

    required. Must be http(s):// with a host.

  • model (String)

    required. Identifier the local server expects in the model request field. Persisted to embedding_meta.

  • dimensions (Integer)

    required. Width of vectors the local model produces. Enforced by Provider#validate_response!.

  • api_key (String, nil) (defaults to: nil)

    optional. When present, sent as Authorization: Bearer …. Local runners typically accept any value or no header.

  • normalize (Boolean) (defaults to: false)

    whether the local model returns unit-normalized vectors. Defaults to false (Ollama and most local models do NOT normalize; bge-* and OpenAI do). Affects similarity metric selection downstream.

  • timeout (Integer) (defaults to: DEFAULT_TIMEOUT)

    read timeout, seconds.

  • open_timeout (Integer) (defaults to: DEFAULT_OPEN_TIMEOUT)

    connect timeout, seconds.

  • max_retries (Integer) (defaults to: DEFAULT_MAX_RETRIES)

    retry attempts on 429/5xx/timeouts.

  • embed_batch_size (Integer) (defaults to: DEFAULT_BATCH_SIZE)

    inputs per request.

  • allow_private_endpoint (Boolean) (defaults to: false)

    required when base_url resolves to a private/internal/loopback address. Defaults false; opting in emits a one-time warning per provider instance.

  • allow_insecure_base_url (Boolean) (defaults to: false)

    permit http:// for PUBLIC base URLs. Defaults false. Independent of allow_private_endpoint (which already implies http:// is fine for the local case).

  • allow_faraday_proxy (Boolean) (defaults to: false)

    opt in to proxy / env-proxy autodiscovery. Defaults false.

  • connection (Faraday::Connection, nil) (defaults to: nil)

    injection seam.



114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# File 'lib/parse/embeddings/local_http.rb', line 114

def initialize(
  base_url:,
  model:,
  dimensions:,
  api_key: nil,
  normalize: false,
  timeout: DEFAULT_TIMEOUT,
  open_timeout: DEFAULT_OPEN_TIMEOUT,
  max_retries: DEFAULT_MAX_RETRIES,
  embed_batch_size: DEFAULT_BATCH_SIZE,
  allow_private_endpoint: false,
  allow_insecure_base_url: false,
  allow_faraday_proxy: false,
  connection: nil
)
  validate_model!(model)
  validate_dimensions!(dimensions)
  validate_optional_api_key!(api_key)
  unless [true, false].include?(normalize)
    raise ArgumentError,
          "Parse::Embeddings::LocalHTTP: normalize must be true or false (got #{normalize.inspect})."
  end
  validate_positive_integer!(:timeout, timeout)
  validate_positive_integer!(:open_timeout, open_timeout)
  validate_non_negative_integer!(:max_retries, max_retries)
  validate_positive_integer!(:embed_batch_size, embed_batch_size)

  sanitized_base_url, resolved_addrs, is_private =
    validate_base_url_and_gate_ssrf!(base_url,
                                     allow_private_endpoint: allow_private_endpoint,
                                     allow_insecure_base_url: allow_insecure_base_url)
  if is_private
    # Audit log. Emits once per instance — Kernel#warn so it lands
    # on stderr and any logger that captures it. Operators running
    # a hardened environment can grep this to confirm every
    # private-endpoint opt-in was intentional.
    warn "Parse::Embeddings::LocalHTTP: allow_private_endpoint=true for #{sanitized_base_url}" \
         "resolved to private address(es) #{resolved_addrs.map(&:to_s).inspect}."
  end

  @base_url = sanitized_base_url
  @model = model
  @dimensions = dimensions
  @api_key = api_key
  @normalize = normalize
  @timeout = timeout
  @open_timeout = open_timeout
  @max_retries = max_retries
  @embed_batch_size = embed_batch_size
  @allow_faraday_proxy = allow_faraday_proxy
  @connection = connection || build_connection
end

Instance Method Details

#dimensionsObject



167
168
169
# File 'lib/parse/embeddings/local_http.rb', line 167

def dimensions
  @dimensions
end

#embed_batch_sizeObject



175
176
177
# File 'lib/parse/embeddings/local_http.rb', line 175

def embed_batch_size
  @embed_batch_size
end

#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>

Returns vectors aligned 1:1 with strings.

Parameters:

  • strings (Array<String>)

    inputs.

  • input_type (Symbol) (defaults to: :search_document)

    accepted for forward compatibility, ignored at the wire level.

Returns:

  • (Array<Array<Float>>)

    vectors aligned 1:1 with strings.



196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
# File 'lib/parse/embeddings/local_http.rb', line 196

def embed_text(strings, input_type: :search_document)
  unless strings.is_a?(Array)
    raise ArgumentError,
          "Parse::Embeddings::LocalHTTP#embed_text expects Array<String> (got #{strings.class})."
  end
  return [] if strings.empty?
  strings.each_with_index do |s, i|
    unless s.is_a?(String)
      raise ArgumentError,
            "Parse::Embeddings::LocalHTTP#embed_text strings[#{i}] is not a String (#{s.class})."
    end
    if s.empty?
      raise ArgumentError,
            "Parse::Embeddings::LocalHTTP#embed_text strings[#{i}] is empty; local runners typically reject empty inputs."
    end
  end

  body = { input: strings, model: @model }

  instrument_embed(strings.length, input_type) do |emit_payload|
    payload = post_embeddings(body)
    # Local runners may or may not include `usage`. When present,
    # forward total_tokens to the AS::N payload.
    if payload.is_a?(Hash) && payload["usage"].is_a?(Hash)
      tt = payload["usage"]["total_tokens"]
      emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0
    end
    vectors = extract_vectors!(payload, strings.length)
    validate_response!(strings.length, vectors)
  end
end

#inspect_attrsObject



228
229
230
# File 'lib/parse/embeddings/local_http.rb', line 228

def inspect_attrs
  super.merge(base: safe_base_host, retries: @max_retries)
end

#model_nameObject



171
172
173
# File 'lib/parse/embeddings/local_http.rb', line 171

def model_name
  @model
end

#normalize?Boolean

Returns:

  • (Boolean)


179
180
181
# File 'lib/parse/embeddings/local_http.rb', line 179

def normalize?
  @normalize
end

#supports_input_type?Boolean

Returns:

  • (Boolean)


183
184
185
186
187
188
189
190
# File 'lib/parse/embeddings/local_http.rb', line 183

def supports_input_type?
  # The OpenAI-compatible local runners do not asymmetrize. Some
  # models (bge-*) have a documented query prefix, but the local
  # server itself doesn't expose `input_type:` — callers wrap the
  # query text instead. We accept the kwarg for cache-key stability
  # but drop it at the wire level.
  false
end