Class: Parse::Embeddings::Qwen
- Defined in:
- lib/parse/embeddings/qwen.rb
Overview
Qwen 3 embeddings provider. Targets Alibaba Cloud DashScope's
OpenAI-compatible endpoint (/compatible-mode/v1/embeddings),
which mirrors the OpenAI request envelope but speaks the
qwen3-embedding-* model family.
Supported models — all three are Matryoshka-capable, so the
dimensions: constructor kwarg truncates the returned vector
to any width ≤ native:
qwen3-embedding-0.6b— 1024 dim native, ~32k input tokens.qwen3-embedding-4b— 2560 dim native.qwen3-embedding-8b— 4096 dim native.
The same three checkpoints are published open-weight on Hugging
Face under Apache 2.0 (Qwen/Qwen3-Embedding-0.6B, etc.) — for
self-hosted inference behind vLLM / Text Embeddings Inference /
llama.cpp, use LocalHTTP instead and point it at your gateway.
Asymmetric input types
Qwen3-Embedding is trained with an instruction-tuned head, but
the DashScope compatible-mode endpoint does not currently accept
an input_type / task request field. We therefore set
supports_input_type? to false and drop the SDK-canonical
input_type: kwarg at the wire — same posture as OpenAI and
LocalHTTP. Callers who want query/passage asymmetry must wrap
their text with an explicit instruction prefix client-side; the
AS::N event still carries the requested input_type so cache
keys remain stable.
Defined Under Namespace
Classes: AuthenticationError, BadRequestError, RateLimitError, TransientError
Constant Summary collapse
- DEFAULT_BASE_URL =
Default to the international compatible-mode host. Operators in mainland China should override to
https://dashscope.aliyuncs.com/compatible-mode/v1. "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"- DEFAULT_MODEL =
"qwen3-embedding-8b"- DEFAULT_TIMEOUT =
30- DEFAULT_OPEN_TIMEOUT =
5- DEFAULT_MAX_RETRIES =
3- DEFAULT_BATCH_SIZE =
DashScope's compatible endpoint caps embedding requests at 25 inputs per call (smaller than OpenAI's 2048). Default below the cap so callers don't have to tune.
10- MAX_RESPONSE_BYTES =
16 * 1024 * 1024
- MODEL_DEFAULT_DIMENSIONS =
{ "qwen3-embedding-0.6b" => 1024, "qwen3-embedding-4b" => 2560, "qwen3-embedding-8b" => 4096, }.freeze
- MODEL_MAX_INPUT_TOKENS =
{ "qwen3-embedding-0.6b" => 32_000, "qwen3-embedding-4b" => 32_000, "qwen3-embedding-8b" => 32_000, }.freeze
- MATRYOSHKA_MODELS =
Every Qwen3-Embedding row is Matryoshka-capable. Kept as an explicit allowlist so future non-Matryoshka additions (e.g. qwen-text-embedding-v3) don't silently inherit the behaviour.
%w[ qwen3-embedding-0.6b qwen3-embedding-4b qwen3-embedding-8b ].freeze
Constants inherited from Provider
Provider::AS_NOTIFICATION_NAME
Instance Method Summary collapse
- #dimensions ⇒ Object
- #embed_batch_size ⇒ Object
-
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Vectors aligned 1:1 with
strings. -
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen
constructor
A new instance of Qwen.
- #inspect_attrs ⇒ Object
- #max_input_tokens ⇒ Object
- #model_name ⇒ Object
- #normalize? ⇒ Boolean
- #supports_input_type? ⇒ Boolean
Methods inherited from Provider
#embed_image, #embed_text_batched, #inspect, #instrument_embed, #modalities, #validate_response!
Constructor Details
#initialize(api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil) ⇒ Qwen
Returns a new instance of Qwen.
111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
# File 'lib/parse/embeddings/qwen.rb', line 111 def initialize( api_key:, model: DEFAULT_MODEL, dimensions: nil, base_url: DEFAULT_BASE_URL, timeout: DEFAULT_TIMEOUT, open_timeout: DEFAULT_OPEN_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, embed_batch_size: DEFAULT_BATCH_SIZE, allow_faraday_proxy: false, allow_insecure_base_url: false, connection: nil ) validate_api_key!(api_key) validate_model!(model) validate_dimensions!(model, dimensions) sanitized_base_url = validate_base_url!(base_url, allow_insecure_base_url) validate_positive_integer!(:timeout, timeout) validate_positive_integer!(:open_timeout, open_timeout) validate_non_negative_integer!(:max_retries, max_retries) validate_positive_integer!(:embed_batch_size, ) @api_key = api_key @model = model @dimensions = dimensions || MODEL_DEFAULT_DIMENSIONS.fetch(model) @base_url = sanitized_base_url @timeout = timeout @open_timeout = open_timeout @max_retries = max_retries @embed_batch_size = @allow_faraday_proxy = allow_faraday_proxy @connection = connection || build_connection end |
Instance Method Details
#dimensions ⇒ Object
145 146 147 |
# File 'lib/parse/embeddings/qwen.rb', line 145 def dimensions @dimensions end |
#embed_batch_size ⇒ Object
153 154 155 |
# File 'lib/parse/embeddings/qwen.rb', line 153 def @embed_batch_size end |
#embed_text(strings, input_type: :search_document) ⇒ Array<Array<Float>>
Returns vectors aligned 1:1 with strings.
177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 |
# File 'lib/parse/embeddings/qwen.rb', line 177 def (strings, input_type: :search_document) unless strings.is_a?(Array) raise ArgumentError, "Parse::Embeddings::Qwen#embed_text expects Array<String> (got #{strings.class})." end return [] if strings.empty? strings.each_with_index do |s, i| unless s.is_a?(String) raise ArgumentError, "Parse::Embeddings::Qwen#embed_text strings[#{i}] is not a String (#{s.class})." end if s.empty? raise ArgumentError, "Parse::Embeddings::Qwen#embed_text strings[#{i}] is empty; Qwen rejects empty inputs." end end body = { model: @model, input: strings, encoding_format: "float", } # Forward `dimensions` only when active width differs from # native. Sending native width is a no-op on DashScope but # we keep the wire minimal to avoid drift across future # endpoint revisions. if MATRYOSHKA_MODELS.include?(@model) && @dimensions != MODEL_DEFAULT_DIMENSIONS.fetch(@model) body[:dimensions] = @dimensions end (strings.length, input_type) do |emit_payload| payload = (body) if payload.is_a?(Hash) && payload["usage"].is_a?(Hash) tt = payload["usage"]["total_tokens"] emit_payload[:total_tokens] = tt if tt.is_a?(Integer) && tt >= 0 end vectors = extract_vectors!(payload, strings.length) validate_response!(strings.length, vectors) end end |
#inspect_attrs ⇒ Object
219 220 221 |
# File 'lib/parse/embeddings/qwen.rb', line 219 def inspect_attrs super.merge(base: safe_base_host, retries: @max_retries) end |
#max_input_tokens ⇒ Object
157 158 159 |
# File 'lib/parse/embeddings/qwen.rb', line 157 def max_input_tokens MODEL_MAX_INPUT_TOKENS[@model] end |
#model_name ⇒ Object
149 150 151 |
# File 'lib/parse/embeddings/qwen.rb', line 149 def model_name @model end |
#normalize? ⇒ Boolean
161 162 163 164 |
# File 'lib/parse/embeddings/qwen.rb', line 161 def normalize? # Qwen3-Embedding is documented unit-normalized at the head. true end |
#supports_input_type? ⇒ Boolean
166 167 168 169 170 171 |
# File 'lib/parse/embeddings/qwen.rb', line 166 def supports_input_type? # DashScope compatible-mode does not accept a wire-level # input_type / task field. The kwarg threads through for # cache-key stability but is dropped at the request. false end |