FastEmbed by Qdrant
FastEmbed from Qdrant is a lightweight, fast, Python library built for embedding generation.
- Quantized model weights
- ONNX Runtime, no PyTorch dependency
- CPU-first design
- Data-parallelism for encoding of large datasets.
Dependenciesโ
To use FastEmbed with LangChain, install the fastembed Python package.
%pip install --upgrade --quiet  fastembed
Importsโ
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
Instantiating FastEmbedโ
Parametersโ
- model_name: str(default: "BAAI/bge-small-en-v1.5")- Name of the FastEmbedding model to use. You can find the list of supported models here. 
- max_length: int(default: 512)- The maximum number of tokens. Unknown behavior for values > 512. 
- cache_dir: Optional[str]- The path to the cache directory. Defaults to - local_cachein the parent directory.
- threads: Optional[int]- The number of threads a single onnxruntime session can use. Defaults to None. 
- doc_embed_type: Literal["default", "passage"](default: "default")- "default": Uses FastEmbed's default embedding method. - "passage": Prefixes the text with "passage" before embedding. 
embeddings = FastEmbedEmbeddings()
Usageโ
Generating document embeddingsโ
document_embeddings = embeddings.embed_documents(
    ["This is a document", "This is some other document"]
)
Generating query embeddingsโ
query_embeddings = embeddings.embed_query("This is a query")