2025-05-25 00:34:20 +02:00

3.6 KiB

Cloud Function for Image Embedding Processing

This Cloud Function processes images to generate embeddings using Google's Vertex AI multimodal embedding model and stores them in a Qdrant vector database.

Overview

The function is triggered by Pub/Sub messages containing image processing tasks. It:

  1. Downloads images from Google Cloud Storage
  2. Generates embeddings using Vertex AI's multimodalembedding@001 model
  3. Stores embeddings in Qdrant vector database
  4. Updates image metadata in Firestore

Key Features

  • Vertex AI Multimodal Embeddings: Uses Google's state-of-the-art multimodal embedding model
  • 1408-dimensional vectors: High-quality embeddings for semantic image search
  • Automatic retry: Built-in retry logic for failed processing
  • Status tracking: Real-time status updates in Firestore
  • Scalable: Auto-scaling Cloud Function with configurable limits

Dependencies

  • google-cloud-aiplatform: Vertex AI SDK for multimodal embeddings
  • google-cloud-firestore: Firestore database client
  • google-cloud-storage: Cloud Storage client
  • qdrant-client: Vector database client
  • numpy: Numerical operations
  • Pillow: Image processing

Environment Variables

The function requires these environment variables:

# Google Cloud Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
VERTEX_AI_LOCATION=us-central1

# Firestore Configuration
FIRESTORE_PROJECT_ID=your-project-id
FIRESTORE_DATABASE_NAME=(default)

# Cloud Storage Configuration
GCS_BUCKET_NAME=your-bucket-name

# Qdrant Configuration
QDRANT_HOST=your-qdrant-host
QDRANT_PORT=6333
QDRANT_API_KEY=your-api-key
QDRANT_COLLECTION=image_vectors
QDRANT_HTTPS=false

# Logging
LOG_LEVEL=INFO

Testing

Local Testing

  1. Set up your environment:
export GOOGLE_CLOUD_PROJECT=your-project-id
export VERTEX_AI_LOCATION=us-central1
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the test script:
python test_vertex_ai_embeddings.py

This will create a test image and verify that embeddings are generated correctly.

Expected Output

The test should output something like:

INFO:__main__:Testing Vertex AI multimodal embeddings...
INFO:__main__:Using project: your-project-id
INFO:__main__:Creating test image...
INFO:__main__:Created test image with 1234 bytes
INFO:__main__:Generating embeddings using Vertex AI...
INFO:__main__:Generated embeddings with shape: (1408,)
INFO:__main__:Embeddings dtype: float32
INFO:__main__:Embeddings range: [-0.1234, 0.5678]
INFO:__main__:Embeddings norm: 1.0000
INFO:__main__:✅ All tests passed! Vertex AI embeddings are working correctly.
INFO:__main__:🎉 Test completed successfully!

Deployment

The function is deployed using Terraform. See the main deployment documentation for details.

Monitoring

  • Check Cloud Function logs in Google Cloud Console
  • Monitor Firestore for image status updates
  • Check Qdrant for stored embeddings

Troubleshooting

Common Issues

  1. Authentication errors: Ensure the service account has roles/aiplatform.user permission
  2. API not enabled: Ensure aiplatform.googleapis.com is enabled
  3. Quota limits: Check Vertex AI quotas in your project
  4. Network issues: Ensure the function can reach Qdrant and other services

Error Messages

  • "Failed to generate embeddings - no image embedding returned": Check image format and size
  • "PROJECT_ID not found in environment variables": Set GOOGLE_CLOUD_PROJECT
  • "Error generating embeddings": Check Vertex AI API access and quotas