cp

2025-05-25 00:34:20 +02:00 · 2025-05-25 00:34:20 +02:00 · f80505bd74
commit f80505bd74
parent 38d5cb8d98
1 changed files with 104 additions and 50 deletions
--- a/deployment/cloud-function/README.md
+++ b/deployment/cloud-function/README.md
@ -1,66 +1,120 @@
-# Cloud Function for Image Processing
+# Cloud Function for Image Embedding Processing
-This directory contains the source code for the Cloud Function that processes image embeddings.
+This Cloud Function processes images to generate embeddings using Google's Vertex AI multimodal embedding model and stores them in a Qdrant vector database.
-## Deployment
+## Overview
-**Note: This Cloud Function is now deployed via Terraform, not the previous bash script.**
+The function is triggered by Pub/Sub messages containing image processing tasks. It:
-The Cloud Function is automatically deployed when you run:
+1. Downloads images from Google Cloud Storage
 2. Generates embeddings using Vertex AI's `multimodalembedding@001` model
 3. Stores embeddings in Qdrant vector database
 4. Updates image metadata in Firestore
-```bash
+## Key Features
 cd deployment/terraform
 terraform apply
 ```
-## What the Cloud Function Does
+- **Vertex AI Multimodal Embeddings**: Uses Google's state-of-the-art multimodal embedding model
-
+- **1408-dimensional vectors**: High-quality embeddings for semantic image search
-1. **Triggered by Pub/Sub**: Listens to the `image-processing-topic` for new image processing tasks
+- **Automatic retry**: Built-in retry logic for failed processing
-2. **Downloads Images**: Retrieves images from Google Cloud Storage
+- **Status tracking**: Real-time status updates in Firestore
-3. **Generates Embeddings**: Uses Google Cloud Vision API to create image embeddings
+- **Scalable**: Auto-scaling Cloud Function with configurable limits
 4. **Stores Vectors**: Saves embeddings to the Qdrant vector database
 5. **Updates Status**: Updates Firestore with processing status and embedding metadata
 ## Environment Variables
 The following environment variables are automatically configured by Terraform:
 - `QDRANT_HOST`: IP address of the vector database VM
 - `QDRANT_PORT`: Port for Qdrant (default: 6333)
 - `QDRANT_API_KEY`: API key for Qdrant authentication
 - `QDRANT_COLLECTION`: Collection name for storing vectors (default: image_vectors)
 ## Function Configuration
 - **Runtime**: Python 3.11
 - **Memory**: 512MB
 - **Timeout**: 540 seconds (9 minutes)
 - **Max Instances**: 10
 - **Min Instances**: 0 (scales to zero when not in use)
 - **Retry Policy**: Automatic retries on failure
 ## Dependencies
-See `requirements.txt` for the complete list of Python dependencies.
+- `google-cloud-aiplatform`: Vertex AI SDK for multimodal embeddings
 - `google-cloud-firestore`: Firestore database client
 - `google-cloud-storage`: Cloud Storage client
 - `qdrant-client`: Vector database client
 - `numpy`: Numerical operations
 - `Pillow`: Image processing
 ## Environment Variables
 The function requires these environment variables:
 ```bash
 # Google Cloud Configuration
 GOOGLE_CLOUD_PROJECT=your-project-id
 VERTEX_AI_LOCATION=us-central1
 # Firestore Configuration
 FIRESTORE_PROJECT_ID=your-project-id
 FIRESTORE_DATABASE_NAME=(default)
 # Cloud Storage Configuration
 GCS_BUCKET_NAME=your-bucket-name
 # Qdrant Configuration
 QDRANT_HOST=your-qdrant-host
 QDRANT_PORT=6333
 QDRANT_API_KEY=your-api-key
 QDRANT_COLLECTION=image_vectors
 QDRANT_HTTPS=false
 # Logging
 LOG_LEVEL=INFO
 ```
 ## Testing
 ### Local Testing
 1. Set up your environment:
 ```bash
 export GOOGLE_CLOUD_PROJECT=your-project-id
 export VERTEX_AI_LOCATION=us-central1
 ```
 2. Install dependencies:
 ```bash
 pip install -r requirements.txt
 ```
 3. Run the test script:
 ```bash
 python test_vertex_ai_embeddings.py
 ```
 This will create a test image and verify that embeddings are generated correctly.
 ### Expected Output
 The test should output something like:
 ```
 INFO:__main__:Testing Vertex AI multimodal embeddings...
 INFO:__main__:Using project: your-project-id
 INFO:__main__:Creating test image...
 INFO:__main__:Created test image with 1234 bytes
 INFO:__main__:Generating embeddings using Vertex AI...
 INFO:__main__:Generated embeddings with shape: (1408,)
 INFO:__main__:Embeddings dtype: float32
 INFO:__main__:Embeddings range: [-0.1234, 0.5678]
 INFO:__main__:Embeddings norm: 1.0000
 INFO:__main__:✅ All tests passed! Vertex AI embeddings are working correctly.
 INFO:__main__:🎉 Test completed successfully!
 ```
 ## Deployment
 The function is deployed using Terraform. See the main deployment documentation for details.
 ## Monitoring
-The function automatically logs to Google Cloud Logging. You can monitor:
+- Check Cloud Function logs in Google Cloud Console
 - Monitor Firestore for image status updates
 - Check Qdrant for stored embeddings
- Function executions
+## Troubleshooting
 - Error rates
 - Processing times
 - Pub/Sub message handling
-## Manual Testing
+### Common Issues
-To manually trigger the function, publish a message to the Pub/Sub topic:
+1. **Authentication errors**: Ensure the service account has `roles/aiplatform.user` permission
 2. **API not enabled**: Ensure `aiplatform.googleapis.com` is enabled
 3. **Quota limits**: Check Vertex AI quotas in your project
 4. **Network issues**: Ensure the function can reach Qdrant and other services
-```json
+### Error Messages
-{
+
-  "image_id": "test-image-123",
+- `"Failed to generate embeddings - no image embedding returned"`: Check image format and size
-  "storage_path": "your-bucket/path/to/image.jpg",
+- `"PROJECT_ID not found in environment variables"`: Set `GOOGLE_CLOUD_PROJECT`
-  "team_id": "team-456",
+- `"Error generating embeddings"`: Check Vertex AI API access and quotas 
  "retry_count": 0
 }
 ```