This commit is contained in:
johnpccd 2025-05-25 00:34:20 +02:00
parent 38d5cb8d98
commit f80505bd74

View File

@ -1,66 +1,120 @@
# Cloud Function for Image Processing # Cloud Function for Image Embedding Processing
This directory contains the source code for the Cloud Function that processes image embeddings. This Cloud Function processes images to generate embeddings using Google's Vertex AI multimodal embedding model and stores them in a Qdrant vector database.
## Deployment ## Overview
**Note: This Cloud Function is now deployed via Terraform, not the previous bash script.** The function is triggered by Pub/Sub messages containing image processing tasks. It:
The Cloud Function is automatically deployed when you run: 1. Downloads images from Google Cloud Storage
2. Generates embeddings using Vertex AI's `multimodalembedding@001` model
3. Stores embeddings in Qdrant vector database
4. Updates image metadata in Firestore
```bash ## Key Features
cd deployment/terraform
terraform apply
```
## What the Cloud Function Does - **Vertex AI Multimodal Embeddings**: Uses Google's state-of-the-art multimodal embedding model
- **1408-dimensional vectors**: High-quality embeddings for semantic image search
1. **Triggered by Pub/Sub**: Listens to the `image-processing-topic` for new image processing tasks - **Automatic retry**: Built-in retry logic for failed processing
2. **Downloads Images**: Retrieves images from Google Cloud Storage - **Status tracking**: Real-time status updates in Firestore
3. **Generates Embeddings**: Uses Google Cloud Vision API to create image embeddings - **Scalable**: Auto-scaling Cloud Function with configurable limits
4. **Stores Vectors**: Saves embeddings to the Qdrant vector database
5. **Updates Status**: Updates Firestore with processing status and embedding metadata
## Environment Variables
The following environment variables are automatically configured by Terraform:
- `QDRANT_HOST`: IP address of the vector database VM
- `QDRANT_PORT`: Port for Qdrant (default: 6333)
- `QDRANT_API_KEY`: API key for Qdrant authentication
- `QDRANT_COLLECTION`: Collection name for storing vectors (default: image_vectors)
## Function Configuration
- **Runtime**: Python 3.11
- **Memory**: 512MB
- **Timeout**: 540 seconds (9 minutes)
- **Max Instances**: 10
- **Min Instances**: 0 (scales to zero when not in use)
- **Retry Policy**: Automatic retries on failure
## Dependencies ## Dependencies
See `requirements.txt` for the complete list of Python dependencies. - `google-cloud-aiplatform`: Vertex AI SDK for multimodal embeddings
- `google-cloud-firestore`: Firestore database client
- `google-cloud-storage`: Cloud Storage client
- `qdrant-client`: Vector database client
- `numpy`: Numerical operations
- `Pillow`: Image processing
## Environment Variables
The function requires these environment variables:
```bash
# Google Cloud Configuration
GOOGLE_CLOUD_PROJECT=your-project-id
VERTEX_AI_LOCATION=us-central1
# Firestore Configuration
FIRESTORE_PROJECT_ID=your-project-id
FIRESTORE_DATABASE_NAME=(default)
# Cloud Storage Configuration
GCS_BUCKET_NAME=your-bucket-name
# Qdrant Configuration
QDRANT_HOST=your-qdrant-host
QDRANT_PORT=6333
QDRANT_API_KEY=your-api-key
QDRANT_COLLECTION=image_vectors
QDRANT_HTTPS=false
# Logging
LOG_LEVEL=INFO
```
## Testing
### Local Testing
1. Set up your environment:
```bash
export GOOGLE_CLOUD_PROJECT=your-project-id
export VERTEX_AI_LOCATION=us-central1
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the test script:
```bash
python test_vertex_ai_embeddings.py
```
This will create a test image and verify that embeddings are generated correctly.
### Expected Output
The test should output something like:
```
INFO:__main__:Testing Vertex AI multimodal embeddings...
INFO:__main__:Using project: your-project-id
INFO:__main__:Creating test image...
INFO:__main__:Created test image with 1234 bytes
INFO:__main__:Generating embeddings using Vertex AI...
INFO:__main__:Generated embeddings with shape: (1408,)
INFO:__main__:Embeddings dtype: float32
INFO:__main__:Embeddings range: [-0.1234, 0.5678]
INFO:__main__:Embeddings norm: 1.0000
INFO:__main__:✅ All tests passed! Vertex AI embeddings are working correctly.
INFO:__main__:🎉 Test completed successfully!
```
## Deployment
The function is deployed using Terraform. See the main deployment documentation for details.
## Monitoring ## Monitoring
The function automatically logs to Google Cloud Logging. You can monitor: - Check Cloud Function logs in Google Cloud Console
- Monitor Firestore for image status updates
- Check Qdrant for stored embeddings
- Function executions ## Troubleshooting
- Error rates
- Processing times
- Pub/Sub message handling
## Manual Testing ### Common Issues
To manually trigger the function, publish a message to the Pub/Sub topic: 1. **Authentication errors**: Ensure the service account has `roles/aiplatform.user` permission
2. **API not enabled**: Ensure `aiplatform.googleapis.com` is enabled
3. **Quota limits**: Check Vertex AI quotas in your project
4. **Network issues**: Ensure the function can reach Qdrant and other services
```json ### Error Messages
{
"image_id": "test-image-123", - `"Failed to generate embeddings - no image embedding returned"`: Check image format and size
"storage_path": "your-bucket/path/to/image.jpg", - `"PROJECT_ID not found in environment variables"`: Set `GOOGLE_CLOUD_PROJECT`
"team_id": "team-456", - `"Error generating embeddings"`: Check Vertex AI API access and quotas
"retry_count": 0
}
```