cp
This commit is contained in:
parent
38d5cb8d98
commit
f80505bd74
@ -1,66 +1,120 @@
|
|||||||
# Cloud Function for Image Processing
|
# Cloud Function for Image Embedding Processing
|
||||||
|
|
||||||
This directory contains the source code for the Cloud Function that processes image embeddings.
|
This Cloud Function processes images to generate embeddings using Google's Vertex AI multimodal embedding model and stores them in a Qdrant vector database.
|
||||||
|
|
||||||
## Deployment
|
## Overview
|
||||||
|
|
||||||
**Note: This Cloud Function is now deployed via Terraform, not the previous bash script.**
|
The function is triggered by Pub/Sub messages containing image processing tasks. It:
|
||||||
|
|
||||||
The Cloud Function is automatically deployed when you run:
|
1. Downloads images from Google Cloud Storage
|
||||||
|
2. Generates embeddings using Vertex AI's `multimodalembedding@001` model
|
||||||
|
3. Stores embeddings in Qdrant vector database
|
||||||
|
4. Updates image metadata in Firestore
|
||||||
|
|
||||||
```bash
|
## Key Features
|
||||||
cd deployment/terraform
|
|
||||||
terraform apply
|
|
||||||
```
|
|
||||||
|
|
||||||
## What the Cloud Function Does
|
- **Vertex AI Multimodal Embeddings**: Uses Google's state-of-the-art multimodal embedding model
|
||||||
|
- **1408-dimensional vectors**: High-quality embeddings for semantic image search
|
||||||
1. **Triggered by Pub/Sub**: Listens to the `image-processing-topic` for new image processing tasks
|
- **Automatic retry**: Built-in retry logic for failed processing
|
||||||
2. **Downloads Images**: Retrieves images from Google Cloud Storage
|
- **Status tracking**: Real-time status updates in Firestore
|
||||||
3. **Generates Embeddings**: Uses Google Cloud Vision API to create image embeddings
|
- **Scalable**: Auto-scaling Cloud Function with configurable limits
|
||||||
4. **Stores Vectors**: Saves embeddings to the Qdrant vector database
|
|
||||||
5. **Updates Status**: Updates Firestore with processing status and embedding metadata
|
|
||||||
|
|
||||||
## Environment Variables
|
|
||||||
|
|
||||||
The following environment variables are automatically configured by Terraform:
|
|
||||||
|
|
||||||
- `QDRANT_HOST`: IP address of the vector database VM
|
|
||||||
- `QDRANT_PORT`: Port for Qdrant (default: 6333)
|
|
||||||
- `QDRANT_API_KEY`: API key for Qdrant authentication
|
|
||||||
- `QDRANT_COLLECTION`: Collection name for storing vectors (default: image_vectors)
|
|
||||||
|
|
||||||
## Function Configuration
|
|
||||||
|
|
||||||
- **Runtime**: Python 3.11
|
|
||||||
- **Memory**: 512MB
|
|
||||||
- **Timeout**: 540 seconds (9 minutes)
|
|
||||||
- **Max Instances**: 10
|
|
||||||
- **Min Instances**: 0 (scales to zero when not in use)
|
|
||||||
- **Retry Policy**: Automatic retries on failure
|
|
||||||
|
|
||||||
## Dependencies
|
## Dependencies
|
||||||
|
|
||||||
See `requirements.txt` for the complete list of Python dependencies.
|
- `google-cloud-aiplatform`: Vertex AI SDK for multimodal embeddings
|
||||||
|
- `google-cloud-firestore`: Firestore database client
|
||||||
|
- `google-cloud-storage`: Cloud Storage client
|
||||||
|
- `qdrant-client`: Vector database client
|
||||||
|
- `numpy`: Numerical operations
|
||||||
|
- `Pillow`: Image processing
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
The function requires these environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Google Cloud Configuration
|
||||||
|
GOOGLE_CLOUD_PROJECT=your-project-id
|
||||||
|
VERTEX_AI_LOCATION=us-central1
|
||||||
|
|
||||||
|
# Firestore Configuration
|
||||||
|
FIRESTORE_PROJECT_ID=your-project-id
|
||||||
|
FIRESTORE_DATABASE_NAME=(default)
|
||||||
|
|
||||||
|
# Cloud Storage Configuration
|
||||||
|
GCS_BUCKET_NAME=your-bucket-name
|
||||||
|
|
||||||
|
# Qdrant Configuration
|
||||||
|
QDRANT_HOST=your-qdrant-host
|
||||||
|
QDRANT_PORT=6333
|
||||||
|
QDRANT_API_KEY=your-api-key
|
||||||
|
QDRANT_COLLECTION=image_vectors
|
||||||
|
QDRANT_HTTPS=false
|
||||||
|
|
||||||
|
# Logging
|
||||||
|
LOG_LEVEL=INFO
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Local Testing
|
||||||
|
|
||||||
|
1. Set up your environment:
|
||||||
|
```bash
|
||||||
|
export GOOGLE_CLOUD_PROJECT=your-project-id
|
||||||
|
export VERTEX_AI_LOCATION=us-central1
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Install dependencies:
|
||||||
|
```bash
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run the test script:
|
||||||
|
```bash
|
||||||
|
python test_vertex_ai_embeddings.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This will create a test image and verify that embeddings are generated correctly.
|
||||||
|
|
||||||
|
### Expected Output
|
||||||
|
|
||||||
|
The test should output something like:
|
||||||
|
```
|
||||||
|
INFO:__main__:Testing Vertex AI multimodal embeddings...
|
||||||
|
INFO:__main__:Using project: your-project-id
|
||||||
|
INFO:__main__:Creating test image...
|
||||||
|
INFO:__main__:Created test image with 1234 bytes
|
||||||
|
INFO:__main__:Generating embeddings using Vertex AI...
|
||||||
|
INFO:__main__:Generated embeddings with shape: (1408,)
|
||||||
|
INFO:__main__:Embeddings dtype: float32
|
||||||
|
INFO:__main__:Embeddings range: [-0.1234, 0.5678]
|
||||||
|
INFO:__main__:Embeddings norm: 1.0000
|
||||||
|
INFO:__main__:✅ All tests passed! Vertex AI embeddings are working correctly.
|
||||||
|
INFO:__main__:🎉 Test completed successfully!
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
The function is deployed using Terraform. See the main deployment documentation for details.
|
||||||
|
|
||||||
## Monitoring
|
## Monitoring
|
||||||
|
|
||||||
The function automatically logs to Google Cloud Logging. You can monitor:
|
- Check Cloud Function logs in Google Cloud Console
|
||||||
|
- Monitor Firestore for image status updates
|
||||||
|
- Check Qdrant for stored embeddings
|
||||||
|
|
||||||
- Function executions
|
## Troubleshooting
|
||||||
- Error rates
|
|
||||||
- Processing times
|
|
||||||
- Pub/Sub message handling
|
|
||||||
|
|
||||||
## Manual Testing
|
### Common Issues
|
||||||
|
|
||||||
To manually trigger the function, publish a message to the Pub/Sub topic:
|
1. **Authentication errors**: Ensure the service account has `roles/aiplatform.user` permission
|
||||||
|
2. **API not enabled**: Ensure `aiplatform.googleapis.com` is enabled
|
||||||
|
3. **Quota limits**: Check Vertex AI quotas in your project
|
||||||
|
4. **Network issues**: Ensure the function can reach Qdrant and other services
|
||||||
|
|
||||||
```json
|
### Error Messages
|
||||||
{
|
|
||||||
"image_id": "test-image-123",
|
- `"Failed to generate embeddings - no image embedding returned"`: Check image format and size
|
||||||
"storage_path": "your-bucket/path/to/image.jpg",
|
- `"PROJECT_ID not found in environment variables"`: Set `GOOGLE_CLOUD_PROJECT`
|
||||||
"team_id": "team-456",
|
- `"Error generating embeddings"`: Check Vertex AI API access and quotas
|
||||||
"retry_count": 0
|
|
||||||
}
|
|
||||||
```
|
|
||||||
Loading…
x
Reference in New Issue
Block a user