GeniSpace Self-Hosted Elastic Scaling
This document provides a detailed overview of the elastic scaling capabilities of the GeniSpace self-hosted deployment version, helping enterprise customers understand the platform's architecture design and scaling strategies.
Overview
GeniSpace adopts a modern microservices architecture design, deployed on Kubernetes clusters, providing exceptional high availability and scalability. Each core service supports independent vertical and horizontal scaling, allowing flexible resource configuration adjustments based on business needs.
Version Information
GeniSpace offers three versions:
| Version | Description |
|---|---|
| Self-Hosted | Enterprise private deployment; this document primarily covers this version |
| International | Cloud-based SaaS service (international region) |
| China | Cloud-based SaaS service (China region) |
Elastic Scaling Capabilities Summary (Corresponding to Training Materials)
The GeniSpace self-hosted version provides the following capabilities through microservices architecture and Kubernetes-native features:
- Horizontal Scaling: All services support horizontal scaling by increasing replica counts
- Vertical Scaling: Scale vertically by adjusting resource configurations (CPU, memory, etc.)
- Auto Scaling: Supports configuring HPA (Horizontal Pod Autoscaler) for automatic elastic scaling
- High Availability: Ensures service high availability through multi-replica deployment and health checks
- Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture
Task Execution Layer: Worker, Scheduler
Core Service Layer: API, Agent, Dataset, MCP Server
For detailed information about the self-hosted version or purchasing inquiries, please contact our sales team: https://www.genispace.cn/contact
Platform Architecture
Microservice Components
The GeniSpace platform consists of multiple independent microservices, each of which can be scaled independently based on load requirements:
┌─────────────────────────────────────────────────────────────────┐
│ Frontend Service Layer │
│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ │
│ │ Web │ │ Console │ │ Workbench │ │ Admin │ ... │
│ └─────────┘ └─────────┘ └───────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Core Service Layer │
│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌────────────┐ │
│ │ API │ │ Agent │ │ Dataset │ │ MCP Server │ │
│ └─────────┘ └─────────┘ └───────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Task Execution Layer │
│ ┌───────────┐ ┌────────┐ ┌─────────────────────────────┐ │
│ │ Scheduler │ │ Worker │ │ Operators (NJS/Enterprise) │ │
│ └───────────┘ └────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌──────────┐ ┌──────────┐ │
│ │ Redis │ │ RabbitMQ │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘
Core Service Descriptions
| Service | Description | Default Replicas | Scaling Recommendations |
|---|---|---|---|
| API | Core backend API service | 1 | Scale based on API request volume |
| Agent | AI agent service, running as a persistent process | 1-3 | Scale based on concurrent agent count |
| Dataset | Dataset management service | 1 | Scale based on data processing needs |
| MCP Server | MCP tool invocation service | 1 | Scale based on MCP call frequency |
| Worker | Task execution worker nodes | 10 | Dynamically scale based on task queue depth |
| Scheduler | Task scheduling service | 2 | HA deployment, recommended 2+ replicas |
Task Execution Layer Elastic Scaling
Worker - Powerful Task Execution Engine
The Worker is the core task execution component of the GeniSpace platform, providing robust task processing capabilities.
Operating Mode
The Worker runs in HTTP mode as a persistent process, providing the following capabilities:
- Debug Support: Provides a real-time debugging environment for developers
- MCP Tool Calls: Supports agent MCP (Model Context Protocol) tool invocations
- Task Execution: Processes various task nodes in workflows
Resource Configuration Example
# Worker resource configuration
resources:
requests:
cpu: 100m # Base CPU request
memory: 128Mi # Base memory request
limits:
cpu: 1000m # CPU limit, supports burst performance
memory: 1Gi # Memory limit
Horizontal Scaling Strategy
The Worker supports flexible horizontal scaling, allowing automatic or manual replica scaling based on task queue depth:
# Replica count configuration
spec:
replicas: 10 # Adjust based on requirements
Scaling Recommendations:
- Small deployment (< 100 concurrent tasks): 5-10 replicas
- Medium deployment (100-500 concurrent tasks): 10-20 replicas
- Large deployment (> 500 concurrent tasks): 20+ replicas, recommended with HPA
Scheduler - Unlimited Task Scheduling
The Scheduler is the core component for task scheduling, responsible for managing and distributing tasks to Worker nodes.
Core Features
- Distributed Locks: Uses Redis for distributed locking to prevent duplicate task execution
- High Availability Design: Supports multi-replica deployment with automatic failover
- Health Checks: Built-in health check endpoints to ensure service availability
# Scheduler health check configuration
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
Unlimited Scaling Capability
Through the Scheduler + Worker architecture design, GeniSpace achieves unlimited task execution scaling:
- Scheduler handles task scheduling and distribution
- Worker handles task execution
- Increasing Worker replica count linearly improves task processing capacity
Core Service Elastic Scaling
API Service
The API service is the platform's core backend service, handling all API requests.
Vertical Scaling Configuration
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m # Production environment allows higher burst performance
memory: 1Gi
Horizontal Scaling
Scale API service replica count based on API request volume:
spec:
replicas: 3 # Recommended 3+ replicas for high-traffic scenarios
Agent Service
The Agent service provides the runtime environment for AI agents and is the core carrier of AI capabilities.
Resource Configuration
Due to the compute-intensive nature of AI services, the Agent service requires higher resource allocation:
resources:
requests:
cpu: 200m # AI services need more CPU
memory: 256Mi
limits:
cpu: 2000m # AI tasks may require significant CPU
memory: 2Gi
Scaling Strategy
- Persistent Process: The Agent runs as a persistent process, providing real-time AI service capabilities
- MCP Tool Calls: Works with the MCP Server to support agents calling various tools
- Horizontal Scaling: Increase replicas based on concurrent agent count
Dataset Service
The Dataset service handles dataset management and processing.
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
MCP Server
The MCP Server provides the server-side implementation of the Model Context Protocol, supporting agents in calling external tools.
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
Kubernetes Deployment Strategies
Rolling Update Strategy
All services are configured with rolling update strategies to ensure zero-downtime during updates:
strategy:
rollingUpdate:
maxSurge: 1 # Create at most 1 extra Pod during updates
maxUnavailable: 0 # No Pod unavailability allowed during updates
type: RollingUpdate
Resource Configuration Best Practices
GKE Autopilot Bursting Feature
If you use GKE Autopilot, you can leverage the Bursting feature to optimize costs:
- Strategy: Set lower
requests(save costs) with reasonablelimits(ensure performance) - Minimum Values: requests CPU 50m, Memory 52Mi
Resource Configuration Tiers
| Service Type | CPU Requests | CPU Limits | Memory Requests | Memory Limits |
|---|---|---|---|---|
| Frontend Services | 50m | 500m | 64Mi | 512Mi |
| Core Services | 100m | 1000m | 128Mi | 1Gi |
| AI Services | 200m | 2000m | 256Mi | 2Gi |
Configuration Management
Use Kustomize for environment-specific configuration management:
cicd/apps/genispace/
├── base/ # Base configuration
│ ├── deployment/ # Deployment configuration
│ ├── service/ # Service configuration
│ └── config/ # ConfigMap configuration
└── overlays/ # Environment overlays
└── global/
├── prod/ # Production environment
│ ├── patches/ # Environment-specific patches
│ └── config/ # Environment configuration
└── test/ # Test environment
High Availability Configuration
Multi-Replica Deployment
Critical services should be configured with multiple replicas for high availability:
| Service | Minimum Replicas | Recommended Replicas |
|---|---|---|
| API | 1 | 2-3 |
| Agent | 1 | 2-3 |
| Scheduler | 2 | 2-3 |
| Worker | 5 | 10+ |
Health Checks
All services support Kubernetes-native health check mechanisms:
- Liveness Probe: Detects whether the service is alive
- Readiness Probe: Detects whether the service is ready
Scaling Operations Guide
Manual Scaling
Use kubectl commands to manually scale service replica counts:
# Scale Worker to 20 replicas
kubectl scale deployment worker --replicas=20 -n genispace
# Scale API to 3 replicas
kubectl scale deployment api --replicas=3 -n genispace
Scaling via Kustomize Configuration
Modify overlays/<environment>/patches/deployment-replicas.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 20 # Change to desired replica count
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent
spec:
replicas: 5 # Change to desired replica count
Configuring HPA (Horizontal Pod Autoscaler)
Configure HPA for automatic scaling based on resource utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitoring & Operations
Key Monitoring Metrics
| Metric Type | Monitoring Target | Recommended Alert Threshold |
|---|---|---|
| CPU Usage | All services | > 80% |
| Memory Usage | All services | > 85% |
| Task Queue Depth | Scheduler/Worker | > 1000 |
| Request Latency | API/Agent | P99 > 2s |
| Error Rate | All services | > 1% |
Logging & Tracing
GeniSpace supports integration with mainstream observability tools:
- Log Collection: Supports Fluentd, Fluent Bit
- Metrics Monitoring: Supports Prometheus, Grafana
- Distributed Tracing: Supports Phoenix (built-in)
Summary
The GeniSpace self-hosted version provides powerful elastic scaling capabilities through microservices architecture and Kubernetes-native features:
- Horizontal Scaling: All services support horizontal scaling by increasing replica counts
- Vertical Scaling: Scale vertically by adjusting resource configurations
- Auto Scaling: Supports configuring HPA for automatic elastic scaling
- High Availability: Ensures service high availability through multi-replica deployment and health checks
- Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture
For more information about the self-hosted version or for professional deployment support, please contact us:
- Sales Inquiry: https://www.genispace.cn/contact
Related Documentation
- Enterprise Deployment — Enterprise deployment solutions
- High Availability Deployment — High availability architecture and disaster recovery