Skip to main content

GeniSpace Self-Hosted Elastic Scaling

This document provides a detailed overview of the elastic scaling capabilities of the GeniSpace self-hosted deployment version, helping enterprise customers understand the platform's architecture design and scaling strategies.

Overview

GeniSpace adopts a modern microservices architecture design, deployed on Kubernetes clusters, providing exceptional high availability and scalability. Each core service supports independent vertical and horizontal scaling, allowing flexible resource configuration adjustments based on business needs.

Version Information

GeniSpace offers three versions:

VersionDescription
Self-HostedEnterprise private deployment; this document primarily covers this version
InternationalCloud-based SaaS service (international region)
ChinaCloud-based SaaS service (China region)

Elastic Scaling Capabilities Summary (Corresponding to Training Materials)

The GeniSpace self-hosted version provides the following capabilities through microservices architecture and Kubernetes-native features:

  1. Horizontal Scaling: All services support horizontal scaling by increasing replica counts
  2. Vertical Scaling: Scale vertically by adjusting resource configurations (CPU, memory, etc.)
  3. Auto Scaling: Supports configuring HPA (Horizontal Pod Autoscaler) for automatic elastic scaling
  4. High Availability: Ensures service high availability through multi-replica deployment and health checks
  5. Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture

Task Execution Layer: Worker, Scheduler
Core Service Layer: API, Agent, Dataset, MCP Server

Obtaining the Self-Hosted Version

For detailed information about the self-hosted version or purchasing inquiries, please contact our sales team: https://www.genispace.cn/contact

Platform Architecture

Microservice Components

The GeniSpace platform consists of multiple independent microservices, each of which can be scaled independently based on load requirements:

┌─────────────────────────────────────────────────────────────────┐
│ Frontend Service Layer │
│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌─────────┐ │
│ │ Web │ │ Console │ │ Workbench │ │ Admin │ ... │
│ └─────────┘ └─────────┘ └───────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Core Service Layer │
│ ┌─────────┐ ┌─────────┐ ┌───────────┐ ┌────────────┐ │
│ │ API │ │ Agent │ │ Dataset │ │ MCP Server │ │
│ └─────────┘ └─────────┘ └───────────┘ └────────────┘ │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Task Execution Layer │
│ ┌───────────┐ ┌────────┐ ┌─────────────────────────────┐ │
│ │ Scheduler │ │ Worker │ │ Operators (NJS/Enterprise) │ │
│ └───────────┘ └────────┘ └─────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────┐
│ Infrastructure Layer │
│ ┌──────────┐ ┌──────────┐ │
│ │ Redis │ │ RabbitMQ │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────────┘

Core Service Descriptions

ServiceDescriptionDefault ReplicasScaling Recommendations
APICore backend API service1Scale based on API request volume
AgentAI agent service, running as a persistent process1-3Scale based on concurrent agent count
DatasetDataset management service1Scale based on data processing needs
MCP ServerMCP tool invocation service1Scale based on MCP call frequency
WorkerTask execution worker nodes10Dynamically scale based on task queue depth
SchedulerTask scheduling service2HA deployment, recommended 2+ replicas

Task Execution Layer Elastic Scaling

Worker - Powerful Task Execution Engine

The Worker is the core task execution component of the GeniSpace platform, providing robust task processing capabilities.

Operating Mode

The Worker runs in HTTP mode as a persistent process, providing the following capabilities:

  • Debug Support: Provides a real-time debugging environment for developers
  • MCP Tool Calls: Supports agent MCP (Model Context Protocol) tool invocations
  • Task Execution: Processes various task nodes in workflows

Resource Configuration Example

# Worker resource configuration
resources:
requests:
cpu: 100m # Base CPU request
memory: 128Mi # Base memory request
limits:
cpu: 1000m # CPU limit, supports burst performance
memory: 1Gi # Memory limit

Horizontal Scaling Strategy

The Worker supports flexible horizontal scaling, allowing automatic or manual replica scaling based on task queue depth:

# Replica count configuration
spec:
replicas: 10 # Adjust based on requirements

Scaling Recommendations:

  • Small deployment (< 100 concurrent tasks): 5-10 replicas
  • Medium deployment (100-500 concurrent tasks): 10-20 replicas
  • Large deployment (> 500 concurrent tasks): 20+ replicas, recommended with HPA

Scheduler - Unlimited Task Scheduling

The Scheduler is the core component for task scheduling, responsible for managing and distributing tasks to Worker nodes.

Core Features

  • Distributed Locks: Uses Redis for distributed locking to prevent duplicate task execution
  • High Availability Design: Supports multi-replica deployment with automatic failover
  • Health Checks: Built-in health check endpoints to ensure service availability
# Scheduler health check configuration
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10

readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5

Unlimited Scaling Capability

Through the Scheduler + Worker architecture design, GeniSpace achieves unlimited task execution scaling:

  1. Scheduler handles task scheduling and distribution
  2. Worker handles task execution
  3. Increasing Worker replica count linearly improves task processing capacity

Core Service Elastic Scaling

API Service

The API service is the platform's core backend service, handling all API requests.

Vertical Scaling Configuration

resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 1000m # Production environment allows higher burst performance
memory: 1Gi

Horizontal Scaling

Scale API service replica count based on API request volume:

spec:
replicas: 3 # Recommended 3+ replicas for high-traffic scenarios

Agent Service

The Agent service provides the runtime environment for AI agents and is the core carrier of AI capabilities.

Resource Configuration

Due to the compute-intensive nature of AI services, the Agent service requires higher resource allocation:

resources:
requests:
cpu: 200m # AI services need more CPU
memory: 256Mi
limits:
cpu: 2000m # AI tasks may require significant CPU
memory: 2Gi

Scaling Strategy

  • Persistent Process: The Agent runs as a persistent process, providing real-time AI service capabilities
  • MCP Tool Calls: Works with the MCP Server to support agents calling various tools
  • Horizontal Scaling: Increase replicas based on concurrent agent count

Dataset Service

The Dataset service handles dataset management and processing.

resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

MCP Server

The MCP Server provides the server-side implementation of the Model Context Protocol, supporting agents in calling external tools.

resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi

Kubernetes Deployment Strategies

Rolling Update Strategy

All services are configured with rolling update strategies to ensure zero-downtime during updates:

strategy:
rollingUpdate:
maxSurge: 1 # Create at most 1 extra Pod during updates
maxUnavailable: 0 # No Pod unavailability allowed during updates
type: RollingUpdate

Resource Configuration Best Practices

GKE Autopilot Bursting Feature

If you use GKE Autopilot, you can leverage the Bursting feature to optimize costs:

  • Strategy: Set lower requests (save costs) with reasonable limits (ensure performance)
  • Minimum Values: requests CPU 50m, Memory 52Mi

Resource Configuration Tiers

Service TypeCPU RequestsCPU LimitsMemory RequestsMemory Limits
Frontend Services50m500m64Mi512Mi
Core Services100m1000m128Mi1Gi
AI Services200m2000m256Mi2Gi

Configuration Management

Use Kustomize for environment-specific configuration management:

cicd/apps/genispace/
├── base/ # Base configuration
│ ├── deployment/ # Deployment configuration
│ ├── service/ # Service configuration
│ └── config/ # ConfigMap configuration
└── overlays/ # Environment overlays
└── global/
├── prod/ # Production environment
│ ├── patches/ # Environment-specific patches
│ └── config/ # Environment configuration
└── test/ # Test environment

High Availability Configuration

Multi-Replica Deployment

Critical services should be configured with multiple replicas for high availability:

ServiceMinimum ReplicasRecommended Replicas
API12-3
Agent12-3
Scheduler22-3
Worker510+

Health Checks

All services support Kubernetes-native health check mechanisms:

  • Liveness Probe: Detects whether the service is alive
  • Readiness Probe: Detects whether the service is ready

Scaling Operations Guide

Manual Scaling

Use kubectl commands to manually scale service replica counts:

# Scale Worker to 20 replicas
kubectl scale deployment worker --replicas=20 -n genispace

# Scale API to 3 replicas
kubectl scale deployment api --replicas=3 -n genispace

Scaling via Kustomize Configuration

Modify overlays/<environment>/patches/deployment-replicas.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 20 # Change to desired replica count
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent
spec:
replicas: 5 # Change to desired replica count

Configuring HPA (Horizontal Pod Autoscaler)

Configure HPA for automatic scaling based on resource utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

Monitoring & Operations

Key Monitoring Metrics

Metric TypeMonitoring TargetRecommended Alert Threshold
CPU UsageAll services> 80%
Memory UsageAll services> 85%
Task Queue DepthScheduler/Worker> 1000
Request LatencyAPI/AgentP99 > 2s
Error RateAll services> 1%

Logging & Tracing

GeniSpace supports integration with mainstream observability tools:

  • Log Collection: Supports Fluentd, Fluent Bit
  • Metrics Monitoring: Supports Prometheus, Grafana
  • Distributed Tracing: Supports Phoenix (built-in)

Summary

The GeniSpace self-hosted version provides powerful elastic scaling capabilities through microservices architecture and Kubernetes-native features:

  1. Horizontal Scaling: All services support horizontal scaling by increasing replica counts
  2. Vertical Scaling: Scale vertically by adjusting resource configurations
  3. Auto Scaling: Supports configuring HPA for automatic elastic scaling
  4. High Availability: Ensures service high availability through multi-replica deployment and health checks
  5. Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture
Get Support

For more information about the self-hosted version or for professional deployment support, please contact us: