GeniSpace Self-Hosted Elastic Scaling

This document provides a detailed overview of the elastic scaling capabilities of the GeniSpace self-hosted deployment version, helping enterprise customers understand the platform's architecture design and scaling strategies.

Overview

GeniSpace adopts a modern microservices architecture design, deployed on Kubernetes clusters, providing exceptional high availability and scalability. Each core service supports independent vertical and horizontal scaling, allowing flexible resource configuration adjustments based on business needs.

Version Information

GeniSpace offers three versions:

Version	Description
Self-Hosted	Enterprise private deployment; this document primarily covers this version
International	Cloud-based SaaS service (international region)
China	Cloud-based SaaS service (China region)

Elastic Scaling Capabilities Summary (Corresponding to Training Materials)

The GeniSpace self-hosted version provides the following capabilities through microservices architecture and Kubernetes-native features:

Horizontal Scaling: All services support horizontal scaling by increasing replica counts
Vertical Scaling: Scale vertically by adjusting resource configurations (CPU, memory, etc.)
Auto Scaling: Supports configuring HPA (Horizontal Pod Autoscaler) for automatic elastic scaling
High Availability: Ensures service high availability through multi-replica deployment and health checks
Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture

Task Execution Layer: Worker, Scheduler
Core Service Layer: API, Agent, Dataset, MCP Server

Obtaining the Self-Hosted Version

For detailed information about the self-hosted version or purchasing inquiries, please contact our sales team: https://www.genispace.cn/contact

Platform Architecture

Microservice Components

The GeniSpace platform consists of multiple independent microservices, each of which can be scaled independently based on load requirements:

┌─────────────────────────────────────────────────────────────────┐
│                         Frontend Service Layer                    │
│   ┌─────────┐  ┌─────────┐  ┌───────────┐  ┌─────────┐         │
│   │   Web   │  │ Console │  │ Workbench │  │  Admin  │   ...   │
│   └─────────┘  └─────────┘  └───────────┘  └─────────┘         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         Core Service Layer                        │
│   ┌─────────┐  ┌─────────┐  ┌───────────┐  ┌────────────┐      │
│   │   API   │  │  Agent  │  │  Dataset  │  │ MCP Server │      │
│   └─────────┘  └─────────┘  └───────────┘  └────────────┘      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         Task Execution Layer                      │
│   ┌───────────┐  ┌────────┐  ┌─────────────────────────────┐   │
│   │ Scheduler │  │ Worker │  │ Operators (NJS/Enterprise) │   │
│   └───────────┘  └────────┘  └─────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                         Infrastructure Layer                      │
│        ┌──────────┐                    ┌──────────┐             │
│        │  Redis   │                    │ RabbitMQ │             │
│        └──────────┘                    └──────────┘             │
└─────────────────────────────────────────────────────────────────┘

Core Service Descriptions

Service	Description	Default Replicas	Scaling Recommendations
API	Core backend API service	1	Scale based on API request volume
Agent	AI agent service, running as a persistent process	1-3	Scale based on concurrent agent count
Dataset	Dataset management service	1	Scale based on data processing needs
MCP Server	MCP tool invocation service	1	Scale based on MCP call frequency
Worker	Task execution worker nodes	10	Dynamically scale based on task queue depth
Scheduler	Task scheduling service	2	HA deployment, recommended 2+ replicas

Task Execution Layer Elastic Scaling

Worker - Powerful Task Execution Engine

The Worker is the core task execution component of the GeniSpace platform, providing robust task processing capabilities.

Operating Mode

The Worker runs in HTTP mode as a persistent process, providing the following capabilities:

Debug Support: Provides a real-time debugging environment for developers
MCP Tool Calls: Supports agent MCP (Model Context Protocol) tool invocations
Task Execution: Processes various task nodes in workflows

Resource Configuration Example

# Worker resource configuration
resources:
  requests:
    cpu: 100m      # Base CPU request
    memory: 128Mi  # Base memory request
  limits:
    cpu: 1000m     # CPU limit, supports burst performance
    memory: 1Gi    # Memory limit

Horizontal Scaling Strategy

The Worker supports flexible horizontal scaling, allowing automatic or manual replica scaling based on task queue depth:

# Replica count configuration
spec:
  replicas: 10  # Adjust based on requirements

Scaling Recommendations:

Small deployment (< 100 concurrent tasks): 5-10 replicas
Medium deployment (100-500 concurrent tasks): 10-20 replicas
Large deployment (> 500 concurrent tasks): 20+ replicas, recommended with HPA

Scheduler - Unlimited Task Scheduling

The Scheduler is the core component for task scheduling, responsible for managing and distributing tasks to Worker nodes.

Core Features

Distributed Locks: Uses Redis for distributed locking to prevent duplicate task execution
High Availability Design: Supports multi-replica deployment with automatic failover
Health Checks: Built-in health check endpoints to ensure service availability

# Scheduler health check configuration
livenessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  
readinessProbe:
  httpGet:
    path: /health
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

Unlimited Scaling Capability

Through the Scheduler + Worker architecture design, GeniSpace achieves unlimited task execution scaling:

Scheduler handles task scheduling and distribution
Worker handles task execution
Increasing Worker replica count linearly improves task processing capacity

Core Service Elastic Scaling

API Service

The API service is the platform's core backend service, handling all API requests.

Vertical Scaling Configuration

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 1000m     # Production environment allows higher burst performance
    memory: 1Gi

Horizontal Scaling

Scale API service replica count based on API request volume:

spec:
  replicas: 3  # Recommended 3+ replicas for high-traffic scenarios

Agent Service

The Agent service provides the runtime environment for AI agents and is the core carrier of AI capabilities.

Resource Configuration

Due to the compute-intensive nature of AI services, the Agent service requires higher resource allocation:

resources:
  requests:
    cpu: 200m      # AI services need more CPU
    memory: 256Mi
  limits:
    cpu: 2000m     # AI tasks may require significant CPU
    memory: 2Gi

Scaling Strategy

Persistent Process: The Agent runs as a persistent process, providing real-time AI service capabilities
MCP Tool Calls: Works with the MCP Server to support agents calling various tools
Horizontal Scaling: Increase replicas based on concurrent agent count

Dataset Service

The Dataset service handles dataset management and processing.

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

MCP Server

The MCP Server provides the server-side implementation of the Model Context Protocol, supporting agents in calling external tools.

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Kubernetes Deployment Strategies

Rolling Update Strategy

All services are configured with rolling update strategies to ensure zero-downtime during updates:

strategy:
  rollingUpdate:
    maxSurge: 1        # Create at most 1 extra Pod during updates
    maxUnavailable: 0  # No Pod unavailability allowed during updates
  type: RollingUpdate

Resource Configuration Best Practices

GKE Autopilot Bursting Feature

If you use GKE Autopilot, you can leverage the Bursting feature to optimize costs:

Strategy: Set lower requests (save costs) with reasonable limits (ensure performance)
Minimum Values: requests CPU 50m, Memory 52Mi

Resource Configuration Tiers

Service Type	CPU Requests	CPU Limits	Memory Requests	Memory Limits
Frontend Services	50m	500m	64Mi	512Mi
Core Services	100m	1000m	128Mi	1Gi
AI Services	200m	2000m	256Mi	2Gi

Configuration Management

Use Kustomize for environment-specific configuration management:

cicd/apps/genispace/
├── base/                    # Base configuration
│   ├── deployment/          # Deployment configuration
│   ├── service/             # Service configuration
│   └── config/              # ConfigMap configuration
└── overlays/                # Environment overlays
    └── global/
        ├── prod/            # Production environment
        │   ├── patches/     # Environment-specific patches
        │   └── config/      # Environment configuration
        └── test/            # Test environment

High Availability Configuration

Multi-Replica Deployment

Critical services should be configured with multiple replicas for high availability:

Service	Minimum Replicas	Recommended Replicas
API	1	2-3
Agent	1	2-3
Scheduler	2	2-3
Worker	5	10+

Health Checks

All services support Kubernetes-native health check mechanisms:

Liveness Probe: Detects whether the service is alive
Readiness Probe: Detects whether the service is ready

Scaling Operations Guide

Manual Scaling

Use kubectl commands to manually scale service replica counts:

# Scale Worker to 20 replicas
kubectl scale deployment worker --replicas=20 -n genispace

# Scale API to 3 replicas
kubectl scale deployment api --replicas=3 -n genispace

Scaling via Kustomize Configuration

Modify overlays/<environment>/patches/deployment-replicas.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: worker
spec:
  replicas: 20  # Change to desired replica count
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent
spec:
  replicas: 5   # Change to desired replica count

Configuring HPA (Horizontal Pod Autoscaler)

Configure HPA for automatic scaling based on resource utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: worker-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: worker
  minReplicas: 5
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Monitoring & Operations

Key Monitoring Metrics

Metric Type	Monitoring Target	Recommended Alert Threshold
CPU Usage	All services	> 80%
Memory Usage	All services	> 85%
Task Queue Depth	Scheduler/Worker	> 1000
Request Latency	API/Agent	P99 > 2s
Error Rate	All services	> 1%

Logging & Tracing

GeniSpace supports integration with mainstream observability tools:

Log Collection: Supports Fluentd, Fluent Bit
Metrics Monitoring: Supports Prometheus, Grafana
Distributed Tracing: Supports Phoenix (built-in)

Summary

The GeniSpace self-hosted version provides powerful elastic scaling capabilities through microservices architecture and Kubernetes-native features:

Horizontal Scaling: All services support horizontal scaling by increasing replica counts
Vertical Scaling: Scale vertically by adjusting resource configurations
Auto Scaling: Supports configuring HPA for automatic elastic scaling
High Availability: Ensures service high availability through multi-replica deployment and health checks
Unlimited Scaling: Achieves unlimited task execution capacity through the Scheduler + Worker architecture

Get Support

For more information about the self-hosted version or for professional deployment support, please contact us:

Sales Inquiry: https://www.genispace.cn/contact

Enterprise Deployment — Enterprise deployment solutions
High Availability Deployment — High availability architecture and disaster recovery

Overview​

Version Information​

Elastic Scaling Capabilities Summary (Corresponding to Training Materials)​

Platform Architecture​

Microservice Components​

Core Service Descriptions​

Task Execution Layer Elastic Scaling​

Worker - Powerful Task Execution Engine​

Operating Mode​

Resource Configuration Example​

Horizontal Scaling Strategy​

Scheduler - Unlimited Task Scheduling​

Core Features​

Unlimited Scaling Capability​

Core Service Elastic Scaling​

API Service​

Vertical Scaling Configuration​

Horizontal Scaling​

Agent Service​

Resource Configuration​

Scaling Strategy​

Dataset Service​

MCP Server​

Kubernetes Deployment Strategies​

Rolling Update Strategy​

Resource Configuration Best Practices​

GKE Autopilot Bursting Feature​

Resource Configuration Tiers​

Configuration Management​

High Availability Configuration​

Multi-Replica Deployment​

Health Checks​

Scaling Operations Guide​

Manual Scaling​

Scaling via Kustomize Configuration​

Configuring HPA (Horizontal Pod Autoscaler)​

Monitoring & Operations​

Key Monitoring Metrics​

Logging & Tracing​

Summary​

Related Documentation​