Data Overview

The Data module in the GeniSpace platform is divided into Data Source Management and Vector Datasets, providing data support for workflows and agents.

Access Path

Console → Sidebar Data module
Or navigate directly to /data (defaults to Data Source Management)
Switch between tabs at the top: Data Source Management | Vector Datasets | Platform Data (Platform Data is not shown in self-hosted deployments)
URL parameters: /data?tab=datasource, /data?tab=dataset, /data?tab=platform

Data Source Management

Data Source Management is used to connect to external relational databases and perform CRUD operations within workflows.

Feature Structure

Data Source Management includes three sub-views:

View	Description
Data Sources	SQL-based query/write configurations associated with connected databases
Data Tables	Manage table structures in databases (create, edit, delete)
Databases	Database connection management, supporting MySQL, PostgreSQL, MariaDB

Basic Workflow

Add a database connection in the Databases view (fill in host, port, credentials, database name, etc.)
Create or manage table structures in the Data Tables view
Create a data source in the Data Sources view, select a database, and configure SQL statements
Run functional tests to verify

Data sources can be invoked in workflow nodes and can also be converted into Data Source Tools for use by agents.

Vector Datasets

Vector datasets are used to store and retrieve vectorized data, providing semantic search and knowledge support for agents.

Features

1. Data Management

Custom Datasets: Create and manage custom datasets, supporting multiple data types
Data Import: Support importing from multiple sources, including file uploads, API integrations, etc.
Data Preview: Preview dataset contents in real time, with pagination and search support
Data Export: Export datasets in multiple formats

2. Vectorization Support

Automatic Vectorization: Support automatic text vectorization without manual processing
Vector Search: Support similarity search based on vectors
Vector Fields: Support custom vector fields with flexible dimension configuration

3. Data Operations

Data Query: Support complex query conditions, including filtering, sorting, etc.
Data Update: Support batch updates and individual record updates
Data Deletion: Support conditional deletion and batch deletion
Data Insertion: Support batch insertion and individual record insertion

Integration with Agents

Select associated datasets in the agent configuration
Agents can retrieve relevant knowledge from datasets via vector search
Support connecting multiple knowledge bases simultaneously

Usage Guide

Accessing Vector Datasets

Click the Vector Datasets tab at the top of the Data module
For first-time use, confirm that the team key has been initialized (key status is shown in the statistics card)

Creating a Dataset

Click the "Create Dataset" button on the Vector Datasets page
Fill in the basic dataset information:
- Name: Unique identifier for the dataset
- Description: Detailed description of the dataset
- Database Type: Select Milvus, etc.
- Database Configuration: Configure options such as auto ID
Define the dataset structure:
- Add fields: Support multiple data types
- Set primary key: Select the primary key field
- Configure indexes: Create indexes for fields that need indexing
Click "Create" to complete dataset creation

Vector Dataset Interface Features

Statistics Cards: Total datasets, total records, total data size, key status, with expandable detailed statistics and team key management
Create Dataset: Create a new vector dataset
Import Data: Import data into an existing dataset (Enterprise edition)
Search & Filter: Search by name, filter by database type
Import/Export History: View import and export records

Data Operations

Insert Data

POST /v1/datasets/{dataset_id}/data/insert
{
  "data": [
    {
      "field1": "value1",
      "field2": "value2"
    }
  ]
}

Query Data

POST /v1/datasets/{dataset_id}/data/query
{
  "filter": "field1 == 'value1'",
  "limit": 100,
  "offset": 0,
  "outputFields": ["field1", "field2"]
}

Update Data

POST /v1/datasets/{dataset_id}/data/update
{
  "filter": "field1 == 'value1'",
  "update_data": {
    "field2": "new_value"
  }
}

Delete Data

POST /v1/datasets/{dataset_id}/data/delete
{
  "filter": "field1 == 'value1'"
}

Vector Search

POST /v1/datasets/{dataset_id}/data/search
{
  "vector_field": "vector",
  "data": [[0.1, 0.2, ..., 0.5]],
  "limit": 5,
  "filter": "category == 'technology'",
  "outputFields": ["id", "title", "content"]
}

Supported Data Types

Datasets support the following data types:

INT64: 64-bit integer
FLOAT: Floating-point number
VARCHAR: String
BOOL: Boolean
FLOAT_VECTOR: Float vector

Best Practices

Data Preprocessing
- Clean and format data before importing
- Ensure data conforms to field type requirements
- Handle missing values and outliers
Vectorization Configuration
- Choose appropriate text fields for vectorization
- Set vector dimensions based on actual needs
- Regularly update the vectorization model
Query Optimization
- Use filter conditions wisely
- Set appropriate page sizes
- Query only necessary fields
Performance Considerations
- Control data volume during batch operations
- Use indexes appropriately
- Avoid frequent small data operations

Important Notes

Dataset names must be unique
At least one vector field is required
Primary key fields must be unique
Vector field dimensions must be fixed
Ensure the dataset exists and is accessible before performing data operations
Pay attention to data type compatibility
Regularly back up important data

FAQ

Q: How do I choose the right vector dimensions? A: Vector dimensions typically depend on the vectorization model used. We recommend 768 or 1536 dimensions.
Q: What should I do if data import fails? A: Check whether the data format meets the requirements, ensure field types match, and review error logs for detailed information.
Q: How can I optimize query performance? A: Use indexes wisely, optimize filter conditions, control the number of returned fields, and use pagination appropriately.
Q: How should I set the similarity threshold for vector search? A: Adjust based on your specific use case and requirements. Typically, 0.7–0.8 is a good starting point.

Knowledge Base — Knowledge base management and document processing
Storage — File storage and management
Agent Overview — Using data and knowledge bases with agents
Workflow Overview — Using data sources in workflows

Access Path​

Data Source Management​

Feature Structure​

Basic Workflow​

Vector Datasets​

Features​

1. Data Management​

2. Vectorization Support​

3. Data Operations​

Integration with Agents​

Usage Guide​

Accessing Vector Datasets​

Creating a Dataset​

Vector Dataset Interface Features​

Data Operations​

Insert Data​

Query Data​

Update Data​

Delete Data​

Vector Search​

Supported Data Types​

Best Practices​

Important Notes​

FAQ​

Related Documentation​