Skip to main content

Data Overview

The Data module in the GeniSpace platform is divided into Data Source Management and Vector Datasets, providing data support for workflows and agents.

Access Path

  • Console → Sidebar Data module
  • Or navigate directly to /data (defaults to Data Source Management)
  • Switch between tabs at the top: Data Source Management | Vector Datasets | Platform Data (Platform Data is not shown in self-hosted deployments)
  • URL parameters: /data?tab=datasource, /data?tab=dataset, /data?tab=platform

Data Source Management

Data Source Management is used to connect to external relational databases and perform CRUD operations within workflows.

Feature Structure

Data Source Management includes three sub-views:

ViewDescription
Data SourcesSQL-based query/write configurations associated with connected databases
Data TablesManage table structures in databases (create, edit, delete)
DatabasesDatabase connection management, supporting MySQL, PostgreSQL, MariaDB

Basic Workflow

  1. Add a database connection in the Databases view (fill in host, port, credentials, database name, etc.)
  2. Create or manage table structures in the Data Tables view
  3. Create a data source in the Data Sources view, select a database, and configure SQL statements
  4. Run functional tests to verify

Data sources can be invoked in workflow nodes and can also be converted into Data Source Tools for use by agents.

Vector Datasets

Vector datasets are used to store and retrieve vectorized data, providing semantic search and knowledge support for agents.

Features

1. Data Management

  • Custom Datasets: Create and manage custom datasets, supporting multiple data types
  • Data Import: Support importing from multiple sources, including file uploads, API integrations, etc.
  • Data Preview: Preview dataset contents in real time, with pagination and search support
  • Data Export: Export datasets in multiple formats

2. Vectorization Support

  • Automatic Vectorization: Support automatic text vectorization without manual processing
  • Vector Search: Support similarity search based on vectors
  • Vector Fields: Support custom vector fields with flexible dimension configuration

3. Data Operations

  • Data Query: Support complex query conditions, including filtering, sorting, etc.
  • Data Update: Support batch updates and individual record updates
  • Data Deletion: Support conditional deletion and batch deletion
  • Data Insertion: Support batch insertion and individual record insertion

Integration with Agents

  • Select associated datasets in the agent configuration
  • Agents can retrieve relevant knowledge from datasets via vector search
  • Support connecting multiple knowledge bases simultaneously

Usage Guide

Accessing Vector Datasets

  1. Click the Vector Datasets tab at the top of the Data module
  2. For first-time use, confirm that the team key has been initialized (key status is shown in the statistics card)

Creating a Dataset

  1. Click the "Create Dataset" button on the Vector Datasets page
  2. Fill in the basic dataset information:
    • Name: Unique identifier for the dataset
    • Description: Detailed description of the dataset
    • Database Type: Select Milvus, etc.
    • Database Configuration: Configure options such as auto ID
  3. Define the dataset structure:
    • Add fields: Support multiple data types
    • Set primary key: Select the primary key field
    • Configure indexes: Create indexes for fields that need indexing
  4. Click "Create" to complete dataset creation

Vector Dataset Interface Features

  • Statistics Cards: Total datasets, total records, total data size, key status, with expandable detailed statistics and team key management
  • Create Dataset: Create a new vector dataset
  • Import Data: Import data into an existing dataset (Enterprise edition)
  • Search & Filter: Search by name, filter by database type
  • Import/Export History: View import and export records

Data Operations

Insert Data

POST /v1/datasets/{dataset_id}/data/insert
{
"data": [
{
"field1": "value1",
"field2": "value2"
}
]
}

Query Data

POST /v1/datasets/{dataset_id}/data/query
{
"filter": "field1 == 'value1'",
"limit": 100,
"offset": 0,
"outputFields": ["field1", "field2"]
}

Update Data

POST /v1/datasets/{dataset_id}/data/update
{
"filter": "field1 == 'value1'",
"update_data": {
"field2": "new_value"
}
}

Delete Data

POST /v1/datasets/{dataset_id}/data/delete
{
"filter": "field1 == 'value1'"
}
POST /v1/datasets/{dataset_id}/data/search
{
"vector_field": "vector",
"data": [[0.1, 0.2, ..., 0.5]],
"limit": 5,
"filter": "category == 'technology'",
"outputFields": ["id", "title", "content"]
}

Supported Data Types

Datasets support the following data types:

  • INT64: 64-bit integer
  • FLOAT: Floating-point number
  • VARCHAR: String
  • BOOL: Boolean
  • FLOAT_VECTOR: Float vector

Best Practices

  1. Data Preprocessing

    • Clean and format data before importing
    • Ensure data conforms to field type requirements
    • Handle missing values and outliers
  2. Vectorization Configuration

    • Choose appropriate text fields for vectorization
    • Set vector dimensions based on actual needs
    • Regularly update the vectorization model
  3. Query Optimization

    • Use filter conditions wisely
    • Set appropriate page sizes
    • Query only necessary fields
  4. Performance Considerations

    • Control data volume during batch operations
    • Use indexes appropriately
    • Avoid frequent small data operations

Important Notes

  1. Dataset names must be unique
  2. At least one vector field is required
  3. Primary key fields must be unique
  4. Vector field dimensions must be fixed
  5. Ensure the dataset exists and is accessible before performing data operations
  6. Pay attention to data type compatibility
  7. Regularly back up important data

FAQ

  1. Q: How do I choose the right vector dimensions? A: Vector dimensions typically depend on the vectorization model used. We recommend 768 or 1536 dimensions.

  2. Q: What should I do if data import fails? A: Check whether the data format meets the requirements, ensure field types match, and review error logs for detailed information.

  3. Q: How can I optimize query performance? A: Use indexes wisely, optimize filter conditions, control the number of returned fields, and use pagination appropriately.

  4. Q: How should I set the similarity threshold for vector search? A: Adjust based on your specific use case and requirements. Typically, 0.7–0.8 is a good starting point.