Data Overview
The Data module in the GeniSpace platform is divided into Data Source Management and Vector Datasets, providing data support for workflows and agents.
Access Path
- Console → Sidebar Data module
- Or navigate directly to
/data(defaults to Data Source Management) - Switch between tabs at the top: Data Source Management | Vector Datasets | Platform Data (Platform Data is not shown in self-hosted deployments)
- URL parameters:
/data?tab=datasource,/data?tab=dataset,/data?tab=platform
Data Source Management
Data Source Management is used to connect to external relational databases and perform CRUD operations within workflows.
Feature Structure
Data Source Management includes three sub-views:
| View | Description |
|---|---|
| Data Sources | SQL-based query/write configurations associated with connected databases |
| Data Tables | Manage table structures in databases (create, edit, delete) |
| Databases | Database connection management, supporting MySQL, PostgreSQL, MariaDB |
Basic Workflow
- Add a database connection in the Databases view (fill in host, port, credentials, database name, etc.)
- Create or manage table structures in the Data Tables view
- Create a data source in the Data Sources view, select a database, and configure SQL statements
- Run functional tests to verify
Data sources can be invoked in workflow nodes and can also be converted into Data Source Tools for use by agents.
Vector Datasets
Vector datasets are used to store and retrieve vectorized data, providing semantic search and knowledge support for agents.
Features
1. Data Management
- Custom Datasets: Create and manage custom datasets, supporting multiple data types
- Data Import: Support importing from multiple sources, including file uploads, API integrations, etc.
- Data Preview: Preview dataset contents in real time, with pagination and search support
- Data Export: Export datasets in multiple formats
2. Vectorization Support
- Automatic Vectorization: Support automatic text vectorization without manual processing
- Vector Search: Support similarity search based on vectors
- Vector Fields: Support custom vector fields with flexible dimension configuration
3. Data Operations
- Data Query: Support complex query conditions, including filtering, sorting, etc.
- Data Update: Support batch updates and individual record updates
- Data Deletion: Support conditional deletion and batch deletion
- Data Insertion: Support batch insertion and individual record insertion
Integration with Agents
- Select associated datasets in the agent configuration
- Agents can retrieve relevant knowledge from datasets via vector search
- Support connecting multiple knowledge bases simultaneously
Usage Guide
Accessing Vector Datasets
- Click the Vector Datasets tab at the top of the Data module
- For first-time use, confirm that the team key has been initialized (key status is shown in the statistics card)
Creating a Dataset
- Click the "Create Dataset" button on the Vector Datasets page
- Fill in the basic dataset information:
- Name: Unique identifier for the dataset
- Description: Detailed description of the dataset
- Database Type: Select Milvus, etc.
- Database Configuration: Configure options such as auto ID
- Define the dataset structure:
- Add fields: Support multiple data types
- Set primary key: Select the primary key field
- Configure indexes: Create indexes for fields that need indexing
- Click "Create" to complete dataset creation
Vector Dataset Interface Features
- Statistics Cards: Total datasets, total records, total data size, key status, with expandable detailed statistics and team key management
- Create Dataset: Create a new vector dataset
- Import Data: Import data into an existing dataset (Enterprise edition)
- Search & Filter: Search by name, filter by database type
- Import/Export History: View import and export records
Data Operations
Insert Data
POST /v1/datasets/{dataset_id}/data/insert
{
"data": [
{
"field1": "value1",
"field2": "value2"
}
]
}
Query Data
POST /v1/datasets/{dataset_id}/data/query
{
"filter": "field1 == 'value1'",
"limit": 100,
"offset": 0,
"outputFields": ["field1", "field2"]
}
Update Data
POST /v1/datasets/{dataset_id}/data/update
{
"filter": "field1 == 'value1'",
"update_data": {
"field2": "new_value"
}
}
Delete Data
POST /v1/datasets/{dataset_id}/data/delete
{
"filter": "field1 == 'value1'"
}
Vector Search
POST /v1/datasets/{dataset_id}/data/search
{
"vector_field": "vector",
"data": [[0.1, 0.2, ..., 0.5]],
"limit": 5,
"filter": "category == 'technology'",
"outputFields": ["id", "title", "content"]
}
Supported Data Types
Datasets support the following data types:
- INT64: 64-bit integer
- FLOAT: Floating-point number
- VARCHAR: String
- BOOL: Boolean
- FLOAT_VECTOR: Float vector
Best Practices
-
Data Preprocessing
- Clean and format data before importing
- Ensure data conforms to field type requirements
- Handle missing values and outliers
-
Vectorization Configuration
- Choose appropriate text fields for vectorization
- Set vector dimensions based on actual needs
- Regularly update the vectorization model
-
Query Optimization
- Use filter conditions wisely
- Set appropriate page sizes
- Query only necessary fields
-
Performance Considerations
- Control data volume during batch operations
- Use indexes appropriately
- Avoid frequent small data operations
Important Notes
- Dataset names must be unique
- At least one vector field is required
- Primary key fields must be unique
- Vector field dimensions must be fixed
- Ensure the dataset exists and is accessible before performing data operations
- Pay attention to data type compatibility
- Regularly back up important data
FAQ
-
Q: How do I choose the right vector dimensions? A: Vector dimensions typically depend on the vectorization model used. We recommend 768 or 1536 dimensions.
-
Q: What should I do if data import fails? A: Check whether the data format meets the requirements, ensure field types match, and review error logs for detailed information.
-
Q: How can I optimize query performance? A: Use indexes wisely, optimize filter conditions, control the number of returned fields, and use pagination appropriately.
-
Q: How should I set the similarity threshold for vector search? A: Adjust based on your specific use case and requirements. Typically, 0.7–0.8 is a good starting point.
Related Documentation
- Knowledge Base — Knowledge base management and document processing
- Storage — File storage and management
- Agent Overview — Using data and knowledge bases with agents
- Workflow Overview — Using data sources in workflows