Metadata-Version: 2.1
Name: dashvector
Version: 1.0.22
Summary: DashVector Client Python Sdk Library
Home-page: https://github.com/alibaba/proxima
License: Apache-2.0
Keywords: DashVector,vector,database,cloud
Author: Alibaba
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Database
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: aiohttp (>=3.1.0,<4.0.0)
Requires-Dist: certifi (>=2023.7.22,<2024.0.0)
Requires-Dist: grpcio (>=1.44.0) ; python_version >= "3.8" and python_version < "3.11"
Requires-Dist: grpcio (>=1.59.0) ; python_version >= "3.11" and python_version < "4.0"
Requires-Dist: importlib_metadata
Requires-Dist: numpy
Requires-Dist: protobuf (>=5.29,<6.0)
Project-URL: Documentation, https://help.aliyun.com/document_detail/2510225.html
Description-Content-Type: text/markdown
# DashVector Client Python Library
DashVector is a scalable and fully-managed vector-database service for building various machine learning applications. The DashVector client SDK is your gateway to access the DashVector service.
For more information about DashVector, please visit: https://help.aliyun.com/document_detail/2510225.html
## Installation
To install the DashVector client Python SDK, simply run:
```shell
pip install dashvector
```
## QuickStart
```python
import numpy as np
import dashvector
# Use DashVector `Client` api to communicate with the backend vectorDB service.
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
# Create a collection named "quickstart" with dimension of 4, using the default Cosine distance metric
rsp = client.create(name='quickstart', dimension=4)
assert rsp
# Get a collection by name
collection = client.get(name='quickstart')
# Operations on 'Collection' includes Inert/Query/Upsert/Update/Delete/Fetch of docs
# Here we insert sample data (4-dimensional vectors) in batches of 16
collection.insert(
[
dashvector.Doc(id=str(i), vector=np.random.rand(4), fields={'anykey': 'anyvalue'})
for i in range(16)
]
)
# Query a vector from the collection
docs = collection.query([0.1, 0.2, 0.3, 0.4], topk=5)
print(docs)
# Get statistics about collection
stats = collection.stats()
print(stats)
# Delete a collection by name
client.delete(name='quickstart')
```
## Reference
### Create a Client
`Client` host various APIs for interacting with DashVector `Collection`.
```python
dashvector.Client(
api_key: str,
endpoint: str = 'dashvector.cn-hangzhou.aliyuncs.com',
protocal: dashvector.DashVectorProtocol = dashvector.DashVectorProtocol.GRPC,
timeout: float = 10.0
) -> Client
```
| Parameters | Type | Required | Description |
|------------|--------------------|----------|----------------------------------------------------------------------------------------------|
| api_key | str | Yes | Your DashVector API-KEY |
| endpoint | str | No | Service Endpoint.
Default value: `dashvector.cn-hangzhou.aliyuncs.com` |
| protocol | DashVectorProtocol | No | Communication protocol, support HTTP and GRPC.
Default value: `DashVectorProtocol.GRPC` |
| timeout | float | No | Timeout period (in seconds), -1 means no timeout.
Default value: `10.0` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
assert client
```
### Create Collection
```python
Client.create(
name: str,
dimension: int,
dtype: Union[Type[int], Type[float]] = float,
fields_schema: Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] = None,
metric: str = 'cosine',
timeout: Optional[int] = None
) -> DashVectorResponse
```
| Parameters | Type | Required | Description |
|----------------|----------------------------------------------------------------------------|----------|------------------------------------------------------------------------------------------------------------------|
| name | str | Yes | The name of the Collection to create. |
| dimension | int | Yes | The dimensions of the Collection's vectors. Valid values: 1-20,000 |
| dtype | Union[Type[int], Type[float]] | No | The date type of the Collection's vectors.
Default value: `Type[float]` |
| fields_schema | Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] | No | Fields schema of the Collection.
Default value: `None`
e.g. `{"name": str, "age": int}` |
| metric | str | No | Vector similarity metric. For `cosine`, dtype must be `float`.
Valid values:
1. (Default)`cosine`
2. `dotproduct`
3. `euclidean` |
| timeout | Optional[int] | No | Timeout period (in seconds), -1 means asynchronous creation collection.
Default value: `None` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
rsp = client.create('YOUR-COLLECTION-NAME', dimension=4)
assert rsp
```
### List Collections
`Client.list() -> DashVectorResponse`
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collections = client.list()
for collection in collections:
print(collection)
# outputs:
# 'quickstart'
```
### Describe Collection
`Client.describe(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|-------|----------|-----------------------------------------|
| name | str | Yes | The name of the Collection to describe. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
rsp = client.describe('YOUR-COLLECTION-NAME')
print(rsp)
# example output:
# {
# "request_id": "8d3ac14e-5382-4736-b77c-4318761ddfab",
# "code": 0,
# "message": "",
# "output": {
# "name": "quickstart",
# "dimension": 4,
# "dtype": "FLOAT",
# "metric": "dotproduct",
# "fields_schema": {
# "name": "STRING",
# "age": "INT",
# "height": "FLOAT"
# },
# "status": "SERVING",
# "partitions": {
# "default": "SERVING"
# }
# }
# }
```
### Delete Collection
`Client.delete(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|-------|----------|---------------------------------------|
| name | str | Yes | The name of the Collection to delete. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
client.delete('YOUR-COLLECTION-NAME')
```
### Get a Collection Instance
`Collection` provides APIs for accessing `Doc` and `Partition`
`Client.get(name: str) -> Collection`
| Parameters | Type | Required | Description |
|------------|-------|----------|------------------------------------|
| name | str | Yes | The name of the Collection to get. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
assert collection
```
### Describe Collection Statistics
`Collection.stats() -> DashVectorResponse`
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.stats()
print(rsp)
# example output:
# {
# "request_id": "14448bcb-c9a3-49a8-9152-0de3990bce59",
# "code": 0,
# "message": "Success",
# "output": {
# "total_doc_count": "26",
# "index_completeness": 1.0,
# "partitions": {
# "default": {
# "total_doc_count": "26"
# }
# }
# }
# }
```
### Insert/Update/Upsert Docs
```python
Collection.insert(
docs: Union[Doc, List[Doc], Tuple, List[Tuple]],
partition: Optional[str] = None,
async_req: False
) -> DashVectorResponse
```
| Parameters | Type | Required | Description |
|------------|-------------------------------------------|----------|------------------------------------------------------------------------|
| docs | Union[Doc, List[Doc], Tuple, List[Tuple]] | Yes | The docs to Insert/Update/Upsert. |
| partition | Optional[str] | No | Name of the partition to Insert/Update/Upsert.
Default value: `None` |
| async_req | bool | No | Enable async request or not.
Default value: `False` |
Example:
```python
import dashvector
import numpy as np
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
# insert a doc with Tuple
collection.insert(('YOUR-DOC-ID1', [0.1, 0.2, 0.3, 0.4]))
collection.insert(('YOUR-DOC-ID2', [0.2, 0.3, 0.4, 0.5], {'age': 30, 'name': 'alice', 'anykey': 'anyvalue'}))
# insert a doc with dashvector.Doc
collection.insert(
dashvector.Doc(
id='YOUR-DOC-ID3',
vector=[0.3, 0.4, 0.5, 0.6],
fields={'foo': 'bar'}
)
)
# insert in batches
ret = collection.insert(
[
('YOUR-DOC-ID4', [0.2, 0.7, 0.8, 1.3], {'age': 1}),
('YOUR-DOC-ID4', [0.3, 0.6, 0.9, 1.2], {'age': 2}),
('YOUR-DOC-ID6', [0.4, 0.5, 1.0, 1.1], {'age': 3})
]
)
# insert in batches
ret = collection.insert(
[
dashvector.Doc(id=str(i), vector=np.random.rand(4)) for i in range(10)
]
)
# async insert
ret_funture = collection.insert(
[
dashvector.Doc(id=str(i+10), vector=np.random.rand(4)) for i in range(10)
],
async_req=True
)
ret = ret_funture.get()
```
### Query a Collection
```python
Collection.query(
vector: Optional[Union[List[Union[int, float]], np.ndarray]] = None,
id: Optional[str] = None,
topk: int = 10,
filter: Optional[str] = None,
include_vector: bool = False,
partition: Optional[str] = None,
output_fields: Optional[List[str]] = None,
async_req: False
) -> DashVectorResponse
```
| Parameters | Type | Required | Description |
|-----------------|------------------------------------------------------|----------|--------------------------------------------------------------------------------------------------------------|
| vector | Optional[Union[List[Union[int, float]], np.ndarray]] | No | The vector to query |
| id | Optional[str] | No | The doc id to query.
Setting `id` means searching by vector corresponding to the id |
| topk | Optional[str] | No | Number of similarity results to return.
Default value: `10` |
| filter | Optional[str] | No | Expression used to filter results
Default value: None
e.g. `age>20` |
| include_vector | bool | No | Return vector details or not.
Default value: `False` |
| partition | Optional[str] | No | Name of the partition to Query.
Default value: `None` |
| output_fields | Optional[List[str]] | No | List of field names to return.
Default value: `None`, means return all fields
e.g. `['name', 'age']` |
| async_req | bool | No | Enable async request or not.
Default value: `False` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
match_docs = collection.query([0.1, 0.2, 0.3, 0.4], topk=100, filter='age>20', include_vector=True, output_fields=['age','name','foo'])
if match_docs:
for doc in match_docs:
print(doc.id)
print(doc.vector)
print(doc.fields)
print(doc.score)
```
### Delete Docs
```python
collection.delete(
ids: Union[str, List[str]],
delete_all: bool = False,
partition: Optional[str] = None,
async_req: bool = False
) -> DashVectorResponse
```
| Parameters | Type | Required | Description |
|------------|-----------------------|----------|-----------------------------------------------------------------|
| ids | Union[str, List[str]] | Yes | The id (or list of ids) for the Doc(s) to Delete |
| delete_all | bool | No | Delete all vectors from partition.
Default value: `False` |
| partition | Optional[str] | No | Name of the partition to Delete from.
Default value: `None` |
| async_req | bool | No | Enable async request or not.
Default value: `False` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
collection.delete(['YOUR-DOC-ID1','YOUR-DOC-ID2'])
```
### Fetch Docs
```python
Collection.fetch(
ids: Union[str, List[str]],
partition: Optional[str] = None,
async_req: bool = False
) -> DashVectorResponse
```
| Parameters | Type | Required | Description |
|------------|-----------------------|----------|----------------------------------------------------------------|
| ids | Union[str, List[str]] | Yes | The id (or list of ids) for the Doc(s) to Fetch |
| partition | Optional[str] | No | Name of the partition to Fetch from.
Default value: `None` |
| async_req | bool | No | Enable async request or not.
Default value: `False` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
fetch_docs = collection.fetch(['YOUR-DOC-ID1', 'YOUR-DOC-ID2'])
if fetch_docs:
for doc_id in fetch_docs:
doc = fetch_docs[doc_id]
print(doc.id)
print(doc.vector)
print(doc.fields)
```
### Create Collection Partition
`Collection.create_partition(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|----------------|----------|-------------------------------------------------------------------------------------------------------|
| name | str | Yes | The name of the Partition to Create. |
| timeout | Optional[int] | No | Timeout period (in seconds), -1 means asynchronous creation partition.
Default value: `None` |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.create_partition('YOUR-PARTITION-NAME')
assert rsp
```
### Delete Collection Partition
`Collection.delete_partition(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|-------|----------|--------------------------------------|
| name | str | Yes | The name of the Partition to Delete. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.delete_partition('YOUR-PARTITION-NAME')
assert rsp
```
### List Collection Partitions
`Collection.list_partitions() -> DashVectorResponse`
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
partitions = collection.list_partitions()
assert partitions
for pt in partitions:
print(pt)
```
### Describe Collection Partition
`Collection.describe_partition(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|-------|----------|----------------------------------------|
| name | str | Yes | The name of the Partition to Describe. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.describe_partition('shoes')
print(rsp)
# example output:
# {"request_id":"296267a7-68e2-483a-87e6-5992d85a5806","code":0,"message":"","output":"SERVING"}
```
### Statistics for Collection Partition
`Collection.stats_partition(name: str) -> DashVectorResponse`
| Parameters | Type | Required | Description |
|------------|-------|----------|----------------------------------------------|
| name | str | Yes | The name of the Partition to get Statistics. |
Example:
```python
import dashvector
client = dashvector.Client(api_key='YOUR-DASHVECTOR-API-KEY')
collection = client.get('YOUR-COLLECTION-NAME')
rsp = collection.stats_partition('shoes')
print(rsp)
# example outptut:
# {
# "code":0,
# "message":"",
# "requests_id":"330a2bcb-e4a7-4fc6-a711-2fe5f8a24e8c",
# "output":{
# "total_doc_count":0
# }
# }
```
## Class
### dashvector.Doc
```python
@dataclass(frozen=True)
class Doc(object):
id: str
vector: Union[List[int], List[float], numpy.ndarray]
fields: Optional[Dict[str, Union[Type[str], Type[int], Type[float], Type[bool]]]] = None
score: float = 0.0
```
### dashvector.DashVectorResponse
```python
class DashVectorResponse(object):
code: DashVectorCode
message: str
request_id: str
output: Any
```
## License
This project is licensed under the Apache License (Version 2.0).