Feat/assistant app (#2086)

Co-authored-by: chenhe <guchenhe@gmail.com>
Co-authored-by: Pascal M <11357019+perzeuss@users.noreply.github.com>
This commit is contained in:
Yeuoly
2024-01-23 19:58:23 +08:00
committed by GitHub
parent 7bbe12b2bd
commit 86286e1ac8
175 changed files with 11619 additions and 1235 deletions

View File

@@ -0,0 +1,266 @@
# Advanced Tool Integration
Before starting with this advanced guide, please make sure you have a basic understanding of the tool integration process in Dify. Check out [Quick Integration](./tool_scale_out.md) for a quick runthrough.
## Tool Interface
We have defined a series of helper methods in the `Tool` class to help developers quickly build more complex tools.
### Message Return
Dify supports various message types such as `text`, `link`, `image`, and `file BLOB`. You can return different types of messages to the LLM and users through the following interfaces.
Please note, some parameters in the following interfaces will be introduced in later sections.
#### Image URL
You only need to pass the URL of the image, and Dify will automatically download the image and return it to the user.
```python
def create_image_message(self, image: str, save_as: str = '') -> ToolInvokeMessage:
"""
create an image message
:param image: the url of the image
:return: the image message
"""
```
#### Link
If you need to return a link, you can use the following interface.
```python
def create_link_message(self, link: str, save_as: str = '') -> ToolInvokeMessage:
"""
create a link message
:param link: the url of the link
:return: the link message
"""
```
#### Text
If you need to return a text message, you can use the following interface.
```python
def create_text_message(self, text: str, save_as: str = '') -> ToolInvokeMessage:
"""
create a text message
:param text: the text of the message
:return: the text message
"""
```
#### File BLOB
If you need to return the raw data of a file, such as images, audio, video, PPT, Word, Excel, etc., you can use the following interface.
- `blob` The raw data of the file, of bytes type
- `meta` The metadata of the file, if you know the type of the file, it is best to pass a `mime_type`, otherwise Dify will use `octet/stream` as the default type
```python
def create_blob_message(self, blob: bytes, meta: dict = None, save_as: str = '') -> ToolInvokeMessage:
"""
create a blob message
:param blob: the blob
:return: the blob message
"""
```
### Shortcut Tools
In large model applications, we have two common needs:
- First, summarize a long text in advance, and then pass the summary content to the LLM to prevent the original text from being too long for the LLM to handle
- The content obtained by the tool is a link, and the web page information needs to be crawled before it can be returned to the LLM
To help developers quickly implement these two needs, we provide the following two shortcut tools.
#### Text Summary Tool
This tool takes in an user_id and the text to be summarized, and returns the summarized text. Dify will use the default model of the current workspace to summarize the long text.
```python
def summary(self, user_id: str, content: str) -> str:
"""
summary the content
:param user_id: the user id
:param content: the content
:return: the summary
"""
```
#### Web Page Crawling Tool
This tool takes in web page link to be crawled and a user_agent (which can be empty), and returns a string containing the information of the web page. The `user_agent` is an optional parameter that can be used to identify the tool. If not passed, Dify will use the default `user_agent`.
```python
def get_url(self, url: str, user_agent: str = None) -> str:
"""
get url
""" the crawled result
```
### Variable Pool
We have introduced a variable pool in `Tool` to store variables, files, etc. generated during the tool's operation. These variables can be used by other tools during the tool's operation.
Next, we will use `DallE3` and `Vectorizer.AI` as examples to introduce how to use the variable pool.
- `DallE3` is an image generation tool that can generate images based on text. Here, we will let `DallE3` generate a logo for a coffee shop
- `Vectorizer.AI` is a vector image conversion tool that can convert images into vector images, so that the images can be infinitely enlarged without distortion. Here, we will convert the PNG icon generated by `DallE3` into a vector image, so that it can be truly used by designers.
#### DallE3
First, we use DallE3. After creating the image, we save the image to the variable pool. The code is as follows:
```python
from typing import Any, Dict, List, Union
from core.tools.entities.tool_entities import ToolInvokeMessage
from core.tools.tool.builtin_tool import BuiltinTool
from base64 import b64decode
from openai import OpenAI
class DallE3Tool(BuiltinTool):
def _invoke(self,
user_id: str,
tool_paramters: Dict[str, Any],
) -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]:
"""
invoke tools
"""
client = OpenAI(
api_key=self.runtime.credentials['openai_api_key'],
)
# prompt
prompt = tool_paramters.get('prompt', '')
if not prompt:
return self.create_text_message('Please input prompt')
# call openapi dalle3
response = client.images.generate(
prompt=prompt, model='dall-e-3',
size='1024x1024', n=1, style='vivid', quality='standard',
response_format='b64_json'
)
result = []
for image in response.data:
# Save all images to the variable pool through the save_as parameter. The variable name is self.VARIABLE_KEY.IMAGE.value. If new images are generated later, they will overwrite the previous images.
result.append(self.create_blob_message(blob=b64decode(image.b64_json),
meta={ 'mime_type': 'image/png' },
save_as=self.VARIABLE_KEY.IMAGE.value))
return result
```
Note that we used `self.VARIABLE_KEY.IMAGE.value` as the variable name of the image. In order for developers' tools to cooperate with each other, we defined this `KEY`. You can use it freely, or you can choose not to use this `KEY`. Passing a custom KEY is also acceptable.
#### Vectorizer.AI
Next, we use Vectorizer.AI to convert the PNG icon generated by DallE3 into a vector image. Let's go through the functions we defined here. The code is as follows:
```python
from core.tools.tool.builtin_tool import BuiltinTool
from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParamter
from core.tools.errors import ToolProviderCredentialValidationError
from typing import Any, Dict, List, Union
from httpx import post
from base64 import b64decode
class VectorizerTool(BuiltinTool):
def _invoke(self, user_id: str, tool_paramters: Dict[str, Any]) \
-> Union[ToolInvokeMessage, List[ToolInvokeMessage]]:
"""
Tool invocation, the image variable name needs to be passed in from here, so that we can get the image from the variable pool
"""
def get_runtime_parameters(self) -> List[ToolParamter]:
"""
Override the tool parameter list, we can dynamically generate the parameter list based on the actual situation in the current variable pool, so that the LLM can generate the form based on the parameter list
"""
def is_tool_avaliable(self) -> bool:
"""
Whether the current tool is available, if there is no image in the current variable pool, then we don't need to display this tool, just return False here
"""
```
Next, let's implement these three functions
```python
from core.tools.tool.builtin_tool import BuiltinTool
from core.tools.entities.tool_entities import ToolInvokeMessage, ToolParamter
from core.tools.errors import ToolProviderCredentialValidationError
from typing import Any, Dict, List, Union
from httpx import post
from base64 import b64decode
class VectorizerTool(BuiltinTool):
def _invoke(self, user_id: str, tool_paramters: Dict[str, Any]) \
-> Union[ToolInvokeMessage, List[ToolInvokeMessage]]:
"""
invoke tools
"""
api_key_name = self.runtime.credentials.get('api_key_name', None)
api_key_value = self.runtime.credentials.get('api_key_value', None)
if not api_key_name or not api_key_value:
raise ToolProviderCredentialValidationError('Please input api key name and value')
# Get image_id, the definition of image_id can be found in get_runtime_parameters
image_id = tool_paramters.get('image_id', '')
if not image_id:
return self.create_text_message('Please input image id')
# Get the image generated by DallE from the variable pool
image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE)
if not image_binary:
return self.create_text_message('Image not found, please request user to generate image firstly.')
# Generate vector image
response = post(
'https://vectorizer.ai/api/v1/vectorize',
files={ 'image': image_binary },
data={ 'mode': 'test' },
auth=(api_key_name, api_key_value),
timeout=30
)
if response.status_code != 200:
raise Exception(response.text)
return [
self.create_text_message('the vectorized svg is saved as an image.'),
self.create_blob_message(blob=response.content,
meta={'mime_type': 'image/svg+xml'})
]
def get_runtime_parameters(self) -> List[ToolParamter]:
"""
override the runtime parameters
"""
# Here, we override the tool parameter list, define the image_id, and set its option list to all images in the current variable pool. The configuration here is consistent with the configuration in yaml.
return [
ToolParamter.get_simple_instance(
name='image_id',
llm_description=f'the image id that you want to vectorize, \
and the image id should be specified in \
{[i.name for i in self.list_default_image_variables()]}',
type=ToolParamter.ToolParameterType.SELECT,
required=True,
options=[i.name for i in self.list_default_image_variables()]
)
]
def is_tool_avaliable(self) -> bool:
# Only when there are images in the variable pool, the LLM needs to use this tool
return len(self.list_default_image_variables()) > 0
```
It's worth noting that we didn't actually use `image_id` here. We assumed that there must be an image in the default variable pool when calling this tool, so we directly used `image_binary = self.get_variable_file(self.VARIABLE_KEY.IMAGE)` to get the image. In cases where the model's capabilities are weak, we recommend developers to do the same, which can effectively improve fault tolerance and avoid the model passing incorrect parameters.

View File

@@ -0,0 +1,212 @@
# Quick Tool Integration
Here, we will use GoogleSearch as an example to demonstrate how to quickly integrate a tool.
## 1. Prepare the Tool Provider yaml
### Introduction
This yaml declares a new tool provider, and includes information like the provider's name, icon, author, and other details that are fetched by the frontend for display.
### Example
We need to create a `google` module (folder) under `core/tools/provider/builtin`, and create `google.yaml`. The name must be consistent with the module name.
Subsequently, all operations related to this tool will be carried out under this module.
```yaml
identity: # Basic information of the tool provider
author: Dify # Author
name: google # Name, unique, no duplication with other providers
label: # Label for frontend display
en_US: Google # English label
zh_Hans: Google # Chinese label
description: # Description for frontend display
en_US: Google # English description
zh_Hans: Google # Chinese description
icon: icon.svg # Icon, needs to be placed in the _assets folder of the current module
```
- The `identity` field is mandatory, it contains the basic information of the tool provider, including author, name, label, description, icon, etc.
- The icon needs to be placed in the `_assets` folder of the current module, you can refer to [here](../../provider/builtin/google/_assets/icon.svg).
## 2. Prepare Provider Credentials
Google, as a third-party tool, uses the API provided by SerpApi, which requires an API Key to use. This means that this tool needs a credential to use. For tools like `wikipedia`, there is no need to fill in the credential field, you can refer to [here](../../provider/builtin/wikipedia/wikipedia.yaml).
After configuring the credential field, the effect is as follows:
```yaml
identity:
author: Dify
name: google
label:
en_US: Google
zh_Hans: Google
description:
en_US: Google
zh_Hans: Google
icon: icon.svg
credentails_for_provider: # Credential field
serpapi_api_key: # Credential field name
type: secret-input # Credential field type
required: true # Required or not
label: # Credential field label
en_US: SerpApi API key # English label
zh_Hans: SerpApi API key # Chinese label
placeholder: # Credential field placeholder
en_US: Please input your SerpApi API key # English placeholder
zh_Hans: 请输入你的 SerpApi API key # Chinese placeholder
help: # Credential field help text
en_US: Get your SerpApi API key from SerpApi # English help text
zh_Hans: 从 SerpApi 获取您的 SerpApi API key # Chinese help text
url: https://serpapi.com/manage-api-key # Credential field help link
```
- `type`: Credential field type, currently can be either `secret-input`, `text-input`, or `select` , corresponding to password input box, text input box, and drop-down box, respectively. If set to `secret-input`, it will mask the input content on the frontend, and the backend will encrypt the input content.
## 3. Prepare Tool yaml
A provider can have multiple tools, each tool needs a yaml file to describe, this file contains the basic information, parameters, output, etc. of the tool.
Still taking GoogleSearch as an example, we need to create a `tools` module under the `google` module, and create `tools/google_search.yaml`, the content is as follows.
```yaml
identity: # Basic information of the tool
name: google_search # Tool name, unique, no duplication with other tools
author: Dify # Author
label: # Label for frontend display
en_US: GoogleSearch # English label
zh_Hans: 谷歌搜索 # Chinese label
description: # Description for frontend display
human: # Introduction for frontend display, supports multiple languages
en_US: A tool for performing a Google SERP search and extracting snippets and webpages.Input should be a search query.
zh_Hans: 一个用于执行 Google SERP 搜索并提取片段和网页的工具。输入应该是一个搜索查询。
llm: A tool for performing a Google SERP search and extracting snippets and webpages.Input should be a search query. # Introduction passed to LLM, in order to make LLM better understand this tool, we suggest to write as detailed information about this tool as possible here, so that LLM can understand and use this tool
parameters: # Parameter list
- name: query # Parameter name
type: string # Parameter type
required: true # Required or not
label: # Parameter label
en_US: Query string # English label
zh_Hans: 查询语句 # Chinese label
human_description: # Introduction for frontend display, supports multiple languages
en_US: used for searching
zh_Hans: 用于搜索网页内容
llm_description: key words for searching # Introduction passed to LLM, similarly, in order to make LLM better understand this parameter, we suggest to write as detailed information about this parameter as possible here, so that LLM can understand this parameter
form: llm # Form type, llm means this parameter needs to be inferred by Agent, the frontend will not display this parameter
- name: result_type
type: select # Parameter type
required: true
options: # Drop-down box options
- value: text
label:
en_US: text
zh_Hans: 文本
- value: link
label:
en_US: link
zh_Hans: 链接
default: link
label:
en_US: Result type
zh_Hans: 结果类型
human_description:
en_US: used for selecting the result type, text or link
zh_Hans: 用于选择结果类型,使用文本还是链接进行展示
form: form # Form type, form means this parameter needs to be filled in by the user on the frontend before the conversation starts
```
- The `identity` field is mandatory, it contains the basic information of the tool, including name, author, label, description, etc.
- `parameters` Parameter list
- `name` Parameter name, unique, no duplication with other parameters
- `type` Parameter type, currently supports `string`, `number`, `boolean`, `select` four types, corresponding to string, number, boolean, drop-down box
- `required` Required or not
- In `llm` mode, if the parameter is required, the Agent is required to infer this parameter
- In `form` mode, if the parameter is required, the user is required to fill in this parameter on the frontend before the conversation starts
- `options` Parameter options
- In `llm` mode, Dify will pass all options to LLM, LLM can infer based on these options
- In `form` mode, when `type` is `select`, the frontend will display these options
- `default` Default value
- `label` Parameter label, for frontend display
- `human_description` Introduction for frontend display, supports multiple languages
- `llm_description` Introduction passed to LLM, in order to make LLM better understand this parameter, we suggest to write as detailed information about this parameter as possible here, so that LLM can understand this parameter
- `form` Form type, currently supports `llm`, `form` two types, corresponding to Agent self-inference and frontend filling
## 4. Add Tool Logic
After completing the tool configuration, we can start writing the tool code that defines how it is invoked.
Create `google_search.py` under the `google/tools` module, the content is as follows.
```python
from core.tools.tool.builtin_tool import BuiltinTool
from core.tools.entities.tool_entities import ToolInvokeMessage
from typing import Any, Dict, List, Union
class GoogleSearchTool(BuiltinTool):
def _invoke(self,
user_id: str,
tool_paramters: Dict[str, Any],
) -> Union[ToolInvokeMessage, List[ToolInvokeMessage]]:
"""
invoke tools
"""
query = tool_paramters['query']
result_type = tool_paramters['result_type']
api_key = self.runtime.credentials['serpapi_api_key']
# TODO: search with serpapi
result = SerpAPI(api_key).run(query, result_type=result_type)
if result_type == 'text':
return self.create_text_message(text=result)
return self.create_link_message(link=result)
```
### Parameters
The overall logic of the tool is in the `_invoke` method, this method accepts two parameters: `user_id` and `tool_paramters`, which represent the user ID and tool parameters respectively
### Return Data
When the tool returns, you can choose to return one message or multiple messages, here we return one message, using `create_text_message` and `create_link_message` can create a text message or a link message.
## 5. Add Provider Code
Finally, we need to create a provider class under the provider module to implement the provider's credential verification logic. If the credential verification fails, it will throw a `ToolProviderCredentialValidationError` exception.
Create `google.py` under the `google` module, the content is as follows.
```python
from core.tools.entities.tool_entities import ToolInvokeMessage, ToolProviderType
from core.tools.tool.tool import Tool
from core.tools.provider.builtin_tool_provider import BuiltinToolProviderController
from core.tools.errors import ToolProviderCredentialValidationError
from core.tools.provider.builtin.google.tools.google_search import GoogleSearchTool
from typing import Any, Dict
class GoogleProvider(BuiltinToolProviderController):
def _validate_credentials(self, credentials: Dict[str, Any]) -> None:
try:
# 1. Here you need to instantiate a GoogleSearchTool with GoogleSearchTool(), it will automatically load the yaml configuration of GoogleSearchTool, but at this time it does not have credential information inside
# 2. Then you need to use the fork_tool_runtime method to pass the current credential information to GoogleSearchTool
# 3. Finally, invoke it, the parameters need to be passed according to the parameter rules configured in the yaml of GoogleSearchTool
GoogleSearchTool().fork_tool_runtime(
meta={
"credentials": credentials,
}
).invoke(
user_id='',
tool_paramters={
"query": "test",
"result_type": "link"
},
)
except Exception as e:
raise ToolProviderCredentialValidationError(str(e))
```
## Completion
After the above steps are completed, we can see this tool on the frontend, and it can be used in the Agent.
Of course, because google_search needs a credential, before using it, you also need to input your credentials on the frontend.
![Alt text](../zh_Hans/images/index/image-2.png)