Skip to content

Library Document

alt text

Supported Document Formats

The library supports uploading files in various document formats and also allows the addition of webpages.

  • Add File: The library supports uploading the following document formats:

    • PDF: .pdf
    • Excel: .xlsx
    • Word: .docx
    • Markdown: .md
    • CSV: .csv
    • HTML: .html
    • JSON: .json
  • Add Webpage: Supports adding a webpage URL, which will automatically scrape the webpage content and add it to the library.

⚠️Note: Static webpages are best supported. If the webpage content is dynamically generated, you can select Include Dynamic Content to attempt to capture the webpage content. However, not all dynamic webpage content can be successfully captured.

You can configure each document individually, including setting different vectorization strategies to suit the content of different documents.

File Alias

The file alias is used for better identification of the file and will also be displayed at the reference location in the client. If no file alias is set, the original file name will be displayed directly.

File Description

This is an important setting and will be used in two places:

  1. When the library type is a file library, it is used to find the appropriate file (in this case, the file description replaces the role of high-dimensional vectors, so it is very important to set a proper file description).
  2. As metadata when submitting file fragments to the large language model, enabling the model to better answer the user's questions.

💡Tip: For better RAG (Retrieval-Augmented Generation) results, it is recommended to always set a file description.

Vectorization Strategy

You can set a separate vectorization strategy for individual files, which is particularly useful when the content of certain files does not match the vectorization strategy of the library. Modifying the vectorization strategy of a single file will re-vectorize the content of that file.

Last updated: