# Deliverable 1 Build a prototype and deploy to a Hugging Face Space, and create a short (< 2 min) loom video demonstrating some initial testing inputs and outputs. # Deliverable 2 How did you choose your stack, and why did you select each tool the way you did? In choosing my stack, I aimed to balance performance, scalability, and the ability to handle both structured and unstructured data. The tools I selected provide flexibility to handle diverse document types, while ensuring effective chunking and retrieval. Here's the breakdown of each tool and why it was chosen: Qdrant ??? ## 1. **PyMuPDFLoader**: - **Reason for Selection**: - PyMuPDFLoader is fast, lightweight, and efficient for parsing PDFs. It offers good performance with respect to speed and memory usage, which is crucial when dealing with large documents. - In my use case, the AI Bill of Rights and NIST RMF documents are both structured and relatively dense. PyMuPDFLoader allows for quick loading and extraction of content without compromising on accuracy. - **Why Not Another Option**: - While I considered `PyPDFium2Loader`, it is slower (2 minutes 30 seconds for loading the same document) and the performance difference didn’t justify the switch since the output quality between the two loaders was almost identical. ## 2. **RecursiveCharacterTextSplitter**: - **Reason for Selection**: - This splitter allows for flexible chunking of documents into manageable pieces while preserving context and meaning. It is particularly effective because it prevents breaking up the text in the middle of a thought, ensuring that the chunks remain semantically coherent. - It also enables me to adjust the chunk sizes dynamically based on the document's structure. For instance, with the Blueprint for AI Bill of Rights, I can chunk based on sections, principles, and subsections, while still applying the RecursiveCharacter strategy within each chunk. - **Why Not a Static Chunking Strategy**: - A simple page or sentence-based chunking would not retain the full context in many cases. The recursive strategy ensures that chunks are more comprehensive, making retrieval more effective. ## 3. **SemanticChunker**: - **Reason for Selection**: - This chunker allows for semantically rich divisions of the text, meaning that chunks are more likely to contain entire ideas or thoughts. This approach enhances coherence and leads to better retrieval outcomes. - It is also adaptable and can be used to refine chunking strategies for documents that may not be as well-structured as the AI Bill of Rights or NIST RMF documents. - **Why This Over Simple Chunking**: - Semantic chunking provides better retrieval precision, especially in answering complex questions, since the context is more meaningful. This is particularly important when documents do not follow a clear structure. ## 4. **Snowflake-Arctic-Embed-L Embedding Model**: - **Reason for Selection**: - This model offers a good balance between performance and accuracy. With 334 million parameters and 1024-dimension embeddings, it is a smaller model but ranks competitively on the MTEB leaderboard (27th), suggesting its efficiency. - For a retrieval-augmented generation (RAG) setup, the embedding model plays a critical role in vectorizing chunks accurately, and this model is performant for both speed and relevance in retrieval tasks. - **Why Not a Larger Model**: - Larger models with more parameters may improve accuracy slightly but come at a much higher computational cost. For enterprise applications, the smaller yet efficient `snowflake-arctic-embed-l` model provides a good trade-off between speed and accuracy, allowing for scalability without major infrastructure demands. ## 5. **Context Enrichment and Contextual Compression**: - **Reason for Selection**: - These advanced retrieval techniques aim to enhance the quality of responses by improving the retrieval process. Context enrichment allows for richer, more informed responses, while contextual compression ensures that responses remain concise and relevant. - **Why Not Pure Contextual Retrieval**: - Pure retrieval may lead to irrelevant or verbose results. By applying these techniques, I ensure that the retrieval process generates more targeted and meaningful answers, which is essential when dealing with complex or nuanced questions (e.g., AI ethics, privacy, and risk management). ## 6. **Grouping by Similar Context**: - **Reason for Selection**: - Grouping documents by similar context improves retrieval accuracy. When a user asks about a specific topic like data privacy, the system can retrieve relevant chunks from different documents (e.g., both the AI Bill of Rights and NIST RMF), ensuring that responses are comprehensive. - **Why This Strategy**: - Grouping chunks by similar context ensures that even when documents are diverse or cover multiple topics, the right content is prioritized during retrieval. This helps improve answer quality, especially when dealing with detailed or nuanced questions. ## 7. **Vector Store**: - **Reason for Selection**: - Using a vector store enables efficient storage and retrieval of embeddings, ensuring fast lookups and scalable operations. It also allows for advanced similarity search, making sure the most relevant chunks are retrieved based on the query embeddings. - **Why Not Traditional Indexing**: - Traditional indexing methods are less effective in handling semantic content and would not allow for the nuanced retrieval that RAG applications require. Vector stores enable better handling of embeddings and can scale with large datasets. ## Conclusion: Each tool in this stack was chosen to ensure **speed**, **scalability**, and **accuracy** while dealing with structured and unstructured documents. By balancing performance with precision (e.g., fast document loading via PyMuPDFLoader, efficient chunking strategies, and a small but powerful embedding model), this stack provides a robust framework for building ethical and useful AI applications.