Text Generation
Transformers
English
mixtral
legal
conversational
Inference Endpoints
d-delaurier commited on
Commit
1780da1
1 Parent(s): db673fc

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Redactable-LLM
2
+ The high-level overview for integrating multiple Open Source Large Language Models within the AutoGen Framework is as follows:
3
+
4
+ ### Development of Custom Agents
5
+ - **Agent Design**: Tasks include NLP/NER/PII identification, interpreting natural language commands, executing document redaction, and final verification.
6
+ - **Customization**: Custom agents trained on specific tasks related to each aspect of the redaction process.
7
+ - **Human Interaction**: Implement features to facilitate seamless human-agent interaction, allowing users to input commands and queries naturally (Optional)
8
+
9
+ ### LLM & VLLM AutoGen Integration
10
+
11
+ - **Model Selection**: Automatic, task-dependent agent selection.
12
+ - **Enhanced Inference**: Enhanced LLM inference features for optimal performance, including tuning, caching, error handling, and templating.
13
+ - **Quality Control**: Vision agents analyze redacted documents using Set-of-Mark (SoM) prompting. Rejected documents are reprocessed and reviewed.
14
+ -
15
+ ![AutoGen Agents](https://i.imgur.com/aFgV7yd.png)
16
+
17
+ ### System Optimization
18
+ - **Workflow Automation**: Automate the redaction workflow using a blend of LLMs, custom agents, and human inputs for efficient detection and redaction of sensitive information.
19
+ - **Performance Maximization**: Optimize the system for both efficiency and accuracy, utilizing AutoGen's complex workflow management features.
20
+
21
+ ### User Interface Development
22
+ - **Interface Design**: Develop a user-friendly interface that enables non-technical users to interact with the system via natural language prompts.
23
+ - **Feedback Integration**: Implement a feedback loop to continuously refine the system's accuracy and user-friendliness based on user inputs.
24
+ - **User Knowledgebase**: (Optional) User account, profile, and domain knowledge will be accessible by the `Research` agent, for personalized interaction and results.
25
+
26
+ ### Training, Testing and Validation
27
+ - **Model Training**: Develop new datasets, focused on document understanding related to redaction.
28
+ - **Unit Testing**: Conduct extensive unit tests to ensure individual system components function correctly.
29
+ - **System Testing**: Perform comprehensive end-to-end testing to validate the entire redaction process, from user input to output.
30
+ - **User Trials**: Facilitate user trials to gather feedback and make necessary system adjustments.
31
+ ---
32
+
33
+ - #### Mistral AI (LLM)
34
+ [Paper](https://mistral.ai/news/mixtral-of-experts/) | [Model](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
35
+
36
+ - #### QwenLM (VLLM)
37
+ [Paper](https://arxiv.org/abs/2308.12966) | [Code](https://github.com/QwenLM/Qwen-VL?tab=readme-ov-file) | [Paper: Set-of-Mark Prompting](https://arxiv.org/abs/2310.11441)
38
+
39
+ - #### AutoGen
40
+ [Paper](https://arxiv.org/abs/2308.08155) | [Code](https://github.com/microsoft/autogen/tree/main)
41
+
42
+ - #### Gretel AI (Synthetic Dataset Generation)
43
+ [Model Page](https://gretel.ai/solutions/public-sector) | [Code](https://github.com/gretelai) | [Paper: Textbooks Are All You Need II](https://arxiv.org/abs/2309.05463)