|
from langchain_core.output_parsers import JsonOutputParser |
|
from langchain_core.prompts import PromptTemplate |
|
|
|
chain_of_density_prompt_template = """ |
|
Research Paper: {paper} |
|
|
|
You will generate increasingly concise, entity-dense summaries of the above research paper. |
|
|
|
Repeat the following 2 steps 10 times. |
|
|
|
Step 1. Identify 1-3 informative Entities ('; ' delimited) from the research paper that are missing from the previously generated summary. These entities should be key components such as research questions, methodologies, findings, theoretical contributions, or implications. |
|
Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities. |
|
|
|
A Missing Entity is: |
|
- Relevant: critical to understanding the paper’s contribution. |
|
- Specific: descriptive yet concise (5 words or fewer). |
|
- Novel: not included in the previous summary. |
|
- Faithful: accurately represented in the research paper. |
|
- Anywhere: can be found anywhere in the research paper. |
|
|
|
Guidelines: |
|
- The first summary should be long (4-5 sentences, ~100 words) yet focus on general information about the research paper, including its broad topic and objectives, without going into detail. |
|
- Avoid using verbose language and fillers (e.g., 'This research paper discusses') to reach the word count. |
|
- Strive for efficiency in word use: rewrite the previous summary to improve readability and make space for additional entities. |
|
- Employ strategies such as fusion (combining entities), compression (shortening descriptions), and removal of uninformative phrases to make space for new entities. |
|
- The summaries should evolve to be highly dense and concise yet remain self-contained, meaning they can be understood without reading the full paper. |
|
- Missing entities should be integrated seamlessly into the new summary. |
|
- Never omit entities from previous summaries. If space is a challenge, incorporate fewer new entities but maintain the same word count. |
|
|
|
Remember, use the exact same number of words for each summary. |
|
|
|
The JSON output should be a list (length 10) of dictionaries. Each dictionary must have two keys: 'missing_entities', listing the 1-3 entities added in each round; and 'denser_summary', presenting the new summary that integrates these entities without increasing the length. |
|
""" |
|
|
|
chain_of_density_output_parser = JsonOutputParser() |
|
chain_of_density_prompt = PromptTemplate( |
|
template=chain_of_density_prompt_template, |
|
input_variables=["paper"], |
|
) |
|
chain_of_density_chain = ( |
|
lambda model: chain_of_density_prompt | model | chain_of_density_output_parser |
|
) |
|
|