from langchain_core.output_parsers import JsonOutputParser from langchain_core.prompts import PromptTemplate chain_of_density_prompt_template = """ Research Paper: {paper} You will generate increasingly concise, entity-dense summaries of the above research paper. Repeat the following 2 steps 10 times. Step 1. Identify 1-3 informative Entities ('; ' delimited) from the research paper that are missing from the previously generated summary. These entities should be key components such as research questions, methodologies, findings, theoretical contributions, or implications. Step 2. Write a new, denser summary of identical length which covers every entity and detail from the previous summary plus the Missing Entities. A Missing Entity is: - Relevant: critical to understanding the paper’s contribution. - Specific: descriptive yet concise (5 words or fewer). - Novel: not included in the previous summary. - Faithful: accurately represented in the research paper. - Anywhere: can be found anywhere in the research paper. Guidelines: - The first summary should be long (4-5 sentences, ~100 words) yet focus on general information about the research paper, including its broad topic and objectives, without going into detail. - Avoid using verbose language and fillers (e.g., 'This research paper discusses') to reach the word count. - Strive for efficiency in word use: rewrite the previous summary to improve readability and make space for additional entities. - Employ strategies such as fusion (combining entities), compression (shortening descriptions), and removal of uninformative phrases to make space for new entities. - The summaries should evolve to be highly dense and concise yet remain self-contained, meaning they can be understood without reading the full paper. - Missing entities should be integrated seamlessly into the new summary. - Never omit entities from previous summaries. If space is a challenge, incorporate fewer new entities but maintain the same word count. Remember, use the exact same number of words for each summary. The JSON output should be a list (length 10) of dictionaries. Each dictionary must have two keys: 'missing_entities', listing the 1-3 entities added in each round; and 'denser_summary', presenting the new summary that integrates these entities without increasing the length. """ chain_of_density_output_parser = JsonOutputParser() chain_of_density_prompt = PromptTemplate( template=chain_of_density_prompt_template, input_variables=["paper"], ) chain_of_density_chain = ( lambda model: chain_of_density_prompt | model | chain_of_density_output_parser )