Spaces:

facebook
/

llm-transparency-tool-demo

Running

App Files Files Community

mahnerak commited on Feb 14

Commit

ce00289

•

0 Parent(s):

Initial Commit 🚀

Browse files

Files changed (42) hide show

.dockerignore +3 -0
.flake8 +2 -0
.gitignore +7 -0
CODE_OF_CONDUCT.md +80 -0
CONTRIBUTING.md +31 -0
Dockerfile +42 -0
LICENSE +399 -0
README.md +88 -0
config/docker_hosting.json +13 -0
config/docker_local.json +25 -0
config/local.json +47 -0
env.yaml +27 -0
llm_transparency_tool/__init__.py +5 -0
llm_transparency_tool/components/__init__.py +111 -0
llm_transparency_tool/components/frontend/.env +6 -0
llm_transparency_tool/components/frontend/.prettierrc +5 -0
llm_transparency_tool/components/frontend/package.json +39 -0
llm_transparency_tool/components/frontend/public/index.html +15 -0
llm_transparency_tool/components/frontend/src/ContributionGraph.tsx +517 -0
llm_transparency_tool/components/frontend/src/LlmViewer.css +77 -0
llm_transparency_tool/components/frontend/src/Selector.tsx +154 -0
llm_transparency_tool/components/frontend/src/common.tsx +17 -0
llm_transparency_tool/components/frontend/src/index.tsx +39 -0
llm_transparency_tool/components/frontend/src/react-app-env.d.ts +1 -0
llm_transparency_tool/components/frontend/tsconfig.json +19 -0
llm_transparency_tool/models/__init__.py +5 -0
llm_transparency_tool/models/test_tlens_model.py +162 -0
llm_transparency_tool/models/tlens_model.py +303 -0
llm_transparency_tool/models/transparent_llm.py +199 -0
llm_transparency_tool/routes/__init__.py +5 -0
llm_transparency_tool/routes/contributions.py +201 -0
llm_transparency_tool/routes/graph.py +163 -0
llm_transparency_tool/routes/graph_node.py +90 -0
llm_transparency_tool/routes/test_contributions.py +148 -0
llm_transparency_tool/server/app.py +659 -0
llm_transparency_tool/server/graph_selection.py +56 -0
llm_transparency_tool/server/monitor.py +99 -0
llm_transparency_tool/server/styles.py +107 -0
llm_transparency_tool/server/utils.py +133 -0
pyproject.toml +2 -0
sample_input.txt +3 -0
setup.py +13 -0

.dockerignore ADDED Viewed

	@@ -0,0 +1,3 @@

+**/.git
+**/node_modules
+**/.mypy_cache

.flake8 ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [flake8]
2	+ max-line-length = 120

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+**/frontend/node_modules*
+**/frontend/build/
+**/frontend/.yarn*
+.vscode/
+.mypy_cache/
+__pycache__/
+.DS_Store

CODE_OF_CONDUCT.md ADDED Viewed

	@@ -0,0 +1,80 @@

+# Code of Conduct
+## Our Pledge
+In the interest of fostering an open and welcoming environment, we as
+contributors and maintainers pledge to make participation in our project and
+our community a harassment-free experience for everyone, regardless of age, body
+size, disability, ethnicity, sex characteristics, gender identity and expression,
+level of experience, education, socio-economic status, nationality, personal
+appearance, race, religion, or sexual identity and orientation.
+## Our Standards
+Examples of behavior that contributes to creating a positive environment
+include:
+* Using welcoming and inclusive language
+* Being respectful of differing viewpoints and experiences
+* Gracefully accepting constructive criticism
+* Focusing on what is best for the community
+* Showing empathy towards other community members
+Examples of unacceptable behavior by participants include:
+* The use of sexualized language or imagery and unwelcome sexual attention or
+advances
+* Trolling, insulting/derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or electronic
+address, without explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+professional setting
+## Our Responsibilities
+Project maintainers are responsible for clarifying the standards of acceptable
+behavior and are expected to take appropriate and fair corrective action in
+response to any instances of unacceptable behavior.
+Project maintainers have the right and responsibility to remove, edit, or
+reject comments, commits, code, wiki edits, issues, and other contributions
+that are not aligned to this Code of Conduct, or to ban temporarily or
+permanently any contributor for other behaviors that they deem inappropriate,
+threatening, offensive, or harmful.
+## Scope
+This Code of Conduct applies within all project spaces, and it also applies when
+an individual is representing the project or its community in public spaces.
+Examples of representing a project or community include using an official
+project e-mail address, posting via an official social media account, or acting
+as an appointed representative at an online or offline event. Representation of
+a project may be further defined and clarified by project maintainers.
+This Code of Conduct also applies outside the project spaces when there is a
+reasonable belief that an individual's behavior may have a negative impact on
+the project or its community.
+## Enforcement
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported by contacting the project team at <[email protected]>. All
+complaints will be reviewed and investigated and will result in a response that
+is deemed necessary and appropriate to the circumstances. The project team is
+obligated to maintain confidentiality with regard to the reporter of an incident.
+Further details of specific enforcement policies may be posted separately.
+Project maintainers who do not follow or enforce the Code of Conduct in good
+faith may face temporary or permanent repercussions as determined by other
+members of the project's leadership.
+## Attribution
+This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
+available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
+[homepage]: https://www.contributor-covenant.org
+For answers to common questions about this code of conduct, see
+https://www.contributor-covenant.org/faq

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,31 @@

+# Contributing to llm-transparency-tool
+We want to make contributing to this project as easy and transparent as
+possible.
+## Pull Requests
+We actively welcome your pull requests.
+1. Fork the repo and create your branch from `main`.
+2. If you've added code that should be tested, add tests.
+3. If you've changed APIs, update the documentation.
+4. Ensure the test suite passes.
+5. Make sure your code lints.
+6. If you haven't already, complete the Contributor License Agreement ("CLA").
+## Contributor License Agreement ("CLA")
+In order to accept your pull request, we need you to submit a CLA. You only need
+to do this once to work on any of Facebook's open source projects.
+Complete your CLA here: <https://code.facebook.com/cla>
+## Issues
+We use GitHub issues to track public bugs. Please ensure your description is
+clear and has sufficient instructions to be able to reproduce the issue.
+Facebook has a [bounty program](https://www.facebook.com/whitehat/) for the safe
+disclosure of security bugs. In those cases, please go through the process
+outlined on that page and do not file a public issue.
+## License
+By contributing to llm-transparency-tool, you agree that your contributions will be licensed
+under the LICENSE file in the root directory of this source tree.

Dockerfile ADDED Viewed

	@@ -0,0 +1,42 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu22.04
+RUN apt-get update && apt-get install -y \
+    wget \
+    git \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+RUN useradd -m -u 1000 user
+USER user
+ENV HOME=/home/user
+RUN wget -P /tmp \
+    "https://github.com/conda-forge/miniforge/releases/download/23.11.0-0/Mambaforge-23.11.0-0-Linux-x86_64.sh" \
+    && bash /tmp/Mambaforge-23.11.0-0-Linux-x86_64.sh -b -p $HOME/mambaforge3 \
+    && rm /tmp/Mambaforge-23.11.0-0-Linux-x86_64.sh
+ENV PATH $HOME/mambaforge3/bin:$PATH
+WORKDIR $HOME
+ENV REPO=$HOME/llm-transparency-tool
+COPY --chown=user . $REPO
+WORKDIR $REPO
+RUN mamba env create --name llmtt -f env.yaml -y
+ENV PATH $HOME/mambaforge3/envs/llmtt/bin:$PATH
+RUN pip install -e .
+RUN cd llm_transparency_tool/components/frontend \
+    && yarn install \
+    && yarn build
+EXPOSE 7860
+CMD ["streamlit", "run", "llm_transparency_tool/server/app.py", "--server.port=7860", "--server.address=0.0.0.0", "--theme.font=Inconsolata", "--", "config/docker_hosting.json"]

LICENSE ADDED Viewed

	@@ -0,0 +1,399 @@

+Attribution-NonCommercial 4.0 International
+=======================================================================
+Creative Commons Corporation ("Creative Commons") is not a law firm and
+does not provide legal services or legal advice. Distribution of
+Creative Commons public licenses does not create a lawyer-client or
+other relationship. Creative Commons makes its licenses and related
+information available on an "as-is" basis. Creative Commons gives no
+warranties regarding its licenses, any material licensed under their
+terms and conditions, or any related information. Creative Commons
+disclaims all liability for damages resulting from their use to the
+fullest extent possible.
+Using Creative Commons Public Licenses
+Creative Commons public licenses provide a standard set of terms and
+conditions that creators and other rights holders may use to share
+original works of authorship and other material subject to copyright
+and certain other rights specified in the public license below. The
+following considerations are for informational purposes only, are not
+exhaustive, and do not form part of our licenses.
+     Considerations for licensors: Our public licenses are
+     intended for use by those authorized to give the public
+     permission to use material in ways otherwise restricted by
+     copyright and certain other rights. Our licenses are
+     irrevocable. Licensors should read and understand the terms
+     and conditions of the license they choose before applying it.
+     Licensors should also secure all rights necessary before
+     applying our licenses so that the public can reuse the
+     material as expected. Licensors should clearly mark any
+     material not subject to the license. This includes other CC-
+     licensed material, or material used under an exception or
+     limitation to copyright. More considerations for licensors:
+	wiki.creativecommons.org/Considerations_for_licensors
+     Considerations for the public: By using one of our public
+     licenses, a licensor grants the public permission to use the
+     licensed material under specified terms and conditions. If
+     the licensor's permission is not necessary for any reason--for
+     example, because of any applicable exception or limitation to
+     copyright--then that use is not regulated by the license. Our
+     licenses grant only permissions under copyright and certain
+     other rights that a licensor has authority to grant. Use of
+     the licensed material may still be restricted for other
+     reasons, including because others have copyright or other
+     rights in the material. A licensor may make special requests,
+     such as asking that all changes be marked or described.
+     Although not required by our licenses, you are encouraged to
+     respect those requests where reasonable. More_considerations
+     for the public:
+	wiki.creativecommons.org/Considerations_for_licensees
+=======================================================================
+Creative Commons Attribution-NonCommercial 4.0 International Public
+License
+By exercising the Licensed Rights (defined below), You accept and agree
+to be bound by the terms and conditions of this Creative Commons
+Attribution-NonCommercial 4.0 International Public License ("Public
+License"). To the extent this Public License may be interpreted as a
+contract, You are granted the Licensed Rights in consideration of Your
+acceptance of these terms and conditions, and the Licensor grants You
+such rights in consideration of benefits the Licensor receives from
+making the Licensed Material available under these terms and
+conditions.
+Section 1 -- Definitions.
+  a. Adapted Material means material subject to Copyright and Similar
+     Rights that is derived from or based upon the Licensed Material
+     and in which the Licensed Material is translated, altered,
+     arranged, transformed, or otherwise modified in a manner requiring
+     permission under the Copyright and Similar Rights held by the
+     Licensor. For purposes of this Public License, where the Licensed
+     Material is a musical work, performance, or sound recording,
+     Adapted Material is always produced where the Licensed Material is
+     synched in timed relation with a moving image.
+  b. Adapter's License means the license You apply to Your Copyright
+     and Similar Rights in Your contributions to Adapted Material in
+     accordance with the terms and conditions of this Public License.
+  c. Copyright and Similar Rights means copyright and/or similar rights
+     closely related to copyright including, without limitation,
+     performance, broadcast, sound recording, and Sui Generis Database
+     Rights, without regard to how the rights are labeled or
+     categorized. For purposes of this Public License, the rights
+     specified in Section 2(b)(1)-(2) are not Copyright and Similar
+     Rights.
+  d. Effective Technological Measures means those measures that, in the
+     absence of proper authority, may not be circumvented under laws
+     fulfilling obligations under Article 11 of the WIPO Copyright
+     Treaty adopted on December 20, 1996, and/or similar international
+     agreements.
+  e. Exceptions and Limitations means fair use, fair dealing, and/or
+     any other exception or limitation to Copyright and Similar Rights
+     that applies to Your use of the Licensed Material.
+  f. Licensed Material means the artistic or literary work, database,
+     or other material to which the Licensor applied this Public
+     License.
+  g. Licensed Rights means the rights granted to You subject to the
+     terms and conditions of this Public License, which are limited to
+     all Copyright and Similar Rights that apply to Your use of the
+     Licensed Material and that the Licensor has authority to license.
+  h. Licensor means the individual(s) or entity(ies) granting rights
+     under this Public License.
+  i. NonCommercial means not primarily intended for or directed towards
+     commercial advantage or monetary compensation. For purposes of
+     this Public License, the exchange of the Licensed Material for
+     other material subject to Copyright and Similar Rights by digital
+     file-sharing or similar means is NonCommercial provided there is
+     no payment of monetary compensation in connection with the
+     exchange.
+  j. Share means to provide material to the public by any means or
+     process that requires permission under the Licensed Rights, such
+     as reproduction, public display, public performance, distribution,
+     dissemination, communication, or importation, and to make material
+     available to the public including in ways that members of the
+     public may access the material from a place and at a time
+     individually chosen by them.
+  k. Sui Generis Database Rights means rights other than copyright
+     resulting from Directive 96/9/EC of the European Parliament and of
+     the Council of 11 March 1996 on the legal protection of databases,
+     as amended and/or succeeded, as well as other essentially
+     equivalent rights anywhere in the world.
+  l. You means the individual or entity exercising the Licensed Rights
+     under this Public License. Your has a corresponding meaning.
+Section 2 -- Scope.
+  a. License grant.
+       1. Subject to the terms and conditions of this Public License,
+          the Licensor hereby grants You a worldwide, royalty-free,
+          non-sublicensable, non-exclusive, irrevocable license to
+          exercise the Licensed Rights in the Licensed Material to:
+            a. reproduce and Share the Licensed Material, in whole or
+               in part, for NonCommercial purposes only; and
+            b. produce, reproduce, and Share Adapted Material for
+               NonCommercial purposes only.
+       2. Exceptions and Limitations. For the avoidance of doubt, where
+          Exceptions and Limitations apply to Your use, this Public
+          License does not apply, and You do not need to comply with
+          its terms and conditions.
+       3. Term. The term of this Public License is specified in Section
+          6(a).
+       4. Media and formats; technical modifications allowed. The
+          Licensor authorizes You to exercise the Licensed Rights in
+          all media and formats whether now known or hereafter created,
+          and to make technical modifications necessary to do so. The
+          Licensor waives and/or agrees not to assert any right or
+          authority to forbid You from making technical modifications
+          necessary to exercise the Licensed Rights, including
+          technical modifications necessary to circumvent Effective
+          Technological Measures. For purposes of this Public License,
+          simply making modifications authorized by this Section 2(a)
+          (4) never produces Adapted Material.
+       5. Downstream recipients.
+            a. Offer from the Licensor -- Licensed Material. Every
+               recipient of the Licensed Material automatically
+               receives an offer from the Licensor to exercise the
+               Licensed Rights under the terms and conditions of this
+               Public License.
+            b. No downstream restrictions. You may not offer or impose
+               any additional or different terms or conditions on, or
+               apply any Effective Technological Measures to, the
+               Licensed Material if doing so restricts exercise of the
+               Licensed Rights by any recipient of the Licensed
+               Material.
+       6. No endorsement. Nothing in this Public License constitutes or
+          may be construed as permission to assert or imply that You
+          are, or that Your use of the Licensed Material is, connected
+          with, or sponsored, endorsed, or granted official status by,
+          the Licensor or others designated to receive attribution as
+          provided in Section 3(a)(1)(A)(i).
+  b. Other rights.
+       1. Moral rights, such as the right of integrity, are not
+          licensed under this Public License, nor are publicity,
+          privacy, and/or other similar personality rights; however, to
+          the extent possible, the Licensor waives and/or agrees not to
+          assert any such rights held by the Licensor to the limited
+          extent necessary to allow You to exercise the Licensed
+          Rights, but not otherwise.
+       2. Patent and trademark rights are not licensed under this
+          Public License.
+       3. To the extent possible, the Licensor waives any right to
+          collect royalties from You for the exercise of the Licensed
+          Rights, whether directly or through a collecting society
+          under any voluntary or waivable statutory or compulsory
+          licensing scheme. In all other cases the Licensor expressly
+          reserves any right to collect such royalties, including when
+          the Licensed Material is used other than for NonCommercial
+          purposes.
+Section 3 -- License Conditions.
+Your exercise of the Licensed Rights is expressly made subject to the
+following conditions.
+  a. Attribution.
+       1. If You Share the Licensed Material (including in modified
+          form), You must:
+            a. retain the following if it is supplied by the Licensor
+               with the Licensed Material:
+                 i. identification of the creator(s) of the Licensed
+                    Material and any others designated to receive
+                    attribution, in any reasonable manner requested by
+                    the Licensor (including by pseudonym if
+                    designated);
+                ii. a copyright notice;
+               iii. a notice that refers to this Public License;
+                iv. a notice that refers to the disclaimer of
+                    warranties;
+                 v. a URI or hyperlink to the Licensed Material to the
+                    extent reasonably practicable;
+            b. indicate if You modified the Licensed Material and
+               retain an indication of any previous modifications; and
+            c. indicate the Licensed Material is licensed under this
+               Public License, and include the text of, or the URI or
+               hyperlink to, this Public License.
+       2. You may satisfy the conditions in Section 3(a)(1) in any
+          reasonable manner based on the medium, means, and context in
+          which You Share the Licensed Material. For example, it may be
+          reasonable to satisfy the conditions by providing a URI or
+          hyperlink to a resource that includes the required
+          information.
+       3. If requested by the Licensor, You must remove any of the
+          information required by Section 3(a)(1)(A) to the extent
+          reasonably practicable.
+       4. If You Share Adapted Material You produce, the Adapter's
+          License You apply must not prevent recipients of the Adapted
+          Material from complying with this Public License.
+Section 4 -- Sui Generis Database Rights.
+Where the Licensed Rights include Sui Generis Database Rights that
+apply to Your use of the Licensed Material:
+  a. for the avoidance of doubt, Section 2(a)(1) grants You the right
+     to extract, reuse, reproduce, and Share all or a substantial
+     portion of the contents of the database for NonCommercial purposes
+     only;
+  b. if You include all or a substantial portion of the database
+     contents in a database in which You have Sui Generis Database
+     Rights, then the database in which You have Sui Generis Database
+     Rights (but not its individual contents) is Adapted Material; and
+  c. You must comply with the conditions in Section 3(a) if You Share
+     all or a substantial portion of the contents of the database.
+For the avoidance of doubt, this Section 4 supplements and does not
+replace Your obligations under this Public License where the Licensed
+Rights include other Copyright and Similar Rights.
+Section 5 -- Disclaimer of Warranties and Limitation of Liability.
+  a. UNLESS OTHERWISE SEPARATELY UNDERTAKEN BY THE LICENSOR, TO THE
+     EXTENT POSSIBLE, THE LICENSOR OFFERS THE LICENSED MATERIAL AS-IS
+     AND AS-AVAILABLE, AND MAKES NO REPRESENTATIONS OR WARRANTIES OF
+     ANY KIND CONCERNING THE LICENSED MATERIAL, WHETHER EXPRESS,
+     IMPLIED, STATUTORY, OR OTHER. THIS INCLUDES, WITHOUT LIMITATION,
+     WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR
+     PURPOSE, NON-INFRINGEMENT, ABSENCE OF LATENT OR OTHER DEFECTS,
+     ACCURACY, OR THE PRESENCE OR ABSENCE OF ERRORS, WHETHER OR NOT
+     KNOWN OR DISCOVERABLE. WHERE DISCLAIMERS OF WARRANTIES ARE NOT
+     ALLOWED IN FULL OR IN PART, THIS DISCLAIMER MAY NOT APPLY TO YOU.
+  b. TO THE EXTENT POSSIBLE, IN NO EVENT WILL THE LICENSOR BE LIABLE
+     TO YOU ON ANY LEGAL THEORY (INCLUDING, WITHOUT LIMITATION,
+     NEGLIGENCE) OR OTHERWISE FOR ANY DIRECT, SPECIAL, INDIRECT,
+     INCIDENTAL, CONSEQUENTIAL, PUNITIVE, EXEMPLARY, OR OTHER LOSSES,
+     COSTS, EXPENSES, OR DAMAGES ARISING OUT OF THIS PUBLIC LICENSE OR
+     USE OF THE LICENSED MATERIAL, EVEN IF THE LICENSOR HAS BEEN
+     ADVISED OF THE POSSIBILITY OF SUCH LOSSES, COSTS, EXPENSES, OR
+     DAMAGES. WHERE A LIMITATION OF LIABILITY IS NOT ALLOWED IN FULL OR
+     IN PART, THIS LIMITATION MAY NOT APPLY TO YOU.
+  c. The disclaimer of warranties and limitation of liability provided
+     above shall be interpreted in a manner that, to the extent
+     possible, most closely approximates an absolute disclaimer and
+     waiver of all liability.
+Section 6 -- Term and Termination.
+  a. This Public License applies for the term of the Copyright and
+     Similar Rights licensed here. However, if You fail to comply with
+     this Public License, then Your rights under this Public License
+     terminate automatically.
+  b. Where Your right to use the Licensed Material has terminated under
+     Section 6(a), it reinstates:
+       1. automatically as of the date the violation is cured, provided
+          it is cured within 30 days of Your discovery of the
+          violation; or
+       2. upon express reinstatement by the Licensor.
+     For the avoidance of doubt, this Section 6(b) does not affect any
+     right the Licensor may have to seek remedies for Your violations
+     of this Public License.
+  c. For the avoidance of doubt, the Licensor may also offer the
+     Licensed Material under separate terms or conditions or stop
+     distributing the Licensed Material at any time; however, doing so
+     will not terminate this Public License.
+  d. Sections 1, 5, 6, 7, and 8 survive termination of this Public
+     License.
+Section 7 -- Other Terms and Conditions.
+  a. The Licensor shall not be bound by any additional or different
+     terms or conditions communicated by You unless expressly agreed.
+  b. Any arrangements, understandings, or agreements regarding the
+     Licensed Material not stated herein are separate from and
+     independent of the terms and conditions of this Public License.
+Section 8 -- Interpretation.
+  a. For the avoidance of doubt, this Public License does not, and
+     shall not be interpreted to, reduce, limit, restrict, or impose
+     conditions on any use of the Licensed Material that could lawfully
+     be made without permission under this Public License.
+  b. To the extent possible, if any provision of this Public License is
+     deemed unenforceable, it shall be automatically reformed to the
+     minimum extent necessary to make it enforceable. If the provision
+     cannot be reformed, it shall be severed from this Public License
+     without affecting the enforceability of the remaining terms and
+     conditions.
+  c. No term or condition of this Public License will be waived and no
+     failure to comply consented to unless expressly agreed to by the
+     Licensor.
+  d. Nothing in this Public License constitutes or may be interpreted
+     as a limitation upon, or waiver of, any privileges and immunities
+     that apply to the Licensor or You, including from the legal
+     processes of any jurisdiction or authority.
+=======================================================================
+Creative Commons is not a party to its public
+licenses. Notwithstanding, Creative Commons may elect to apply one of
+its public licenses to material it publishes and in those instances
+will be considered the “Licensor.” The text of the Creative Commons
+public licenses is dedicated to the public domain under the CC0 Public
+Domain Dedication. Except for the limited purpose of indicating that
+material is shared under a Creative Commons public license or as
+otherwise permitted by the Creative Commons policies published at
+creativecommons.org/policies, Creative Commons does not authorize the
+use of the trademark "Creative Commons" or any other trademark or logo
+of Creative Commons without its prior written consent including,
+without limitation, in connection with any unauthorized modifications
+to any of its public licenses or any other arrangements,
+understandings, or agreements concerning use of licensed material. For
+the avoidance of doubt, this paragraph does not form part of the
+public licenses.
+Creative Commons may be contacted at creativecommons.org.

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+<h1>
+  <img width="500" alt="LLM Transparency Tool" src="https://github.com/facebookresearch/llm-transparency-tool/assets/1367529/4bbf2544-88de-4576-9622-6047a056c5c8">
+</h1>
+<img width="832" alt="screenshot" src="https://github.com/facebookresearch/llm-transparency-tool/assets/1367529/78f6f9e2-fe76-4ded-bb78-a57f64f4ac3a">
+## Key functionality
+* Choose your model, choose or add your prompt, run the inference.
+* Browse contribution graph.
+    * Select the token to build the graph from.
+    * Tune the contribution threshold.
+* Select representation of any token after any block.
+* For the representation, see its projection to the output vocabulary, see which tokens
+were promoted/suppressed but the previous block.
+* The following things are clickable:
+  * Edges. That shows more info about the contributing attention head.
+  * Heads when an edge is selected. You can see what this head is promoting/suppressing.
+  * FFN blocks (little squares on the graph).
+  * Neurons when an FFN block is selected.
+## Installation
+### Dockerized running
+```bash
+# From the repository root directory
+docker build -t llm_transparency_tool .
+docker run --rm -p 7860:7860 llm_transparency_tool
+```
+### Local Installation
+```bash
+# download
+git clone [email protected]:facebookresearch/llm-transparency-tool.git
+cd llm-transparency-tool
+# install the necessary packages
+conda env create --name llmtt -f env.yaml
+# install the `llm_transparency_tool` package
+pip install -e .
+# now, we need to build the frontend
+# don't worry, even `yarn` comes preinstalled by `env.yaml`
+cd llm_transparency_tool/components/frontend
+yarn install
+yarn build
+```
+### Launch
+```bash
+streamlit run llm_transparency_tool/server/app.py -- config/local.json
+```
+## Adding support for your LLM
+Initially, the tool allows you to select from just a handful of models. Here are the
+options you can try for using your model in the tool, from least to most
+effort.
+### The model is already supported by TransformerLens
+Full list of models is [here](https://github.com/neelnanda-io/TransformerLens/blob/0825c5eb4196e7ad72d28bcf4e615306b3897490/transformer_lens/loading_from_pretrained.py#L18).
+In this case, the model can be added to the configuration json file.
+### Tuned version of a model supported by TransformerLens
+Add the official name of the model to the config along with the location to read the
+weights from.
+### The model is not supported by TransformerLens
+In this case the UI wouldn't know how to create proper hooks for the model. You'd need
+to implement your version of [TransparentLlm](./llm_transparency_tool/models/transparent_llm.py#L28) class and alter the
+Streamlit app to use your implementation.
+## License
+This code is made available under a [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/) license, as found in the LICENSE file.
+However you may have other legal obligations that govern your use of other content, such as the terms of service for third-party models.

config/docker_hosting.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+    "allow_loading_dataset_files": false,
+    "max_user_string_length": 100,
+    "preloaded_dataset_filename": "sample_input.txt",
+    "debug": false,
+    "demo_mode": true,
+    "models": {
+        "facebook/opt-125m": null,
+        "gpt2": null,
+        "distilgpt2": null
+    },
+    "default_model": "gpt2"
+}

config/docker_local.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+    "allow_loading_dataset_files": true,
+    "preloaded_dataset_filename": "sample_input.txt",
+    "debug": true,
+    "models": {
+        "": null,
+        "facebook/opt-125m": null,
+        "facebook/opt-1.3b": null,
+        "facebook/opt-2.7b": null,
+        "facebook/opt-6.7b": null,
+        "facebook/opt-13b": null,
+        "facebook/opt-30b": null,
+        "meta-llama/Llama-2-7b-hf": null,
+        "meta-llama/Llama-2-7b-chat-hf": null,
+        "meta-llama/Llama-2-13b-hf": null,
+        "meta-llama/Llama-2-13b-chat-hf": null,
+        "gpt2": null,
+        "gpt2-medium": null,
+        "gpt2-large": null,
+        "gpt2-xl": null,
+        "distilgpt2": null
+    },
+    "default_model": "distilgpt2",
+    "demo_mode": false
+}

config/local.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+    "allow_loading_dataset_files": true,
+    "preloaded_dataset_filename": "sample_input.txt",
+    "debug": true,
+    "models": {
+        "": null,
+        "gpt2": null,
+        "distilgpt2": null,
+        "facebook/opt-125m": null,
+        "facebook/opt-1.3b": null,
+        "EleutherAI/gpt-neo-125M": null,
+        "Qwen/Qwen-1_8B": null,
+        "Qwen/Qwen1.5-0.5B": null,
+        "Qwen/Qwen1.5-0.5B-Chat": null,
+        "Qwen/Qwen1.5-1.8B": null,
+        "Qwen/Qwen1.5-1.8B-Chat": null,
+        "microsoft/phi-1": null,
+        "microsoft/phi-1_5": null,
+        "microsoft/phi-2": null,
+        "meta-llama/Llama-2-7b-hf": null,
+        "meta-llama/Llama-2-7b-chat-hf": null,
+        "meta-llama/Llama-2-13b-hf": null,
+        "meta-llama/Llama-2-13b-chat-hf": null,
+        "gpt2-medium": null,
+        "gpt2-large": null,
+        "gpt2-xl": null,
+        "mistralai/Mistral-7B-v0.1": null,
+        "mistralai/Mistral-7B-Instruct-v0.1": null,
+        "mistralai/Mistral-7B-Instruct-v0.2": null,
+        "google/gemma-7b": null,
+        "google/gemma-2b": null,
+        "facebook/opt-2.7b": null,
+        "facebook/opt-6.7b": null,
+        "facebook/opt-13b": null,
+        "facebook/opt-30b": null
+    },
+    "default_model": "",
+    "demo_mode": false
+}

env.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+name: llmtt
+channels:
+  - pytorch
+  - nvidia
+  - conda-forge
+dependencies:
+  - python
+  - pytorch
+  - pytorch-cuda=11.8
+  - nodejs
+  - yarn
+  - pip
+  - pip:
+      - datasets
+      - einops
+      - fancy_einsum
+      - jaxtyping
+      - networkx
+      - plotly
+      - pyinstrument
+      - setuptools
+      - streamlit
+      - streamlit_extras
+      - tokenizers
+      - transformer_lens
+      - transformers
+      - pytest  # fixes wrong dependencies of transformer_lens

llm_transparency_tool/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

llm_transparency_tool/components/__init__.py ADDED Viewed

	@@ -0,0 +1,111 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import os
+from typing import List, Optional
+import networkx as nx
+import streamlit.components.v1 as components
+from llm_transparency_tool.models.transparent_llm import ModelInfo
+from llm_transparency_tool.server.graph_selection import GraphSelection, UiGraphNode
+_RELEASE = True
+if _RELEASE:
+    parent_dir = os.path.dirname(os.path.abspath(__file__))
+    config = {
+        "path": os.path.join(parent_dir, "frontend/build"),
+    }
+else:
+    config = {
+        "url": "http://localhost:3001",
+    }
+_component_func = components.declare_component("contribution_graph", **config)
+def is_node_valid(node: UiGraphNode, n_layers: int, n_tokens: int):
+    return node.layer < n_layers and node.token < n_tokens
+def is_selection_valid(s: GraphSelection, n_layers: int, n_tokens: int):
+    if not s:
+        return True
+    if s.node:
+        if not is_node_valid(s.node, n_layers, n_tokens):
+            return False
+    if s.edge:
+        for node in [s.edge.source, s.edge.target]:
+            if not is_node_valid(node, n_layers, n_tokens):
+                return False
+    return True
+def contribution_graph(
+    model_info: ModelInfo,
+    tokens: List[str],
+    graphs: List[nx.Graph],
+    key: str,
+) -> Optional[GraphSelection]:
+    """Create a new instance of contribution graph.
+    Returns selected graph node or None if nothing was selected.
+    """
+    assert len(tokens) == len(graphs)
+    result = _component_func(
+        component="graph",
+        model_info=model_info.__dict__,
+        tokens=tokens,
+        edges_per_token=[nx.node_link_data(g)["links"] for g in graphs],
+        default=None,
+        key=key,
+    )
+    selection = GraphSelection.from_json(result)
+    n_tokens = len(tokens)
+    n_layers = model_info.n_layers
+    # We need this extra protection because even though the component has to check for
+    # the validity of the selection, sometimes it allows invalid output. It's some
+    # unexpected effect that has something to do with React and how the output value is
+    # set for the component.
+    if not is_selection_valid(selection, n_layers, n_tokens):
+        selection = None
+    return selection
+def selector(
+    items: List[str],
+    indices: List[int],
+    temperatures: Optional[List[float]],
+    preselected_index: Optional[int],
+    key: str,
+) -> Optional[int]:
+    """Create a new instance of selector.
+    Returns selected item index.
+    """
+    n = len(items)
+    assert n == len(indices)
+    items = [{"index": i, "text": s} for s, i in zip(items, indices)]
+    if temperatures is not None:
+        assert n == len(temperatures)
+        for i, t in enumerate(temperatures):
+            items[i]["temperature"] = t
+    result = _component_func(
+        component="selector",
+        items=items,
+        preselected_index=preselected_index,
+        default=None,
+        key=key,
+    )
+    return None if result is None else int(result)

llm_transparency_tool/components/frontend/.env ADDED Viewed

	@@ -0,0 +1,6 @@

+# Run the component's dev server on :3001
+# (The Streamlit dev server already runs on :3000)
+PORT=3001
+# Don't automatically open the web browser on `npm run start`.
+BROWSER=none

llm_transparency_tool/components/frontend/.prettierrc ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "endOfLine": "lf",
+  "semi": false,
+  "trailingComma": "es5"
+}

llm_transparency_tool/components/frontend/package.json ADDED Viewed

	@@ -0,0 +1,39 @@

+{
+  "name": "contribution_graph",
+  "version": "0.1.0",
+  "private": true,
+  "dependencies": {
+    "@types/d3": "^7.4.0",
+    "d3": "^7.8.5",
+    "react": "^18.2.0",
+    "react-dom": "^18.2.0",
+    "streamlit-component-lib": "^2.0.0"
+  },
+  "scripts": {
+    "start": "react-scripts start",
+    "build": "react-scripts build",
+    "test": "react-scripts test",
+    "eject": "react-scripts eject"
+  },
+  "browserslist": {
+    "production": [
+      ">0.2%",
+      "not dead",
+      "not op_mini all"
+    ],
+    "development": [
+      "last 1 chrome version",
+      "last 1 firefox version",
+      "last 1 safari version"
+    ]
+  },
+  "homepage": ".",
+  "devDependencies": {
+    "@types/node": "^20.11.17",
+    "@types/react": "^18.2.55",
+    "@types/react-dom": "^18.2.19",
+    "eslint-config-react-app": "^7.0.1",
+    "react-scripts": "^5.0.1",
+    "typescript": "^5.3.3"
+  }
+}

llm_transparency_tool/components/frontend/public/index.html ADDED Viewed

	@@ -0,0 +1,15 @@

+<!DOCTYPE html>
+<html lang="en">
+    <head>
+        <title>Contribution Graph for Streamlit</title>
+        <meta charset="UTF-8" />
+        <meta name="viewport" content="width=device-width, initial-scale=1" />
+        <meta name="theme-color" content="#000000" />
+        <meta name="description" content="Contribution Graph for Streamlit" />
+        <link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css" />
+    </head>
+    <body>
+        <noscript>You need to enable JavaScript to run this app.</noscript>
+        <div id="root"></div>
+    </body>
+</html>

llm_transparency_tool/components/frontend/src/ContributionGraph.tsx ADDED Viewed

	@@ -0,0 +1,517 @@

+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+import {
+    ComponentProps,
+    Streamlit,
+    withStreamlitConnection,
+} from 'streamlit-component-lib'
+import React, { useEffect, useMemo, useRef, useState } from 'react';
+import * as d3 from 'd3';
+import {
+    Label,
+    Point,
+} from './common';
+import './LlmViewer.css';
+export const renderParams = {
+    cellH: 32,
+    cellW: 32,
+    attnSize: 8,
+    afterFfnSize: 8,
+    ffnSize: 6,
+    tokenSelectorSize: 16,
+    layerCornerRadius: 6,
+}
+interface Cell {
+    layer: number
+    token: number
+}
+enum CellItem {
+    AfterAttn = 'after_attn',
+    AfterFfn = 'after_ffn',
+    Ffn = 'ffn',
+    Original = 'original',  // They will only be at level = 0
+}
+interface Node {
+    cell: Cell | null
+    item: CellItem | null
+}
+interface NodeProps {
+    node: Node
+    pos: Point
+    isActive: boolean
+}
+interface EdgeRaw {
+    weight: number
+    source: string
+    target: string
+}
+interface Edge {
+    weight: number
+    from: Node
+    to: Node
+    fromPos: Point
+    toPos: Point
+    isSelectable: boolean
+    isFfn: boolean
+}
+interface Selection {
+    node: Node | null
+    edge: Edge | null
+}
+function tokenPointerPolygon(origin: Point) {
+    const r = renderParams.tokenSelectorSize / 2
+    const dy = r / 2
+    const dx = r * Math.sqrt(3.0) / 2
+    // Draw an arrow looking down
+    return [
+        [origin.x, origin.y + r],
+        [origin.x + dx, origin.y - dy],
+        [origin.x - dx, origin.y - dy],
+    ].toString()
+}
+function isSameCell(cell1: Cell | null, cell2: Cell | null) {
+    if (cell1 == null || cell2 == null) {
+        return false
+    }
+    return cell1.layer === cell2.layer && cell1.token === cell2.token
+}
+function isSameNode(node1: Node | null, node2: Node | null) {
+    if (node1 === null || node2 === null) {
+        return false
+    }
+    return isSameCell(node1.cell, node2.cell)
+        && node1.item === node2.item;
+}
+function isSameEdge(edge1: Edge | null, edge2: Edge | null) {
+    if (edge1 === null || edge2 === null) {
+        return false
+    }
+    return isSameNode(edge1.from, edge2.from) && isSameNode(edge1.to, edge2.to);
+}
+function nodeFromString(name: string) {
+    const match = name.match(/([AIMX])(\d+)_(\d+)/)
+    if (match == null) {
+        return {
+            cell: null,
+            item: null,
+        }
+    }
+    const [, type, layerStr, tokenStr] = match
+    const layer = +layerStr
+    const token = +tokenStr
+    const typeToCellItem = new Map<string, CellItem>([
+        ['A', CellItem.AfterAttn],
+        ['I', CellItem.AfterFfn],
+        ['M', CellItem.Ffn],
+        ['X', CellItem.Original],
+    ])
+    return {
+        cell: {
+            layer: layer,
+            token: token,
+        },
+        item: typeToCellItem.get(type) ?? null,
+    }
+}
+function isValidNode(node: Node, nLayers: number, nTokens: number) {
+    if (node.cell === null) {
+        return true
+    }
+    return node.cell.layer < nLayers && node.cell.token < nTokens
+}
+function isValidSelection(selection: Selection, nLayers: number, nTokens: number) {
+    if (selection.node !== null) {
+        return isValidNode(selection.node, nLayers, nTokens)
+    }
+    if (selection.edge !== null) {
+        return isValidNode(selection.edge.from, nLayers, nTokens) &&
+            isValidNode(selection.edge.to, nLayers, nTokens)
+    }
+    return true
+}
+const ContributionGraph = ({ args }: ComponentProps) => {
+    const modelInfo = args['model_info']
+    const tokens = args['tokens']
+    const edgesRaw: EdgeRaw[][] = args['edges_per_token']
+    const nLayers = modelInfo === null ? 0 : modelInfo.n_layers
+    const nTokens = tokens === null ? 0 : tokens.length
+    const [selection, setSelection] = useState<Selection>({
+        node: null,
+        edge: null,
+    })
+    var curSelection = selection
+    if (!isValidSelection(selection, nLayers, nTokens)) {
+        curSelection = {
+            node: null,
+            edge: null,
+        }
+        setSelection(curSelection)
+        Streamlit.setComponentValue(curSelection)
+    }
+    const [startToken, setStartToken] = useState<number>(nTokens - 1)
+    // We have startToken state var, but it won't be updated till next render, so use
+    // this var in the current render.
+    var curStartToken = startToken
+    if (startToken >= nTokens) {
+        curStartToken = nTokens - 1
+        setStartToken(curStartToken)
+    }
+    const handleRepresentationClick = (node: Node) => {
+        const newSelection: Selection = {
+            node: node,
+            edge: null,
+        }
+        setSelection(newSelection)
+        Streamlit.setComponentValue(newSelection)
+    }
+    const handleEdgeClick = (edge: Edge) => {
+        if (!edge.isSelectable) {
+            return
+        }
+        const newSelection: Selection = {
+            node: edge.to,
+            edge: edge,
+        }
+        setSelection(newSelection)
+        Streamlit.setComponentValue(newSelection)
+    }
+    const handleTokenClick = (t: number) => {
+        setStartToken(t)
+    }
+    const [xScale, yScale] = useMemo(() => {
+        const x = d3.scaleLinear()
+            .domain([-2, nTokens - 1])
+            .range([0, renderParams.cellW * (nTokens + 2)])
+        const y = d3.scaleLinear()
+            .domain([-1, nLayers])
+            .range([renderParams.cellH * (nLayers + 2), 0])
+        return [x, y]
+    }, [nLayers, nTokens])
+    const cells = useMemo(() => {
+        let result: Cell[] = []
+        for (let l = 0; l < nLayers; l++) {
+            for (let t = 0; t < nTokens; t++) {
+                result.push({
+                    layer: l,
+                    token: t,
+                })
+            }
+        }
+        return result
+    }, [nLayers, nTokens])
+    const nodeCoords = useMemo(() => {
+        let result = new Map<string, Point>()
+        const w = renderParams.cellW
+        const h = renderParams.cellH
+        for (var cell of cells) {
+            const cx = xScale(cell.token + 0.5)
+            const cy = yScale(cell.layer - 0.5)
+            result.set(
+                JSON.stringify({ cell: cell, item: CellItem.AfterAttn }),
+                { x: cx, y: cy + h / 4 },
+            )
+            result.set(
+                JSON.stringify({ cell: cell, item: CellItem.AfterFfn }),
+                { x: cx, y: cy - h / 4 },
+            )
+            result.set(
+                JSON.stringify({ cell: cell, item: CellItem.Ffn }),
+                { x: cx + 5 * w / 16, y: cy },
+            )
+        }
+        for (let t = 0; t < nTokens; t++) {
+            cell = {
+                layer: 0,
+                token: t,
+            }
+            const cx = xScale(cell.token + 0.5)
+            const cy = yScale(cell.layer - 1.0)
+            result.set(
+                JSON.stringify({ cell: cell, item: CellItem.Original }),
+                { x: cx, y: cy + h / 4 },
+            )
+        }
+        return result
+    }, [cells, nTokens, xScale, yScale])
+    const edges: Edge[][] = useMemo(() => {
+        let result = []
+        for (var edgeList of edgesRaw) {
+            let edgesPerStartToken = []
+            for (var edge of edgeList) {
+                const u = nodeFromString(edge.source)
+                const v = nodeFromString(edge.target)
+                var isSelectable = (
+                    u.cell !== null && v.cell !== null && v.item === CellItem.AfterAttn
+                )
+                var isFfn = (
+                    u.cell !== null && v.cell !== null && (
+                        u.item === CellItem.Ffn || v.item === CellItem.Ffn
+                    )
+                )
+                edgesPerStartToken.push({
+                    weight: edge.weight,
+                    from: u,
+                    to: v,
+                    fromPos: nodeCoords.get(JSON.stringify(u)) ?? { 'x': 0, 'y': 0 },
+                    toPos: nodeCoords.get(JSON.stringify(v)) ?? { 'x': 0, 'y': 0 },
+                    isSelectable: isSelectable,
+                    isFfn: isFfn,
+                })
+            }
+            result.push(edgesPerStartToken)
+        }
+        return result
+    }, [edgesRaw, nodeCoords])
+    const activeNodes = useMemo(() => {
+        let result = new Set<string>()
+        for (var edge of edges[curStartToken]) {
+            const u = JSON.stringify(edge.from)
+            const v = JSON.stringify(edge.to)
+            result.add(u)
+            result.add(v)
+        }
+        return result
+    }, [edges, curStartToken])
+    const nodeProps = useMemo(() => {
+        let result: Array<NodeProps> = []
+        nodeCoords.forEach((p: Point, node: string) => {
+            result.push({
+                node: JSON.parse(node),
+                pos: p,
+                isActive: activeNodes.has(node),
+            })
+        })
+        return result
+    }, [nodeCoords, activeNodes])
+    const tokenLabels: Label[] = useMemo(() => {
+        if (!tokens) {
+            return []
+        }
+        return tokens.map((s: string, i: number) => ({
+            text: s.replace(/ /g, '·'),
+            pos: {
+                x: xScale(i + 0.5),
+                y: yScale(-1.5),
+            },
+        }))
+    }, [tokens, xScale, yScale])
+    const layerLabels: Label[] = useMemo(() => {
+        return Array.from(Array(nLayers).keys()).map(i => ({
+            text: 'L' + i,
+            pos: {
+                x: xScale(-0.25),
+                y: yScale(i - 0.5),
+            },
+        }))
+    }, [nLayers, xScale, yScale])
+    const tokenSelectors: Array<[number, Point]> = useMemo(() => {
+        return Array.from(Array(nTokens).keys()).map(i => ([
+            i,
+            {
+                x: xScale(i + 0.5),
+                y: yScale(nLayers - 0.5),
+            }
+        ]))
+    }, [nTokens, nLayers, xScale, yScale])
+    const totalW = xScale(nTokens + 2)
+    const totalH = yScale(-4)
+    useEffect(() => {
+        Streamlit.setFrameHeight(totalH)
+    }, [totalH])
+    const colorScale = d3.scaleLinear(
+        [0.0, 0.5, 1.0],
+        ['#9eba66', 'darkolivegreen', 'darkolivegreen']
+    )
+    const ffnEdgeColorScale = d3.scaleLinear(
+        [0.0, 0.5, 1.0],
+        ['orchid', 'purple', 'purple']
+    )
+    const edgeWidthScale = d3.scaleLinear([0.0, 0.5, 1.0], [2.0, 3.0, 3.0])
+    const svgRef = useRef(null);
+    useEffect(() => {
+        const getNodeStyle = (p: NodeProps, type: string) => {
+            if (isSameNode(p.node, curSelection.node)) {
+                return 'selectable-item selection'
+            }
+            if (p.isActive) {
+                return 'selectable-item active-' + type + '-node'
+            }
+            return 'selectable-item inactive-node'
+        }
+        const svg = d3.select(svgRef.current)
+        svg.selectAll('*').remove()
+        svg
+            .selectAll('layers')
+            .data(Array.from(Array(nLayers).keys()).filter((x) => x % 2 === 1))
+            .enter()
+            .append('rect')
+            .attr('class', 'layer-highlight')
+            .attr('x', xScale(-1.0))
+            .attr('y', (layer) => yScale(layer))
+            .attr('width', xScale(nTokens + 0.25) - xScale(-1.0))
+            .attr('height', (layer) => yScale(layer) - yScale(layer + 1))
+            .attr('rx', renderParams.layerCornerRadius)
+        svg
+            .selectAll('edges')
+            .data(edges[curStartToken])
+            .enter()
+            .append('line')
+            .style('stroke', (edge: Edge) => {
+                if (isSameEdge(edge, curSelection.edge)) {
+                    return 'orange'
+                }
+                if (edge.isFfn) {
+                    return ffnEdgeColorScale(edge.weight)
+                }
+                return colorScale(edge.weight)
+            })
+            .attr('class', (edge: Edge) => edge.isSelectable ? 'selectable-edge' : '')
+            .style('stroke-width', (edge: Edge) => edgeWidthScale(edge.weight))
+            .attr('x1', (edge: Edge) => edge.fromPos.x)
+            .attr('y1', (edge: Edge) => edge.fromPos.y)
+            .attr('x2', (edge: Edge) => edge.toPos.x)
+            .attr('y2', (edge: Edge) => edge.toPos.y)
+            .on('click', (event: PointerEvent, edge) => {
+                handleEdgeClick(edge)
+            })
+        svg
+            .selectAll('residual')
+            .data(nodeProps)
+            .enter()
+            .filter((p) => {
+                return p.node.item === CellItem.AfterAttn
+                    || p.node.item === CellItem.AfterFfn
+            })
+            .append('circle')
+            .attr('class', (p) => getNodeStyle(p, 'residual'))
+            .attr('cx', (p) => p.pos.x)
+            .attr('cy', (p) => p.pos.y)
+            .attr('r', renderParams.attnSize / 2)
+            .on('click', (event: PointerEvent, p) => {
+                handleRepresentationClick(p.node)
+            })
+        svg
+            .selectAll('ffn')
+            .data(nodeProps)
+            .enter()
+            .filter((p) => p.node.item === CellItem.Ffn && p.isActive)
+            .append('rect')
+            .attr('class', (p) => getNodeStyle(p, 'ffn'))
+            .attr('x', (p) => p.pos.x - renderParams.ffnSize / 2)
+            .attr('y', (p) => p.pos.y - renderParams.ffnSize / 2)
+            .attr('width', renderParams.ffnSize)
+            .attr('height', renderParams.ffnSize)
+            .on('click', (event: PointerEvent, p) => {
+                handleRepresentationClick(p.node)
+            })
+        svg
+            .selectAll('token_labels')
+            .data(tokenLabels)
+            .enter()
+            .append('text')
+            .attr('x', (label: Label) => label.pos.x)
+            .attr('y', (label: Label) => label.pos.y)
+            .attr('text-anchor', 'end')
+            .attr('dominant-baseline', 'middle')
+            .attr('alignment-baseline', 'top')
+            .attr('transform', (label: Label) =>
+                'rotate(-40, ' + label.pos.x + ', ' + label.pos.y + ')')
+            .text((label: Label) => label.text)
+        svg
+            .selectAll('layer_labels')
+            .data(layerLabels)
+            .enter()
+            .append('text')
+            .attr('x', (label: Label) => label.pos.x)
+            .attr('y', (label: Label) => label.pos.y)
+            .attr('text-anchor', 'middle')
+            .attr('alignment-baseline', 'middle')
+            .text((label: Label) => label.text)
+        svg
+            .selectAll('token_selectors')
+            .data(tokenSelectors)
+            .enter()
+            .append('polygon')
+            .attr('class', ([i,]) => (
+                curStartToken === i
+                    ? 'selectable-item selection'
+                    : 'selectable-item token-selector'
+            ))
+            .attr('points', ([, p]) => tokenPointerPolygon(p))
+            .attr('r', renderParams.tokenSelectorSize / 2)
+            .on('click', (event: PointerEvent, [i,]) => {
+                handleTokenClick(i)
+            })
+    }, [
+        cells,
+        edges,
+        nodeProps,
+        tokenLabels,
+        layerLabels,
+        tokenSelectors,
+        curStartToken,
+        curSelection,
+        colorScale,
+        ffnEdgeColorScale,
+        edgeWidthScale,
+        nLayers,
+        nTokens,
+        xScale,
+        yScale
+    ])
+    return <svg ref={svgRef} width={totalW} height={totalH}></svg>
+}
+export default withStreamlitConnection(ContributionGraph)

llm_transparency_tool/components/frontend/src/LlmViewer.css ADDED Viewed

	@@ -0,0 +1,77 @@

+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+.graph-container {
+    display: flex;
+    justify-content: center;
+    align-items: center;
+    height: 100vh;
+}
+.svg {
+    border: 1px solid #ccc;
+}
+.layer-highlight {
+    fill: #f0f5f0;
+}
+.selectable-item {
+    stroke: black;
+    cursor: pointer;
+}
+.selection,
+.selection:hover {
+    fill: orange;
+}
+.active-residual-node {
+    fill: yellowgreen;
+}
+.active-residual-node:hover {
+    fill: olivedrab;
+}
+.active-ffn-node {
+    fill: orchid;
+}
+.active-ffn-node:hover {
+    fill: purple;
+}
+.inactive-node {
+    fill: lightgray;
+    stroke-width: 0.5px;
+}
+.inactive-node:hover {
+    fill: gray;
+}
+.selectable-edge {
+    cursor: pointer;
+}
+.token-selector {
+    fill: lightblue;
+}
+.token-selector:hover {
+    fill: cornflowerblue;
+}
+.selector-item {
+    fill: lightblue;
+}
+.selector-item:hover {
+    fill: cornflowerblue;
+}

llm_transparency_tool/components/frontend/src/Selector.tsx ADDED Viewed

	@@ -0,0 +1,154 @@

+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+import {
+    ComponentProps,
+    Streamlit,
+    withStreamlitConnection,
+} from "streamlit-component-lib"
+import React, { useEffect, useMemo, useRef, useState } from 'react';
+import * as d3 from 'd3';
+import {
+    Point,
+} from './common';
+import './LlmViewer.css';
+export const renderParams = {
+    verticalGap: 24,
+    horizontalGap: 24,
+    itemSize: 8,
+}
+interface Item {
+    index: number
+    text: string
+    temperature: number
+}
+const Selector = ({ args }: ComponentProps) => {
+    const items: Item[] = args["items"]
+    const preselected_index: number | null = args["preselected_index"]
+    const n = items.length
+    const [selection, setSelection] = useState<number | null>(null)
+    // Ensure the preselected element has effect only when it's a new data.
+    var args_json = JSON.stringify(args)
+    useEffect(() => {
+        setSelection(preselected_index)
+        Streamlit.setComponentValue(preselected_index)
+    }, [args_json, preselected_index]);
+    const handleItemClick = (index: number) => {
+        setSelection(index)
+        Streamlit.setComponentValue(index)
+    }
+    const [xScale, yScale] = useMemo(() => {
+        const x = d3.scaleLinear()
+            .domain([0, 1])
+            .range([0, renderParams.horizontalGap])
+        const y = d3.scaleLinear()
+            .domain([0, n - 1])
+            .range([0, renderParams.verticalGap * (n - 1)])
+        return [x, y]
+    }, [n])
+    const itemCoords: Point[] = useMemo(() => {
+        return Array.from(Array(n).keys()).map(i => ({
+            x: xScale(0.5),
+            y: yScale(i + 0.5),
+        }))
+    }, [n, xScale, yScale])
+    var hasTemperature = false
+    if (n > 0) {
+        var t = items[0].temperature
+        hasTemperature = (t !== null && t !== undefined)
+    }
+    const colorScale = useMemo(() => {
+        var min_t = 0.0
+        var max_t = 1.0
+        if (hasTemperature) {
+            min_t = items[0].temperature
+            max_t = items[0].temperature
+            for (var i = 0; i < n; i++) {
+                const t = items[i].temperature
+                min_t = Math.min(min_t, t)
+                max_t = Math.max(max_t, t)
+            }
+        }
+        const norm = d3.scaleLinear([min_t, max_t], [0.0, 1.0])
+        const colorScale = d3.scaleSequential(d3.interpolateYlGn);
+        return d3.scaleSequential(value => colorScale(norm(value)))
+    }, [items, hasTemperature, n])
+    const totalW = 100
+    const totalH = yScale(n)
+    useEffect(() => {
+        Streamlit.setFrameHeight(totalH)
+    }, [totalH])
+    const svgRef = useRef(null);
+    useEffect(() => {
+        const svg = d3.select(svgRef.current)
+        svg.selectAll('*').remove()
+        const getItemClass = (index: number) => {
+            var style = 'selectable-item '
+            style += index === selection ? 'selection' : 'selector-item'
+            return style
+        }
+        const getItemColor = (item: Item) => {
+            var t = item.temperature ?? 0.0
+            return item.index === selection ? 'orange' : colorScale(t)
+        }
+        var icons = svg
+            .selectAll('items')
+            .data(Array.from(Array(n).keys()))
+            .enter()
+            .append('circle')
+            .attr('cx', (i) => itemCoords[i].x)
+            .attr('cy', (i) => itemCoords[i].y)
+            .attr('r', renderParams.itemSize / 2)
+            .on('click', (event: PointerEvent, i) => {
+                handleItemClick(items[i].index)
+            })
+            .attr('class', (i) => getItemClass(items[i].index))
+        if (hasTemperature) {
+            icons.style('fill', (i) => getItemColor(items[i]))
+        }
+        svg
+            .selectAll('labels')
+            .data(Array.from(Array(n).keys()))
+            .enter()
+            .append('text')
+            .attr('x', (i) => itemCoords[i].x + renderParams.horizontalGap / 2)
+            .attr('y', (i) => itemCoords[i].y)
+            .attr('text-anchor', 'left')
+            .attr('alignment-baseline', 'middle')
+            .text((i) => items[i].text)
+    }, [
+        items,
+        n,
+        itemCoords,
+        selection,
+        colorScale,
+        hasTemperature,
+    ])
+    return <svg ref={svgRef} width={totalW} height={totalH}></svg>
+}
+export default withStreamlitConnection(Selector)

llm_transparency_tool/components/frontend/src/common.tsx ADDED Viewed

	@@ -0,0 +1,17 @@

+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+export interface Point {
+    x: number
+    y: number
+}
+export interface Label {
+    text: string
+    pos: Point
+}

llm_transparency_tool/components/frontend/src/index.tsx ADDED Viewed

	@@ -0,0 +1,39 @@

+/**
+ * Copyright (c) Meta Platforms, Inc. and affiliates.
+ * All rights reserved.
+ *
+ * This source code is licensed under the license found in the
+ * LICENSE file in the root directory of this source tree.
+ */
+import React from "react"
+import ReactDOM from "react-dom"
+import {
+  ComponentProps,
+  withStreamlitConnection,
+} from "streamlit-component-lib"
+import ContributionGraph from "./ContributionGraph"
+import Selector from "./Selector"
+const LlmViewerComponent = (props: ComponentProps) => {
+  switch (props.args['component']) {
+    case 'graph':
+      return <ContributionGraph />
+    case 'selector':
+      return <Selector />
+    default:
+      return <></>
+  }
+};
+const StreamlitLlmViewerComponent = withStreamlitConnection(LlmViewerComponent)
+ReactDOM.render(
+  <React.StrictMode>
+    <StreamlitLlmViewerComponent />
+  </React.StrictMode>,
+  document.getElementById("root")
+)

llm_transparency_tool/components/frontend/src/react-app-env.d.ts ADDED Viewed

	@@ -0,0 +1 @@


1	+ /// <reference types="react-scripts" />

llm_transparency_tool/components/frontend/tsconfig.json ADDED Viewed

	@@ -0,0 +1,19 @@

+{
+  "compilerOptions": {
+    "target": "es5",
+    "lib": ["dom", "dom.iterable", "esnext"],
+    "allowJs": true,
+    "skipLibCheck": true,
+    "esModuleInterop": true,
+    "allowSyntheticDefaultImports": true,
+    "strict": true,
+    "forceConsistentCasingInFileNames": true,
+    "module": "esnext",
+    "moduleResolution": "node",
+    "resolveJsonModule": true,
+    "isolatedModules": true,
+    "noEmit": true,
+    "jsx": "react"
+  },
+  "include": ["src"]
+}

llm_transparency_tool/models/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

llm_transparency_tool/models/test_tlens_model.py ADDED Viewed

	@@ -0,0 +1,162 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import unittest
+import torch
+from llm_transparency_tool.models.tlens_model import TransformerLensTransparentLlm
+from llm_transparency_tool.models.transparent_llm import ModelInfo
+class TransparentLlmTestCase(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        # Picking the smallest model possible so that the test runs faster. It's ok to
+        # change this model, but you'll need to update tokenization specifics in some
+        # tests.
+        cls._llm = TransformerLensTransparentLlm(
+            model_name="facebook/opt-125m",
+            device="cpu",
+        )
+    def setUp(self):
+        self._llm.run(["test", "test 1"])
+        self._eps = 1e-5
+    def test_model_info(self):
+        info = self._llm.model_info()
+        self.assertEqual(
+            info,
+            ModelInfo(
+                name="facebook/opt-125m",
+                n_params_estimate=84934656,
+                n_layers=12,
+                n_heads=12,
+                d_model=768,
+                d_vocab=50272,
+            ),
+        )
+    def test_tokens(self):
+        tokens = self._llm.tokens()
+        pad = 1
+        bos = 2
+        test = 21959
+        one = 112
+        self.assertEqual(tokens.tolist(), [[bos, test, pad], [bos, test, one]])
+    def test_tokens_to_strings(self):
+        s = self._llm.tokens_to_strings(torch.Tensor([2, 21959, 112]).to(torch.int))
+        self.assertEqual(s, ["</s>", "test", " 1"])
+    def test_manage_state(self):
+        # One llm.run was called at the setup. Call one more and make sure the object
+        # returns values for the new state.
+        self._llm.run(["one", "two", "three", "four"])
+        self.assertEqual(self._llm.tokens().shape[0], 4)
+    def test_residual_in_and_out(self):
+        """
+        Test that residual_in is a residual_out for the previous layer.
+        """
+        for layer in range(1, 12):
+            prev_residual_out = self._llm.residual_out(layer - 1)
+            residual_in = self._llm.residual_in(layer)
+            diff = torch.max(torch.abs(residual_in - prev_residual_out)).item()
+            self.assertLess(diff, self._eps, f"layer {layer}")
+    def test_residual_plus_block(self):
+        """
+        Make sure that new residual = old residual + block output. Here, block is an ffn
+        or attention. It's not that obvious because it could be that layer norm is
+        applied after the block output, but before saving the result to residual.
+        Luckily, this is not the case in TransformerLens, and we're relying on that.
+        """
+        layer = 3
+        batch = 0
+        pos = 0
+        residual_in = self._llm.residual_in(layer)[batch][pos]
+        residual_mid = self._llm.residual_after_attn(layer)[batch][pos]
+        residual_out = self._llm.residual_out(layer)[batch][pos]
+        ffn_out = self._llm.ffn_out(layer)[batch][pos]
+        attn_out = self._llm.attention_output(batch, layer, pos)
+        a = residual_mid
+        b = residual_in + attn_out
+        diff = torch.max(torch.abs(a - b)).item()
+        self.assertLess(diff, self._eps, "attn")
+        a = residual_out
+        b = residual_mid + ffn_out
+        diff = torch.max(torch.abs(a - b)).item()
+        self.assertLess(diff, self._eps, "ffn")
+    def test_tensor_shapes(self):
+        # Not much we can do about the tensors, but at least check their shapes and
+        # that they don't contain NaNs.
+        vocab_size = 50272
+        n_batch = 2
+        n_tokens = 3
+        d_model = 768
+        d_hidden = d_model * 4
+        n_heads = 12
+        layer = 5
+        device = self._llm.residual_in(0).device
+        for name, tensor, expected_shape in [
+            ("r_in", self._llm.residual_in(layer), [n_batch, n_tokens, d_model]),
+            (
+                "r_mid",
+                self._llm.residual_after_attn(layer),
+                [n_batch, n_tokens, d_model],
+            ),
+            ("r_out", self._llm.residual_out(layer), [n_batch, n_tokens, d_model]),
+            ("logits", self._llm.logits(), [n_batch, n_tokens, vocab_size]),
+            ("ffn_out", self._llm.ffn_out(layer), [n_batch, n_tokens, d_model]),
+            (
+                "decomposed_ffn_out",
+                self._llm.decomposed_ffn_out(0, 0, 0),
+                [d_hidden, d_model],
+            ),
+            ("neuron_activations", self._llm.neuron_activations(0, 0, 0), [d_hidden]),
+            ("neuron_output", self._llm.neuron_output(0, 0), [d_model]),
+            (
+                "attention_matrix",
+                self._llm.attention_matrix(0, 0, 0),
+                [n_tokens, n_tokens],
+            ),
+            (
+                "attention_output_per_head",
+                self._llm.attention_output_per_head(0, 0, 0, 0),
+                [d_model],
+            ),
+            (
+                "attention_output",
+                self._llm.attention_output(0, 0, 0),
+                [d_model],
+            ),
+            (
+                "decomposed_attn",
+                self._llm.decomposed_attn(0, layer),
+                [n_tokens, n_tokens, n_heads, d_model],
+            ),
+            (
+                "unembed",
+                self._llm.unembed(torch.zeros([d_model]).to(device), normalize=True),
+                [vocab_size],
+            ),
+        ]:
+            self.assertEqual(list(tensor.shape), expected_shape, name)
+            self.assertFalse(torch.any(tensor.isnan()), name)
+if __name__ == "__main__":
+    unittest.main()

llm_transparency_tool/models/tlens_model.py ADDED Viewed

	@@ -0,0 +1,303 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from dataclasses import dataclass
+from typing import List, Optional
+import torch
+import transformer_lens
+import transformers
+from fancy_einsum import einsum
+from jaxtyping import Float, Int
+from typeguard import typechecked
+import streamlit as st
+from llm_transparency_tool.models.transparent_llm import ModelInfo, TransparentLlm
+@dataclass
+class _RunInfo:
+    tokens: Int[torch.Tensor, "batch pos"]
+    logits: Float[torch.Tensor, "batch pos d_vocab"]
+    cache: transformer_lens.ActivationCache
+@st.cache_resource(
+    max_entries=1,
+    show_spinner=True,
+    hash_funcs={
+        transformers.PreTrainedModel: id,
+        transformers.PreTrainedTokenizer: id
+    }
+)
+def load_hooked_transformer(
+    model_name: str,
+    hf_model: Optional[transformers.PreTrainedModel] = None,
+    tlens_device: str = "cuda",
+    dtype: torch.dtype = torch.float32,
+):
+    # if tlens_device == "cuda":
+    #     n_devices = torch.cuda.device_count()
+    # else:
+    #     n_devices = 1
+    tlens_model = transformer_lens.HookedTransformer.from_pretrained(
+        model_name,
+        hf_model=hf_model,
+        fold_ln=False,  # Keep layer norm where it is.
+        center_writing_weights=False,
+        center_unembed=False,
+        device=tlens_device,
+        # n_devices=n_devices,
+        dtype=dtype,
+    )
+    tlens_model.eval()
+    return tlens_model
+# TODO(igortufanov): If we want to scale the app to multiple users, we need more careful
+# thread-safe implementation. The simplest option could be to wrap the existing methods
+# in mutexes.
+class TransformerLensTransparentLlm(TransparentLlm):
+    """
+    Implementation of Transparent LLM based on transformer lens.
+    Args:
+    - model_name: The official name of the model from HuggingFace. Even if the model was
+        patched or loaded locally, the name should still be official because that's how
+        transformer_lens treats the model.
+    - hf_model: The language model as a HuggingFace class.
+    - tokenizer,
+    - device: "gpu" or "cpu"
+    """
+    def __init__(
+        self,
+        model_name: str,
+        hf_model: Optional[transformers.PreTrainedModel] = None,
+        tokenizer: Optional[transformers.PreTrainedTokenizer] = None,
+        device: str = "gpu",
+        dtype: torch.dtype = torch.float32,
+    ):
+        if device == "gpu":
+            self.device = "cuda"
+            if not torch.cuda.is_available():
+                RuntimeError("Asked to run on gpu, but torch couldn't find cuda")
+        elif device == "cpu":
+            self.device = "cpu"
+        else:
+            raise RuntimeError(f"Specified device {device} is not a valid option")
+        self.dtype = dtype
+        self.hf_tokenizer = tokenizer
+        self.hf_model = hf_model
+        # self._model = tlens_model
+        self._model_name = model_name
+        self._prepend_bos = True
+        self._last_run = None
+        self._run_exception = RuntimeError(
+            "Tried to use the model output before calling the `run` method"
+        )
+    def copy(self):
+        import copy
+        return copy.copy(self)
+    @property
+    def _model(self):
+        tlens_model = load_hooked_transformer(
+            self._model_name,
+            hf_model=self.hf_model,
+            tlens_device=self.device,
+            dtype=self.dtype,
+        )
+        if self.hf_tokenizer is not None:
+            tlens_model.set_tokenizer(self.hf_tokenizer, default_padding_side="left")
+        tlens_model.set_use_attn_result(True)
+        tlens_model.set_use_attn_in(False)
+        tlens_model.set_use_split_qkv_input(False)
+        return tlens_model
+    def model_info(self) -> ModelInfo:
+        cfg = self._model.cfg
+        return ModelInfo(
+            name=self._model_name,
+            n_params_estimate=cfg.n_params,
+            n_layers=cfg.n_layers,
+            n_heads=cfg.n_heads,
+            d_model=cfg.d_model,
+            d_vocab=cfg.d_vocab,
+        )
+    @torch.no_grad()
+    def run(self, sentences: List[str]) -> None:
+        tokens = self._model.to_tokens(sentences, prepend_bos=self._prepend_bos)
+        logits, cache = self._model.run_with_cache(tokens)
+        self._last_run = _RunInfo(
+            tokens=tokens,
+            logits=logits,
+            cache=cache,
+        )
+    def batch_size(self) -> int:
+        if not self._last_run:
+            raise self._run_exception
+        return self._last_run.logits.shape[0]
+    @typechecked
+    def tokens(self) -> Int[torch.Tensor, "batch pos"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._last_run.tokens
+    @typechecked
+    def tokens_to_strings(self, tokens: Int[torch.Tensor, "pos"]) -> List[str]:
+        return self._model.to_str_tokens(tokens)
+    @typechecked
+    def logits(self) -> Float[torch.Tensor, "batch pos d_vocab"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._last_run.logits
+    @torch.no_grad()
+    @typechecked
+    def unembed(
+        self,
+        t: Float[torch.Tensor, "d_model"],
+        normalize: bool,
+    ) -> Float[torch.Tensor, "vocab"]:
+        # t: [d_model] -> [batch, pos, d_model]
+        tdim = t.unsqueeze(0).unsqueeze(0)
+        if normalize:
+            normalized = self._model.ln_final(tdim)
+            result = self._model.unembed(normalized)
+        else:
+            result = self._model.unembed(tdim)
+        return result[0][0]
+    def _get_block(self, layer: int, block_name: str) -> str:
+        if not self._last_run:
+            raise self._run_exception
+        return self._last_run.cache[f"blocks.{layer}.{block_name}"]
+    # ================= Methods related to the residual stream =================
+    @typechecked
+    def residual_in(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._get_block(layer, "hook_resid_pre")
+    @typechecked
+    def residual_after_attn(
+        self, layer: int
+    ) -> Float[torch.Tensor, "batch pos d_model"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._get_block(layer, "hook_resid_mid")
+    @typechecked
+    def residual_out(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._get_block(layer, "hook_resid_post")
+    # ================ Methods related to the feed-forward layer ===============
+    @typechecked
+    def ffn_out(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        if not self._last_run:
+            raise self._run_exception
+        return self._get_block(layer, "hook_mlp_out")
+    @torch.no_grad()
+    @typechecked
+    def decomposed_ffn_out(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+    ) -> Float[torch.Tensor, "hidden d_model"]:
+        # Take activations right before they're multiplied by W_out, i.e. non-linearity
+        # and layer norm are already applied.
+        processed_activations = self._get_block(layer, "mlp.hook_post")[batch_i][pos]
+        return torch.mul(processed_activations.unsqueeze(-1), self._model.W_out[layer])
+    @typechecked
+    def neuron_activations(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+    ) -> Float[torch.Tensor, "hidden"]:
+        return self._get_block(layer, "mlp.hook_pre")[batch_i][pos]
+    @typechecked
+    def neuron_output(
+        self,
+        layer: int,
+        neuron: int,
+    ) -> Float[torch.Tensor, "d_model"]:
+        return self._model.W_out[layer][neuron]
+    # ==================== Methods related to the attention ====================
+    @typechecked
+    def attention_matrix(
+        self, batch_i: int, layer: int, head: int
+    ) -> Float[torch.Tensor, "query_pos key_pos"]:
+        return self._get_block(layer, "attn.hook_pattern")[batch_i][head]
+    @typechecked
+    def attention_output_per_head(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+        head: int,
+    ) -> Float[torch.Tensor, "d_model"]:
+        return self._get_block(layer, "attn.hook_result")[batch_i][pos][head]
+    @typechecked
+    def attention_output(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+    ) -> Float[torch.Tensor, "d_model"]:
+        return self._get_block(layer, "hook_attn_out")[batch_i][pos]
+    @torch.no_grad()
+    @typechecked
+    def decomposed_attn(
+        self, batch_i: int, layer: int
+    ) -> Float[torch.Tensor, "pos key_pos head d_model"]:
+        if not self._last_run:
+            raise self._run_exception
+        hook_v = self._get_block(layer, "attn.hook_v")[batch_i]
+        b_v = self._model.b_V[layer]
+        v = hook_v + b_v
+        pattern = self._get_block(layer, "attn.hook_pattern")[batch_i].to(v.dtype)
+        z = einsum(
+            "key_pos head d_head, "
+            "head query_pos key_pos -> "
+            "query_pos key_pos head d_head",
+            v,
+            pattern,
+        )
+        decomposed_attn = einsum(
+            "pos key_pos head d_head, "
+            "head d_head d_model -> "
+            "pos key_pos head d_model",
+            z,
+            self._model.W_O[layer],
+        )
+        return decomposed_attn

llm_transparency_tool/models/transparent_llm.py ADDED Viewed

	@@ -0,0 +1,199 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from abc import ABC, abstractmethod
+from dataclasses import dataclass
+from typing import List
+import torch
+from jaxtyping import Float, Int
+@dataclass
+class ModelInfo:
+    name: str
+    # Not the actual number of parameters, but rather the order of magnitude
+    n_params_estimate: int
+    n_layers: int
+    n_heads: int
+    d_model: int
+    d_vocab: int
+class TransparentLlm(ABC):
+    """
+    An abstract stateful interface for a language model. The model is supposed to be
+    loaded at the class initialization.
+    The internal state is the resulting tensors from the last call of the `run` method.
+    Most of the methods could return values based on the state, but some may do cheap
+    computations based on them.
+    """
+    @abstractmethod
+    def model_info(self) -> ModelInfo:
+        """
+        Gives general info about the model. This method must be available before any
+        calls of the `run`.
+        """
+        pass
+    @abstractmethod
+    def run(self, sentences: List[str]) -> None:
+        """
+        Run the inference on the given sentences in a single batch and store all
+        necessary info in the internal state.
+        """
+        pass
+    @abstractmethod
+    def batch_size(self) -> int:
+        """
+        The size of the batch that was used for the last call of `run`.
+        """
+        pass
+    @abstractmethod
+    def tokens(self) -> Int[torch.Tensor, "batch pos"]:
+        pass
+    @abstractmethod
+    def tokens_to_strings(self, tokens: Int[torch.Tensor, "pos"]) -> List[str]:
+        pass
+    @abstractmethod
+    def logits(self) -> Float[torch.Tensor, "batch pos d_vocab"]:
+        pass
+    @abstractmethod
+    def unembed(
+        self,
+        t: Float[torch.Tensor, "d_model"],
+        normalize: bool,
+    ) -> Float[torch.Tensor, "vocab"]:
+        """
+        Project the given vector (for example, the state of the residual stream for a
+        layer and token) into the output vocabulary.
+        normalize: whether to apply the final normalization before the unembedding.
+        Setting it to True and applying to output of the last layer gives the output of
+        the model.
+        """
+        pass
+    # ================= Methods related to the residual stream =================
+    @abstractmethod
+    def residual_in(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        """
+        The state of the residual stream before entering the layer. For example, when
+        layer == 0 these must the embedded tokens (including positional embedding).
+        """
+        pass
+    @abstractmethod
+    def residual_after_attn(
+        self, layer: int
+    ) -> Float[torch.Tensor, "batch pos d_model"]:
+        """
+        The state of the residual stream after attention, but before the FFN in the
+        given layer.
+        """
+        pass
+    @abstractmethod
+    def residual_out(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        """
+        The state of the residual stream after the given layer. This is equivalent to the
+        next layer's input.
+        """
+        pass
+    # ================ Methods related to the feed-forward layer ===============
+    @abstractmethod
+    def ffn_out(self, layer: int) -> Float[torch.Tensor, "batch pos d_model"]:
+        """
+        The output of the FFN layer, before it gets merged into the residual stream.
+        """
+        pass
+    @abstractmethod
+    def decomposed_ffn_out(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+    ) -> Float[torch.Tensor, "hidden d_model"]:
+        """
+        A collection of vectors added to the residual stream by each neuron. It should
+        be the same as neuron activations multiplied by neuron outputs.
+        """
+        pass
+    @abstractmethod
+    def neuron_activations(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+    ) -> Float[torch.Tensor, "d_ffn"]:
+        """
+        The content of the hidden layer right after the activation function was applied.
+        """
+        pass
+    @abstractmethod
+    def neuron_output(
+        self,
+        layer: int,
+        neuron: int,
+    ) -> Float[torch.Tensor, "d_model"]:
+        """
+        Return the value that the given neuron adds to the residual stream. It's a raw
+        vector from the model parameters, no activation involved.
+        """
+        pass
+    # ==================== Methods related to the attention ====================
+    @abstractmethod
+    def attention_matrix(
+        self, batch_i, layer: int, head: int
+    ) -> Float[torch.Tensor, "query_pos key_pos"]:
+        """
+        Return a lower-diagonal attention matrix.
+        """
+        pass
+    @abstractmethod
+    def attention_output(
+        self,
+        batch_i: int,
+        layer: int,
+        pos: int,
+        head: int,
+    ) -> Float[torch.Tensor, "d_model"]:
+        """
+        Return what the given head at the given layer and pos added to the residual
+        stream.
+        """
+        pass
+    @abstractmethod
+    def decomposed_attn(
+        self, batch_i: int, layer: int
+    ) -> Float[torch.Tensor, "source target head d_model"]:
+        """
+        Here
+        - source: index of token from the previous layer
+        - target: index of token on the current layer
+        The decomposed attention tells what vector from source representation was used
+        in order to contribute to the taget representation.
+        """
+        pass

llm_transparency_tool/routes/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.

llm_transparency_tool/routes/contributions.py ADDED Viewed

	@@ -0,0 +1,201 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from typing import Tuple
+import einops
+import torch
+from jaxtyping import Float
+from typeguard import typechecked
+@torch.no_grad()
+@typechecked
+def get_contributions(
+    parts: torch.Tensor,
+    whole: torch.Tensor,
+    distance_norm: int = 1,
+) -> torch.Tensor:
+    """
+    Compute contributions of the `parts` vectors into the `whole` vector.
+    Shapes of the tensors are as follows:
+    parts:  p_1 ... p_k, v_1 ... v_n, d
+    whole:               v_1 ... v_n, d
+    result: p_1 ... p_k, v_1 ... v_n
+    Here
+    * `p_1 ... p_k`: dimensions for enumerating the parts
+    * `v_1 ... v_n`: dimensions listing the independent cases (batching),
+    * `d` is the dimension to compute the distances on.
+    The resulting contributions will be normalized so that
+    for each v_: sum(over p_ of result(p_, v_)) = 1.
+    """
+    EPS = 1e-5
+    k = len(parts.shape) - len(whole.shape)
+    assert k >= 0
+    assert parts.shape[k:] == whole.shape
+    bc_whole = whole.expand(parts.shape)  # new dims p_1 ... p_k are added to the front
+    distance = torch.nn.functional.pairwise_distance(parts, bc_whole, p=distance_norm)
+    whole_norm = torch.norm(whole, p=distance_norm, dim=-1)
+    distance = (whole_norm - distance).clip(min=EPS)
+    sum = distance.sum(dim=tuple(range(k)), keepdim=True)
+    return distance / sum
+@torch.no_grad()
+@typechecked
+def get_contributions_with_one_off_part(
+    parts: torch.Tensor,
+    one_off: torch.Tensor,
+    whole: torch.Tensor,
+    distance_norm: int = 1,
+) -> Tuple[torch.Tensor, torch.Tensor]:
+    """
+    Same as computing the contributions, but there is one additional part. That's useful
+    because we always have the residual stream as one of the parts.
+    See `get_contributions` documentation about `parts` and `whole` dimensions. The
+    `one_off` should have the same dimensions as `whole`.
+    Returns a pair consisting of
+    1. contributions tensor for the `parts`
+    2. contributions tensor for the `one_off` vector
+    """
+    assert one_off.shape == whole.shape
+    k = len(parts.shape) - len(whole.shape)
+    assert k >= 0
+    # Flatten the p_ dimensions, get contributions for the list, unflatten.
+    flat = parts.flatten(start_dim=0, end_dim=k - 1)
+    flat = torch.cat([flat, one_off.unsqueeze(0)])
+    contributions = get_contributions(flat, whole, distance_norm)
+    parts_contributions, one_off_contributions = torch.split(
+        contributions, flat.shape[0] - 1
+    )
+    return (
+        parts_contributions.unflatten(0, parts.shape[0:k]),
+        one_off_contributions[0],
+    )
+@torch.no_grad()
+@typechecked
+def get_attention_contributions(
+    resid_pre: Float[torch.Tensor, "batch pos d_model"],
+    resid_mid: Float[torch.Tensor, "batch pos d_model"],
+    decomposed_attn: Float[torch.Tensor, "batch pos key_pos head d_model"],
+    distance_norm: int = 1,
+) -> Tuple[
+    Float[torch.Tensor, "batch pos key_pos head"],
+    Float[torch.Tensor, "batch pos"],
+]:
+    """
+    Returns a pair of
+    - a tensor of contributions of each token via each head
+    - the contribution of the residual stream.
+    """
+    # part dimensions | batch dimensions | vector dimension
+    # ----------------+------------------+-----------------
+    # key_pos, head   | batch, pos       | d_model
+    parts = einops.rearrange(
+        decomposed_attn,
+        "batch pos key_pos head d_model -> key_pos head batch pos d_model",
+    )
+    attn_contribution, residual_contribution = get_contributions_with_one_off_part(
+        parts, resid_pre, resid_mid, distance_norm
+    )
+    return (
+        einops.rearrange(
+            attn_contribution, "key_pos head batch pos -> batch pos key_pos head"
+        ),
+        residual_contribution,
+    )
+@torch.no_grad()
+@typechecked
+def get_mlp_contributions(
+    resid_mid: Float[torch.Tensor, "batch pos d_model"],
+    resid_post: Float[torch.Tensor, "batch pos d_model"],
+    mlp_out: Float[torch.Tensor, "batch pos d_model"],
+    distance_norm: int = 1,
+) -> Tuple[Float[torch.Tensor, "batch pos"], Float[torch.Tensor, "batch pos"]]:
+    """
+    Returns a pair of (mlp, residual) contributions for each sentence and token.
+    """
+    contributions = get_contributions(
+        torch.stack((mlp_out, resid_mid)), resid_post, distance_norm
+    )
+    return contributions[0], contributions[1]
+@torch.no_grad()
+@typechecked
+def get_decomposed_mlp_contributions(
+    resid_mid: Float[torch.Tensor, "d_model"],
+    resid_post: Float[torch.Tensor, "d_model"],
+    decomposed_mlp_out: Float[torch.Tensor, "hidden d_model"],
+    distance_norm: int = 1,
+) -> Tuple[Float[torch.Tensor, "hidden"], float]:
+    """
+    Similar to `get_mlp_contributions`, but it takes the MLP output for each neuron of
+    the hidden layer and thus computes a contribution per neuron.
+    Doesn't contain batch and token dimensions for sake of saving memory. But we may
+    consider adding them.
+    """
+    neuron_contributions, residual_contribution = get_contributions_with_one_off_part(
+        decomposed_mlp_out, resid_mid, resid_post, distance_norm
+    )
+    return neuron_contributions, residual_contribution.item()
+@torch.no_grad()
+def apply_threshold_and_renormalize(
+    threshold: float,
+    c_blocks: torch.Tensor,
+    c_residual: torch.Tensor,
+) -> Tuple[torch.Tensor, torch.Tensor]:
+    """
+    Thresholding mechanism used in the original graphs paper. After the threshold is
+    applied, the remaining contributions are renormalized on order to sum up to 1 for
+    each representation.
+    threshold: The threshold.
+    c_residual: Contribution of the residual stream for each representation. This tensor
+        should contain 1 element per representation, i.e., its dimensions are all batch
+        dimensions.
+    c_blocks: Contributions of the blocks. Could be 1 block per representation, like
+        ffn, or heads*tokens blocks in case of attention. The shape of `c_residual`
+        must be a prefix if the shape of this tensor. The remaining dimensions are for
+        listing the blocks.
+    """
+    block_dims = len(c_blocks.shape)
+    resid_dims = len(c_residual.shape)
+    bound_dims = block_dims - resid_dims
+    assert bound_dims >= 0
+    assert c_blocks.shape[0:resid_dims] == c_residual.shape
+    c_blocks = c_blocks * (c_blocks > threshold)
+    c_residual = c_residual * (c_residual > threshold)
+    denom = c_residual + c_blocks.sum(dim=tuple(range(resid_dims, block_dims)))
+    return (
+        c_blocks / denom.reshape(denom.shape + (1,) * bound_dims),
+        c_residual / denom,
+    )

llm_transparency_tool/routes/graph.py ADDED Viewed

	@@ -0,0 +1,163 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from typing import List, Optional
+import networkx as nx
+import torch
+import llm_transparency_tool.routes.contributions as contributions
+from llm_transparency_tool.models.transparent_llm import TransparentLlm
+class GraphBuilder:
+    """
+    Constructs the contributions graph with edges given one by one. The resulting graph
+    is a networkx graph that can be accessed via the `graph` field. It contains the
+    following types of nodes:
+    - X0_<token>: the original token.
+    - A<layer>_<token>: the residual stream after attention at the given layer for the
+        given token.
+    - M<layer>_<token>: the ffn block.
+    - I<layer>_<token>: the residual stream after the ffn block.
+    """
+    def __init__(self, n_layers: int, n_tokens: int):
+        self._n_layers = n_layers
+        self._n_tokens = n_tokens
+        self.graph = nx.DiGraph()
+        for layer in range(n_layers):
+            for token in range(n_tokens):
+                self.graph.add_node(f"A{layer}_{token}")
+                self.graph.add_node(f"I{layer}_{token}")
+                self.graph.add_node(f"M{layer}_{token}")
+        for token in range(n_tokens):
+            self.graph.add_node(f"X0_{token}")
+    def get_output_node(self, token: int):
+        return f"I{self._n_layers - 1}_{token}"
+    def _add_edge(self, u: str, v: str, weight: float):
+        # TODO(igortufanov): Here we sum up weights for multi-edges. It happens with
+        # attention from the current token and the residual edge. Ideally these need to
+        # be 2 separate edges, but then we need to do a MultiGraph. Multigraph is fine,
+        # but when we try to traverse it, we face some NetworkX issue with EDGE_OK
+        # receiving 3 arguments instead of 2.
+        if self.graph.has_edge(u, v):
+            self.graph[u][v]["weight"] += weight
+        else:
+            self.graph.add_edge(u, v, weight=weight)
+    def add_attention_edge(self, layer: int, token_from: int, token_to: int, w: float):
+        self._add_edge(
+            f"I{layer-1}_{token_from}" if layer > 0 else f"X0_{token_from}",
+            f"A{layer}_{token_to}",
+            w,
+        )
+    def add_residual_to_attn(self, layer: int, token: int, w: float):
+        self._add_edge(
+            f"I{layer-1}_{token}" if layer > 0 else f"X0_{token}",
+            f"A{layer}_{token}",
+            w,
+        )
+    def add_ffn_edge(self, layer: int, token: int, w: float):
+        self._add_edge(f"A{layer}_{token}", f"M{layer}_{token}", w)
+        self._add_edge(f"M{layer}_{token}", f"I{layer}_{token}", w)
+    def add_residual_to_ffn(self, layer: int, token: int, w: float):
+        self._add_edge(f"A{layer}_{token}", f"I{layer}_{token}", w)
+@torch.no_grad()
+def build_full_graph(
+    model: TransparentLlm,
+    batch_i: int = 0,
+    renormalizing_threshold: Optional[float] = None,
+) -> nx.Graph:
+    """
+    Build the contribution graph for all blocks of the model and all tokens.
+    model: The transparent llm which already did the inference.
+    batch_i: Which sentence to use from the batch that was given to the model.
+    renormalizing_threshold: If specified, will apply renormalizing thresholding to the
+    contributions. All contributions below the threshold will be erazed and the rest
+    will be renormalized.
+    """
+    n_layers = model.model_info().n_layers
+    n_tokens = model.tokens()[batch_i].shape[0]
+    builder = GraphBuilder(n_layers, n_tokens)
+    for layer in range(n_layers):
+        c_attn, c_resid_attn = contributions.get_attention_contributions(
+            resid_pre=model.residual_in(layer)[batch_i].unsqueeze(0),
+            resid_mid=model.residual_after_attn(layer)[batch_i].unsqueeze(0),
+            decomposed_attn=model.decomposed_attn(batch_i, layer).unsqueeze(0),
+        )
+        if renormalizing_threshold is not None:
+            c_attn, c_resid_attn = contributions.apply_threshold_and_renormalize(
+                renormalizing_threshold, c_attn, c_resid_attn
+            )
+        for token_from in range(n_tokens):
+            for token_to in range(n_tokens):
+                # Sum attention contributions over heads.
+                c = c_attn[batch_i, token_to, token_from].sum().item()
+                builder.add_attention_edge(layer, token_from, token_to, c)
+        for token in range(n_tokens):
+            builder.add_residual_to_attn(
+                layer, token, c_resid_attn[batch_i, token].item()
+            )
+        c_ffn, c_resid_ffn = contributions.get_mlp_contributions(
+            resid_mid=model.residual_after_attn(layer)[batch_i].unsqueeze(0),
+            resid_post=model.residual_out(layer)[batch_i].unsqueeze(0),
+            mlp_out=model.ffn_out(layer)[batch_i].unsqueeze(0),
+        )
+        if renormalizing_threshold is not None:
+            c_ffn, c_resid_ffn = contributions.apply_threshold_and_renormalize(
+                renormalizing_threshold, c_ffn, c_resid_ffn
+            )
+        for token in range(n_tokens):
+            builder.add_ffn_edge(layer, token, c_ffn[batch_i, token].item())
+            builder.add_residual_to_ffn(
+                layer, token, c_resid_ffn[batch_i, token].item()
+            )
+    return builder.graph
+def build_paths_to_predictions(
+    graph: nx.Graph,
+    n_layers: int,
+    n_tokens: int,
+    starting_tokens: List[int],
+    threshold: float,
+) -> List[nx.Graph]:
+    """
+    Given the full graph, this function returns only the trees leading to the specified
+    tokens. Edges with weight below `threshold` will be ignored.
+    """
+    builder = GraphBuilder(n_layers, n_tokens)
+    rgraph = graph.reverse()
+    search_graph = nx.subgraph_view(
+        rgraph, filter_edge=lambda u, v: rgraph[u][v]["weight"] > threshold
+    )
+    result = []
+    for start in starting_tokens:
+        assert start < n_tokens
+        assert start >= 0
+        edges = nx.edge_dfs(search_graph, source=builder.get_output_node(start))
+        tree = search_graph.edge_subgraph(edges)
+        # Reverse the edges because the dfs was going from upper layer downwards.
+        result.append(tree.reverse())
+    return result

llm_transparency_tool/routes/graph_node.py ADDED Viewed

	@@ -0,0 +1,90 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from dataclasses import dataclass
+from enum import Enum
+from typing import List, Optional
+class NodeType(Enum):
+    AFTER_ATTN = "after_attn"
+    AFTER_FFN = "after_ffn"
+    FFN = "ffn"
+    ORIGINAL = "original"  # The original tokens
+def _format_block_hierachy_string(blocks: List[str]) -> str:
+    return " ▸ ".join(blocks)
+@dataclass
+class GraphNode:
+    layer: int
+    token: int
+    type: NodeType
+    def is_in_residual_stream(self) -> bool:
+        return self.type in [NodeType.AFTER_ATTN, NodeType.AFTER_FFN]
+    def get_residual_predecessor(self) -> Optional["GraphNode"]:
+        """
+        Get another graph node which points to the state of the residual stream before
+        this node.
+        Retun None if current representation is the first one in the residual stream.
+        """
+        scheme = {
+            NodeType.AFTER_ATTN: GraphNode(
+                layer=max(self.layer - 1, 0),
+                token=self.token,
+                type=NodeType.AFTER_FFN if self.layer > 0 else NodeType.ORIGINAL,
+            ),
+            NodeType.AFTER_FFN: GraphNode(
+                layer=self.layer,
+                token=self.token,
+                type=NodeType.AFTER_ATTN,
+            ),
+            NodeType.FFN: GraphNode(
+                layer=self.layer,
+                token=self.token,
+                type=NodeType.AFTER_ATTN,
+            ),
+            NodeType.ORIGINAL: None,
+        }
+        node = scheme[self.type]
+        if node.layer < 0:
+            return None
+        return node
+    def get_name(self) -> str:
+        return _format_block_hierachy_string(
+            [f"L{self.layer}", f"T{self.token}", str(self.type.value)]
+        )
+    def get_predecessor_block_name(self) -> str:
+        """
+        Return the name of the block standing between current node and its predecessor
+        in the residual stream.
+        """
+        scheme = {
+            NodeType.AFTER_ATTN: [f"L{self.layer}", "attn"],
+            NodeType.AFTER_FFN: [f"L{self.layer}", "ffn"],
+            NodeType.FFN: [f"L{self.layer}", "ffn"],
+            NodeType.ORIGINAL: ["Nothing"],
+        }
+        return _format_block_hierachy_string(scheme[self.type])
+    def get_head_name(self, head: Optional[int]) -> str:
+        path = [f"L{self.layer}", "attn"]
+        if head is not None:
+            path.append(f"H{head}")
+        return _format_block_hierachy_string(path)
+    def get_neuron_name(self, neuron: Optional[int]) -> str:
+        path = [f"L{self.layer}", "ffn"]
+        if neuron is not None:
+            path.append(f"N{neuron}")
+        return _format_block_hierachy_string(path)

llm_transparency_tool/routes/test_contributions.py ADDED Viewed

	@@ -0,0 +1,148 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import unittest
+from typing import Any, List
+import torch
+import llm_transparency_tool.routes.contributions as contributions
+class TestContributions(unittest.TestCase):
+    def setUp(self):
+        torch.manual_seed(123)
+        self.eps = 1e-4
+        # It may be useful to run the test on GPU in case there are any issues with
+        # creating temporary tensors on another device. But turn this off by default.
+        self.test_on_gpu = False
+        self.device = "cuda" if self.test_on_gpu else "cpu"
+        self.batch = 4
+        self.tokens = 5
+        self.heads = 6
+        self.d_model = 10
+        self.decomposed_attn = torch.rand(
+            self.batch,
+            self.tokens,
+            self.tokens,
+            self.heads,
+            self.d_model,
+            device=self.device,
+        )
+        self.mlp_out = torch.rand(
+            self.batch, self.tokens, self.d_model, device=self.device
+        )
+        self.resid_pre = torch.rand(
+            self.batch, self.tokens, self.d_model, device=self.device
+        )
+        self.resid_mid = torch.rand(
+            self.batch, self.tokens, self.d_model, device=self.device
+        )
+        self.resid_post = torch.rand(
+            self.batch, self.tokens, self.d_model, device=self.device
+        )
+    def _assert_tensor_eq(self, t: torch.Tensor, expected: List[Any]):
+        self.assertTrue(
+            torch.isclose(t, torch.Tensor(expected), atol=self.eps).all(),
+            t,
+        )
+    def test_mlp_contributions(self):
+        mlp_out = torch.tensor([[[1.0, 1.0]]])
+        resid_mid = torch.tensor([[[0.0, 0.0]]])
+        resid_post = torch.tensor([[[1.0, 1.0]]])
+        c_mlp, c_residual = contributions.get_mlp_contributions(
+            resid_mid, resid_post, mlp_out
+        )
+        self.assertAlmostEqual(c_mlp.item(), 1.0, delta=self.eps)
+        self.assertAlmostEqual(c_residual.item(), 0.0, delta=self.eps)
+    def test_decomposed_attn_contributions(self):
+        resid_pre = torch.tensor([[[2.0, 1.0]]])
+        resid_mid = torch.tensor([[[2.0, 2.0]]])
+        decomposed_attn = torch.tensor(
+            [
+                [
+                    [
+                        [
+                            [1.0, 1.0],
+                            [-1.0, 0.0],
+                        ]
+                    ]
+                ]
+            ]
+        )
+        c_attn, c_residual = contributions.get_attention_contributions(
+            resid_pre, resid_mid, decomposed_attn, distance_norm=2
+        )
+        self._assert_tensor_eq(c_attn, [[[[0.43613, 0]]]])
+        self.assertAlmostEqual(c_residual.item(), 0.56387, delta=self.eps)
+    def test_decomposed_mlp_contributions(self):
+        pre = torch.tensor([10.0, 10.0])
+        post = torch.tensor([-10.0, 10.0])
+        neuron_impacts = torch.tensor(
+            [
+                [0.0, 1.0],
+                [1.0, 0.0],
+                [-21.0, -1.0],
+            ]
+        )
+        c_mlp, c_residual = contributions.get_decomposed_mlp_contributions(
+            pre, post, neuron_impacts, distance_norm=2
+        )
+        # A bit counter-intuitive, but the only vector pointing from 0 towards the
+        # output is the first one.
+        self._assert_tensor_eq(c_mlp, [1, 0, 0])
+        self.assertAlmostEqual(c_residual, 0, delta=self.eps)
+    def test_decomposed_mlp_contributions_single_direction(self):
+        pre = torch.tensor([1.0, 1.0])
+        post = torch.tensor([4.0, 4.0])
+        neuron_impacts = torch.tensor(
+            [
+                [1.0, 1.0],
+                [2.0, 2.0],
+            ]
+        )
+        c_mlp, c_residual = contributions.get_decomposed_mlp_contributions(
+            pre, post, neuron_impacts, distance_norm=2
+        )
+        self._assert_tensor_eq(c_mlp, [0.25, 0.5])
+        self.assertAlmostEqual(c_residual, 0.25, delta=self.eps)
+    def test_attention_contributions_shape(self):
+        c_attn, c_residual = contributions.get_attention_contributions(
+            self.resid_pre, self.resid_mid, self.decomposed_attn
+        )
+        self.assertEqual(
+            list(c_attn.shape), [self.batch, self.tokens, self.tokens, self.heads]
+        )
+        self.assertEqual(list(c_residual.shape), [self.batch, self.tokens])
+    def test_mlp_contributions_shape(self):
+        c_mlp, c_residual = contributions.get_mlp_contributions(
+            self.resid_mid, self.resid_post, self.mlp_out
+        )
+        self.assertEqual(list(c_mlp.shape), [self.batch, self.tokens])
+        self.assertEqual(list(c_residual.shape), [self.batch, self.tokens])
+    def test_renormalizing_threshold(self):
+        c_blocks = torch.Tensor([[0.05, 0.15], [0.05, 0.05]])
+        c_residual = torch.Tensor([0.8, 0.9])
+        norm_blocks, norm_residual = contributions.apply_threshold_and_renormalize(
+            0.1, c_blocks, c_residual
+        )
+        self._assert_tensor_eq(norm_blocks, [[0.0, 0.157894], [0.0, 0.0]])
+        self._assert_tensor_eq(norm_residual, [0.842105, 1.0])

llm_transparency_tool/server/app.py ADDED Viewed

	@@ -0,0 +1,659 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import argparse
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Tuple
+import networkx as nx
+import pandas as pd
+import plotly.express
+import plotly.graph_objects as go
+import streamlit as st
+import streamlit_extras.row as st_row
+import torch
+from jaxtyping import Float
+from torch.amp import autocast
+from transformers import HfArgumentParser
+import llm_transparency_tool.components
+from llm_transparency_tool.models.tlens_model import TransformerLensTransparentLlm
+import llm_transparency_tool.routes.contributions as contributions
+import llm_transparency_tool.routes.graph
+from llm_transparency_tool.models.transparent_llm import TransparentLlm
+from llm_transparency_tool.routes.graph_node import NodeType
+from llm_transparency_tool.server.graph_selection import (
+    GraphSelection,
+    UiGraphEdge,
+    UiGraphNode,
+)
+from llm_transparency_tool.server.styles import (
+    RenderSettings,
+    logits_color_map,
+    margins_css,
+    string_to_display,
+)
+from llm_transparency_tool.server.utils import (
+    B0,
+    get_contribution_graph,
+    load_dataset,
+    load_model,
+    possible_devices,
+    run_model_with_session_caching,
+    st_placeholder,
+)
+from llm_transparency_tool.server.monitor import SystemMonitor
+from networkx.classes.digraph import DiGraph
+@st.cache_resource(
+    hash_funcs={
+        nx.Graph: id,
+        DiGraph: id
+    }
+)
+def cached_build_paths_to_predictions(
+    graph: nx.Graph,
+    n_layers: int,
+    n_tokens: int,
+    starting_tokens: List[int],
+    threshold: float,
+):
+    return llm_transparency_tool.routes.graph.build_paths_to_predictions(
+        graph, n_layers, n_tokens, starting_tokens, threshold
+    )
+@st.cache_resource(
+    hash_funcs={
+        TransformerLensTransparentLlm: id
+    }
+)
+def cached_run_inference_and_populate_state(
+    stateless_model,
+    sentences,
+):
+    stateful_model = stateless_model.copy()
+    stateful_model.run(sentences)
+    return stateful_model
+@dataclass
+class LlmViewerConfig:
+    debug: bool = field(
+        default=False,
+        metadata={"help": "Show debugging information, like the time profile."},
+    )
+    preloaded_dataset_filename: Optional[str] = field(
+        default=None,
+        metadata={"help": "The name of the text file to load the lines from."},
+    )
+    demo_mode: bool = field(
+        default=False,
+        metadata={"help": "Whether the app should be in the demo mode."},
+    )
+    allow_loading_dataset_files: bool = field(
+        default=True,
+        metadata={"help": "Whether the app should be able to load the dataset files " "on the server side."},
+    )
+    max_user_string_length: Optional[int] = field(
+        default=None,
+        metadata={
+            "help": "Limit for the length of user-provided sentences (in characters), " "or None if there is no limit."
+        },
+    )
+    models: Dict[str, str] = field(
+        default_factory=dict,
+        metadata={
+            "help": "Locations of models which are stored locally. Dictionary: official "
+            "HuggingFace name -> path to dir. If None is specified, the model will be"
+            "downloaded from HuggingFace."
+        },
+    )
+    default_model: str = field(
+        default="",
+        metadata={"help": "The model to load once the UI is started."},
+    )
+class App:
+    _stateful_model: TransparentLlm = None
+    render_settings = RenderSettings()
+    _graph: Optional[nx.Graph] = None
+    _contribution_threshold: float = 0.0
+    _renormalize_after_threshold: bool = False
+    _normalize_before_unembedding: bool = True
+    @property
+    def stateful_model(self) -> TransparentLlm:
+        return self._stateful_model
+    def __init__(self, config: LlmViewerConfig):
+        self._config = config
+        st.set_page_config(layout="wide")
+        st.markdown(margins_css, unsafe_allow_html=True)
+    def _get_representation(self, node: Optional[UiGraphNode]) -> Optional[Float[torch.Tensor, "d_model"]]:
+        if node is None:
+            return None
+        fn = {
+            NodeType.AFTER_ATTN: self.stateful_model.residual_after_attn,
+            NodeType.AFTER_FFN: self.stateful_model.residual_out,
+            NodeType.FFN: None,
+            NodeType.ORIGINAL: self.stateful_model.residual_in,
+        }
+        return fn[node.type](node.layer)[B0][node.token]
+    def draw_model_info(self):
+        info = self.stateful_model.model_info().__dict__
+        df = pd.DataFrame(
+            data=[str(x) for x in info.values()],
+            index=info.keys(),
+            columns=["Model parameter"],
+        )
+        st.dataframe(df, use_container_width=False)
+    def draw_dataset_selection(self) -> int:
+        def update_dataset(filename: Optional[str]):
+            dataset = load_dataset(filename) if filename is not None else []
+            st.session_state["dataset"] = dataset
+            st.session_state["dataset_file"] = filename
+        if "dataset" not in st.session_state:
+            update_dataset(self._config.preloaded_dataset_filename)
+        if not self._config.demo_mode:
+            if self._config.allow_loading_dataset_files:
+                row_f = st_row.row([2, 1], vertical_align="bottom")
+                filename = row_f.text_input("Dataset", value=st.session_state.dataset_file or "")
+                if row_f.button("Load"):
+                    update_dataset(filename)
+            row_s = st_row.row([2, 1], vertical_align="bottom")
+            new_sentence = row_s.text_input("New sentence")
+            new_sentence_added = False
+            if row_s.button("Add"):
+                max_len = self._config.max_user_string_length
+                n = len(new_sentence)
+                if max_len is None or n <= max_len:
+                    st.session_state.dataset.append(new_sentence)
+                    new_sentence_added = True
+                    st.session_state.sentence_selector = new_sentence
+                else:
+                    st.warning(f"Sentence length {n} is larger than " f"the configured limit of {max_len}")
+        sentences = st.session_state.dataset
+        selection = st.selectbox(
+            "Sentence",
+            sentences,
+            index=len(sentences) - 1,
+            key="sentence_selector",
+        )
+        return selection
+    def _unembed(
+        self,
+        representation: torch.Tensor,
+    ) -> torch.Tensor:
+        return self.stateful_model.unembed(representation, normalize=self._normalize_before_unembedding)
+    def draw_graph(self, contribution_threshold: float) -> Optional[GraphSelection]:
+        tokens = self.stateful_model.tokens()[B0]
+        n_tokens = tokens.shape[0]
+        model_info = self.stateful_model.model_info()
+        graphs = cached_build_paths_to_predictions(
+            self._graph,
+            model_info.n_layers,
+            n_tokens,
+            range(n_tokens),
+            contribution_threshold,
+        )
+        return llm_transparency_tool.components.contribution_graph(
+            model_info,
+            self.stateful_model.tokens_to_strings(tokens),
+            graphs,
+            key=f"graph_{hash(self.sentence)}",
+        )
+    def draw_token_matrix(
+        self,
+        values: Float[torch.Tensor, "t t"],
+        tokens: List[str],
+        value_name: str,
+        title: str,
+    ):
+        assert values.shape[0] == len(tokens)
+        labels = {
+            "x": "<b>src</b>",
+            "y": "<b>tgt</b>",
+            "color": value_name,
+        }
+        captions = [f"({i}){t}" for i, t in enumerate(tokens)]
+        fig = plotly.express.imshow(
+            values.cpu(),
+            title=f'<b>{title}</b>',
+            labels=labels,
+            x=captions,
+            y=captions,
+            color_continuous_scale=self.render_settings.attention_color_map,
+            aspect="equal",
+        )
+        fig.update_layout(
+            autosize=True,
+            margin=go.layout.Margin(
+                l=50,  # left margin
+                r=0,  # right margin
+                b=100,  # bottom margin
+                t=100,  # top margin
+                # pad=10  # padding
+            )
+        )
+        fig.update_xaxes(tickmode="linear")
+        fig.update_yaxes(tickmode="linear")
+        fig.update_coloraxes(showscale=False)
+        st.plotly_chart(fig, use_container_width=True, theme=None)
+    def draw_attn_info(self, edge: UiGraphEdge, container_attention_map) -> Optional[int]:
+        """
+        Returns: the index of the selected head.
+        """
+        n_heads = self.stateful_model.model_info().n_heads
+        layer = edge.target.layer
+        head_contrib, _ = contributions.get_attention_contributions(
+            resid_pre=self.stateful_model.residual_in(layer)[B0].unsqueeze(0),
+            resid_mid=self.stateful_model.residual_after_attn(layer)[B0].unsqueeze(0),
+            decomposed_attn=self.stateful_model.decomposed_attn(B0, layer).unsqueeze(0),
+        )
+        # [batch pos key_pos head] -> [head]
+        flat_contrib = head_contrib[0, edge.target.token, edge.source.token, :]
+        assert flat_contrib.shape[0] == n_heads, f"{flat_contrib.shape} vs {n_heads}"
+        selected_head = llm_transparency_tool.components.selector(
+            items=[f"H{h}" if h >= 0 else "All" for h in range(-1, n_heads)],
+            indices=range(-1, n_heads),
+            temperatures=[sum(flat_contrib).item()] + flat_contrib.tolist(),
+            preselected_index=flat_contrib.argmax().item(),
+            key=f"head_selector_layer_{layer}" #_from_tok_{edge.source.token}_to_tok_{edge.target.token}",
+        )
+        print(f"head_selector_layer_{layer}_from_tok_{edge.source.token}_to_tok_{edge.target.token}")
+        if selected_head == -1 or selected_head is None:
+            # selected_head = None
+            selected_head = flat_contrib.argmax().item()
+            print('****\n' * 3 + f"selected_head: {selected_head}" + '\n****\n' * 3)
+        # Draw attention matrix and contributions for the selected head.
+        if selected_head is not None:
+            tokens = [
+                string_to_display(s) for s in self.stateful_model.tokens_to_strings(self.stateful_model.tokens()[B0])
+            ]
+            with container_attention_map:
+                attn_container, contrib_container = st.columns([1, 1])
+                with attn_container:
+                    attn = self.stateful_model.attention_matrix(B0, layer, selected_head)
+                    self.draw_token_matrix(
+                        attn,
+                        tokens,
+                        "attention",
+                        f"Attention map L{layer} H{selected_head}",
+                    )
+                with contrib_container:
+                    contrib = head_contrib[B0, :, :, selected_head]
+                    self.draw_token_matrix(
+                        contrib,
+                        tokens,
+                        "contribution",
+                        f"Contribution map L{layer} H{selected_head}",
+                    )
+        return selected_head
+    def draw_ffn_info(self, node: UiGraphNode) -> Optional[int]:
+        """
+        Returns: the index of the selected neuron.
+        """
+        resid_mid = self.stateful_model.residual_after_attn(node.layer)[B0][node.token]
+        resid_post = self.stateful_model.residual_out(node.layer)[B0][node.token]
+        decomposed_ffn = self.stateful_model.decomposed_ffn_out(B0, node.layer, node.token)
+        c_ffn, _ = contributions.get_decomposed_mlp_contributions(resid_mid, resid_post, decomposed_ffn)
+        top_values, top_i = c_ffn.sort(descending=True)
+        n = min(self.render_settings.n_top_neurons, c_ffn.shape[0])
+        top_neurons = top_i[0:n].tolist()
+        selected_neuron = llm_transparency_tool.components.selector(
+            items=[f"{top_neurons[i]}" if i >= 0 else "All" for i in range(-1, n)],
+            indices=range(-1, n),
+            temperatures=[0.0] + top_values[0:n].tolist(),
+            preselected_index=-1,
+            key="neuron_selector",
+        )
+        if selected_neuron is None:
+            selected_neuron = -1
+        selected_neuron = None if selected_neuron == -1 else top_neurons[selected_neuron]
+        return selected_neuron
+    def _draw_token_table(
+        self,
+        n_top: int,
+        n_bottom: int,
+        representation: torch.Tensor,
+        predecessor: Optional[torch.Tensor] = None,
+    ):
+        n_total = n_top + n_bottom
+        logits = self._unembed(representation)
+        n_vocab = logits.shape[0]
+        scores, indices = torch.topk(logits, n_top, largest=True)
+        positions = list(range(n_top))
+        if n_bottom > 0:
+            low_scores, low_indices = torch.topk(logits, n_bottom, largest=False)
+            indices = torch.cat((indices, low_indices.flip(0)))
+            scores = torch.cat((scores, low_scores.flip(0)))
+            positions += range(n_vocab - n_bottom, n_vocab)
+        tokens = [string_to_display(w) for w in self.stateful_model.tokens_to_strings(indices)]
+        if predecessor is not None:
+            pre_logits = self._unembed(predecessor)
+            _, sorted_pre_indices = pre_logits.sort(descending=True)
+            pre_indices_dict = {index: pos for pos, index in enumerate(sorted_pre_indices.tolist())}
+            old_positions = [pre_indices_dict[i] for i in indices.tolist()]
+            def pos_gain_string(pos, old_pos):
+                if pos == old_pos:
+                    return ""
+                sign = "↓" if pos > old_pos else "↑"
+                return f"({sign}{abs(pos - old_pos)})"
+            position_strings = [f"{i} {pos_gain_string(i, old_i)}" for (i, old_i) in zip(positions, old_positions)]
+        else:
+            position_strings = [str(pos) for pos in positions]
+        def pos_gain_color(s):
+            color = "black"
+            if isinstance(s, str):
+                if "↓" in s:
+                    color = "red"
+                if "↑" in s:
+                    color = "green"
+            return f"color: {color}"
+        top_df = pd.DataFrame(
+            data=zip(position_strings, tokens, scores.tolist()),
+            columns=["Pos", "Token", "Score"],
+        )
+        st.dataframe(
+            top_df.style.map(pos_gain_color)
+            .background_gradient(
+                axis=0,
+                cmap=logits_color_map(positive_and_negative=n_bottom > 0),
+            )
+            .format(precision=3),
+            hide_index=True,
+            height=self.render_settings.table_cell_height * (n_total + 1),
+            use_container_width=True,
+        )
+    def draw_token_dynamics(self, representation: torch.Tensor, block_name: str) -> None:
+        st.caption(block_name)
+        self._draw_token_table(
+            self.render_settings.n_promoted_tokens,
+            self.render_settings.n_suppressed_tokens,
+            representation,
+            None,
+        )
+    def draw_top_tokens(
+        self,
+        node: UiGraphNode,
+        container_top_tokens,
+        container_token_dynamics,
+    ) -> None:
+        pre_node = node.get_residual_predecessor()
+        if pre_node is None:
+            return
+        representation = self._get_representation(node)
+        predecessor = self._get_representation(pre_node)
+        with container_top_tokens:
+            st.caption(node.get_name())
+            self._draw_token_table(
+                self.render_settings.n_top_tokens,
+                0,
+                representation,
+                predecessor,
+            )
+        if container_token_dynamics is not None:
+            with container_token_dynamics:
+                self.draw_token_dynamics(representation - predecessor, node.get_predecessor_block_name())
+    def draw_attention_dynamics(self, node: UiGraphNode, head: Optional[int]):
+        block_name = node.get_head_name(head)
+        block_output = (
+            self.stateful_model.attention_output_per_head(B0, node.layer, node.token, head)
+            if head is not None
+            else self.stateful_model.attention_output(B0, node.layer, node.token)
+        )
+        self.draw_token_dynamics(block_output, block_name)
+    def draw_ffn_dynamics(self, node: UiGraphNode, neuron: Optional[int]):
+        block_name = node.get_neuron_name(neuron)
+        block_output = (
+            self.stateful_model.neuron_output(node.layer, neuron)
+            if neuron is not None
+            else self.stateful_model.ffn_out(node.layer)[B0][node.token]
+        )
+        self.draw_token_dynamics(block_output, block_name)
+    def draw_precision_controls(self, device: str) -> Tuple[torch.dtype, bool]:
+        """
+        Draw fp16/fp32 switch and AMP control.
+        return: The selected precision and whether AMP should be enabled.
+        """
+        if device == "cpu":
+            dtype = torch.float32
+        else:
+            dtype = st.selectbox(
+                "Precision",
+                [torch.float16, torch.bfloat16, torch.float32],
+                index=0,
+            )
+        amp_enabled = dtype != torch.float32
+        return dtype, amp_enabled
+    def draw_controls(self):
+        # model_container, data_container = st.columns([1, 1])
+        with st.sidebar.expander("Model", expanded=True):
+            list_of_devices = possible_devices()
+            if len(list_of_devices) > 1:
+                self.device = st.selectbox(
+                    "Device",
+                    possible_devices(),
+                    index=0,
+                )
+            else:
+                self.device = list_of_devices[0]
+            self.dtype, self.amp_enabled = self.draw_precision_controls(self.device)
+            model_list = list(self._config.models)
+            default_choice = model_list.index(self._config.default_model)
+            self.model_name = st.selectbox(
+                "Model",
+                model_list,
+                index=default_choice,
+            )
+            if self.model_name:
+                self._stateful_model = load_model(
+                    model_name=self.model_name,
+                    _model_path=self._config.models[self.model_name],
+                    _device=self.device,
+                    _dtype=self.dtype,
+                )
+                self.model_key = self.model_name  # TODO maybe something else?
+                self.draw_model_info()
+        self.sentence = self.draw_dataset_selection()
+        with st.sidebar.expander("Graph", expanded=True):
+            self._contribution_threshold = st.slider(
+                min_value=0.01,
+                max_value=0.1,
+                step=0.01,
+                value=0.04,
+                format=r"%.3f",
+                label="Contribution threshold",
+            )
+            self._renormalize_after_threshold = st.checkbox("Renormalize after threshold", value=True)
+            self._normalize_before_unembedding = st.checkbox("Normalize before unembedding", value=True)
+    def run_inference(self):
+        with autocast(enabled=self.amp_enabled, device_type="cuda", dtype=self.dtype):
+            self._stateful_model = cached_run_inference_and_populate_state(self.stateful_model, [self.sentence])
+        with autocast(enabled=self.amp_enabled, device_type="cuda", dtype=self.dtype):
+            self._graph = get_contribution_graph(
+                self.stateful_model,
+                self.model_key,
+                self.stateful_model.tokens()[B0].tolist(),
+                (self._contribution_threshold if self._renormalize_after_threshold else 0.0),
+            )
+    def draw_graph_and_selection(
+        self,
+    ) -> None:
+        (
+            container_graph,
+            container_tokens,
+        ) = st.columns(self.render_settings.column_proportions)
+        container_graph_left, container_graph_right = container_graph.columns([5, 1])
+        container_graph_left.write('##### Graph')
+        heads_placeholder = container_graph_right.empty()
+        heads_placeholder.write('##### Blocks')
+        container_graph_right_used = False
+        container_top_tokens, container_token_dynamics = container_tokens.columns([1, 1])
+        container_top_tokens.write('##### Top Tokens')
+        container_top_tokens_used = False
+        container_token_dynamics.write('##### Promoted Tokens')
+        container_token_dynamics_used = False
+        try:
+            if self.sentence is None:
+                return
+            with container_graph_left:
+                selection = self.draw_graph(self._contribution_threshold if not self._renormalize_after_threshold else 0.0)
+            if selection is None:
+                return
+            node = selection.node
+            edge = selection.edge
+            if edge is not None and edge.target.type == NodeType.AFTER_ATTN:
+                with container_graph_right:
+                    container_graph_right_used = True
+                    heads_placeholder.write('##### Heads')
+                    head = self.draw_attn_info(edge, container_graph)
+                with container_token_dynamics:
+                    self.draw_attention_dynamics(edge.target, head)
+                    container_token_dynamics_used = True
+            elif node is not None and node.type == NodeType.FFN:
+                with container_graph_right:
+                    container_graph_right_used = True
+                    heads_placeholder.write('##### Neurons')
+                    neuron = self.draw_ffn_info(node)
+                with container_token_dynamics:
+                    self.draw_ffn_dynamics(node, neuron)
+                    container_token_dynamics_used = True
+            if node is not None and node.is_in_residual_stream():
+                self.draw_top_tokens(
+                    node,
+                    container_top_tokens,
+                    container_token_dynamics if not container_token_dynamics_used else None,
+                )
+                container_top_tokens_used = True
+                container_token_dynamics_used = True
+        finally:
+            if not container_graph_right_used:
+                st_placeholder('Click on an edge to see head contributions. \n\n'
+                               'Or click on FFN to see individual neuron contributions.', container_graph_right, height=1100)
+            if not container_top_tokens_used:
+                st_placeholder('Select a node from residual stream to see its top tokens.', container_top_tokens, height=1100)
+            if not container_token_dynamics_used:
+                st_placeholder('Select a node to see its promoted tokens.', container_token_dynamics, height=1100)
+    def run(self):
+        with st.sidebar.expander("About", expanded=True):
+            if self._config.demo_mode:
+                st.caption("""
+                    The app is deployed in Demo Mode, thus only predefined models and inputs are available.\n
+                    You can still install the app locally and use your own models and inputs.\n
+                    See https://github.com/facebookresearch/llm-transparency-tool for more information.
+                """)
+        self.draw_controls()
+        if not self.model_name:
+            st.warning("No model selected")
+            st.stop()
+        if self.sentence is None:
+            st.warning("No sentence selected")
+        else:
+            with torch.inference_mode():
+                self.run_inference()
+        self.draw_graph_and_selection()
+if __name__ == "__main__":
+    top_parser = argparse.ArgumentParser()
+    top_parser.add_argument("config_file")
+    args = top_parser.parse_args()
+    parser = HfArgumentParser([LlmViewerConfig])
+    config = parser.parse_json_file(args.config_file)[0]
+    with SystemMonitor(config.debug) as prof:
+        app = App(config)
+        app.run()

llm_transparency_tool/server/graph_selection.py ADDED Viewed

	@@ -0,0 +1,56 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from dataclasses import dataclass
+from typing import Any, Dict, Optional
+from llm_transparency_tool.routes.graph_node import GraphNode, NodeType
+class UiGraphNode(GraphNode):
+    @staticmethod
+    def from_json(json: Dict[str, Any]) -> Optional["UiGraphNode"]:
+        try:
+            layer = json["cell"]["layer"]
+            token = json["cell"]["token"]
+            type = NodeType(json["item"])
+            return UiGraphNode(layer, token, type)
+        except (TypeError, KeyError):
+            return None
+@dataclass
+class UiGraphEdge:
+    source: UiGraphNode
+    target: UiGraphNode
+    weight: float
+    @staticmethod
+    def from_json(json: Dict[str, Any]) -> Optional["UiGraphEdge"]:
+        try:
+            source = UiGraphNode.from_json(json["from"])
+            target = UiGraphNode.from_json(json["to"])
+            if source is None or target is None:
+                return None
+            weight = float(json["weight"])
+            return UiGraphEdge(source, target, weight)
+        except (TypeError, KeyError):
+            return None
+@dataclass
+class GraphSelection:
+    node: Optional[UiGraphNode]
+    edge: Optional[UiGraphEdge]
+    @staticmethod
+    def from_json(json: Dict[str, Any]) -> Optional["GraphSelection"]:
+        try:
+            node = UiGraphNode.from_json(json["node"])
+            edge = UiGraphEdge.from_json(json["edge"])
+            return GraphSelection(node, edge)
+        except (TypeError, KeyError):
+            return None

llm_transparency_tool/server/monitor.py ADDED Viewed

	@@ -0,0 +1,99 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import torch
+import streamlit as st
+from pyinstrument import Profiler
+from typing import Dict
+import pandas as pd
+@st.cache_resource(max_entries=1, show_spinner=False)
+def init_gpu_memory():
+    """
+    When CUDA is initialized, it occupies some memory on the GPU thus this overhead
+    can sometimes make it difficult to understand how much memory is actually used by
+    the model.
+    This function is used to initialize CUDA and measure the overhead.
+    """
+    if not torch.cuda.is_available():
+        return {}
+    # lets init torch gpu for a moment
+    gpu_memory_overhead = {}
+    for i in range(torch.cuda.device_count()):
+        torch.ones(1).cuda(i)
+        free, total = torch.cuda.mem_get_info(i)
+        occupied = total - free
+        gpu_memory_overhead[i] = occupied
+    return gpu_memory_overhead
+class SystemMonitor:
+    """
+    This class is used to monitor the system resources such as GPU memory and CPU
+    usage. It uses the pyinstrument library to profile the code and measure the
+    execution time of different parts of the code.
+    """
+    def __init__(
+        self,
+        enabled: bool = False,
+    ):
+        self.enabled = enabled
+        self.profiler = Profiler()
+        self.overhead: Dict[int, int]
+    def __enter__(self):
+        if not self.enabled:
+            return
+        self.overhead = init_gpu_memory()
+        self.profiler.__enter__()
+    def __exit__(self, exc_type, exc_value, traceback):
+        if not self.enabled:
+            return
+        self.profiler.__exit__(exc_type, exc_value, traceback)
+        self.report_gpu_usage()
+        self.report_profiler()
+        with st.expander("Session state"):
+            st.write(st.session_state)
+        return None
+    def report_gpu_usage(self):
+        if not torch.cuda.is_available():
+            return
+        data = []
+        for i in range(torch.cuda.device_count()):
+            free, total = torch.cuda.mem_get_info(i)
+            occupied = total - free
+            data.append({
+                'overhead': self.overhead[i],
+                'occupied': occupied - self.overhead[i],
+                'free': free,
+            })
+        df = pd.DataFrame(data, columns=["overhead", "occupied", "free"])
+        with st.sidebar.expander("System"):
+            st.write("GPU memory on server")
+            df /= 1024 ** 3  # Convert to GB
+            st.bar_chart(df, width=200, height=200, color=["#fefefe", "#84c9ff", "#fe2b2b"])
+    def report_profiler(self):
+        html_code = self.profiler.output_html()
+        with st.expander("Profiler", expanded=False):
+            st.components.v1.html(html_code, height=1000, scrolling=True)

llm_transparency_tool/server/styles.py ADDED Viewed

	@@ -0,0 +1,107 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from dataclasses import dataclass
+import matplotlib
+# Unofficial way do make the padding a bit smaller.
+margins_css = """
+    <style>
+        .main > div {
+            padding: 1rem;
+            padding-top: 2rem;  # Still need this gap for the top bar
+            gap: 0rem;
+        }
+        section[data-testid="stSidebar"] {
+            width: 300px !important; # Set the width to your desired value
+        }
+    </style>
+"""
+@dataclass
+class RenderSettings:
+    column_proportions = [50, 30]
+    # We don't know the actual height. This will be used in order to compute the table
+    # viewport height when needed.
+    table_cell_height = 36
+    n_top_tokens = 30
+    n_promoted_tokens = 15
+    n_suppressed_tokens = 15
+    n_top_neurons = 20
+    attention_color_map = "Blues"
+    no_model_alt_text = "<no model selected>"
+def string_to_display(s: str) -> str:
+    return s.replace(" ", "·")
+def logits_color_map(positive_and_negative: bool) -> matplotlib.colors.Colormap:
+    background_colors = {
+        "red": [
+            [0.0, 0.40, 0.40],
+            [0.1, 0.69, 0.69],
+            [0.2, 0.83, 0.83],
+            [0.3, 0.95, 0.95],
+            [0.4, 0.99, 0.99],
+            [0.5, 1.0, 1.0],
+            [0.6, 0.90, 0.90],
+            [0.7, 0.72, 0.72],
+            [0.8, 0.49, 0.49],
+            [0.9, 0.30, 0.30],
+            [1.0, 0.15, 0.15],
+        ],
+        "green": [
+            [0.0, 0.0, 0.0],
+            [0.1, 0.09, 0.09],
+            [0.2, 0.37, 0.37],
+            [0.3, 0.64, 0.64],
+            [0.4, 0.85, 0.85],
+            [0.5, 1.0, 1.0],
+            [0.6, 0.96, 0.96],
+            [0.7, 0.88, 0.88],
+            [0.8, 0.73, 0.73],
+            [0.9, 0.57, 0.57],
+            [1.0, 0.39, 0.39],
+        ],
+        "blue": [
+            [0.0, 0.12, 0.12],
+            [0.1, 0.16, 0.16],
+            [0.2, 0.30, 0.30],
+            [0.3, 0.50, 0.50],
+            [0.4, 0.78, 0.78],
+            [0.5, 1.0, 1.0],
+            [0.6, 0.81, 0.81],
+            [0.7, 0.52, 0.52],
+            [0.8, 0.25, 0.25],
+            [0.9, 0.12, 0.12],
+            [1.0, 0.09, 0.09],
+        ],
+    }
+    if not positive_and_negative:
+        # Stretch the top part to the whole range
+        new_colors = {}
+        for channel, colors in background_colors.items():
+            new_colors[channel] = [
+                [(value - 0.5) * 2, color, color]
+                for value, color, _ in colors
+                if value >= 0.5
+            ]
+        background_colors = new_colors
+    return matplotlib.colors.LinearSegmentedColormap(
+        f"RdYG-{positive_and_negative}",
+        background_colors,
+    )

llm_transparency_tool/server/utils.py ADDED Viewed

	@@ -0,0 +1,133 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+import uuid
+from typing import List, Optional, Tuple
+import networkx as nx
+import streamlit as st
+import torch
+import transformers
+import llm_transparency_tool.routes.graph
+from llm_transparency_tool.models.tlens_model import TransformerLensTransparentLlm
+from llm_transparency_tool.models.transparent_llm import TransparentLlm
+GPU = "gpu"
+CPU = "cpu"
+# This variable is for expressing the idea that batch_id = 0, but make it more
+# readable than just 0.
+B0 = 0
+def possible_devices() -> List[str]:
+    devices = []
+    if torch.cuda.is_available():
+        devices.append("gpu")
+    devices.append("cpu")
+    return devices
+def load_dataset(filename) -> List[str]:
+    with open(filename) as f:
+        dataset = [s.strip("\n") for s in f.readlines()]
+    print(f"Loaded {len(dataset)} sentences from {filename}")
+    return dataset
+@st.cache_resource(
+    hash_funcs={
+        TransformerLensTransparentLlm: id
+    }
+)
+def load_model(
+    model_name: str,
+    _device: str,
+    _model_path: Optional[str] = None,
+    _dtype: torch.dtype = torch.float32,
+) -> TransparentLlm:
+    """
+    Returns the loaded model along with its key. The key is just a unique string which
+    can be used later to identify if the model has changed.
+    """
+    assert _device in possible_devices()
+    causal_lm = None
+    tokenizer = None
+    tl_lm = TransformerLensTransparentLlm(
+        model_name=model_name,
+        hf_model=causal_lm,
+        tokenizer=tokenizer,
+        device=_device,
+        dtype=_dtype,
+    )
+    return tl_lm
+def run_model(model: TransparentLlm, sentence: str) -> None:
+    print(f"Running inference for '{sentence}'")
+    model.run([sentence])
+def load_model_with_session_caching(
+    **kwargs,
+) -> Tuple[TransparentLlm, str]:
+    return load_model(**kwargs)
+def run_model_with_session_caching(
+    _model: TransparentLlm,
+    model_key: str,
+    sentence: str,
+):
+    LAST_RUN_MODEL_KEY = "last_run_model_key"
+    LAST_RUN_SENTENCE = "last_run_sentence"
+    state = st.session_state
+    if (
+        state.get(LAST_RUN_MODEL_KEY, None) == model_key
+        and state.get(LAST_RUN_SENTENCE, None) == sentence
+    ):
+        return
+    run_model(_model, sentence)
+    state[LAST_RUN_MODEL_KEY] = model_key
+    state[LAST_RUN_SENTENCE] = sentence
+@st.cache_resource(
+    hash_funcs={
+        TransformerLensTransparentLlm: id
+    }
+)
+def get_contribution_graph(
+    model: TransparentLlm,  # TODO bug here
+    model_key: str,
+    tokens: List[str],
+    threshold: float,
+) -> nx.Graph:
+    """
+    The `model_key` and `tokens` are used only for caching. The model itself is not
+    hashed, hence the `_` in the beginning.
+    """
+    return llm_transparency_tool.routes.graph.build_full_graph(
+        model,
+        B0,
+        threshold,
+    )
+def st_placeholder(
+    text: str,
+    container=st,
+    border: bool = True,
+    height: Optional[int] = 500,
+):
+    empty = container.empty()
+    empty.container(border=border, height=height).write(f'<small>{text}</small>', unsafe_allow_html=True)
+    return empty

pyproject.toml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [tool.black]
2	+ line-length = 120

sample_input.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+The war lasted from the year 1732 to the year 17
+5 + 4 = 9, 2 + 3 =
+When Mary and John went to the store, John gave a drink to

setup.py ADDED Viewed

	@@ -0,0 +1,13 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the license found in the
+# LICENSE file in the root directory of this source tree.
+from setuptools import setup
+setup(
+    name="llm_transparency_tool",
+    version="0.1",
+    packages=["llm_transparency_tool"],
+)