janwinkler1 commited on
Commit
aa4279f
1 Parent(s): 7235148

Updating Readme, adding data overview (#3)

Browse files
README.md CHANGED
@@ -1,6 +1,6 @@
1
- # Source of truth
2
 
3
- ## welcome and initial setup
4
 
5
  hi all,
6
  i think when we start with the EDA, it suffices if anyone just uses what they are used to (f.e. conda or whatever). However, afterwards, i think it could be helpful that everyone, always has exactly the same environment, same package/python versions, which is why i propose working with docker to minimize headaches and "but it works on my machine" issues. I think with this minimal setup below, we can fully focus on hacking while not having pain with painful stuff.
@@ -15,7 +15,9 @@ please feel free to add / change / challenge things!
15
 
16
  essentially, you just have to build the container with the services you want. if you're interested in it i can go into more detail just let me know.
17
 
18
- 1. navigate to dc/dev and run:
 
 
19
 
20
  ```
21
  docker compose up -d --build
@@ -23,61 +25,45 @@ docker compose up -d --build
23
 
24
  only use the `--build` flag the first time around, or if you want to rebuild the container (e.g. when having added a package you need in the container). **NOTE:** the `-d` flag stands for `detach` which means that your docker container runs in the background and does not log everything into your console.
25
 
26
- 2. then, to check whether everything worked hit:
27
 
28
  ```
29
  docker ps
30
  ```
31
 
32
- 3. for this specific setup, you can head to `localhost:8888` where jupyterlab is running.
33
 
34
- 4. to create a new file (using jupytext, see below), just create a new .ipynb file, the .py file will be created automatically. all the changes you make in the notebook, will be reflected in the .py files which you then can use for your commits.
35
 
36
  now you shoold see the running docker containers.
37
 
38
  ### what about huggingface spaces:
39
 
40
- - [here](./docs/huggingface-spaces.md), you can see what huggingface spaces is and how we can complement our github repo with it (credits to chat-gpt)
 
 
41
 
42
  ### jupytext - nice versioning of jupyter notebooks
43
 
44
- since we are likely be working with jupyter notebooks alot, lets use jupytext. It automatically maps .ipynb to .py files with some magic. The .ipynb are in the gitignore, so we only have .py files nicely versioned in the repo. read more about it [here](https://jupytext.readthedocs.io/en/latest/)
 
 
 
45
 
46
  ### trunk based development
47
 
48
- lets stick to trunk based. if you dont know what it is, read all about it [here](https://trunkbaseddevelopment.com/)
49
-
50
- key take aways:
51
-
52
- #### Trunk-Based Development: Key Points
53
-
54
- 1. **Single Main Branch**: All developers commit to the trunk or main branch.
55
- 2. **Short-Lived Branches**: Branches, if used, are short-lived and quickly merged back.
56
- 3. **Frequent Integrations**: Code changes are integrated frequently, often multiple times a day.
57
- 4. **Feature Flags**: Incomplete features are managed with feature flags to maintain trunk stability.
58
-
59
- #### Benefits
60
-
61
- - **Reduced Integration Problems**: Early conflict detection and resolution.
62
- - **Higher Code Quality**: Continuous testing ensures stable and high-quality code.
63
- - **Simpler Workflow**: Less overhead managing branches and merges.
64
- - **Enhanced Collaboration**: Encourages teamwork and code reviews.
65
 
66
- #### Challenges
67
 
68
- - **Discipline Required**: Developers must write clean, well-tested code.
69
- - **Handling Incomplete Features**: Requires careful use of feature flags.
70
 
71
- #### Best Practices
 
72
 
73
- - **Frequent Commits**: Small, incremental changes reduce integration risks.
74
- - **Comprehensive Testing**: Automated tests for codebase coverage.
75
- - **Feature Flags**: Manage incomplete or experimental features.
76
- - **Code Reviews**: Maintain quality and knowledge sharing.
77
 
78
- ### code format
79
-
80
- - lets stick to black for python and prettier for .md and other formats
81
- - using docker for the purpose of formatting is really easy
82
- - just `chmod +x format` so that the `format` is executable
83
- - then simply use `./format` before adding your changes and all the files will be autoformatted
 
1
+ # Brainforest EcoHackathon 2024, Group "" (pronounced empty string)
2
 
3
+ ## welcome and minimal setup
4
 
5
  hi all,
6
  i think when we start with the EDA, it suffices if anyone just uses what they are used to (f.e. conda or whatever). However, afterwards, i think it could be helpful that everyone, always has exactly the same environment, same package/python versions, which is why i propose working with docker to minimize headaches and "but it works on my machine" issues. I think with this minimal setup below, we can fully focus on hacking while not having pain with painful stuff.
 
15
 
16
  essentially, you just have to build the container with the services you want. if you're interested in it i can go into more detail just let me know.
17
 
18
+ 1. clone the repo
19
+
20
+ 2. navigate to dc/dev and run:
21
 
22
  ```
23
  docker compose up -d --build
 
25
 
26
  only use the `--build` flag the first time around, or if you want to rebuild the container (e.g. when having added a package you need in the container). **NOTE:** the `-d` flag stands for `detach` which means that your docker container runs in the background and does not log everything into your console.
27
 
28
+ 3. then, to check whether everything worked hit:
29
 
30
  ```
31
  docker ps
32
  ```
33
 
34
+ 4. for this specific setup, you can head to `localhost:8888` where jupyterlab is running.
35
 
36
+ 5. to create a new file (using jupytext, see below), just create a new .ipynb file, the .py file will be created automatically. all the changes you make in the notebook, will be reflected in the .py files which you then can use for your commits.
37
 
38
  now you shoold see the running docker containers.
39
 
40
  ### what about huggingface spaces:
41
 
42
+ - IMO we have two options:
43
+ 1. [connect hugginface space](https://huggingface.co/docs/hub/spaces-github-actions#managing-spaces-with-github-actions) of our group to this repo using github actions
44
+ 2. have separate hugginface space (might be a pain)
45
 
46
  ### jupytext - nice versioning of jupyter notebooks
47
 
48
+ - we are likely be working with jupyter notebooks alot
49
+ - lets use [jupytext](https://jupytext.readthedocs.io/en/latest/)
50
+ - It automatically maps .ipynb to .py files with some magic
51
+ - The .ipynb are in the gitignore, so we only have .py files nicely versioned in the repo
52
 
53
  ### trunk based development
54
 
55
+ - lets stick to trunk based
56
+ - if you dont know what it is, read all about it [here](https://trunkbaseddevelopment.com/)
57
+ - or read the [key take aways](./docs/key-takeaways-tb.md)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
+ ### code format
60
 
61
+ - lets stick to [Black](https://black.readthedocs.io/en/stable/) for python and [Prettier](https://prettier.io/) for .md and other formats
62
+ - using docker for the purpose of formatting is really easy
63
 
64
+ 1. `chmod +x format` so that the `format` file is executable
65
+ 2. then simply use `./format` before adding your changes and all the files will be autoformatted
66
 
67
+ ## awesome data overview
 
 
 
68
 
69
+ ![](./docs/data_overview_yuri.jpg)
 
 
 
 
 
docs/data_overview_yuri.jpg ADDED
docs/huggingface-spaces.md DELETED
@@ -1,64 +0,0 @@
1
- Hugging Face Spaces and GitHub repositories serve different but complementary purposes. Here’s a comparison and how they can be used together:
2
-
3
- ### Comparison with GitHub Repositories
4
-
5
- - **GitHub Repository**:
6
-
7
- - **Purpose**: Primarily used for version control, collaboration, and sharing of code and projects.
8
- - **Capabilities**: Stores code, tracks changes, manages issues, and supports CI/CD pipelines.
9
- - **Usage**: Developers collaborate on software development projects, manage codebases, and deploy applications.
10
-
11
- - **Hugging Face Spaces**:
12
- - **Purpose**: Designed specifically for deploying interactive machine learning applications and demos.
13
- - **Capabilities**: Hosts and deploys machine learning models and applications using frameworks like Streamlit, Gradio, or custom HTML/CSS/JS.
14
- - **Usage**: Users create and share interactive demos and applications, especially in the field of machine learning.
15
-
16
- ### Integration with GitHub
17
-
18
- You can import a GitHub repository into Hugging Face Spaces to deploy an application hosted on GitHub. Here’s how to do it:
19
-
20
- 1. **Create a Space on Hugging Face**:
21
-
22
- - Go to the Hugging Face Spaces website and create a new Space.
23
-
24
- 2. **Link to GitHub Repository**:
25
-
26
- - During the setup of the new Space, you can link it to a GitHub repository. This allows Hugging Face Spaces to pull the code from your GitHub repo.
27
-
28
- 3. **Configure Your Space**:
29
-
30
- - Ensure your repository contains the necessary files for the framework you are using (Streamlit, Gradio, or HTML/CSS/JS).
31
- - For example, if you are using Streamlit, ensure you have a `requirements.txt` file for dependencies and a main Python script that runs the Streamlit app.
32
-
33
- 4. **Deploy the Application**:
34
- - Once linked, Hugging Face Spaces will automatically deploy the application from the GitHub repository.
35
- - Any updates pushed to the GitHub repository can automatically trigger redeployment of the application on Hugging Face Spaces.
36
-
37
- ### Example Steps to Import a GitHub Repo into Hugging Face Spaces
38
-
39
- 1. **Create a New Space**:
40
- - Navigate to Hugging Face Spaces and click on “New Space”.
41
- 2. **Set Up Space**:
42
-
43
- - Choose a name for your Space, select the appropriate SDK (e.g., Streamlit, Gradio, or HTML), and choose the visibility (public or private).
44
-
45
- 3. **Connect GitHub Repository**:
46
-
47
- - In the Space settings, you will find an option to link a GitHub repository. Provide the URL of your GitHub repository.
48
- - Hugging Face Spaces will clone your GitHub repository to use it as the source code for your Space.
49
-
50
- 4. **Configure and Deploy**:
51
-
52
- - Make sure your GitHub repository is set up correctly for the chosen framework. For example, a Streamlit app should have a `requirements.txt` and an entry-point script like `app.py`.
53
- - Once everything is set up, your Space will be deployed and can be accessed via a URL provided by Hugging Face.
54
-
55
- 5. **Update and Maintain**:
56
- - Any changes you push to the linked GitHub repository will be reflected in the deployed application after the repository is synced with Hugging Face Spaces.
57
-
58
- ### Benefits
59
-
60
- - **Version Control**: Leveraging GitHub’s version control capabilities ensures that your code is managed effectively.
61
- - **Collaboration**: Teams can collaborate on the development of the application using GitHub’s collaborative tools.
62
- - **Easy Deployment**: Hugging Face Spaces simplifies the deployment of interactive machine learning applications without the need for complex infrastructure management.
63
-
64
- By combining the strengths of GitHub and Hugging Face Spaces, you can efficiently develop, manage, and deploy machine learning applications.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
docs/key-takeaways-tb.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ## Trunk-Based Development: Key Points
2
+
3
+ 1. **Single Main Branch**: All developers commit to the trunk or main branch.
4
+ 2. **Short-Lived Branches**: Branches, if used, are short-lived and quickly merged back.
5
+ 3. **Frequent Integrations**: Code changes are integrated frequently, often multiple times a day.
6
+ 4. **Feature Flags**: Incomplete features are managed with feature flags to maintain trunk stability.
7
+
8
+ ## Benefits
9
+
10
+ - **Reduced Integration Problems**: Early conflict detection and resolution.
11
+ - **Higher Code Quality**: Continuous testing ensures stable and high-quality code.
12
+ - **Simpler Workflow**: Less overhead managing branches and merges.
13
+ - **Enhanced Collaboration**: Encourages teamwork and code reviews.
14
+
15
+ ## Challenges
16
+
17
+ - **Discipline Required**: Developers must write clean, well-tested code.
18
+ - **Handling Incomplete Features**: Requires careful use of feature flags.
19
+
20
+ ## Best Practices
21
+
22
+ - **Frequent Commits**: Small, incremental changes reduce integration risks.
23
+ - **Comprehensive Testing**: Automated tests for codebase coverage.
24
+ - **Feature Flags**: Manage incomplete or experimental features.
25
+ - **Code Reviews**: Maintain quality and knowledge sharing.
format CHANGED
@@ -1,7 +1,7 @@
1
  #! /bin/sh
2
 
3
  # Run as root, otherwise the container cannot modify the mounted files.
4
- docker run --rm --user root --volume $(pwd):/work tmknom/prettier prettier --loglevel warn --write .
5
 
6
  # Format Python files using Black
7
  docker run --rm --user root --volume $(pwd):/data cytopia/black:latest .
 
1
  #! /bin/sh
2
 
3
  # Run as root, otherwise the container cannot modify the mounted files.
4
+ docker run --rm --user root --volume $(pwd):/work tmknom/prettier prettier --log-level warn --write .
5
 
6
  # Format Python files using Black
7
  docker run --rm --user root --volume $(pwd):/data cytopia/black:latest .
python/{eda_jan.py → notebooks/eda_jan.py} RENAMED
File without changes