cfahlgren1 HF staff commited on
Commit
318fad3
1 Parent(s): 6032e5b

improve histogram

Browse files
Files changed (1) hide show
  1. src/snippets/histogram.md +15 -2
src/snippets/histogram.md CHANGED
@@ -7,7 +7,6 @@ code: |
7
  from histogram(
8
  table_name,
9
  column_name,
10
- bin_count := 10
11
  )
12
  ---
13
 
@@ -27,7 +26,21 @@ from histogram(
27
 
28
  - `table_name`: The name of the table or a subquery result.
29
  - `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
30
- - `bin_count`: The number of bins to use in the histogram.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
 
32
 
33
  ## Histogram of the length of the input persona from the `PersonaHub` dataset
 
7
  from histogram(
8
  table_name,
9
  column_name,
 
10
  )
11
  ---
12
 
 
26
 
27
  - `table_name`: The name of the table or a subquery result.
28
  - `column_name`: The name of the column for which to create the histogram, you can use different expressions to summarize the data such as length of a string.
29
+ - `bin_count`: The number of bins to use in the histogram. (_**Optional**_)
30
+ - `technique`: The binning technique to use. (_**Optional**_)
31
+
32
+
33
+ ## Binning Techniques
34
+
35
+ | Technique | Description |
36
+ |-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
37
+ | `auto` | Automatically selects the best binning technique based on the data type. If the data type is not numeric or timestamp, it defaults to `sample`. For numeric or timestamp data, it defaults to `equi-width-nice`. |
38
+ | `sample` | Uses distinct values in the column as bins. This technique is useful when the column has a small number of distinct values. |
39
+ | `equi-height` | Creates bins such that each bin has approximately the same number of data points. This technique is useful for ensuring that each bin has a similar number of entries. This can be helpful for skewed distributions. |
40
+ | `equi-width` | Creates bins of equal width. This technique is useful for numeric data. You want each bin to cover the same range of values. |
41
+ | `equi-width-nice` | Creates bins of equal width with "nice" boundaries. This technique is similar to `equi-width`. It adjusts the bin boundaries to be more human-readable (e.g., rounding to the nearest whole number). |
42
+
43
+ You can find more information in the [PR](https://github.com/duckdb/duckdb/pull/12590) that added this feature.
44
 
45
 
46
  ## Histogram of the length of the input persona from the `PersonaHub` dataset