metadata
language: en
license: apache-2.0
library_name: span-marker
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
- generated_from_span_marker_trainer
datasets:
- midas/inspec
metrics:
- precision
- recall
- f1
widget:
- text: >-
Genetic algorithm guided selection: variable selection and subset
selection A novel genetic algorithm guided selection method, GAS, has been
described. The method utilizes a simple encoding scheme which can
represent both compounds and variables used to construct a QSAR/QSPR
model. A genetic algorithm is then utilized to simultaneously optimize the
encoded variables that include both descriptors and compound subsets. The
GAS method generates multiple models each applying to a subset of the
compounds. Typically the subsets represent clusters with different
chemotypes. Also a procedure based on molecular similarity is presented to
determine which model should be applied to a given test set compound. The
variable selection method implemented in GAS has been tested and compared
using the Selwood data set -LRB- n = 31 compounds; nu = 53 descriptors
-RRB-. The results showed that the method is comparable to other published
methods. The subset selection method implemented in GAS has been first
tested using an artificial data set -LRB- n = 100 points; nu = 1
descriptor -RRB- to examine its ability to subset data points and second
applied to analyze the XLOGP data set -LRB- n = 1831 compounds; nu = 126
descriptors -RRB-. The method is able to correctly identify artificial
data points belonging to various subsets. The analysis of the XLOGP data
set shows that the subset selection method can be useful in improving a
QSAR/QSPR model when the variable selection method fails
- text: >-
Presentation media, information complexity, and learning outcomes
Multimedia computing provides a variety of information presentation
modality combinations. Educators have observed that visuals enhance
learning which suggests that multimedia presentations should be superior
to text-only and text with static pictures in facilitating optimal human
information processing and, therefore, comprehension. The article reports
the findings from a 3 -LRB- text-only, overhead slides, and multimedia
presentation -RRB- * 2 -LRB- high and low information complexity -RRB-
factorial experiment. Subjects read a text script, viewed an acetate
overhead slide presentation, or viewed a multimedia presentation depicting
the greenhouse effect -LRB- low complexity -RRB- or photocopier operation
-LRB- high complexity -RRB-. Multimedia was superior to text-only and
overhead slides for comprehension. Information complexity diminished
comprehension and perceived presentation quality. Multimedia was able to
reduce the negative impact of information complexity on comprehension and
increase the extent of sustained attention to the presentation. These
findings suggest that multimedia presentations invoke the use of both the
verbal and visual working memory channels resulting in a reduction of the
cognitive load imposed by increased information complexity. Moreover,
multimedia superiority in facilitating comprehension goes beyond its
ability to increase sustained attention; the quality and effectiveness of
information processing attained -LRB- i.e., use of verbal and visual
working memory -RRB- is also significant
- text: >-
Adaptive filtering for noise reduction in hue saturation intensity color
space Even though the hue saturation intensity -LRB- HSI -RRB- color model
has been widely used in color image processing and analysis, the
conversion formulas from the RGB color model to HSI are nonlinear and
complicated in comparison with the conversion formulas of other color
models. When an RGB image is degraded by random Gaussian noise, this
nonlinearity leads to a nonuniform noise distribution in HSI, making
accurate image analysis more difficult. We have analyzed the noise
characteristics of the HSI color model and developed an adaptive spatial
filtering method to reduce the magnitude of noise and the nonuniformity of
noise variance in the HSI color space. With this adaptive filtering
method, the filter kernel for each pixel is dynamically adjusted,
depending on the values of intensity and saturation. In our experiments we
have filtered the saturation and hue components and generated edge maps
from color gradients. We have found that by using the adaptive filtering
method, the minimum error rate in edge detection improves by approximately
15%
- text: >-
Restoration of broadband imagery steered with a liquid-crystal optical
phased array In many imaging applications, it is highly desirable to
replace mechanical beam-steering components -LRB- i.e., mirrors and
gimbals -RRB- with a nonmechanical device. One such device is a nematic
liquid crystal optical phased array -LRB- LCOPA -RRB-. An LCOPA can
implement a blazed phase grating to steer the incident light. However,
when a phase grating is used in a broadband imaging system, two adverse
effects can occur. First, dispersion will cause different incident
wavelengths arriving at the same angle to be steered to different output
angles, causing chromatic aberrations in the image plane. Second, the
device will steer energy not only to the first diffraction order, but to
others as well. This multiple-order effect results in multiple copies of
the scene appearing in the image plane. We describe a digital image
restoration technique designed to overcome these degradations. The
proposed postprocessing technique is based on a Wiener deconvolution
filter. The technique, however, is applicable only to scenes containing
objects with approximately constant reflectivities over the spectral
region of interest. Experimental results are presented to demonstrate the
effectiveness of this technique
- text: >-
A comparison of computational color constancy Algorithms. II. Experiments
with image data For pt.I see ibid., vol. 11, no. 9, p.972-84 -LRB- 2002
-RRB-. We test a number of the leading computational color constancy
algorithms using a comprehensive set of images. These were of 33 different
scenes under 11 different sources representative of common illumination
conditions. The algorithms studied include two gray world methods, a
version of the Retinex method, several variants of Forsyth's -LRB- 1990
-RRB- gamut-mapping method, Cardei et al.'s -LRB- 2000 -RRB- neural net
method, and Finlayson et al.'s color by correlation method -LRB- Finlayson
et al. 1997, 2001; Hubel and Finlayson 2000 -RRB-. We discuss a number of
issues in applying color constancy ideas to image data, and study in depth
the effect of different preprocessing strategies. We compare the
performance of the algorithms on image data with their performance on
synthesized data. All data used for this study are available online at
http://www.cs.sfu.ca/~color/data, and implementations for most of the
algorithms are also available -LRB- http://www.cs.sfu.ca/~color/code
-RRB-. Experiments with synthesized data -LRB- part one of this paper
-RRB- suggested that the methods which emphasize the use of the input data
statistics, specifically color by correlation and the neural net
algorithm, are potentially the most effective at estimating the
chromaticity of the scene illuminant. Unfortunately, we were unable to
realize comparable performance on real images. Here exploiting pixel
intensity proved to be more beneficial than exploiting the details of
image chromaticity statistics, and the three-dimensional -LRB- 3-D -RRB-
gamut-mapping algorithms gave the best performance
pipeline_tag: token-classification
co2_eq_emissions:
emissions: 20.795
source: codecarbon
training_type: fine-tuning
on_cloud: false
gpu_model: 1 x NVIDIA GeForce RTX 3090
cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
ram_total_size: 31.777088165283203
hours_used: 0.137
model-index:
- name: SpanMarker with bert-base-uncased on Inspec
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
name: Inspec
type: midas/inspec
split: test
metrics:
- type: f1
value: 0.5934525191548642
name: F1
- type: precision
value: 0.5666149412547107
name: Precision
- type: recall
value: 0.6229588106263709
name: Recall
SpanMarker with bert-base-uncased on Inspec
This is a SpanMarker model trained on the Inspec dataset that can be used for Named Entity Recognition. This SpanMarker model uses bert-base-uncased as the underlying encoder. See train.py for the training script.
Model Details
Model Description
- Model Type: SpanMarker
- Encoder: bert-base-uncased
- Maximum Sequence Length: 256 tokens
- Maximum Entity Length: 8 words
- Training Dataset: Inspec
- Language: en
- License: apache-2.0
Model Sources
- Repository: SpanMarker on GitHub
- Thesis: SpanMarker For Named Entity Recognition
Model Labels
Label | Examples |
---|---|
KEY | "Content Atomism", "philosophy of mind", "IBS" |
Evaluation
Metrics
Label | Precision | Recall | F1 |
---|---|---|---|
all | 0.5666 | 0.6230 | 0.5935 |
KEY | 0.5666 | 0.6230 | 0.5935 |
Uses
Direct Use
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-uncased-keyphrase-inspec")
# Run inference
entities = model.predict("Adaptive filtering for noise reduction in hue saturation intensity color space Even though the hue saturation intensity -LRB- HSI -RRB- color model has been widely used in color image processing and analysis, the conversion formulas from the RGB color model to HSI are nonlinear and complicated in comparison with the conversion formulas of other color models. When an RGB image is degraded by random Gaussian noise, this nonlinearity leads to a nonuniform noise distribution in HSI, making accurate image analysis more difficult. We have analyzed the noise characteristics of the HSI color model and developed an adaptive spatial filtering method to reduce the magnitude of noise and the nonuniformity of noise variance in the HSI color space. With this adaptive filtering method, the filter kernel for each pixel is dynamically adjusted, depending on the values of intensity and saturation. In our experiments we have filtered the saturation and hue components and generated edge maps from color gradients. We have found that by using the adaptive filtering method, the minimum error rate in edge detection improves by approximately 15%")
Downstream Use
You can finetune this model on your own dataset.
Click to expand
from span_marker import SpanMarkerModel, Trainer
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-uncased-keyphrase-inspec")
# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003
# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-bert-base-uncased-keyphrase-inspec-finetuned")
Training Details
Training Set Metrics
Training set | Min | Median | Max |
---|---|---|---|
Sentence length | 15 | 138.5327 | 557 |
Entities per sentence | 0 | 8.2507 | 41 |
Training Hyperparameters
- learning_rate: 5e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Environmental Impact
Carbon emissions were measured using CodeCarbon.
- Carbon Emitted: 0.021 kg of CO2
- Hours Used: 0.137 hours
Training Hardware
- On Cloud: No
- GPU Model: 1 x NVIDIA GeForce RTX 3090
- CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
- RAM Size: 31.78 GB
Framework Versions
- Python: 3.9.16
- SpanMarker: 1.3.1.dev
- Transformers : 4.29.2
- PyTorch: 2.0.1+cu118
- Datasets: 2.14.3
- Tokenizers: 0.13.2