how to inverse lemmatization process given a lemma and a token?

  • Last Update :
  • Techknowledgy :

Turning a base form such as a lemma into a situation-appropriate form is called realization (or "surface realization"). Example from Wikipedia:

NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."

Suggestion : 2

Generally, in natural language processing, anycodings_python we want to get the lemma of a token. ,For example, we can map 'eaten' to 'eat' anycodings_python using wordnet lemmatization.,For example, we map 'go' to 'gone' given anycodings_python target form 'eaten'.,Is there any tools in python that can anycodings_python inverse lemma to a certain form?

Turning a base form such as a lemma into anycodings_python a situation-appropriate form is called anycodings_python realization (or "surface realization"). anycodings_python Example from Wikipedia:

NPPhraseSpec subject = nlgFactory.createNounPhrase("the", "woman");
subject.setPlural(true);
SPhraseSpec sentence = nlgFactory.createClause(subject, "smoke");
sentence.setFeature(Feature.NEGATED, true);
System.out.println(realiser.realiseSentence(sentence));
// output: "The women do not smoke."

Suggestion : 3

Component for assigning base forms to tokens using rules based on part-of-speech tags, or lookup tables. Different Language subclasses can implement their own lemmatizer components via language-specific factories. The default data used is provided by the spacy-lookups-data extension package.,If the lemmatization mode is set to "rule", which requires coarse-grained POS (Token.pos) to be assigned, make sure a Tagger, Morphologizer or another component assigning POS is available in the pipeline and runs before the lemmatizer.,During serialization, spaCy will export several data fields used to restore different aspects of the object. If needed, you can exclude them from serialization by passing in the string names via the exclude argument.,The default config is defined by the pipeline component factory and describes how the component should be configured. You can override its settings via the config argument on nlp.add_pipe or in your config.cfg for training. For examples of the lookups data format used by the lookup and rule-based lemmatizers, see spacy-lookups-data.

Example

config = {
   "mode": "rule"
}
nlp.add_pipe("lemmatizer", config = config)

Example

config = {
   "mode": "rule"
}
nlp.add_pipe("lemmatizer", config = config)

Example

# Construction via add_pipe with
default model
lemmatizer = nlp.add_pipe("lemmatizer")

# Construction via add_pipe with custom settings
config = {
   "mode": "rule",
   "overwrite": True
}
lemmatizer = nlp.add_pipe("lemmatizer", config = config)

Example

doc = nlp("This is a sentence.")
lemmatizer = nlp.add_pipe("lemmatizer")
# This usually happens under the hood
processed = lemmatizer(doc)

Example

lemmatizer = nlp.add_pipe("lemmatizer")
for doc in lemmatizer.pipe(docs, batch_size = 50):
   pass

Example

lemmatizer = nlp.add_pipe("lemmatizer")
lemmatizer.initialize(lookups = lookups)
lemmatizer = nlp.add_pipe("lemmatizer")
lemmatizer.initialize(lookups=lookups)
config.cfg[initialize.components.lemmatizer]

[initialize.components.lemmatizer.lookups]
@misc = "load_my_lookups.v1"