Cheshire3 Object Model - Extractor¶
- class cheshire3.baseObjects.Extractor(session, config, parent=None)¶
An Extractor takes selected data and returns extracted values.
An Extractor is a processing object called by an Index with the value returned by a Selector, and extracts the values into an appropriate data structure (a dictionary/hash/associative array).
Example Extractors might extract all text from within a DOM node / etree Element, or select all text that occurs between a pair of selected DOM nodes / etree Elements.
Extractors must also be used on the query terms to apply the same keyword processing rules, for example.
- process_eventList(session, data)¶
Process a list of SAX events serialized in C3 internal format.
- process_node(session, data)¶
Process a DOM node.
- process_string(session, data)¶
Process and return the value of a raw string.
e.g. from an attribute value or the query.
- process_xpathResult(session, data)¶
Process the result of an XPath expression.
Convenience function to wrap the other process_* functions and do type checking.
The following implementations are included in the distribution by default:
- class cheshire3.extractor.SimpleExtractor(session, config, parent)¶
Base extractor, extracts exact text.
- class cheshire3.extractor.TeiExtractor(session, config, parent)¶
- class cheshire3.extractor.SpanXPathExtractor(session, config, parent)¶
Select all text that occurs between a pair of selections.