Custom API Selectors and Filters

When editing your rules, you may use any of the following selectors and operators to define your output. XPath selectors are also fully supported.

Basic Selectors

PatternMatchesExample
*any element*
tagnameelements with the given tag namediv, p
namespace|typeelements of type 'type' in the namespace nsfb|name finds <fb:name> elements
#idelements with attribute ID of "id"div#container, #header
.classelements with a class name of "class"div.left, .post-body
element[attr] or [attr]elements with an attribute named "attr" (with any value)a[href], [title]
element[attr=val] or [attr=val]elements with an attribute named "attr" and value equal to "val"img[width=500], a[rel=nofollow]
[^attrPrefix]elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets[^data-], div[^data-]
[attr^=valPrefix]elements with an attribute named "attr", and value starting with "valPrefix"a[href^=http:]
[attr$=valSuffix]elements with an attribute named "attr", and value ending with "valSuffix"img[src$=.png]
[attr*=valContaining]elements with an attribute named "attr", and value containing "valContaining"a[href*=/search/]
[attr~=elements with an attribute named "attr", and value matching the regular expressionimg[src~=(?i)\.(png|jpe?g)]

The above may be combined in any order, such as div.header[title]

Combinators

The following can be used to specify certain elements based on their relation to other elements on the page (parents, children, siblings, etc.).

PatternMatchesExample
E Fan F element descended from an E elementdiv a, .logo h1
E > Fan F direct child of Eol > li
E + Fan F element immediately preceded by sibling Eli + li, div.head + div
E ~ Fan F element preceded by sibling Eh1 ~ p
E, F, Gall matching elements E, F, or Ga[href], div, h3

Pseudo Selectors

The following advanced selectors are also available.

PatternMatchesExample
:first-childelements that are the first child of some other elementdiv > p:first-child finds the first child element of a div that happens to be a p
:last-childelements that are the last child of some other elementul > li:last-child finds the last list-item in each unordered list
:only-childelements that are the only child of a parent elementp:only-child finds paragraphs without sibling elements
:first-of-typeelements that are the first sibling of its type in the list of children of its parent elementdiv > p:first-of-type finds the first p element of each div
:last-of-typeelements that are the last sibling of its type in the list of children of its parent elementdiv > span:last-of-type finds the last span element within div elements
:only-of-typean element that has a parent element and whose parent element has no other element children with the same expanded element namep:only-of-type finds paragraphs without sibling p elements
:emptyelements that have no children at allp:empty finds paragraphs without children
:nth-child(elements that have an+b-1 siblings before them in the document tree, for any positive integer or zero value of n, and have a parent element. Can also take 'odd' and 'even' as arguments.tr:nth-child(2n+1) finds every odd row of a table
:nth-last-child(elements that have an+b-1 siblings after after them in the document tree.tr:nth-lastchild(-n+2) finds the last two rows of a table
:nth-of-type(represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent elementimg:nth-of-type(2n+1)
:nth-last-of-type(represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent elementimg:nth-last-of-type(2n+1)
:lt(elements whose sibling index is less than ntd:lt(3) finds the first 2 cells of each row
:gt(elements whose sibling index is greater than ntd:gt(1) finds cells after skipping the first two
:eq(elements whose sibling index is equal to ntd:eq(0) finds the first cell of each row
:has(elements that contains at least one element matching the selectordiv:has(p) finds divs that contain p elements
:not(elements that do not match the selectordiv:not(.logo) finds all divs that do not have the "logo" class.
div:not(:has(div)) finds divs that do not contain divs.
:contains(elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants.p:contains(jsoup) finds p elements containing the text "jsoup".
:matches(elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants.td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively.
:containsOwn(elements that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants.p:containsOwn(jsoup) finds p elements with own text "jsoup".
:matchesOwn(elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants.td:matchesOwn(\\d+) finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively.

The above may be combined in any order and with other selectors, such as .light:contains(name):eq(0)