Custom API Selectors and Filters

When editing your rules, you may use any of the following selectors and operators to define your output. XPath selectors are also fully supported.

Basic Selectors

PatternMatchesExample
*any element*
tagnameelements with the given tag namediv, p
namespace|typeelements of type 'type' in the namespace nsfb|name finds <fb:name> elements
#idelements with attribute ID of "id"div#container, #header
.classelements with a class name of "class"div.left, .post-body
element[attr] or [attr]elements with an attribute named "attr" (with any value)a[href], [title]
element[attr=val] or [attr=val]elements with an attribute named "attr" and value equal to "val"img[width=500], a[rel=nofollow]
[^attrPrefix]elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets[^data-], div[^data-]
[attr^=valPrefix]elements with an attribute named "attr", and value starting with "valPrefix"a[href^=http:]
[attr$=valSuffix]elements with an attribute named "attr", and value ending with "valSuffix"img[src$=.png]
[attr*=valContaining]elements with an attribute named "attr", and value containing "valContaining"a[href*=/search/]
[attr~=<em>regex</em>]elements with an attribute named "attr", and value matching the regular expressionimg[src~=(?i)\.(png jpe?g)

The above may be combined in any order, such as div.header[title]

Combinators

The following can be used to specify certain elements based on their relation to other elements on the page (parents, children, siblings, etc.).

PatternMatchesExample
E Fan F element descended from an E elementdiv a, .logo h1
E > Fan F direct child of Eol > li
E + Fan F element immediately preceded by sibling Eli + li, div.head + div
E ~ Fan F element preceded by sibling Eh1 ~ p
E, F, Gall matching elements E, F, or Ga[href], div, h3

Pseudo Selectors

The following advanced selectors are also available.

Pattern

Matches

Example

:first-child

elements that are the first child of some other element

div > p:first-child

finds the first child element of a

div

that happens to be a

p

:last-child

elements that are the last child of some other element

ul > li:last-child

finds the last list-item in each unordered list

:only-child

elements that are the only child of a parent element

p:only-child

finds paragraphs without sibling elements

:first-of-type

elements that are the first sibling

of its type

in the list of children of its parent element

div > p:first-of-type

finds the first

p

element of each

div

:last-of-type

elements that are the last sibling

of its type

in the list of children of its parent element

div > span:last-of-type

finds the last

span

element within

div

elements

:only-of-type

an element that has a parent element and whose parent element has no other element children

with the same expanded element name

p:only-of-type

finds paragraphs without sibling

p

elements

:empty

elements that have no children at all

p:empty

finds paragraphs without children

:nth-child(an+b)

elements that have

an+b-1

siblings

before

them in the document tree, for any positive integer or zero value of

n

, and have a parent element. Can also take 'odd' and 'even' as arguments.

tr:nth-child(2n+1)

finds every odd row of a table

:nth-last-child(an+b)

elements that have

an+b-1

siblings after

after

them in the document tree.

tr:nth-lastchild(-n+2)

finds the last two rows of a table

:nth-of-type(an+b)

represents an element that has

an+b-1

siblings with the same expanded element name

before

it in the document tree, for any zero or positive integer value of n, and has a parent element

img:nth-of-type(2n+1)

:nth-last-of-type(an+b)

represents an element that has

an+b-1

siblings with the same expanded element name

after

it in the document tree, for any zero or positive integer value of n, and has a parent element

img:nth-last-of-type(2n+1)

:lt(n)

elements whose sibling index is less than

n

td:lt(3)

finds the first 2 cells of each row

:gt(n)

elements whose sibling index is greater than

n

td:gt(1)

finds cells after skipping the first two

:eq(n)

elements whose sibling index is equal to

n

td:eq(0)

finds the first cell of each row

:has(selector)

elements that contains at least one element matching the

selector

div:has(p)

finds divs that contain p elements

:not(selector)

elements that do not match the

selector

div:not(.logo) finds all divs that do not have the "logo" class.
div:not(:has(div)) finds divs that do not contain divs.

:contains(text)

elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants.

p:contains(jsoup)

finds p elements containing the text "jsoup".

:matches(regex)

elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants.

td:matches(\\d+)

finds table cells containing digits.

div:matches((?i)login)

finds divs containing the text, case insensitively.

:containsOwn(text)

elements that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants.

p:containsOwn(jsoup)

finds p elements with own text "jsoup".

:matchesOwn(regex)

elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants.

td:matchesOwn(\\d+)

finds table cells directly containing digits.

div:matchesOwn((?i)login)

finds divs containing the text, case insensitively.

The above may be combined in any order and with other selectors, such as .light:contains(name):eq(0)