When editing your rules, you may use any of the following selectors and operators to define your output. XPath selectors are also fully supported.
Basic Selectors
| Pattern | Matches | Example | |
|---|---|---|---|
* | any element | * | |
tagname | elements with the given tag name | div, p | |
namespace|type | elements of type 'type' in the namespace ns | fb|name finds <fb:name> elements | |
#id | elements with attribute ID of "id" | div#container, #header | |
.class | elements with a class name of "class" | div.left, .post-body | |
element[attr] or [attr] | elements with an attribute named "attr" (with any value) | a[href], [title] | |
element[attr=val] or [attr=val] | elements with an attribute named "attr" and value equal to "val" | img[width=500], a[rel=nofollow] | |
[^attrPrefix] | elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets | [^data-], div[^data-] | |
[attr^=valPrefix] | elements with an attribute named "attr", and value starting with "valPrefix" | a[href^=http:] | |
[attr$=valSuffix] | elements with an attribute named "attr", and value ending with "valSuffix" | img[src$=.png] | |
[attr*=valContaining] | elements with an attribute named "attr", and value containing "valContaining" | a[href*=/search/] | |
[attr~=<em>regex</em>] | elements with an attribute named "attr", and value matching the regular expression | img[src~=(?i)\.(png jpe?g) |
The above may be combined in any order, such as div.header[title]
Combinators
The following can be used to specify certain elements based on their relation to other elements on the page (parents, children, siblings, etc.).
| Pattern | Matches | Example |
|---|---|---|
E F | an F element descended from an E element | div a, .logo h1 |
E > F | an F direct child of E | ol > li |
E + F | an F element immediately preceded by sibling E | li + li, div.head + div |
E ~ F | an F element preceded by sibling E | h1 ~ p |
E, F, G | all matching elements E, F, or G | a[href], div, h3 |
Pseudo Selectors
The following advanced selectors are also available.
Pattern | Matches | Example |
|---|---|---|
| elements that are the first child of some other element |
finds the first child element of a
that happens to be a
|
| elements that are the last child of some other element |
finds the last list-item in each unordered list |
| elements that are the only child of a parent element |
finds paragraphs without sibling elements |
| elements that are the first sibling of its type in the list of children of its parent element |
finds the first
element of each
|
| elements that are the last sibling of its type in the list of children of its parent element |
finds the last
element within
elements |
| an element that has a parent element and whose parent element has no other element children with the same expanded element name |
finds paragraphs without sibling
elements |
| elements that have no children at all |
finds paragraphs without children |
:nth-child(an+b) | elements that have an+b-1 siblings before them in the document tree, for any positive integer or zero value of n , and have a parent element. Can also take 'odd' and 'even' as arguments. |
finds every odd row of a table |
:nth-last-child(an+b) | elements that have an+b-1 siblings after after them in the document tree. |
finds the last two rows of a table |
:nth-of-type(an+b) | represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element |
|
:nth-last-of-type(an+b) | represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element |
|
:lt(n) | elements whose sibling index is less than n |
finds the first 2 cells of each row |
:gt(n) | elements whose sibling index is greater than n |
finds cells after skipping the first two |
:eq(n) | elements whose sibling index is equal to n |
finds the first cell of each row |
:has(selector) | elements that contains at least one element matching the selector |
finds divs that contain p elements |
:not(selector) | elements that do not match the selector |
|
:contains(text) | elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants. |
finds p elements containing the text "jsoup". |
:matches(regex) | elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants. |
finds table cells containing digits.
finds divs containing the text, case insensitively. |
:containsOwn(text) | elements that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants. |
finds p elements with own text "jsoup". |
:matchesOwn(regex) | elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants. |
finds table cells directly containing digits.
finds divs containing the text, case insensitively. |
The above may be combined in any order and with other selectors, such as .light:contains(name):eq(0)