When editing your rules, you may use any of the following selectors and operators to define your output. XPath selectors are also fully supported.
Basic Selectors
Pattern | Matches | Example | |
---|---|---|---|
* | any element | * | |
tagname | elements with the given tag name | div , p | |
namespace|type | elements of type 'type' in the namespace ns | fb|name finds <fb:name> elements | |
#id | elements with attribute ID of "id" | div#container , #header | |
.class | elements with a class name of "class" | div.left , .post-body | |
element[attr] or [attr] | elements with an attribute named "attr" (with any value) | a[href] , [title] | |
element[attr=val] or [attr=val] | elements with an attribute named "attr" and value equal to "val" | img[width=500] , a[rel=nofollow] | |
[^attrPrefix] | elements with an attribute name starting with "attrPrefix". Use to find elements with HTML5 datasets | [^data-] , div[^data-] | |
[attr^=valPrefix] | elements with an attribute named "attr", and value starting with "valPrefix" | a[href^=http:] | |
[attr$=valSuffix] | elements with an attribute named "attr", and value ending with "valSuffix" | img[src$=.png] | |
[attr*=valContaining] | elements with an attribute named "attr", and value containing "valContaining" | a[href*=/search/] | |
[attr~= | elements with an attribute named "attr", and value matching the regular expression | img[src~=(?i)\.(png | jpe?g)] |
The above may be combined in any order, such as div.header[title]
Combinators
The following can be used to specify certain elements based on their relation to other elements on the page (parents, children, siblings, etc.).
Pattern | Matches | Example |
---|---|---|
E F | an F element descended from an E element | div a , .logo h1 |
E > F | an F direct child of E | ol > li |
E + F | an F element immediately preceded by sibling E | li + li , div.head + div |
E ~ F | an F element preceded by sibling E | h1 ~ p |
E, F, G | all matching elements E, F, or G | a[href], div, h3 |
Pseudo Selectors
The following advanced selectors are also available.
Pattern | Matches | Example |
---|---|---|
:first-child | elements that are the first child of some other element | div > p:first-child finds the first child element of a div that happens to be a p |
:last-child | elements that are the last child of some other element | ul > li:last-child finds the last list-item in each unordered list |
:only-child | elements that are the only child of a parent element | p:only-child finds paragraphs without sibling elements |
:first-of-type | elements that are the first sibling of its type in the list of children of its parent element | div > p:first-of-type finds the first p element of each div |
:last-of-type | elements that are the last sibling of its type in the list of children of its parent element | div > span:last-of-type finds the last span element within div elements |
:only-of-type | an element that has a parent element and whose parent element has no other element children with the same expanded element name | p:only-of-type finds paragraphs without sibling p elements |
:empty | elements that have no children at all | p:empty finds paragraphs without children |
:nth-child( | elements that have an+b-1 siblings before them in the document tree, for any positive integer or zero value of n, and have a parent element. Can also take 'odd' and 'even' as arguments. | tr:nth-child(2n+1) finds every odd row of a table |
:nth-last-child( | elements that have an+b-1 siblings after after them in the document tree. | tr:nth-lastchild(-n+2) finds the last two rows of a table |
:nth-of-type( | represents an element that has an+b-1 siblings with the same expanded element name before it in the document tree, for any zero or positive integer value of n, and has a parent element | img:nth-of-type(2n+1) |
:nth-last-of-type( | represents an element that has an+b-1 siblings with the same expanded element name after it in the document tree, for any zero or positive integer value of n, and has a parent element | img:nth-last-of-type(2n+1) |
:lt( | elements whose sibling index is less than n | td:lt(3) finds the first 2 cells of each row |
:gt( | elements whose sibling index is greater than n | td:gt(1) finds cells after skipping the first two |
:eq( | elements whose sibling index is equal to n | td:eq(0) finds the first cell of each row |
:has( | elements that contains at least one element matching the selector | div:has(p) finds divs that contain p elements |
:not( | elements that do not match the selector | div:not(.logo) finds all divs that do not have the "logo" class.div:not(:has(div)) finds divs that do not contain divs. |
:contains( | elements that contains the specified text. The search is case insensitive. The text may appear in the found element, or any of its descendants. | p:contains(jsoup) finds p elements containing the text "jsoup". |
:matches( | elements whose text matches the specified regular expression. The text may appear in the found element, or any of its descendants. | td:matches(\\d+) finds table cells containing digits. div:matches((?i)login) finds divs containing the text, case insensitively. |
:containsOwn( | elements that directly contains the specified text. The search is case insensitive. The text must appear in the found element, not any of its descendants. | p:containsOwn(jsoup) finds p elements with own text "jsoup". |
:matchesOwn( | elements whose own text matches the specified regular expression. The text must appear in the found element, not any of its descendants. | td:matchesOwn(\\d+) finds table cells directly containing digits. div:matchesOwn((?i)login) finds divs containing the text, case insensitively. |
The above may be combined in any order and with other selectors, such as .light:contains(name):eq(0)