caucho
 XPath select()


Finding a node is useful but often you will want to select all nodes matching a pattern. For example, a web spider will want to select all links in a HTML page.

The select() call returns an Iterator letting you look at all the nodes. The Iterator returns nodes in document order.

You can precompile XPath patterns and then reuse the precompiled pattern over again. The XPath.find() call in the previous page used a convenient, but inefficient call to find the node. In this example, we'll precompile the pattern before using the select.

The following example reads the home page of http://localhost:8080 and and returns all the <a href> links in that page.

Spidering a Web Page
Pattern pattern = XPath.parseMatch("a[@href]");

Document doc = new Html().parseDocument("http://localhost:8080");

Iterator iter = pattern.select(doc);

while (iter.hasNext()) {
  Element elt = (Element) iter.next();

  System.out.println("link: " + elt.getAttribute("href"));
}

link: /index.xtp
link: /ref/faq.xtp
link: /ref/index.xtp
link: /javadoc/index.html

...

Summary

  • Precompiling XPath patterns is more efficient
  • XPath.select() can "spider" web pages
  • select returns nodes in document order
  • The pattern a[@href] returns <a> elements with an href attribute.

Copyright © 1998-2002 Caucho Technology, Inc. All rights reserved.
Resin® is a registered trademark, and HardCoretm and Quercustm are trademarks of Caucho Technology, Inc.