I am using the following xPath expression:
//div[@id='col_izq']//div[starts-with(@id,'col_izq_tit')]//h1/a |
//div[@id='col_izq']//div[@id='caja_nacional']//h2/a
On this HTML file: http://ift.tt/1MNNxsn
Via PHP, as follows:
$xpath=new DOMXPath($domdoc);
$ALL=$xpath->query($xpathqry);
And it returns an array with ONE single match.
But when testing my xPath in the same site, via FirePath (fierfox addon to test xpath) it returns 9 matches. (See attached screenshot)
The interesting thing here is that it has always been working in my previous cases. I wonder if I am doing something wrong with my xPath query, or maybe I am loading the HTML file in the wrong way:
This is the part in my PHP class, that fetches the HTML and makes sure it is UTF-8 encoded.
$opts = array('http' => array('header' => 'Accept-Charset: UTF-8, *;q=0'));
$context = stream_context_create($opts);
$html=file_get_contents('http://'.$this->motorConfig['domain'].'/'.$seccion,false,$context);
$html=mb_convert_encoding($html, 'UTF-8', mb_detect_encoding($html, 'UTF-8, ISO-8859-1', true));
//$html=str_replace("\0", '', $html); //Avoid PHP BUG http://ift.tt/1HmjQe4
$this->dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath=new DOMXPath($this->dom); //her cuz must be set after loading HTML into DOM
via Chebli Mohamed
Aucun commentaire:
Enregistrer un commentaire