PHP get data from external url encoding UTF-8

in PHP MySQL


If you want loading a HTML from an external server. It doesn’t matter if you want to get content from page with meta charset=”utf-8″, we use file_get_contents() function normally. But if it is not utf-8, another code like Shift_JIS or any other. You will get something similar to this:

 

Get content error format

Get content error format

 

I tried both saving the HTML to a file and outputting it with UTF-8 encoding. It didn’t work so it means file_get_contents() is already returning broken HTML.

This doesn’t mean file_get_contents() doesn’t work or the site blocks content. The main problem is in the charset tag [charset=Shift_JIS]. To deal with this, I use mb_detect_encoding (Detect character encoding) and iconv (Convert string to requested character encoding) functions. I for example with Japanese font format, after encode you will get the following result:

Meta charset Shift_JIS

Meta charset Shift_JIS

Note:

– If you need to replace the image link, css, js,… you can use the str_replace() function

– If you are selective about some content, use the preg_match() function

Eg:

$url = ‘http://onlinemeetingsoft.com/jp/cisco-webex-meetings-download.html’;
$htmlContent = file_get_contents($url);

$currentString = [‘<img src=”../../path/’];
$replaceString = [‘<img src=”http://your_url/’];
$newContent = str_replace($currentString, $replaceString, $htmlContent);

// handle with newContent…

// If you want to get the content in this table (table in td): ‘<table><tr><td><table border=0 cellpadding=2 cellspacing=0 align=center’

preg_match(‘/<\/table><tr><td><table border=0 cellpadding=2 cellspacing=0 align=center(.*?)<\/table>/s’, $htmlContent, $match);

// var_dump($match);
// echo $match[0];

/**
* Encoding
*/
function encoding($string) {
$currentEncoding = mb_detect_encoding($string, ‘auto’);
$result = iconv($currentEncoding, ‘UTF-8’, $string);
return $result;
}

echo encoding($match[0]);

 

Result PHP get data from external url encoding UTF-8:

 

 

Tags: , ,