PHP get data from external url encoding UTF-8
in PHP MySQL
If you want loading a HTML from an external server. It doesn’t matter if you want to get content from page with meta charset=”utf-8″, we use file_get_contents() function normally. But if it is not utf-8, another code like Shift_JIS or any other. You will get something similar to this:
I tried both saving the HTML to a file and outputting it with UTF-8 encoding. It didn’t work so it means file_get_contents() is already returning broken HTML.
This doesn’t mean file_get_contents() doesn’t work or the site blocks content. The main problem is in the charset tag [charset=Shift_JIS]. To deal with this, I use mb_detect_encoding (Detect character encoding) and iconv (Convert string to requested character encoding) functions. I for example with Japanese font format, after encode you will get the following result:
Note:
– If you need to replace the image link, css, js,… you can use the str_replace() function
– If you are selective about some content, use the preg_match() function
Eg:
$url = ‘http://onlinemeetingsoft.com/jp/cisco-webex-meetings-download.html’;
$htmlContent = file_get_contents($url);$currentString = [‘<img src=”../../path/’];
$replaceString = [‘<img src=”http://your_url/’];
$newContent = str_replace($currentString, $replaceString, $htmlContent);// handle with newContent…
// If you want to get the content in this table (table in td): ‘<table><tr><td><table border=0 cellpadding=2 cellspacing=0 align=center’
preg_match(‘/<\/table><tr><td><table border=0 cellpadding=2 cellspacing=0 align=center(.*?)<\/table>/s’, $htmlContent, $match);
// var_dump($match);
// echo $match[0];/**
* Encoding
*/
function encoding($string) {
$currentEncoding = mb_detect_encoding($string, ‘auto’);
$result = iconv($currentEncoding, ‘UTF-8’, $string);
return $result;
}echo encoding($match[0]);
Result PHP get data from external url encoding UTF-8:
Your comment