Anonymous | Login | Signup for a new account | 2012-09-13 19:32 PDT |
Main | Blog | My View | View Issues | Change Log | Roadmap | IRC Chat | Repositories | Scrum Board |
View Issue Details [ Jump to Notes ] | [ Issue History ] [ Print ] | ||||||||||
ID | Project | Category | View Status | Date Submitted | Last Update | ||||||
130 | Source Integration | WebSVN | public | 2010-04-08 03:06 | 2012-01-13 02:09 | ||||||
Reporter | obones | ||||||||||
Assigned To | John Reese | ||||||||||
Priority | normal | Severity | major | Reproducibility | N/A | ||||||
Status | assigned | Resolution | open | ||||||||
Product Version | |||||||||||
Target Version | Fixed in Version | ||||||||||
Summary | 130: SVN plugin should decode svn output | ||||||||||
Description | When using the SVN plugin, there are calls made to the svn binary to retrieve the log messages and create changesets from them. The output of the svn binary is used "as is" and then inserted in the database. Unfortunately, this is can lead to crashes when used with repositories that use non ASCII characters in their message logs. What's even worse is that the encoding of the output depends on the operating system configuration. Here I have tested two: Debian: UTF-8 Windows XP: Codepage 850 While the first one goes through alright to the database (keeping in mind issue 129), the second crashes at the very first log message that contains an accented characters because they are not valid values for the collation used in the database. And because the codepage 850 cannot be easily changed in a windows server, this means that the plugin is currently unusable in such environment. | ||||||||||
Tags | No tags attached. | ||||||||||
Attached Files | |||||||||||
![]() |
||||||||||||||||
|
![]() |
|
obones (reporter) 2010-04-08 06:35 |
Whenever shell_exec is called for $svn in SourceSVN.php, the output should be decoded. For instance, in import_full we have this call: $t_svnlog = explode( "\n", shell_exec( "$svn log -v -r $t_rev:HEAD --limit 200 $t_url" ) ); I have successfully replaced it with this code: $exec_result = shell_exec( "$svn log -v -r $t_rev:HEAD --limit 200 $t_url" ); $encoding = ENCODING; $internal_encoding = iconv_get_encoding('internal_encoding'); if ($encoding != $internal_encoding) $exec_result = iconv($encoding, $internal_encoding.'//TRANSLIT', $exec_result); $t_svnlog = explode( "\n", $exec_result ); In my simple test "ENCODING" is a defined value inside SourceSVN.php but it would be cleaner to have a configuration parameter, either through the configuration page or through a config.php or equivalent file. Obviously this fix puts a requirement for "iconv" to be configured for use inside PHP. Fortunately for me, this is the case in both my servers. |
obones (reporter) 2010-04-08 07:53 |
Well, while that works under my windows machine, it does not work for the linux one. On both, I expect this string: Vidéo On the former, I get it encoded in CP850 just fine On the later however, the result of shell_exec is this: Vid?\195?\169o Where the e acute is replaced by those two codes describing it in UTF-8. So converting this to the internal encoding does not change a thing because the source string is not UTF-8 anyway. I'm still trying to understand why I get this output under Linux |
obones (reporter) 2010-04-09 00:52 |
Ok, I finally found out why this is happening. This is because under linux the environment under which the call happens is blank. So the svn binary uses the "C" locale and as such has no idea what to do with non ASCII characters. To fix this, I have added a new define called "SVN_EXPORT" which value is "LANG=fr_FR.UTF-8" and modified the svn_binary static function like this: # Linux / UNIX paths $t_binary = $t_path . DIRECTORY_SEPARATOR . 'svn'; if ( is_file( $t_binary ) && is_executable( $t_binary ) ) { if (SVN_EXPORT != '') return $s_binary = SVN_EXPORT."; ".$t_binary; else return $s_binary = $t_binary; } I think it would be better to use a configuration parameter because modifying a define is less natural to end users. And bluntly using "en_US.UTF-8" will not work all the time, this locale is not always installed by default on non English systems. For instance, the Debian etch did not have it, while the Mandriva 2010.0 does. This new define along with the previous one allowed me to get valid encoding from the svn output and with the change mentioned in the first note of issue 129 allowed me to get proper content in the database. If the utf8_encode calls are not here, the database refuses the insert, but I'm not sure they should always be here because if the internal encoding of PHP is already UTF-8 we might end up with double encoding of UTF-8... |
Philipp Beckmann (reporter) 2010-04-09 12:13 edited on: 2010-04-09 12:13 |
see also 093 |
Karl Reichert (reporter) 2010-06-21 02:09 edited on: 2010-06-21 02:17 |
obones, which value have you set for ENCODING? I guess, you are a French user, which value should I set as a German user? Edit: I had to set it to 'CP850', this works fine on my German WinXP Server. |
genius_p (reporter) 2010-06-28 00:46 |
I, change code in function process_svn_log foreach( $p_svnlog as $t_line ) { $t_line = iconv('CP866', 'UTF-8', $t_line); # starting state, do nothing I think encoding name add to language file. |
Markus Hastreiter (reporter) 2011-05-16 10:35 |
I had the same issue and I can confirm that the suggested solution from genius_p (130:227) fixed the issue in my case (Windows Server 2003 with IIS 6.0) |
Erdoğan Kürtür (reporter) 2012-01-13 02:09 edited on: 2012-01-13 02:10 |
(http://leetcode.net/mantis/view.php?id=130#c227 [^]) also solves my situation. Used CP1254 for Turkish. (W2K8 Sp2 x64 w/ WAMP) |
![]() |
|||
Date Modified | Username | Field | Change |
2010-04-08 03:06 | obones | New Issue | |
2010-04-08 06:35 | obones | Note Added: 157 | |
2010-04-08 07:53 | obones | Note Added: 158 | |
2010-04-08 12:49 | John Reese | Relationship added | related to 129 |
2010-04-08 12:51 | John Reese | Relationship added | duplicate of 058 |
2010-04-08 12:51 | John Reese | Status | new => assigned |
2010-04-08 12:51 | John Reese | Assigned To | => John Reese |
2010-04-09 00:52 | obones | Note Added: 171 | |
2010-04-09 12:13 | Philipp Beckmann | Note Added: 174 | |
2010-04-09 12:13 | Philipp Beckmann | Note Edited: 174 | View Revisions |
2010-06-21 02:09 | Karl Reichert | Note Added: 226 | |
2010-06-21 02:17 | Karl Reichert | Note Edited: 226 | View Revisions |
2010-06-28 00:46 | genius_p | Note Added: 227 | |
2010-07-18 05:41 | John Reese | Relationship added | has duplicate 167 |
2011-05-16 10:35 | Markus Hastreiter | Note Added: 291 | |
2012-01-13 02:09 | Erdoğan Kürtür | Note Added: 331 | |
2012-01-13 02:10 | Erdoğan Kürtür | Note Edited: 331 | View Revisions |
Copyright © 2000 - 2012 MantisBT Group
Time: 0.1644 seconds. memory usage: 8,458 KB |