Notice: this is a static mirror for historical purposes.

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
130Source IntegrationWebSVNpublic2010-04-08 03:062012-01-13 02:09
Reporterobones 
Assigned ToJohn Reese 
PrioritynormalSeveritymajorReproducibilityN/A
StatusassignedResolutionopen 
Product Version 
Target VersionFixed in Version 
Summary130: SVN plugin should decode svn output
DescriptionWhen using the SVN plugin, there are calls made to the svn binary to retrieve the log messages and create changesets from them.
The output of the svn binary is used "as is" and then inserted in the database. Unfortunately, this is can lead to crashes when used with repositories that use non ASCII characters in their message logs.
What's even worse is that the encoding of the output depends on the operating system configuration. Here I have tested two:

Debian: UTF-8
Windows XP: Codepage 850

While the first one goes through alright to the database (keeping in mind issue 129), the second crashes at the very first log message that contains an accented characters because they are not valid values for the collation used in the database.
And because the codepage 850 cannot be easily changed in a windows server, this means that the plugin is currently unusable in such environment.
TagsNo tags attached.
Attached Files

- Relationships
duplicate of 058assignedJohn Reese umlauts in changeset-comments don't get displayed/saved correctly 
related to 129resolvedJohn Reese database tables should be created with UTF-8 columns instead of the default collation 
has duplicate 167resolvedJohn Reese Two byte character does not shown correctly in changeset note. 

-  Notes
User avatar (157)
obones (reporter)
2010-04-08 06:35

Whenever shell_exec is called for $svn in SourceSVN.php, the output should be decoded. For instance, in import_full we have this call:

$t_svnlog = explode( "\n", shell_exec( "$svn log -v -r $t_rev:HEAD --limit 200 $t_url" ) );

I have successfully replaced it with this code:

$exec_result = shell_exec( "$svn log -v -r $t_rev:HEAD --limit 200 $t_url" );
            
$encoding = ENCODING;
$internal_encoding = iconv_get_encoding('internal_encoding');
if ($encoding != $internal_encoding)
  $exec_result = iconv($encoding, $internal_encoding.'//TRANSLIT', $exec_result);
        
$t_svnlog = explode( "\n", $exec_result );

In my simple test "ENCODING" is a defined value inside SourceSVN.php but it would be cleaner to have a configuration parameter, either through the configuration page or through a config.php or equivalent file.
Obviously this fix puts a requirement for "iconv" to be configured for use inside PHP. Fortunately for me, this is the case in both my servers.
User avatar (158)
obones (reporter)
2010-04-08 07:53

Well, while that works under my windows machine, it does not work for the linux one. On both, I expect this string:
Vidéo

On the former, I get it encoded in CP850 just fine
On the later however, the result of shell_exec is this:

Vid?\195?\169o

Where the e acute is replaced by those two codes describing it in UTF-8. So converting this to the internal encoding does not change a thing because the source string is not UTF-8 anyway.
I'm still trying to understand why I get this output under Linux
User avatar (171)
obones (reporter)
2010-04-09 00:52

Ok, I finally found out why this is happening.
This is because under linux the environment under which the call happens is blank. So the svn binary uses the "C" locale and as such has no idea what to do with non ASCII characters.
To fix this, I have added a new define called "SVN_EXPORT" which value is "LANG=fr_FR.UTF-8" and modified the svn_binary static function like this:

# Linux / UNIX paths
$t_binary = $t_path . DIRECTORY_SEPARATOR . 'svn';
if ( is_file( $t_binary ) && is_executable( $t_binary ) ) {
  if (SVN_EXPORT != '')
    return $s_binary = SVN_EXPORT."; ".$t_binary;
  else
    return $s_binary = $t_binary;
}

I think it would be better to use a configuration parameter because modifying a define is less natural to end users.
And bluntly using "en_US.UTF-8" will not work all the time, this locale is not always installed by default on non English systems. For instance, the Debian etch did not have it, while the Mandriva 2010.0 does.
This new define along with the previous one allowed me to get valid encoding from the svn output and with the change mentioned in the first note of issue 129 allowed me to get proper content in the database.
If the utf8_encode calls are not here, the database refuses the insert, but I'm not sure they should always be here because if the internal encoding of PHP is already UTF-8 we might end up with double encoding of UTF-8...
User avatar (174)
Philipp Beckmann (reporter)
2010-04-09 12:13
edited on: 2010-04-09 12:13

see also 093

User avatar (226)
Karl Reichert (reporter)
2010-06-21 02:09
edited on: 2010-06-21 02:17

obones, which value have you set for ENCODING?

I guess, you are a French user, which value should I set as a German user?

Edit: I had to set it to 'CP850', this works fine on my German WinXP Server.

User avatar (227)
genius_p (reporter)
2010-06-28 00:46

I, change code in function process_svn_log
foreach( $p_svnlog as $t_line ) {
  $t_line = iconv('CP866', 'UTF-8', $t_line);
  # starting state, do nothing

I think encoding name add to language file.
User avatar (291)
Markus Hastreiter (reporter)
2011-05-16 10:35

I had the same issue and I can confirm that the suggested solution from genius_p (130:227) fixed the issue in my case (Windows Server 2003 with IIS 6.0)
User avatar (331)
Erdoğan Kürtür (reporter)
2012-01-13 02:09
edited on: 2012-01-13 02:10

(http://leetcode.net/mantis/view.php?id=130#c227 [^]) also solves my situation.
Used CP1254 for Turkish.

(W2K8 Sp2 x64 w/ WAMP)


- Issue History
Date Modified Username Field Change
2010-04-08 03:06 obones New Issue
2010-04-08 06:35 obones Note Added: 157
2010-04-08 07:53 obones Note Added: 158
2010-04-08 12:49 John Reese Relationship added related to 129
2010-04-08 12:51 John Reese Relationship added duplicate of 058
2010-04-08 12:51 John Reese Status new => assigned
2010-04-08 12:51 John Reese Assigned To => John Reese
2010-04-09 00:52 obones Note Added: 171
2010-04-09 12:13 Philipp Beckmann Note Added: 174
2010-04-09 12:13 Philipp Beckmann Note Edited: 174 View Revisions
2010-06-21 02:09 Karl Reichert Note Added: 226
2010-06-21 02:17 Karl Reichert Note Edited: 226 View Revisions
2010-06-28 00:46 genius_p Note Added: 227
2010-07-18 05:41 John Reese Relationship added has duplicate 167
2011-05-16 10:35 Markus Hastreiter Note Added: 291
2012-01-13 02:09 Erdoğan Kürtür Note Added: 331
2012-01-13 02:10 Erdoğan Kürtür Note Edited: 331 View Revisions


Copyright © 2000 - 2012 MantisBT Group
Time: 0.1644 seconds.
memory usage: 8,458 KB
Powered by Mantis Bugtracker

hosted with
Linode