UTF-8 encoding
Starting with version 3.1, Simplicité is using by default using the UTF-8 encoding (previous versions were using by default the ISO-8859-1 encoding).
Depending on your installation, using UTF-8 may require some addition configurations described below.
If you experiment any character encoding issues, you must have misconfigured or forgotten something.
JVM
It is required to set the JVM default encoding to UTF-8.
The most reliable way to do it is by adding an explict -Dfile.encoding=UTF-8
to the JVM options.
Note: on Linux it is also possible to set the
LANG
environment variable (e.g.en_US.UTF-8
) either for the account running Tomcat level or, preferably, globally.
Application servers
Tomcat
For Tomcat 6 and 7, the connectors definitions in conf/server.xml
needs to be updated to force UTF-8 for URI encoding:
<Connector URIEncoding="UTF-8" ... />
Starting with Tomcat 8 this is the default, so you only need to change something if you don't use UTF-8.
Databases
HSQLDB
Nothing to do :-)
MySQL
The default encoding of the database must be set to utf8
and the default collation to utf8_unicode_ci
Warning: You can use other language-specific unicode collations instead of the
utf8_unicode_ci
if needed but if you use theutf8_bin
collation, the columns search will be case sensitive
This can be set as server's default in your MySQL config file:
[mysqld]
(...)
collation_server=utf8_unicode_ci
character_set_server=utf8
(...)
Note: when changing these values a database service restart is needed
This can also be set at the database level by:
CREATE DATABASE <database name> DEFAULT CHARACTER SET utf8 [DEFAULT COLLATE utf8_unicode_ci];
This can be also done after creation by:
ALTER DATABASE <database name> DEFAULT CHARACTER SET utf8 [DEFAULT COLLATE utf8_unicode_ci];
In both case defining an explicit collation is not mandatory (the value above is the default value for uft8
chraracter set).
Note 1: for using modern characters such as emoticons, you must use
utf8mb4
character set instead ofuft8
Note 2: If the database was loaded before its character set is set to UTF-8 you must reload it or convert explicitly all tables (see below)
When using the setup package, the db-mysql.properties
must be adjusted for setting UTF-8 support in the JDBC URL of the datasource,
this means the JDBC URL must contains the &characterEncoding=utf8&characterResultSets=utf8
options).
For Tomcat, this results in a datasource descriptor similar to this one:
<Resource
name="jdbc/mysqlexample"
type="javax.sql.DataSource"
auth="Container"
username="<username>"
password="<password>"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://<host>:<port>/<database name>?autoReconnect=true&characterEncoding=utf8&characterResultSets=utf8"/>
To check current charset and collation of existing tables you can use:
SHOW TABLE STATUS LIKE '<table name>';
To convert existing tables to UTF-8 you can use:
ALTER TABLE <table name> CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
PostgreSQL
The database must be created with UTF-8 encoding:
create database simplicite encoding 'UTF8' lc_ctype 'en_US.UTF-8' lc_collate 'en_US.UTF-8' template <an UTF-8 database template name>;
Note: if you have created a database using another encoding you must drop it and do it again
No additional configuration is then need at the datasource descriptor level.
Oracle
The unicode support must be present and installed for server. Nothing else is required.
Microsoft SQLServer
No native UTF-8 support (unless using nchar
and nvarchar
types which is not the default).
Note: this constraint does not seem applicable to SQLServer 2017+ on Linux
Custom code
Java and server side scripts
Make sure that you use Globals.getPlatformEncoding()
for designating the platform encoding (instead of hard-coded encoding name)
when you use APIs that have encoding argument(s).
Note: you shouldn't be using such APIs unless you really need to do explicit encoding conversions (e.g. from a ISO-8859-1 encoded file in an adapter).
Custom JSP pages and servlets
If you have custom JSP pages (you shouldn't if you use recent versions of the platform for which external objects must be preferred to custom JSP pages and servlets). you need to adjust the following directive if present :
<%@ page pageEncoding="UTF-8" %>
You should also adjust the following instruction if present in your custom JSP pages and/or servlets:
request.setCharacterEncoding("UTF-8");
Note: If you use the standard API
ServletTool.setHTTPHeaders
method instead of above directive and/or instruction (which is definitely the right approach) you don''t need to do anything._
Others
If you need to convert a text file from ISO-8859-1
to UTF-8
you can, for instance, use the Linux iconv
command line tool:
iconv -f ISO-8859-1 -t UTF-8 iso.txt > utf.txt
Most modern text editors also provide features to convert files from one encoding to another.