Skip to content

User Guide (English)

Prathmesh Ranaut edited this page Jun 15, 2017 · 17 revisions

User Guide

KefirBB User Guide, version 1.2, english.

About

KefirBB is a Java-library for text processing. Initially it was developed for BBCode (Bulletin Board Code) to HTML translation. But flexible configuration allows to use it in others situations. For example XML-to-HTML translation or for HTML filtration. Now it supports Textile and Markdown markup languages. Actually it's the most powerfull and flexible Java-library for BBCode parsing.

New in version 1.2

  • Full support of Textile from TxStyle.
  • Partial support of Markdown from Markdown.
  • New pattern element tags for URL and Email — <url/>, <email/>.
  • Conditional tag if in templates.

New in version 1.1

  • New pattern element tags for a beginning of line, an end of line and a blank line: <bol/>, <eol/>, <blankline/>.
  • The ghost tags. Parser parses them but doesn't change cursor position.
  • Actions for variables.
  • It is possible to describe a few patterns ion one code now.

New in version 1.0

  • The package name was changed to org.kefirsf.bb.
  • add an ability to ignore case in codes.
  • Better performance.
  • Add a limitation for code nesting for preventing java.lang.StackOverflowError
  • Add a configuration for HTML filtration.

Getting Started

for the beginning you have to add a dependency on kefirbb library to your project. It's easy for maven-based projects.

<dependency>
    <groupId>org.kefirsf</groupId>
    <artifactId>kefirbb</artifactId>
    <version>1.5</version>
</dependency>

or, using Gradle

compile 'org.kefirsf:kefirbb:1.5'

For other projects you have to download kefirbb-1.2.jar and put the library to the classpath of your application.

Text processing is done by objects which implements an interface org.kefirsf.bb.TextProcessor.

public interface TextProcessor {
    public CharSequence process(CharSequence source);
    public String process(String source);
    public StringBuilder process(StringBuilder source);
    public StringBuffer process(StringBuffer source);
}

As you can see the interface contains a few simple methods which get in parameters text by different types transform it and return transformed text in objects the same types.

To get the standard TextProcessor for BBCode to HTML translation you have to use a fabric org.kefirsf.bb.BBProcessorFactory.

TextProcessor processor = BBProcessorFactory.getInstance().create();

Now you can use it to translate your text.

assert "<b>text</b>".equals(processor.process("[b]text[/b]"));

The object processor is thread safe. So you can use it in a few threads same time.

Features

KefirBB has very flexible configuration. It allows to use it not only for BBCode to HTML translation but for others text translations. For example for HTML filtration of text which was wrote by user or for escaping special characters from a text. A user also can make a custom configuration for any text translations.

HTML Filtration

KefirBB contains a configuration for HTML filtration. It's needed to prevent XSS attacks if you allows to your users input HTML on the site and wants to show it others users. And to prevent problems with layout.

TextProcessor processor = BBProcessorFactory.getInstance()
    .createFromResource(ConfigurationFactory.SAFE_HTML_CONFIGURATION_FILE);
assert "<b>text</b>".equals(processor.process("<b onclick=\"alert('Attack!');\">text</b>"));

Textile

KefirBB fully supports a markup language Textile. The syntax description is available at TxStyle.

TextProcessor processor = BBProcessorFactory.getInstance()
    .createFromResource(ConfigurationFactory.TEXTILE_CONFIGURATION_FILE);
assert "<p><b>text</b></p>".equals(processor.process("**text**"));

Markdown

Since version 1.2 the library partially supports a markup language Markdown. The syntax description is available at Markdown Syntax. Current realization doesn't fully support markdown lists and blockquotes.

TextProcessor processor = BBProcessorFactory.getInstance()
    .createFromResource(ConfigurationFactory.MARKDOWN_CONFIGURATION_FILE);
assert "<p><strong>text</strong></p>".equals(processor.process("**text**"));

Escape special sequences

KefirBB contains a special class implemented an interface TextProcessor for replacement special character sequences. It contains a constructor with a parameter where you can put your special character sequences for replacement.

TextProcessor processor = new EscapeProcessor(
    new String[][]{
        {"a", "4"},
        {"e", "3"},
        {"l", "1"},
        {"o", "0"}
    }
);
assert "4bcd3fghijk1mn0pqrstuvwxyz".equals(processor.process("abcdefghijklmnopqrstuvwxyz"));

Escape XML

Escaping apecial XML characters is a task which appears very often. It is needed to put a text into XML or HTML. So KefirBB contains special fabric which creates an object of org.kefirsf.bb.EscapeProcessor configured special for escaping XML characters.

TextProcessor processor = EscapeXmlProcessorFactory.getInstance().create();
assert "&lt;escape tag&gt;".equals(processor.process("<escape tag>"));

Custom Configuration

You can create your custom configuration of text processor for your specific tasks. The configuration can be defined declarative in an XML file or programmatically. There are a few ways to use custom configuration.

The first way is to name a configuration file kefirbb.xml and put it in the root of classpath. Next to use standard factory.

TextProcessor processor = BBProcessorFactory.getInstance().create();

The factory first finds a configuration by the path classpath*:kefirbb.xml. If didn't found uses default configuration by the path classpath*:org/kefirsf/bb/default.xml. Some configuration parameters can be defined in classpath resource files kefirbb.properties and kefirbb.properties.xml which have standard syntax of java properties java.util.Properties`.

The second way is to give a configuration from classpath resource file.

TextProcessor processor = BBProcessorFactory.getInstance().createFromResource("my/package/config.xml");

It's needed when you have to use different configurations or can't put your configuration to the classpath.

The third is to put a configuration in a file in the file system.

TextProcessor processor = BBProcessorFactory.getInstance().create("config.xml");

or

TextProcessor processor = BBProcessorFactory.getInstance().create(new File("config.xml"));

The fourth is to create a programmatic configuration.

Configuration configuration  = new Configuration();
...
TextProcessor processor = BBProcessorFactory.getInstance().create(configuration);

You can give programmatic configuration from an XML configuration by the fabric org.kefirsf.bb.ConfigurationFactory.

Configuration Guide

XML Configuration

A configuration file is a XML file which describes a text translation. The permanent address of the XML schema is (http://kefirsf.org/kefirbb/schema/kefirbb-1.2.xsd). You have to use the tag configuration without any attributes as root element in your XML configuration file.

this is an example of simple configuration for escaping XML special sequences.

<?xml version="1.0" encoding="utf-8"?>
<configuration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns="http://kefirsf.org/kefirbb/schema"
               xsi:schemaLocation="http://kefirsf.org/kefirbb/schema http://kefirsf.org/kefirbb/schema/kefirbb-1.2.xsd">
    <code>
        <pattern>&amp;</pattern>
        <template>&amp;amp;</template>
    </code>
    <code>
        <pattern>&apos;</pattern>
        <template>&amp;apos;</template>
    </code>
    <code>
        <pattern>&lt;</pattern>
        <template>&amp;lt;</template>
    </code>
    <code>
        <pattern>&gt;</pattern>
        <template>&amp;gt;</template>
    </code>
    <code>
        <pattern>&quot;</pattern>
        <template>&amp;quot;</template>
    </code>
</configuration>

Code

Codes are the main entities of a text translation. A code defines which text fragment must be converted and how it must be converted.

A code id defined by a tag code. It contains two mandatory tags inside:

  • pattern — the pattern to find a text fragment for converting. Since version 1.1, you can define a few patterns in a code;
  • template — the template to generate new text.

Also a tag code contains two attributes:

  • name — a code name;
  • priority — a code priority (bigger priority is higher, by default 0).

This is an example of a tag code:

<code name="bold">
    <pattern ignoreCase="true">[b]<var inherit="true"/>[/b]</pattern>
    <template>&lt;b&gt;<var/>&lt;/b&gt;</template>
</code>

A tag code can be defined inside the tag configuration inside a tag scope.

Pattern

A pattern can contains text and tags — var, constant, junk, bol, eol, blankline, url, email. When text is being processed the processor finds text fragments by elements inside a tag pattern of the code. Also you can define an attribute ignoreCase which has type boolean. If the value of the attribute ignoreCase is true then finding text by the pattern will ignore character case.

A tag var defines variables and has attributes

  • name — a variable name, by default variable;
  • parse — mark that is needed to process text of the variable, by default true;
  • regex — regular expression for variable parsing, it is used only if parse=false;
  • scope — defines a scope of codes for text processing of the variable, used only if parse=true, by default ROOT;
  • inherit — defines that it is needed to inherit a scope from the outside code, by default false;
  • transparent — marks that variable must be visible outside the code, by default false;
  • action — a variable action
    • rewrite — rewrite a variable value;
    • append — add the text to the existing variable;
    • check — check if the variable text is the same as of an existing variable.
  • ghost — if it is true the processor doesn't change current cursor position, by default false.

A tag constant is ued to describe constants in a tag pattern. It has attributes

  • value — a contant value;
  • ignoreCase — if it is true then ignore case, by default true;
  • ghost — if it is true the processor doesn't change current cursor position, by default false.

A tag junk ignores all characters until a terminator.

A tag bol indicates a beginning of line. Many markup languages use a beginning of line so it was added. Be careful when you use a tag bol a cursor position isn't changed and it will be in the beginning of line so don't use a tag var with the same scope after the bol. It will produce a stack overflow.

A tag eol — an end of line. Processes characters of an end of line in all OS. Has attribute

  • ghost — if it is true the processor doesn't change current cursor position, by default false.

A tab blankline — a blank line. Has attribute

  • ghost — if it is true the processor doesn't change current cursor position, by default false.

A tag url is used for parsing an URL addresses. Has attributes

  • name — a variable name in which will be put an URL address, by default url;
  • local — allows to parse local addresses;
  • schemaless — allows to parse addresses without a schema, in this case will be used schema HTTP;
  • ghost — if it is true the processor doesn't change current cursor position, by default false.

A tag email — is used for parsing email addresses. Has attributes

  • name — a variable name in which will be put an email address, by default email;
  • ghost — if it is true the processor doesn't change current cursor position, by default false.
Template

A tag template can contains text and tags var and if.

A tag var is used for replacement to result text variable values. Has attributes

  • name — a variable name for replacement;
  • function — allows modify value of the variable before replacement. Now it supports functions
    • value — a variable value, by default;
    • length — length of a variable value text.

A condition tag if contains attribute name and check if the variable was initialized before or now. If the variable was initialized in the code then to the result text will be put content of the tag if. Otherwise no. a tag if can contains text and tags var and if the same way as a tag template.

Scope

A scope defines which codes can be used for text processing. By default the scope with the name ROOT is used. Even if it is not defined in the configuration it exists and contains all the codes defined in a tag configuration but scope. It's usefule for simple configurations with a few codes. A developer must not be worry about scopes in this case.

A scope is defined by a tag scope which is situated inside a tag configuration. For a tag scope are defined attributes

  • name — a scope name;
  • parent — a parent scope, all the codes from the parent scope will be put into the scope;
  • ignoreText — signs that it is needed to ignore all the text which is not a codes of the scope;
  • strong — a text can contains only the scope codes not;
  • min — a minimum count of codes which must be in the text, by default -1 is not defined;
  • max — a maximum count of codes which can be in the text, by default -1 is not defined;

Inside a tag scope tags are allowed

  • code — a code tag;
  • coderef — a reference to a tag code defined outside any tag scope. A tag coderef has an attribute name. It is a name of code.

Parameters

Parameters are predefined variables which can be used when a text generating in templates, a prefix and a suffix. Parameters are defined inside a tag params by tags param which have two attributes:

  • name — a variable name;
  • value — a variable value.

For example,

<params>
    <param name="music" value="Punk"/>
</params>

Also parameters can be defined in a separate file kefirbb.properties or kefirbb.properties.xml in classpath. File formats are defined by the class java.util.Properties.

Prefix and Suffix

Prefix and suffix are put in the beginning and the end of the text. They are defined by tags prefix and suffix the same way as a tag template.

<prefix>&lt;!-- bbcodes begin --&gt;</prefix>
<suffix>&lt;!-- bbcodes end --&gt;</suffix>

Programmatic Configuration

The programmatic configuration is defined by class org.kefirsf.bb.conf.Configuration. For example,

// Create configuration
Configuration cfg = new Configuration();

// Set the prefix and suffix
cfg.setPrefix(new Template(Arrays.asList(new Constant("["))));
cfg.setSuffix(new Template(Arrays.asList(new Constant("]"))));

// Configure default scope
Scope scope = new Scope(Scope.ROOT);

// Create code and add it to scope
Code code = new Code();
code.setPattern(new Pattern(Arrays.asList(new Constant("val"))));
code.setTemplate(new Template(Arrays.asList(new NamedValue("value"))));

Set<Code> codes = new HashSet<Code>();
codes.add(code);
scope.setCodes(codes);

// Set scope to configuration
cfg.setRootScope(scope);

// Set the parameter
Map<String, Object> params = new HashMap<String, Object>();
params.putAll(cfg.getParams());
params.put("value", "KefirBB");
cfg.setParams(params);

// Test the configuration
TextProcessor processor = BBProcessorFactory.getInstance().create(cfg);
assert "[KefirBB]".equals(processor.process("val"));