Text.pm
- NAME
- SYNOPSIS
- DESCRIPTION
- INTERNAL FUNCTIONS
- NAME
- DESCRIPTION
- SUMMARY OF OPTIONS
- OPTIONS
- SEE ALSO
- AUTHOR
NAME
Communiware::Format:Text - preprocess routines which are called on item texts before placing them into database
SYNOPSIS
$text = Communiware::Format::Text::ToInternal($text,{SERVER=>someserver,..}) $text = Communiware::Format::Text::FromInternal($ctx,$text)
DESCRIPTION
This module provides preprocessing functions, used by various posing interfaces to convert data into HTML, which would be stored in the Communiware database
ToInternal
Preprocesses text, entered in REPLY form or in posting form. Arguments: text and reference to the hash of the item attributes.
Hash should contain at least SERVER element, which would be used to resolve cmw: links without explicit site specification.
Adds to the hash element HREF which value is the list of links to other communiware items, found in the text and SRC_REPR with value Text.
Recognizes following formatting:
- _word_
- converted to <EM>
- line break
- converted to <BR>
- empty line
- converted to <P>
- valid URL
- URLs starting with http,ftp or https are converted to hyperlinks
- cmw:item(text)
- references to communiware items which looks like cmw:item_id(text) are converted to <A HREF=``/server/item_id''>text</A>. Note. Nested parentheses in text are not handled correctly
-
Syntax:
-
ToInternal $text,\%attributes,format
-
Side effect - replaces value of $attributes{HREF} by list of links to items found in the text
-
Format can be
- spar
- Strict paragraphs - resulting html would always start with <P> and end with </P> and can cosist of multiple paragraphs
- rpar
- Relaxed paragraphs - if text consists of multiple paragraphs, they'll be separated with <P></P>, but surroinding <P>/</P> would never be output. This is default
- npar
- No paragraps. Multiple consequitive newlines in input would be treated as single newline.
- plain
- No paragraphs, no hyperlinks, no any other HTML tags.
- bare
- More plain text than 'plain'. No any tags, no HTML escaping.
-
Bare format is useful for text analysers and other tools which could be confused by any control character sequences.
-
converted to <STRONG>
return_defined
INTERNAL FUNCTIONS
string2html
string2html - function applying bold/italic transformations correctly It also converts urls into hrefs
text2html
This function applies transformation to whole text
NAME
Communiware::Format::Text - mark up text as HTML
DESCRIPTION
The text2html
function marks up plain text as HTML. By default it
expands tabs and converts HTML metacharacters into the corresponding
entities. More complicated transformations, such as splitting the text
into paragraphs or marking up bulleted lists, can be carried out by
setting the appropriate options.
SUMMARY OF OPTIONS
These options always apply:
urls Convert URLs to links email Convert email addresses to links bold Mark up words with *asterisks* in bold emphased Mark up words with _underscores_ as underlined
You can then choose to treat the text according to one of these options:
paras Treat text as paragraph-oriented
The following options apply when the paras
option is specified:
bullets Mark up bulleted paragraphs as unordered list numbers Mark up numbered paragraphs as ordered list
text2html
will issue a warning if it is passed nonsensical options,
for example headings
but not paras
. These warnings can be
supressed by setting $Communiware::Format::Text::QUIET to true.
OPTIONS
- bold
-
Words surrounded with asterisks are marked up in bold, so
*abc*
becomes<B>abc</B>
. - emphased
-
Words surrounded with underscores are marked up with emphased, so
_abc_
becomes<U>abc</U>
. - urls
- Spots Uniform Resource Locators (URLs) in the text and converts them to links. For example
-
See https://perl.com/.
-
becomes
-
See <TT><A HREF="https://perl.com/">https://perl.com/</A></TT>.
SEE ALSO
The HTML::Entities
module (part of the LWP package) provides
functions for encoding and decoding HTML entities.
Tom Christiansen has a complete implementation of RFC 822 structured
field bodies. See
http://www.perl.com/CPAN/authors/Tom_Christiansen/scripts/ckaddr.gz
.
Seth Golub's txt2html
utility does everything that HTML::FromText
does, and a few things that it would like to do. See
http://www.thehouse.org/txt2html/
.
RFC 822: ``Standard for the Format of ARPA Internet Text Messages''
describes the syntax of email addresses (the more esoteric features of
structured field bodies, in particular quoted-strings, domain literals
and comments, are not recognized by HTML::FromText
). See
ftp://src.doc.ic.ac.uk/rfc/rfc822.txt
.
RFC 1630: ``Universal Resource Identifiers in WWW'' lists the protocols
that may appear in URLs. HTML::FromText
also recognizes ``https:'',
but ignores ``file:'' because experience suggests that it results in too
many false positives. See ftp://src.doc.ic.ac.uk/rfc/rfc1630.txt
.
AUTHOR
Gareth Rees <garethr@cre.canon.co.uk>
.
It was very much adopted for Communiware and it is necessary to remove
extra code.