Wazup 1.0
About:
Wazup is a module that parses web-pages, extacts usefull information from the web-garbage and displays it using any Label module (Label/xLabel/xLabelLight). Also you can write extracted info to the file and use it by Oborzevatel module to have "true HTML".
System Requiments:
There are no any specific software or hardware limitations. I hope...
Using:
Each web-page query must be captioned in the next manner:
*Wazup [name]
All query settings have the following form
[name][option] [value]
Example:
*Wazup MySiteNews
MySiteNewsURL http://www.mysite.ru/index.html
Sample configuration using xLabelLight 2.8.3:
;===============================================
; Web-page monitoring (Wazup v1.0)
;-----------------------------------------------
; This configuration isn't practically usefull for people
; who don't know Russian, but it was alone page available
; locally on my machine :)
*Wazup LSAtRu
; NewsItem describes a pattern for extracting information about
; one news item. This little trick would help us to avoid
; silly copying
NewsItem "news_top{%quote}>{%}</td>{*}right{%quote}>{%}</div></td>{*}news_post{%quote}>{%}<div align={%quote}right"
LSAtRuURL http://www.litestep.bip.ru/
LSAtRuUpdateInterval 10
LSAtRuInputString "$NewsItem${*}$NewsItem${*}$NewsItem${*}$NewsItem${*}$NewsItem$"
LSAtRuOutputString LiteStep@Russia News\n\n{%1}\n{%2}\n{%3}\n\n{%4}\n{%5}\n{%6}\n\n{%7}\n{%8}\n{%9}\n\n{%10}\n{%11}\n{%12}\n\n{%13}\n{%14}\n{%15}\n\n
LSAtRuEnabled true
LSAtRuOnChecked !Execute [!WazupSetUpdateInterval LSAtRu 900][!LabelShow NewsLabel]
LSAtRuOnFailure !Execute [!WazupSetUpdateInterval LSAtRu 10]
LSAtRuOnUpdated !alert "Someting new!"
LSAtRuLocalFile "$MiscDir$lsnews.html"
LSAtRuDisplayOn NewsLabel
;======================
*Label NewsLabel
;----------------------
NewsLabelX 100
NewsLabelY 100
NewsLabelWidth 300
NewsLabelHeight 600
NewsLabelText "Here would be the dragons"
NewsLabelImage lsnews.png
NewsLabelImageMode stretch
NewsLabelImageTopEdge 3
NewsLabelImageBottomEdge 3
NewsLabelLeftBorder 5
NewsLabelRightBorder 5
NewsLabelTopBorder 10
NewsLabelBottomBorder 5
NewsLabelAutoLineBreak true
NewsLabelAlign left
NewsLabelVertAlign top
NewsLabelStartHidden
NewsLabelScroll vertical-up
Configuration:
Any web-query setting must begin from the query-name, specified in the
*Wazup line.
Patameters:
(query-name)URL [web-page]
Web page URL. You cannot use local files as URL!
Default: http://www.shellfront.org/
(query-name)LocalFile [file]
If set, this file would be used for tracking changes of the web-page. If you skipped this option then temporary file would be used and OnUpdated action would be disabled
Default: empty string, tracking disabled
(query-name)Enabled [false/true]
If enabled, news would be automatically updated. Else you need to use !WazupCheck bang manually
Default: true
(query-name)UpdateInterval [number]
Time interval between checking (in seconds!).
Default : 600 (10 minutes)
(query-name)InputString [pattern]
This string defines a page pattern which would be used to determine placement of the target information on the page.
The module compares the pattern with a downloaded page and extracts usefull substrings from there.
Count of the extracted text strings started from 1.
For example, if target page looks like this:
<html>
<body>
21.04.2004<br>
Seg@<br>
Wazzzzzup!
</body>
</html>
and the pattern is the following:
<body>{%}<br>{%}<br>{%}</body>
wazup.dll would extract 3 substrings from the page:
1 - 21.04.2004
2 - Seg@
3 - Wazzzzzup!
More detailed information about how to write pattern available here
Default: {%}
(query-name)OutputString [pattern]
This string defines a form of the output, e.g. transformed text. It is just a text where strings to be extracted are replaced with {%N} (N is the number of an extracted string).
For example, if parsing of the web-page give us the following set of strings:
1 - 21.04.2004
2 - Seg@
3 - Wazzzzzup!
then the following pattern:
Post date: {%1}\nNews Maker: {%2}\n{%3}
let us to show in Label "right" formatted text:
Post date: 21.04.2004
News Maker: Seg@
Wazzzzzup!
Default: {%1}
(query-name)OnChecked [action]
Action performed after successfull downloading and parsing of the web page.
Default: !none
(query-name)OnFailure [action]
Action performed if the downloading or the page parsing failed.
Default: !none
(query-name)OnUpdated [action]
Action performed if the page changed after last checking. Required LocalFile to be set to work.
Default: !none
(query-name)OutputLabel [label name]
If set and not empty then this label would be used to display output string.
Default: empty
(query-name)OutputFile [file path]
If set and not empty then output string would be written to this file after each successfull web-page checking
You may use this option, for instance, if you want to display "true HTML" with Oborzevatel module. Wazup.dll doesn't
remove HTML tags before writing a file, although for Label output does.
Default: empty
!Bangs:
First parameter of each !bang is query name.
For example, to read a page described by MySiteNews query you need to type the following thing:
!WazupCheck MySiteNews
Full list of available !bangs:
!WazupCheck (query-name)
Check web page just now.
!WazupEnable (query-name)
Enable autoupdating of the page.
!WazupDisable (query-name)
Disable autoupdating.
!WazupToggle (query-name)
Toggle autoupdating state.
!WazupSetURL (query-name) [URL]
Change source web-page URL.
!WazupSetInputString (query-name) [pattern]
Change web-page pattern.
!WazupSetOutputString (query-name) [pattern]
Change output format.
!WazupSetUpdateInterval (query-name) [time in seconds]
Change time interval between checking.
Writing a pattern:
Pattern is a regular string where some pieces of text replaced with escape-sequences. Here is a list of escape-sequences you may use in Wazup input-string pattern
{*}
Any text. Use this to skip something long but doesn't matter for you
{%}
Extracted substring. This sequence means that here is usefull information which must to be memorized for future use in output.
{%,N}
Extracted substring consists of N symbols
{%quote}
Doublequotes
Extracted strings are numerated from 1 using the order of extraction.
Simple example of pattern usage.
Let's imagine that we have a page with the following content:
<html>
<body>
We're the champions, my friend!
</body>
<html>
and the following pattern:
MySiteNewsInputString "<body>{%}</body>"
When module parses the page, it skip everything until the first enterance of <body>, then read substring (user specified {%} here) until it would meet </body>. Resulting string would be marked as substring #1.
When output pattern is something like this
MySiteNewsOutputString Msg: {%1}
output string would be the next:
Msg: We're the champions, my friend!
Notes:
Just some important notes:
If InputString contain spaces it should be framed with doublequotes.
If InputString contain doublequotes, you need to replace them with escape-sequence: {%quote}
If you want to extract news body, you should do this like in the sample config-file: use $eVar$ to define separate news item. It would save your time and make RC more readable :)
Don't forget that maximal length of the RC-file line is 4096 characters (with expanded enviroment variables)!!!
That's why sometimes you may use {*} instead of {%quote} - it is shorter
Changelog:
Version 1.0, 21.04.2004
Initial release... and final, I really hope :)
Author:
Handle :
Sergey Gagarin a.k.a. Seg@
E-Mail :
inform-sega@freemail.ru
Web :
http://www.litestep.bip.ru/
ICQ : 162261148
IRC : #litestep @ freenode.net