file://InternalText/Help/Usage Untitled 2008-01-21 / English / us-ascii 3393 score, 148 phrases, 3816 words, 21 kb, 868 terms, 0 links. extended engines words query arrow urls accelerator command ring top-level equivalent delete lines dialog key text keystroke result current clicking mouse scrolling string thread txt memory invoke downloaded WordsEx.exe Version 1.0 Copyright (c) 2008, Glenn Scheper. Words,Extended is an Internet text information retrieval, extraction and display program. Words,Extended is freeware, and may be freely distributed. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Instructions: Download the Words,Extended executable program file WordsEx.exe to your Windows (Vista, XP, or 2000) desktop. You may immediately execute the Words,Extended program to begin using it right away. No program installation is required. Neither is any program uninstallation required. Nothing will be left behind after you delete the Words,Extended program. Let's create a shortcut key to improve your Words,Extended usage experience. Right-click on the Words,Extended program icon that you have saved on the desktop, and choose 'Create Shortcut'. Next, right-click on the 'Shortcut to Words,Extended' icon that has just appeared on the desktop, and choose 'Properties'. In the Properties dialog, click inside the 'Shortcut Key' box. Type the period key ('.') and click OK. You have just assigned the combination keystroke 'control+alt+period', easy to type with the right hand, as a shortcut key to begin executing Words,Extended, and also to restore the already executing Words,Extended program window whenever it has been minimized. First, try reading some text in Words,Extended using the smooth scrolling feature. Copy some text to the Windows clipboard from any program, and use the Words,Extended accelerator keystroke control+v to paste the text into Words,Extended. That paste action will create the first and only text view in a top-level ring of all text views, and such a first new current view will replace the rambling Omega welcome screen. If some parts of the text that you copied seem to be missing, don't worry, it is a feature, not a bug. Words,Extended hides blocks of text consisting of only poor quality navigational text, such as often appear in web pages, and shows only the blocks of text that are rich in sentences, as determined by punctuation and capitalization heuristics, or in the neighborhood of such good blocks. To see all the text, you may use the menu command Help, Extend View or its equivalent accelerator keystroke control+x, to toggle between the default viewing mode showing only sentence-rich text and an extended viewing mode showing all the text, also the page location, title, date, language, charset, page statistics, and a most-used and best-ranked uncommon words list, one short block of page summary annotations that Words,Extended inserts at the top of every text. The four arrow keys operate smooth scrolling. Use either the left arrow key or the right arrow key to start smooth scrolling, and to change its speed: the left arrow key for 10% slower and the right arrow key for 10% faster. Use the up arrow key to stop smooth scrolling. When not smooth scrolling, each up arrow key will move the text backwards by one line. The down arrow key will hide and then minimize the Words,Extended window. This is an important stealth function for when ... well, you know when. The spacebar also performs the same stealth function. It is after minimizing Words,Extended that you need a convenient Windows shortcut key assigned to quickly and easily restore the Words,Extended window. In addition to the unique smooth scrolling feature, all of the keystrokes Home, End, Page Up, Page Down, and also the vertical scrollbar operate in the conventional Windows fashion. Also the menu command File, Find in Current View... or its equivalent accelerator keystrokes control+f or F3 operate in the conventional Windows fashion. Black on monochromatic green should be the easiest colors on your eyes, especially if you wear glasses, but you can use the menu command Help, Green/White to change from that, to black on white, white on black, or green on black. The font size can be changed, as much as you like, in 10% steps up or down by typing the plus (+) key or minus (-) key, or also those same two keys unshifted: the equals (=) key for larger, and the underscore (_) key for smaller. Sometimes, when pasting or fetching files of plain text, or in html files using PRE tags that Words,Extended honors, pre-wrapped text lines might exceed the Words,Extended window width, and be displayed as alternating long lines and short lines. Use the menu command Help, Wrap View or its equivalent accelerator keystroke control+z to scan through the current view lines (but not in the actual text that is backing the current view) changing any single newlines to spaces, which will temporarily re-wrap all words within each block of text to fit the current window, but not join up blocks delimited by double newlines. This wrapping change will persist through scrolling in the current view, but disappear after you navigate away from and back to this view. How can you navigate away from the current view? Well, if the top-level ring of views contained more than a single text, the menu commands Next and Back would circle around this top-level ring of views. The Next and Back commands also have equivalent accelerator keystrokes, either control+right arrow or alt+right arrow for Next, and either control+left arrow or alt+left arrow for Back. You may also use the menu command Delete, or one of its two equivalent accelerator keystrokes, del, or esc, to remove a text currently being viewed from the top-level ring of views, and navigate back to a prior view, if there is one. It is okay for the top-level ring of views to be empty. A text that is deleted from the top-level ring of views is not lost, but retreats to deeper Words,Extended program memory until it may be called forth again by various means. The single last text deleted from the top-level ring of views may be restored into the ring by using the menu command File, Undelete Last View or its equivalent accelerator keystroke control+del. You may set ten bookmarks by typing a number key (0-9) at any time, which remembers the current text and the text offset of the very top line that is currently in view; allowing you to return to the same location later by typing the menu command File, Jump to Bookmark (0-9) or its equivalent accelerator keystroke control+j, and then typing the same number key, 0-9. Words,Extended can also help you with note-taking. Using either mouse button, drag the mouse over some text being viewed to immediately copy the dragged-over portion of text, snapped to whole token boundaries, and word-wrapped, in Unicode, to the Windows system clipboard, ready to paste into an e-mail, Notepad, or other Windows document. The copied text will be annotated with two final lines telling the URL and the title of the quoted source text. Unfortunately, Words,Extended does not highlight the dragged-over mouse selection, neither does Words,Extended automatically scroll the text if the mouse drag exceeds the window, as is common in many Windows programs. (You want it when?) To copy a greater extent of text, use the menu command Help, Smaller Font several times first, until the desired text all fits in the window. You may hold the control key down while making multiple mouse drags in order to gather up several text snippets before going off to paste them. You may use the menu command File, Copy Current View or its equivalent accelerator keystroke control+c to copy the entire text that is backing the current view to the system clipboard. Words,Extended uses all wide characters internally. The set of foreign glyphs Words,Extended can display depends on the robustness of the Arial font installed on your computer. Words,Extended does not process double-byte character sets that are sometimes used on Chinese, Japanese and Korean web pages. Only USASCII and Windows 1200 (utf-8) and Windows 1250-1258 charsets or their synonyms are faithfully rendered. Some other charsets, such as shift-jis, big5, koi8, etc., may produce garbage. For these, or any other pages that do not display correctly in Words,Extended, use the menu command File, Invoke Internet Explorer or its equivalent accelerator keystroke control+i to pass the URL of the current web page into Internet Explorer, causing Internet Explorer to fetch and display the web page. Then use control+a, control+c in Internet Explorer to copy all the text. Then use control+v in Words,Extended to paste the copied text, creating a new view in Words,Extended for convenient reading. Words,Extended also does not perform right-to-left display as is needed to view Hebrew and Arabic correctly. In order to let Words,Extended start quickly, and be ready to work immediately, many foreign language word lists are built using a thread that may take several seconds before it will be fully ready for Words,Extended to do accurate foreign language guessing and common word ranking. You can even watch the growth using the menu command Help, Memory Stores. You need not wait for all that vocabulary work to complete before starting to use Words,Extended. All of the menu commands in the Add, ... submenu start a thread to do their work, allowing you to do other operations while Words,Extended is working. For every thread that gets started, a new view is added to the top-level ring of views, to show the progress of the thread, and to show any results that are found, and to let you click on any of the web page results or matching word results found by the thread in order to navigate to the found source text. Clicking on one of the results listed causes that source text to become the new current view in the top-level ring of views. The view of the thread progress and results that you were viewing will then be located just one view Back of the new text view. Whenever the current view shows a thread that is still running, the first time you use the menu command File, Delete Current View or its equivalent accelerator keystrokes, esc, or del, that first use of the Delete command will not actually delete the current view, that is, remove it from the top-level ring of views, but the first use of the Delete command will only stop the execution of the thread. A second use of the Delete command will actually delete the current view. MOUSE CLICK RULES -- are easier done than said: Two different kinds of texts might be in the current view: 1. Thread execution progress logs that contain clickable summaries; 2. Source texts for reading, which may be pasted text views, or internal text views, or downloaded Internet resource (web page) views, or such files added from a file, a folder, or the cache. 1. When viewing a thread progress/results view, clicking on one of summary items listed therein will navigate to that source text for reading. Also when in such a thread progress/results view, the ENTER key acts just like a mouse click at the current cursor location, or wherever the cursor last left the Words,Extended window, to facilitate a minimal motion right-hand-only use of Words,Extended. 2. Otherwise, you would be viewing a source text, and then the following rules apply: If a token resembles a URL, either right or left-clicking on that URL token will invoke the Add, One Internet Page dialog, pasting that URL into the dialog, ready to download that one Internet resource. If it is a binary resource, you will be asked to save it to a file. If it is an HTML or plain text resource, it will be prepared for reading in Words,Extended. If it is a single web page (not multiple URLs downloaded for a frameset), that source text will become the new current view automatically, just as if you had clicked the Add, One Internet Page thread result view's single result summary item. Left-clicking on any word will invoke the Add, Word Search dialog, pasting that word into the dialog, ready to begin a search for that word locally, within all the source texts held in memory. The dialog checkbox "[x] Stem" option finds all words in memory that have the same word stem as your search word(s). Multiple search words may be entered in the text box. Each word is searched individually, and the results commingled. There are three Add, Word Search display options: The (o) KWIC (KeyWord-in-Context) format will find and show all occurrences of the search words, while the (o) Sentences or (o) Paragraphs formats only report such matching words as occur in portions of source texts that were determined to be rich in sentences. After left-clicking a word to start a word search, click on any match-text item shown in the word search results view to begin reading a source text with that matching text segment positioned at the top of the window. While reading a source text, use either delete or escape to remove the source text and return to the previous view of Add, Word Search results. Right-clicking on any word will invoke the Add, Internet search dialog, pasting that word into the dialog, ready to begin a web search for that word. Many search engines will be queried and both query result web pages, and very many hit-result web pages that they have reported will be downloaded into memory. You may either click on some of the interesting downloaded web page items as they appear and are summarized, or use other Words,Extended features like Add, Word Search, or Add, Best Sentences, to drill into the growing corpus of source texts held in memory. After right-clicking a word to start an Internet Search, click on any web page summary item shown in the Internet search results view to begin reading a source text. While reading a source text, use either delete or escape to remove the source text and return to the previous view of Add, Internet Search results. Whenever Words,Extended downloads or inputs a source text, which might be either an HTML or plain text, Words,Extended catalogs all canonicalizable URLs found in anchors and various other tags, and also any tokens in clear text that are recognizable as URLs. Although you cannot choose to browse any particular anchor in Words,Extended (Save the page as a .HTM file, and re-open it with Internet Explorer to do that), you can download all of the URLs found in a source text by using Add, All Links on Page. This command also makes possible to use Words,Extended to download a list of URLs that you have first added to a plain text file, which file you open using the menu command File, Open Page from File, and then use the menu command Add, All Links on Page. The command Add, All Links on Page starts downloading immediately, without first opening a user dialog, but it obeys the last setting of the "[x] Non-text too" checkbox that was set in the dialog for the menu command Add, One Internet Page. Occasionally, on some computers, when downloading certain URLs, program calls from Words,Extended to the synchronous Windows Internet library functions will not ever return to the Words,Extended program. The symptom of this error is that a thread performing an HTTP download will cease from showing new results every few seconds, neither does it show "Thread Ended". If it ever happens, you will not be able to exit the Words,Extended program, because a Words,Extended close operation waits for all threads to stop before destroying shared memory objects; and that hung thread cannot be stopped. If this error ever happens to you, you might first save all the downloaded files into a directory, and then right-click on the Windows taskbar to invoke the Windows Task Manager program, and use Task Manager to terminate the Words,Extended program. If this ever happens, you should run the Windows Update and install all available optional updates. That fixed it for me on Windows XP. Words,Extended does not process JavaScript. If ever a web page does not appear to have all the expected text, use the control+i key to invoke Internet Explorer passing the URL of the current view, as it may better parse the web page. After Internet Explorer opens the page, you can copy all, and paste the text back into Words,Extended to read. Words,Extended parses all HTML source down to unadorned Unicode text with included HTML link tags. Such pages when saved to files may be reopened using Internet Explorer to see and follow the links. Otherwise, when using Words,Extended, you do not browse by following links, but by making queries to search engines and then working with the large corpus of harvested texts. Words,Extended contains information for many search engines, some in foreign languages. To reduce the list of search engines to be queried, use File, Save Search Engines to output the default list of search engines, and manually edit out such lines as you do not wish. Then use File, Load Search Engines to open your file, which list will replace the current list of search engines used by Words,Extended. If you copy the revised file to the fixed location C:\WordsEx\Engines.txt, then Words,Extended will always import your revised list whenever Words,Extended starts up. In some world locations, or as search engines change and in case I never update Words,Extended from version 1.0, or to use new search engines that I did not happen to list, you will need to know how to tailor the engines.txt list. If you wish to add new search engines you might use Words,Extended itself to begin the task of studying the search engine. Use the menu command Add, One Internet Page to fetch the URL of a search engine portal page, that is, a web page that contains a search submission form, and a submit button. Then with that fetched portal page still in memory, do File, Save Search Engines. The saved Engines.txt output file will also include a section of information about all the HTML forms that Words,Extended has encountered and parsed. Since Words,Extended does not parse JavaScript, if the web page form is written using JavaScript, you will have to manually analyze the form source code yourself. Some lines starting with the keyword FYI 000 show specimens of all the query submission URLs that have been constructed from the forms parsed. If you change FYI to GET, and 000 to a three-digit ordinal controlling the order to query the engines and used to label their results; such lines become syntactically suitable specifications of search engine query URLs to control the Add, Internet Search operation after the Engines.txt file has been reloaded using the menu command File, Load Search Engines. You may need to examine alternatives for some parameters that are listed after the URL specimens, and modify the URL to get the effects you want. Before running that new URL, append another text line after that GET line saying STUDY. Next, do File, Load Search Engines, and load your new file. Then use Words,Extended to perform an Internet search controlled by that loaded two-line Engines.txt file, and save all the downloaded files to a folder. Query result pages will have filenames starting with an underscore, and due to the STUDY keyword, the query result page will be heavily annotated with potential page scraping observations. Manually edit the query result page, and use the final section of annotations therein caused by STUDY to select page-scraping rules, following the examples of rules in the default search engine list. All of the valid engine.txt rule lines that can follow a GET line must start with one of the following expressions: good url has ... bad url has ... more url has ... none good until ... none good after ... keep ... (keep query result page as a good web page for reading.) study ... (studying engine - annotate facts in query result page.) For example, one of the interesting engines that I chose not to keep in the Words,Extended default list used four rules: GET 150 http://www.reference.com/search?db=web&q= none good until TWO TOKENS "Search" "took" good url has next TAG /td more url has anchortext NUMBER none good after anchortext NUMBER Any bad-url rules placed ahead of good-url rules can override them. Another interesting query url that I chose not to keep demonstrates the keep keyword, as the query result page is itself a good text: GET 100 http://www.google.com/search?hl=en&ie=ISO-8859-1&btnG=Google+Search&q=define: good url has prior TAG li keep query result page as a good web page for reading Another Internet searching control mechanism is still present, but I have not used it any more, after adding strong page scraping rules: Lines in Engines.txt beginning with a NOT keyword and followed by two double-quoted strings tell what URLs should never be fetched as hit URLs, neither after a redirection. The first string must match the end after some dot of the domain part of the URL of the query result page, and the second string must be found (as by C routine strstr) in the potential web page hit URL to fetch. If both string tests match, then that potential hit URL will be rejected. If the first string is #, it matches a numeric IP address as the query result page domain. If the first string is *, it matches any query result domain. If the first string is /, it means the second string will be tested only against path-less URLs (i.e., top domain names). Words,Extended holds all data in memory, and can quickly use up all of Windows virtual memory. To recover from malloc failures would require much more programming, so instead, Words,Extended will warn you and stop fetching any more web pages once it has consumed 500 Megabytes. If your virtual memory is set larger than 1 Gigabyte, you should still be able to save your pages. If very full, do File, Save All Pages before doing File, Save Urls List, as the latter operation requires much more memory. Thank you for choosing and sharing Words,Extended. -- file://InternalText/Help/Usage