Web Data Extractor. Extract url, meta tag, email, phone, fax from web
v7.1  
Email Extractor, URL Extractor, Meta Tag Extractor, Phone Extractor, Fax Extractor Contacts Site Map
Frequently Asked Questions (FAQ)
  • Q: I set-up a project with "WebSite" extraction - but no page was processed? WDE can not connect?
  • Q: I set-up a project with "WebSite" extraction - but no email was found?
  • Q: I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?
  • Q: When I run WDE, it sucks all my computer power, screen is hardly refreshing?
  • Q: Can I resume an interrupted session in WDE?
  • Q: How I can add search engine listing other than those specified in Engine Listing dialog?

Q: I set-up a project with "WebSite" extraction - but no page was processed? WDE can not connect?

A: There are several things that may cause this:

(1) Check your Internet connection - you must be online.

(2) Check your proxy settings. If you are behind a firewall / proxy server, you need to enter necessary information in the "New Session Dialog - Proxy" tab. If you do not know proxy data then contact your ISP / system administrator.

(3) Is the site password protected? You can not extract data from protected sites.

(4) Make sure the site is not down temporarily/permanently. You can check it using your default browser. Your default browser can load it?

(5) Is the site using some type of redirect system. That is you enter a URL like "http://www.car.com" and the server redirects to "http://www.truck.com". In that case, you need to use "http://www.truck.com" as your starting address in "New Session" dialog.

(6) Check you didn't use any exclude URL filter like "/" or "com" in "New Session Dialog - URL Filter" which will prevent WDE to process all sites.

(7) Check the site doesn't use only a Java applet in the home / index page. Like other spider, WDE can not parse Java applet.

(8) WDE doesn't support secured https:// protocol.

(9) Finally, did you use a very low request time-out period in "New Session - Other" tab? The default time-out period is 100 secs. With a very lower value, WDE may stop the request before host sever reply.

Q: I set-up a project with "WebSite" extraction - but no email was found?

A: There are few things that may cause this:

(1) Not all website owners put their email address in their website/contact page. Some website uses forms in their contact / support pages. 

(2) Check "Depth" setting. Depth tells WDE how many levels to dig down within the specified website.

Q: I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?

A: Make sure the file exist in disk. The file must have URL line-by-line, other format is not supported, WDE will accept only lines that starts with http:// text. Also WDE will not accept URLs that point to image/binary files, because those files will not have any text data to extract.

Q: When I run WDE, it sucks all my computer power, screen is hardly refreshing?

A: It seems you are using high number of threads. Decrease the thread value to "5" in "New Session - Other" tab. WDE can launch multiple threads simultaneously. But remember, too high a thread setting may be too much for your computer and/or internet connection to handle it and also puts an unfair load on the host server which may slow the process down.

Q: Can I resume an interrupted session in WDE?

A: Yes. Use 'File - Open' menu command to open previously stopped session's log file.

Q: How I can add search engine listing other than those specified in Engine Listing dialog?

A: It is easy. In "URL" field type the search query URL. Replace the search keyword part with WDE syntax {SEARCH_KEYWORD}

For Example: an AOL query URL with "Flower Shop" search is:
http://search.aol.com/dirsearch.adp?query=Flower+Shop

You just replace Flower+Shop part with {SEARCH_KEYWORD} like following:

http://search.aol.com/dirsearch.adp?query={SEARCH_KEYWORD}

Afer adding the new engine list, click "Save" button.

 

© Copyright 2006, WebExtractor System.