Helping ordinary people create extraordinary websites!

Go Back   Web Development Forum > The DevTut Exchange > Looking to Hire
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-08-2008, 08:55 PM
Junior Member
 
Join Date: Jun 2008
Location: the US
Posts: 24
Default Email Extractor

Job Budget: Between $200 and $500 Bids/ Views: 2 / 272
Time Remaining: 9d 23h 22m (ends Nov 18, 2008 21:15 U.S. Eastern Time)

Job Description
I need a script (Perl or PHP) that can be run from a server that can extract emails either from a web site that is an interface of a database open to general public (http:// and https://) or from web sites containing information I want. This script also needs to be compiled as a stand-alone .exe program for Windows Server 2003 and Windows XP. This shouldn\'t be a big issue for people that have data scraping/extractors, since the script does just that, download pages, grab emails from them, and save them in an excel or access file. Not a big issue.

- Extract/capture email, and info of the email\'s owner (US address: given name, family name, organization name, city, state, postcode and phone #; if non-US address: given name, family name, organization name, city & state/province, country name, postcode and phone #) if available and URL/ID from a specified database and domain folder through URL addresses. For example, each of the following addresses is linked to a person\'s profile in a database or domain folder, including name, email address, organization name, location, and phone #, etc. Each address below has its own profile format. I will let you know the wanted databases in detail after you win this bid. We need to have options to manually set up the items we want to collect. For example, sometimes we may just want to collect nothing but emails; on the other times, we may like to collect names, emails, and organization names (but nothing else), etc, depending upon the URL we are visiting.

- Once we input a person\'s database address as below, this program should stay in the same database and infinitely loop to search for wanted information of all the people in the database, by increasing and decreasing ID number (multi-threads), until finished or manually stopped. This program should periodically save results to avoid unexpected outrage/error leading to data loss. You should notice that the addresses below all contain a string of “ID=” or “id=”. The program should automatically change the numbers right after the string of “ID=” or “id=”, retrieve the wanted information, save them in an access or excel file, and then loop to next one until the database is examined fully. Some numbers could contain no information, then the program should just loop to next one.

http://www.domain.org/index.cfm?page...l.cfm&ID=49312

https://secure.domain.org/xxxxx/dire...px?DirID=79053

http://subdomain.domain.edu/WhitePag...&a=hs&r=83&kw=

http://subdomain.domain.edu/WhitePag...&a=hs&r=83&kw=


- Extract emails from folder or subfolder in the domain, like domain.com or only from domain.com/folder and on, and not from the root one. Sometimes an email address is embedded under a name. Then collect the name as well as the email from embedded link. For example, http://www.domain.org/aids/faculty.asp

- Crawl pages only in the URL specified, or folder within the URL domain.com/folder, with a maximum of 7-10 hunting depth. Capture emails that can not be manually copied.

- Multithread extraction of emails, connection to URLs in multiple threads for faster speed.

- Delete duplicated emails automatically at the end of job

- Delete all emails (if we tick option) from URL where emails were extracted from.

- Authentication details. If it\'s a forum, a member needs to enter user/password. The script should allow for entering user/password and get identified.......

further details go to Job:Email Extractor | myTino - The World's Leading Online Outsourcing Network
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT -5. The time now is 08:24 PM.


Website Design by Ducani Media Group
Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.