Thursday, August 21, 2014

Deep Web Sharing: Avoiding Google

There are many reasons you might want to avoid having your work from a search engine. A big reason for many writers is to protect their first publication rights*. Maybe you don't want the work attached to your name when applying for jobs. Maybe you don't want work attached and easily searchable by you or your username forever and ever because you're easily embarrassed by your old work. Or you might just be an intensely private person and just not want anything you can avoid associated with you to show up on Google.

To avoid showing up on legitimate search engines, you need to avoid your page being indexed by a crawler, a program run by a search engine to find webpages. Crawlers click through links and index any page they find, keeping track of words, images, and links within. When someone tries to search for that page or similar content, the page is already indexed and can be shown as results. Websites that can be accessed through search are called the "surface web."  Despite the vast, vast number of hits any given search will bring up in Google or Bing, very little of the internet is actually indexed. Any part of the internet that is not a part of the surface web is called the "deep web".

Despite the scary name, you probably access parts of the deep web all the time. Your emails are stored on the deep web. Every time you access bank accounts or post on your blog, you're accessing the deep web, because you're using a page that is inaccessible without certain passwords. Many websites will also dynamically generate pages for individual requests, as in a Google search, which can't be indexed either. Many organizations have internal pages that are protected from crawlers and many websites are not crawled by disallowing bots through recaptcha and other preventative measures.

What this means for you, as a writer, is that as long as you share online work on the internet through the deep web, it cannot be found by crawlers. So, how can you share your work and keep it on the deep web?

Password Protected Options:

1. Emails
 
Sending work personally to others through emails will only allow those who have the username/password combination of the email address you sent the work to view it. Crawlers cannot get past this protection.

2. Password Protected Forums

As with emails, forums that are password protected cannot be crawled despite seeming public and able to be viewed by a wide audience. Forums like Critique Circle are completely password protected to allow users to post work and certain sections of other sites like Absolute Write's forums are password protected for the purpose of sharing work.

3. "Friends Only" Blog Posts

A blog post that requires special permissions to view, like "friends only" posts on many services, cannot be crawled and indexed due to password protection.  However, changing the permissions from public to private will likely keep an indexed, cached version of the page on search engines. Make sure the permissions are set correctly before posting.

4. Cloud Sharing Through Email

Many cloud services like Evernote and Copy allow you to send permissions to view files to certain email addresses. These can be very convenient because it allows readers to view changes as they happen without requiring you to send a new file.

Non-Password Protected Options:

5. Direct Links from Cloud Sharing

Many cloud services like Evernote and Copy also allow you to generate a link to your content that you can share with others. As long as this link is not linked to from the surface web, it will remain in the deep web, meaning that you can share it through emails, chat clients, etc., while the content remains private.

6. Direct Links from Text Posting Services


Text posting sites like TinyPaste have private options that do not create links to text on their websites, preventing them from being indexed. As with blog posts, make sure that the privacy settings are set correctly before posting. These services also often allow you to set a password. By creating a password and posting the link and password publicly, your page can easily be viewed by humans but not crawlers.


*Much advice I've seen about first publication rights and the internet advise that to post online and protect first publication rights you must use password protection. While the latter two options do not use password protection, they are unable to be crawled and, if the links are only shared with a few people, can hardly be considered public. If you are concerned about your first publication rights, do your research. Here's a good place to start. I am not a lawyer and not qualified to give legal advice.

These are all methods I've used to keep my work from web crawlers -- I'm sure there are many more. If there's a big, simple one I've missed, feel free to post in the comments. Either way, have fun being secretive!

No comments:

Post a Comment