Domain Finding Strategy: Public Data Sets
1 vote, 5.00 avg. tacos (84% score)

This was originally part of a private course for interns, but I’ve decided to open it up to the world.  It was written by Scott, one of the original interns who headed a group of V2 Interns, and is now my partner and manager of my entire inventory.

Update: Although PBNs still work, they now have a history of being targeted by Google and therefore may not be the safest option. This is why we now focus on creating online businesses that are independent of SEO traffic.


Note that this is part of the expired domain series which has been compiled in order of importance here.


<Disclaimer> A lot of people are having difficulty finding domains. There is a real learning curve, so please be prepared for this. As an option for those that are just starting out, I recommend you prove the model by first purchasing a few domains, or try out our latest service, RankHero.

RankHero allows you to test the viability of this method without having to invest the hundreds of hours and thousands of dollars it takes to find, build, host and maintain your own Private Content Network. </disclaimer>


Another related method of finding high value domains involves tracking down existing lists of old URLs and using a variation of the Scrapebox/Xenu method. Ideally, your list should be “old” (2008 or earlier), as this provides a better probability that the domains are available and of high authority. Also, the list should be BIG, as the vast majority of these domains will not be available or will not be high authority. This method is more of a “shotgun” approach instead of a “sniper rifle” approach to finding good domains… volume over finesse.  The diagram below provides an overview of the basic idea. As always, understanding the concepts is more important than repeating the process verbatim… and being creative, organized and persistent goes a long way.

(Click to enlarge)


Descriptions for each step:

1 – Find a list of URLs. Along the same lines as the 2006 AOL Dataset, search for big, public data sets that are available for download. Government, university, and research initiatives often have compiled data available for all sorts of reasons. Seek out data sets that would have lots of URLs… for example, census data from the 1900s would be less interesting than a directory of blogs from 2005.  Here are just a few examples of lists of lists… do a little digging and you’ll find many more:

Finding a good list is the most challenging and time consuming part of the whole process but can be worth the time investment.

2 – Cleanse the list into something more workable. If the list is ~400,000 or less, use Excel to manipulate the data. If greater than 400,000, use a database to remove duplicates and segment the list into chunks that are able to be easily manipulaed in Excel.
Strip the URL to its subdomain using ‘URL Tools For Excel Add-in’: (tip of the hat to Danny for this one). From there, use text-to-columns function in Excel (or whatever is easiest for you) to get all URLs down to ‘’ format.

3 – A cleansed list has only unique values in ‘’ format (i.e. no http://, www, subdomains, etc.)

4 – Bulk PR checking utility. (a second hat tip to Danny) can be a good way of checking PR for lots of URLs. It’s $10 to check 1,000,000 domains… faster than Scrapebox and won’t burn out your proxies or draw upon your system resources.

5 – If the PR is NA, then there is a decent chance that it is available. Continue down the path of seeing if it is a high value domain.

6 – Lots of PR6s and PR7s from the big list provide a good data set on which to run Xenu or SB in the hopes of finding PR4s or PR5s

7 – Run Xenu (1 deep) or Scrapebox Link Extractor (External links only) to find pages linking from the PR6s and PR7s.

8 – After running Xenu or SBLE, you’ll have a big list of URLs that receive a link from a PR6 or PR7. Cleanse this list for further processing.

9 – bulk domain check is (to my knowledge) the easiest and fastest way to check domain availability through a web page. Maximum domains to check at once is 4000. Put in a perfectly cleansed list (‘’) or else it will freeze up. Sort the results (by ‘Availability’ column) for easy copy/paste.

10 – Domains that are available may or may not be of high value. Run through the FREE DA Checker to find out. Export from the Free DA Checker will have DA, wwwDA (usually always the same as DA but sometimes different–take the higher value), PA and wwwPA.

11 – Evaluate the results of the export from the Free DA Checker.  Any domain with EITHER (DA or wwwDA) > 25 OR (PA or wwwPA) > 30 should be selected for a second run through the PAID DA Checker

12 – Export from the Paid DA checker will have LRD and rdMT metrics. Combine this with the export from the Free DA Checker to arrive at the ‘Domain Value’ metric.

13 – For domains that have good metrics, check for spam. See ‘Spam Checking‘ post for more information.

14 – For anything with PR=NA, there is a decent chance the domain is available and may have a backlink that is PR6 or PR7.

15 – Check for spam (‘Spam Checking‘ post) and also confirm that there is a real PR6 or PR7 backlink to the home page. A PR6 or PR7 linking page that has 60+ outbound links will likely NOT pass it’s TBPR.

16 – What’s left should be domains that are available, not spam and of high value for a PBN or Adsense site. Buy and launch!

17 – What’s left should be domains that are available, not spam and, once launched, be a PR4 or PR5… making them good for a TLA site or PBN site. Buy and launch!

Domain Finding Strategy: Public Data Sets
1 vote, 5.00 avg. tacos (84% score)
  1. This is a great guide, thanks! Could you point me in the direction of the FREE DA checker tool? Think I’ve already found a decent set of domains to try.

  2. Hi,

    I am looking for few high PR domain, would it be ok to buy expired domains on go daddy auction. I can verify if PR is real or not, but i want to know since the domain is expired would it loose PR in the next update.

    Can anyone one explain me how the expired domains
    lose their PR.

  3. Hi,

    What about G index check? isn’t it important?
    When I’m looking for deleted domains, 90% of them are not indexed anymore. Is it better to ignore them or to buy if all the other metrics are good?

    Thanks for all the info.

  4. Hi,

    I thought the backlinks & age on a dropped domain gets reset. Is that true? If that is the case, what is the purpose of buying domains that are free to register? Your method is very interesting and I would like to test it out. But I am just a little confused about how a dropped/deleted domain can still retains its link juice and age. Thanks.



    • Most of the time all of the same metrics (original age, PA, DA, LRDs, PR) still exist when the domain is re-launched. Only if the domain was penalized by G in its former life will the penalty still exist and the site will not take on the link juice as expected. This is always possible, but can be prevented by doing a thorough check of the backlink profile to make sure it hasn’t been spammed to death.

    • I got a friend to write a script for me. Love computer science nerds! Now the domain checker is checking for all types of extensions. I only want the ones i put in!

  5. Hey, Thanks for info. How do we actually find outbound links? at” 15 – Check for spam (‘Spam Checking‘ post) and also confirm that there is a real PR6 or PR7 backlink to the home page. A PR6 or PR7 linking page that has 60+ outbound links will likely NOT pass it’s TBPR”

  6. Hello,

    Thank you for this amazing post. I wonder how deep would you let the Xenu go for Step 7. I know from podcast video that you did with Spencer, you said 2. I wonder if we are checking high PR websites with PR of 7 or 8 that is still enough or should we dig deeper? I guess I am thinking that those 2 initial level would still bring us high PR domains that are still taken. Would you please guide me?

    Thank You,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>