Smallaxe of DNA-forums.com has created a platform for developing tools to help exploring RAW data files.
Basically it is useful for identifying possible areas of your genome that align with different regions and populations. The user interface at this point is fairly rudimentary, but it makes doing the SPSMart sort of thing quite a bit more easy. It can accept 23andMe and FTDNA FamilyFinder data files.
Here are guidelines for using SNPMap tool:
The general sequence when using the tool is:
1. Load your genetic data using the File menu. The program can accept either 23andMe or FTDNA Family Finder unzipped data files. Note that the program is running locally on your computer, so your data stays with you. No need to send your data to me.
2. Select the populations you're interested in. There are two lists on the left. The top list shows the world regions. The bottom list shows the population data sets within the selected region. You can check/uncheck the regions and populations you want used in the analysis. If you've checked more than one region, then the analysis will compare those regions against each other. If you've checked only a single region, then the analysis will compare the individual populations within that region against each other. You may have to click twice on an item to toggle the check box.
3. Choose the chromosome. For speed, and to make the display of large amounts of info easier, the analysis works with a single chromosome at a time. There is a dropdown box for selecting the chromosome. If you are using FTDNA data, remember that you will have two files - one with just the X chromosome data, and one with chromosomes 1-22.
4. Click on the Recalculate button. This will perform the analysis of your data against the reference population data.
The symbol that looks like an equals sign with a slash through it means the person's genetic data is very unlike that region/population for that SNP, and other regions/populations are more likely. If the person is homozygous, then two of those symbols show up. One symbol means the person is at most 50% ancestry of that region/population at that location of their genome. Two symbols means the person may be 0% ancestry of that region/population at that location. The circle symbol (it's actually a happy face) means the person is very like that region/population, and the SNP is a good marker for that region/population vs. all the others being compared. One happy face means probably at least 50% that region/population, and two means probably 100% that region/population.
As you get more familiar with the program, you can adjust the Reliable/Noisy slider to change the threshold the program uses to distinguish SNPs of interest.
By default, only SNPs of interest are displayed. If you check the All SNPs checkbox and then Recalculate, all the SNPs in your data for which there is any population data will be displayed.
If you select a row Populations list and then right click in the list, a menu will pop up with some options, including opening the Yale ALFRED website with more detailed information about the selected population.
You can select one or more rows in the SNP results list and then right click in the list to see a menu of options. If you select a single row, the menu will have items for opening the Yale ALFRED website with detailed population information about the selected SNP, and an item for opening the NIH database website containing a variety of detailed SNP information. If you've selected one or more rows, the menu will let you copy the list of selected SNP ids to the clipboard. This is useful for pasting into the SNP list entry on the SPSMart website for further SNP exploration. You can also copy all the selected data to the clipboard in comma delimited format suitable for pasting into a spreadsheet program such as Excel or OpenOffice.