- World ,6734100000
- Afghanistan ,27145000
- Albania ,3170000
- Algeria ,33858000
- American Samoa ,67000
- Andorra ,83137
- Angola ,17024000
- Anguilla ,13000
- Antigua and Barbuda ,85000
- Argentina ,40301927
- Armenia ,3230100
- Aruba ,104000
- Australia[5] ,21475000
- Austria ,8340924
- Azerbaijan ,8629900
- Bahamas ,331000
- Bahrain ,760168
- Bangladesh ,158665000
- Barbados ,294000
- Belarus ,9690000
- Belgium ,10666866
- Belize ,288000
- Benin ,9033000
- Bermuda ,65000
- Bhutan ,658000
- Bolivia ,9525000
- Bosnia and Herzegovina ,3935000
- Botswana ,1882000
- Brazil ,188008000
- British Virgin Islands ,23000
- Brunei ,390000
- Bulgaria ,7640238
- Burkina Faso ,14784000
- Burma ,48798000
- Burundi ,8508000
- Cambodia ,14444000
- Cameroon ,18549000
- Canada ,33417600
- Cape Verde ,530000
- Cayman Islands ,47000
- Central African Republic ,4343000
- Chad ,10781000
- Chile ,16820000
- Colombia ,44603000
- Comoros ,682000
- Cook Islands ,20200
- Costa Rica ,4468000
- Croatia ,4435400
- Cuba ,11268000
- Cyprus ,794600
- Czech Republic ,10424926
- Côte d'Ivoire ,19262000
- Dem. Rep. of Congo ,62636000
- Denmark ,5489022
- Djibouti ,833000
- Dominica ,67000
- Dominican Republic ,9760000
- East Timor ,1155000
- Ecuador ,13867761
- Egypt ,75490000
- El Salvador ,6857000
- Equatorial Guinea[16] ,507000
- Eritrea ,4851000
- Estonia ,1340600
- Ethiopia ,79221000
- Falkland Islands ,3000
- Faroe Islands ,48839
- Federated States of Micronesia ,111000
- Fiji ,827900
- Finland ,5323500
- France (incl. overseas France) ,64473140
- French Guiana[14] ,209000
- French Polynesia[14] ,259596
- Gabon ,1331000
- Gambia ,1709000
- Georgia ,4382100
- Germany ,82169000
- Ghana ,23478000
- Gibraltar ,28875
- Greece ,11215000
- Greenland ,58000
- Grenada ,106000
- Guadeloupe[14] ,408000
- Guam ,173000
- Guatemala ,13354000
- Guernsey ,65726
- Guinea ,9370000
- Guinea-Bissau ,1695000
- Guyana ,738000
- Haiti ,9598000
- Honduras ,7106000
- Hong Kong ,6985260
- Hungary ,10035000
- Iceland ,320169
- India ,1140040000
- Indonesia ,228582000
- Iran ,70495782
- Iraq ,28993000
- Ireland ,4422100
- Isle of Man ,80058
- Israel ,7337000
- Italy ,59619290
- Jamaica ,2714000
- Japan ,127690000
- Jersey ,89300
- Jordan ,5924000
- Kazakhstan ,15422000
- Kenya ,37538000
- Kiribati ,95000
- Kuwait ,2851000
- Kyrgyzstan ,5317000
- Laos ,5859000
- Latvia ,2268000
- Lebanon ,4099000
- Lesotho ,2008000
- Liberia ,3750000
- Libya ,6160000
- Liechtenstein ,35365
- Lithuania ,3361100
- Luxembourg ,483800
- Macau ,538100
- Madagascar ,19683000
- Malawi ,13925000
- Malaysia ,27730000
- Maldives ,306000
- Mali ,12337000
- Malta ,410600
- Marshall Islands ,59000
- Martinique[14] ,401000
- Mauritania ,3124000
- Mauritius ,1262000
- Mayotte[14] ,186452
- Mexico ,106682500
- Moldova ,3572700
- Monaco ,33000
- Mongolia ,2629000
- Montenegro ,598000
- Montserrat ,5900
- Morocco ,31224000
- Mozambique ,21397000
- Namibia ,2074000
- Nauru ,10000
- Nepal ,28196000
- Netherlands ,16464600
- Netherlands Antilles ,192000
- New Caledonia[14] ,244600
- New Zealand ,4283700
- Nicaragua ,5603000
- Niger ,14226000
- Nigeria ,148093000
- Niue ,1600
- North Korea ,23790000
- Northern Mariana Islands ,84000
- Norway ,4790300
- Oman ,2595000
- Pakistan ,164744000
- Palau ,20000
- Palestinian territories ,3761646
- Panama ,3343000
- Papua New Guinea ,6331000
- Paraguay ,6127000
- People's Republic of China[2] ,1327000000
- Peru ,28750770
- Philippines ,90457200
- Pitcairn Islands ,50
- Poland ,38115967
- Portugal ,10617575
- Puerto Rico ,3991000
- Qatar ,841000
- Republic of China (Taiwan)[4] ,23007007
- Republic of Macedonia ,2045200
- Republic of the Congo ,3768000
- Romania ,21528600
- Russia ,141900000
- Rwanda ,9725000
- Réunion[14] ,793000
- Saint Helena ,6600
- Saint Kitts and Nevis ,50000
- Saint Lucia ,165000
- Saint Vincent and the Grenadines ,120000
- Saint-Barthélemy[14] ,8450
- Saint-Martin[14] ,33102
- Saint-Pierre and Miquelon[14] ,6125
- Samoa ,188540
- San Marino ,30800
- Saudi Arabia ,24735000
- Senegal ,12379000
- Serbia[6] ,9527100
- Seychelles ,87000
- Sierra Leone ,5866000
- Singapore ,4839400
- Slovakia ,5404784
- Slovenia ,2029000
- Solomon Islands ,506992
- Somalia ,8699000
- Somaliland ,3500000
- South Africa ,47850700
- South Korea ,48224000
- Spain ,46063500
- Sri Lanka ,19299000
- Sudan ,38560000
- Suriname ,458000
- Swaziland ,1141000
- Sweden ,9234209
- Switzerland ,7647600
- Syria ,19929000
- São Tomé and Príncipe ,158000
- Tajikistan ,6736000
- Tanzania ,40454000
- Thailand ,63038247
- Togo ,6585000
- Tokelau ,1400
- Tonga ,100000
- Trinidad and Tobago ,1333000
- Tunisia ,10327000
- Turkey ,70586256
- Turkmenistan ,4965000
- Turks and Caicos Islands ,26000
- Tuvalu ,11000
- U.S. Virgin Islands ,111000
- Uganda ,30884000
- Ukraine ,46030720
- United Arab Emirates ,4380000
- United Kingdom ,61186000
- United States ,305556000
- Uruguay ,3340000
- Uzbekistan ,27372000
- Vanuatu ,226000
- Vatican City ,800
- Venezuela ,28018018
- Vietnam ,87375000
- Wallis and Futuna[14] ,15000
- Western Sahara ,480000
- Yemen ,22389000
- Zambia ,11922000
- Zimbabwe ,13349000
Less Than Dot is a community of passionate IT professionals and enthusiasts dedicated to sharing technical knowledge, experience, and assistance. Inside you will find reference materials, interesting technical discussions, and expert tips and commentary. Once you register for an account you will have immediate access to the forums and all past articles and commentaries.
Forum Search
Forum Statistics
UsersTotal Post History
- Posts:
- 78557
- Topics:
- 17959
7-Day Post History
- New Posts:
- 56
- New Topics:
- 32
- Active Topics:
- 33
Our newest member
Other
-
FAQ
All times are UTC [ DST ]
Google Ads
Puzzle 17: Fraud Detection
Forum rules
Always post answers in a "Hidecode" tag, so that others have a chance to answer the question too.
Always post answers in a "Hidecode" tag, so that others have a chance to answer the question too.
5 posts • Page 1 of 1
Please wait...
Puzzle 17: Fraud Detection
OK, we had a difficult challenge last time, so we are going to keep this one relatively simple.
The challenge is to identify falsification of data sets. Given a set of numbers of natural source (e.g. naturally occuring like credit card payments, not machine/human generated like a telephone number), the program needs to identify the probability of that data being naturally occuring vs. being falsified.
That sounds quite hard as it stands, but don't fear, we have a simple, basic way for you to determine how to check if the data is likely to be naturally occuring or not... "Benfords Law", which outlines the probability of ratios of the leading digit in any given list of values. This way, you can determine if the source data matches the expected profile within a given threshold.
The acceptable deviation threshold is up to you... and the calculation of the resulting probability of falsification is also up to you... So, you could be really strict and say it has to have the exact distribution profile as the standard profile, or you could allow a +/- 10% variance at each data point - the choice is yours, though you will need to be able to identify 2 of the 3 data sets below correctly.
The data sets...
1. This is a valid / true dataset, your program should return a positive judgement when validating the data - you MUST get this right.
(World Populations by Country): http://en.wikipedia.org/wiki/List_of_co ... population
2. This is a valid dataset that has been moderately changed so that it is in between 'completely false' and 'completely true', your program should return a negative judgement when validating the data, however the probability returned should reflect it's potential ambiguity.
3. This is an invalid dataset that has been completely made up. your program should return a negative judgement when validating the data - you MUST get this right.
The output expected from the program is simply 2 values:
- A Proposed Validity: e.g. "Valid" or "Invalid"
- The Probability of the validity: e.g. "80%"
If you have not heard of Benfords law, an overview video can be found here: http://videos.kirix.com/data-and-the-we ... ds-law.htm
and of course, wikipedia can also help: http://en.wikipedia.org/wiki/Benfords_law
As always, the programming language is your choice, though must be posted with your answer (please use the hidecode tags).
The program should output the following validity statements, and the probability percentage estimate it calculates for each:
1. Valid - xx%
2. Invalid - xx%
3. Invalid - xx%
Your program should at least identify 1 and 3 correctly, and show that the percentage probability of the data being valid decreases for each dataset (from 1-3).
Have fun...
The challenge is to identify falsification of data sets. Given a set of numbers of natural source (e.g. naturally occuring like credit card payments, not machine/human generated like a telephone number), the program needs to identify the probability of that data being naturally occuring vs. being falsified.
That sounds quite hard as it stands, but don't fear, we have a simple, basic way for you to determine how to check if the data is likely to be naturally occuring or not... "Benfords Law", which outlines the probability of ratios of the leading digit in any given list of values. This way, you can determine if the source data matches the expected profile within a given threshold.
The acceptable deviation threshold is up to you... and the calculation of the resulting probability of falsification is also up to you... So, you could be really strict and say it has to have the exact distribution profile as the standard profile, or you could allow a +/- 10% variance at each data point - the choice is yours, though you will need to be able to identify 2 of the 3 data sets below correctly.
The data sets...
1. This is a valid / true dataset, your program should return a positive judgement when validating the data - you MUST get this right.
(World Populations by Country): http://en.wikipedia.org/wiki/List_of_co ... population
Code is hidden, SHOW
2. This is a valid dataset that has been moderately changed so that it is in between 'completely false' and 'completely true', your program should return a negative judgement when validating the data, however the probability returned should reflect it's potential ambiguity.
Code is hidden, SHOW
3. This is an invalid dataset that has been completely made up. your program should return a negative judgement when validating the data - you MUST get this right.
Code is hidden, SHOW
The output expected from the program is simply 2 values:
- A Proposed Validity: e.g. "Valid" or "Invalid"
- The Probability of the validity: e.g. "80%"
If you have not heard of Benfords law, an overview video can be found here: http://videos.kirix.com/data-and-the-we ... ds-law.htm
and of course, wikipedia can also help: http://en.wikipedia.org/wiki/Benfords_law
As always, the programming language is your choice, though must be posted with your answer (please use the hidecode tags).
The program should output the following validity statements, and the probability percentage estimate it calculates for each:
1. Valid - xx%
2. Invalid - xx%
3. Invalid - xx%
Your program should at least identify 1 and 3 correctly, and show that the percentage probability of the data being valid decreases for each dataset (from 1-3).
Have fun...

a smile is worth a thousand kind words, so smile, it's easy! 
CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless

CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless
-

damber - LTD Admin

-









- Posts: 3117
- Joined: Tue Oct 09, 2007 1:48 pm
- Location: North Wales, UK
Re: Puzzle 17: Fraud Detection
My solution in PHP:
Output:
Extra:
Chart of datasets distributions via google charts api:

(maybe it may be hidden too)
I can post solution with this graph generation hier, if somebody will...
Code is hidden, SHOW
Output:
Code is hidden, SHOW
Extra:
Chart of datasets distributions via google charts api:
(maybe it may be hidden too)
I can post solution with this graph generation hier, if somebody will...
I try to improve my English language skills. Most things i do better than this.
- tisodotsk
- Apprentice

-

- Posts: 22
- Joined: Fri Aug 08, 2008 12:45 pm
- Location: Bratislava, Slovakia
Re: Puzzle 17: Fraud Detection
once again tisodotsk you've come up with the goods
congrats
We'll give it one more week to see if anyone else has what it takes to step up to the mark...
congratsWe'll give it one more week to see if anyone else has what it takes to step up to the mark...
a smile is worth a thousand kind words, so smile, it's easy! 
CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless

CODE: $5
WORKING CODE: $500
PROPERLY DESIGNED & WORKING CODE: Priceless
-

damber - LTD Admin

-









- Posts: 3117
- Joined: Tue Oct 09, 2007 1:48 pm
- Location: North Wales, UK
Re: Puzzle 17: Fraud Detection
tisodotsk: looks like you should be resetting the deviation at the top of each loop iteration
- funkture
- Newbie

- Posts: 1
- Joined: Wed Jan 04, 2012 8:26 pm
Re: Puzzle 17: Fraud Detection
- joefkelley
- Newbie

-
- Posts: 1
- Joined: Wed Jan 16, 2013 12:20 am
5 posts • Page 1 of 1



LTD Social Sitings
Note: Watch for social icons on posts by your favorite authors to follow their postings on these and other social sites.