Department of Computer Science and Software EngineeringCITS1401 ComputationalThinking with PythonProject 1: Computing World Happiness IndexSubmission deadline: 11:59pm, Monday 29 April 2019.Value: 15% of CITS1401.To be done individually.You should construct a Python 3 program containing your solution to the following problem and submit yourprogram electronically using cssubmit. No other method of submission is allowed.You are expected to have read and understood the Universitys guidelines on academic conduct. In accordance withthis policy, you may discuss with other students the general principles required to understand this project, but thework you submit must be the result of your own effort. Plagiarism detection, and other systems for detectingpotential malpractice, will therefore be used. Besides, if what you submit is not your own work then you will havelearnt little and will therefore, likely, fail the final exam.You must submit your project before the submission deadline listed above. Following UWA policy, a late penalty of10% will be deducted for each day (or part day), after the deadline, that the assignment is submitted. However, inorder to facilitate marking of the assignments in a timely manner, no submissions will be allowed after 7 daysfollowing the deadline.OverviewFor the last few years, the United Nations Sustainable Development Solutions Network has been publishing the WorldHappiness Report. Details of the 2018 report can be found here. The underlying data, which you can also download from thelatter URL, is a combination of data from specially commissioned surveys undertaken by the Gallup organisation, andstatistical and economic data from other sources. The web site linked above also provides the methodology for how thedifferent data have been combined to compute the final score, most dramatically called the Life Ladder.Here is a sample:country Life Ladder Log GDPper capitaSocialsupportHealthy lifeexpectancyat birthFreedom tomake lifechoicesGenerosity Confidencein nationalgovernmentAfghanistan 2.66171813 7.460143566 0.490880072 52.33952713 0.427010864 -0.106340349 0.261178523Albania 4.639548302 9.373718262 0.637698293 69.05165863 0.74961102 -0.035140377 0.457737535Algeria 5.248912334 9.540244102 0.806753874 65.69918823 0.436670482 -0.1946701264/19/2019 CITS1401https://lms.uwa.edu.au/bbcswebdav/pid-1254315-dt-content-rid-18654189_1/courses/CITS1401_SEM-1_2019/project1_2019.html 2/5Argentina 6.039330006 9.843519211 0.906699121 67.53870392 0.831966162 -0.186299905 0.305430293Armenia 4.287736416 9.034710884 0.697924912 65.12568665 0.613697052 -0.132166177 0.246900991Australia 7.25703764 10.71182728 0.949957848 72.78334045 0.910550177 0.301693261 0.45340696The data shown above (and discussed below) can be found in the CSV formated text fileWHR2018Chapter2_reduced_sample.csv.The actual method used to compute the Life Ladder score is quite complicated, so the the aim of this Project, in brief, is totest whether simpler methods can yield similar results. In particular, the Project aims to see whether any of a range ofproposed methods yields a similar ranking, when countries are ranked by Life Ladder score in descending order i.e. fromhappiest on these measures, to least happy. (The Wikipedia article also discusses criticisms of the World Happiness Reportprocess.)Looking at the data sample above, you can see that the column headers occupy the first row, the countries are listed in thefirst column, while the Life Ladder scores that we are seeking to emulate are in the second column. The third and subsequentcolumns contain the data from which you will compute your own Life Ladder scores. However, for this exercise, pleaseremember that the aim is not to replicate the precise Life Ladder scores, but rather to replicate the ranking of countries as aresult of the Life Ladder scores.Eye-balling the DataIn Data Science projects, it is always a good idea to eyeball the data before you attempt to analyse it. The aim is to spotany trends (this looks interesting) or any issues. So, looking at the sample above (ignoring the first two columns), what doyou notice?There is a difference in scale across the columns. Healthy Life Expectancy at Birth ranges from 52.3 to 72.8, but ingeneral is valued in 10s, while Social Support is a value in the range 0.0 to 1.0, and Freedom to Make Life Choices hasboth negative and positive floating point numbers. (The problem of GDP per Capita being actually valued in thethousands, or tens of thousands, has already been solved by the data collectors taking logs.) The issue is that you dontwant a particular attribute to appear significant just because it has much larger values than other attributes.The other thing you may have noticed is that sometimes the data is simply missing, e.g. the score for Confidence inNational Government for Algeria. Any metric we propose will have to deal with such missing data (which is actually avery common problem).Specification: What your program will need to doInputYour program needs to call the Python function input three times to:get the name of the input data fileget the name of the metric to be computed across the normalised data for each country. The allowed names are min,mean, median and harmonic_mean.get the name of the action to be performed. The two options here are: list, list the countries in descending order of thecomputed metric, or correlation, use Spearmans rank correlation coefficient to compute the correlation between ranksaccording to the computed metric and the ranks according to the Life Ladder score.The order of the 3 calls is clearly important.OutputThe output, printed to standard output, will be either a listing of the countries in descending order based on the computedmetric, or a statement containing the correlation value (a number between -1.0 and 1.0).Tasks: A more detailed specification4/19/2019 CITS1401https://lms.uwa.edu.au/bbcswebdav/pid-1254315-dt-content-rid-18654189_1/courses/CITS1401_SEM-1_2019/project1_2019.html 3/5Use input to read in 3 strings, representing the input file name, the metric to be applied to the data from the file(excluding the first two columns) and the action to be taken to report to the user.Read in the CSV formated text file. That is, fields in each row are separated by commas, e.g.Albania,4.639548302,9.373718262,0.637698293,69.05165863,0.74961102,-0.035140377,0.457737535Algeria,5.248912334,9.540244102,0.806753874,65.69918823,0.436670482,-0.194670126,Apart from the first field, all the other fields are either numbers (so converted using float(), or empty, which can betranslated to the Python object None. Each line will be transformed into a row, represented as a list, so you end up with alist of lists.For each column apart from the first two, compute the largest and smallest values in the column (ignoring any Nonevalues).Given the maximum and minimum values for each column, normalise all the values in the respective columns. That is,each value should be normalised by transforming it to a value between 0.0 and 1.0, where 0.0 corresponds to the sma代做CITS1401作业、代写Software Engineering作业、代写Python课程设计作业、Python编llestvalue, and 1.0 to the largest, with other values falling somewhere between 0.0 and 1.0. For example, the minimum LifeExpectancy years in the small dataset is 52.33952713. This is transformed to 0.0. The maximum value is 72.78334045,which is transformed to 1.0. So, working proportionally, 69.05165863 is transformed to 0.81746645. In general, thetransformation is (score - min)/(max-min), where max and min are the respective maximum and minimum scores for agiven column, and will, of course, differ from column to column.For each row, across all the columns except the first two, compute the nominated metric using the normalised values(excluding None). min, mean and median are, respectively, the minimum value (on the basis that a nationshappiness is bounded by the thing the citizens are grumpiest about), mean and median are the arithmetic mean andmedian value (discussed in lectures). The harmonic mean of a list of numbers is defined here. For harmonic mean, apartfrom avoiding None values, you will also have to avoid any zeroes; the other metrics have no problem with 0. The outputfrom this stage is a list of country,score pairs.The list of country,score pairs are either to be listed in order of descending score, or the Spearmans rank correlationcoefficient should be computed between the country,score list that you have computed and the Life Ladder list, whensorted by descending score. You can assume there are no tied ranks, which means that the simpler form of the Spearmancalculation can be used. An example of how to compute Spearmans rank correlation can be found here.Example>>> happiness.main()Enter name of file containing World Happiness computation data:WHR2018Chapter2_reduced_sample.csvChoose metric to be tested from: min, mean, median, harmonic_mean meanChose action to be performed on the data using the specified metric. Options arelist, correlation correlationThe correlation coefficient between the study ranking and the ranking using themean metric is 0.8286>>> happiness.main()Enter name of file containing World Happiness computation data:WHR2018Chapter2_reduced_sample.csvChoose metric to be tested from: min, mean, median, harmonic_mean harmonic_meanChose action to be performed on the data using the specified metric. Options arelist, correlation listRanked list of countries happiness scores based the harmonic_mean metricAustralia 0.9965Albania 0.5146Armenia 0.3046Afghanistan 0.0981Argentina 0.0884Algeria 0.07334/19/2019 CITS1401https://lms.uwa.edu.au/bbcswebdav/pid-1254315-dt-content-rid-18654189_1/courses/CITS1401_SEM-1_2019/project1_2019.html 4/5The complete table is in file WHR2018Chapter2_reduced.csv.ImportantYou will have noticed that you have not been asked to write specific functions. That has been left to you. However, it isimportant that your program defines the top-level function main(). The idea is that within main() the program callsthe other functions, as described above. (Of course, these may call further functions.) The reason this is important is thatwhen I test your program, my testing program will call your main() function. So, if you fail to define main(), myprogram will not be able to test your program.AssumptionsYour program can assume a number of things:Anything is that meant to be a string (i.e. a name) will be a string, and anything that is meant to be a number (i.e. a scorefor a country) will be a number.The order of columns in each row will follow the order of the headings, though data in particular columns may be missingin some rows.What being said, there are number of error conditions that your program should explicitly test for and respond to. Oneexample is detecting whether the named input file exists; for example, the user may have mistyped the name. The way thistest can be done is to first:import osThen, assuming the file name is in variable input_filename, use the test:if not os.path.isfile(input_filename) : return(None)and test for None in the calling function (likely main()).Things to avoidThere are a couple things for your program to avoid.Please do not use Pythons csv module. While use of the csv module is a perfectly sensible thing to do in a productionsetting, it takes away from much of the point of the first part of the project, which is about getting practice opening textfiles and processing text file data.Please do not assume that the input file names will end in .csv. File name suffixes such as .csv and .txt are not mandatoryin systems other than Microsoft Windows.Please make sure your program has only 3 calls to the input() function. More than 3 will cause your program to hang,waiting for input that my automated testing system will not provide. In fact, what will happen is that the marking programdetects the multiple calls, and will not test your code at all.SubmissionSubmit a single Python (.py) file containing all of your functions via cssubmit.Marking RubricFor convenience, your program will be marked out of 20 (later scaled to be out of 15% of the final mark).60% of the marks (12/20) will be awarded based on how well your program completes a number of tests, reflecting normaluse of the program, and also how the program handles various error states, such as the input file not being present. Other4/19/2019 CITS1401https://lms.uwa.edu.au/bbcswebdav/pid-1254315-dt-content-rid-18654189_1/courses/CITS1401_SEM-1_2019/project1_2019.html 5/5than things that you were asked to assume, you need to think creatively about the inputs your program may face.40% (8/20) will be style (5/8) — the code is clear to read — and efficiency (3/8) — your program is well constructed andruns efficiently. For style, think about use of comments, sensible variable names, your name at the top of the program.(Please look at your lecture notes, where this is discussed.)Style Rubric0 Gibberish, impossible to understand1-2 Style is really poor3-4 Style is good or very good, with small lapses5 Excellent style, really easy to read and followFor Project 1, there are not too many ways your code can be inefficient, but try to minimise the number of times yourprogram looks at the same data items. There are particular places where you should use readline(), but not in a loop.Efficiency Rubric0 Code too incomplete to judge efficiency, or wrong problem tackled1 Very poor efficiency, addtional loops, inappropriate use of readline()2 Acceptable efficiency, one or more lapses3 Good efficiency, within the scope of the assignment and where the class is up toAutomated testing is being used so that all submitted programs are being tested the same way. Sometimes it happens thatthere is one mistake in the program that means that no tests are passed. If the marker is able to spot the cause and fix itreadily, then they are allowed to do that and your - now fixed - program will score whatever it scores from the tests, minus2 marks, because other students will not have had the benefit of marker intervention. Still, thats way better than gettingzero, right? (On the other hand, if the bug is too hard to fix, the marker needs to move on to other submissions.)转自:http://www.7daixie.com/2019042217732321.html
网友评论