-
Notifications
You must be signed in to change notification settings - Fork 65
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
git-svn-id: https://pmtkdata.googlecode.com/svn/trunk@39 49fcfb0e-06c…
…3-11df-adb7-1be7f7e3f636
- Loading branch information
mattbdunham
committed
May 29, 2010
1 parent
aaf72ce
commit 5f5f602
Showing
441 changed files
with
252,891 additions
and
0 deletions.
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
% PMTKdescription | ||
% PMTKsource | ||
% PMTKtype | ||
% PMTKfileSize 35.2 KB | ||
% PMTKncases | ||
% PMTKndims |
Large diffs are not rendered by default.
Oops, something went wrong.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
NAME: 2004 New Car and Truck Data | ||
TYPE: Sample | ||
SIZE: 428 observations, 19 variables | ||
|
||
DESCRIPTIVE ABSTRACT: | ||
Specifications are given for 428 new vehicles for the 2004 year. The variables recorded include price, measurements | ||
relating to the size of the vehicle, and fuel efficiency. | ||
|
||
SOURCE: | ||
_Kiplinger's Personal Finance_, December 2003, vol. 57, no. 12, pp. 104-123, http:/www.kiplinger.com (permission to post on | ||
the JSE Web site kindly granted by PARS International Corporation, 102 West 38th Street, New York, NY 10018) | ||
|
||
VARIABLE DESCRIPTIONS: | ||
|
||
Columns Variables | ||
1- 45 Vehicle Name | ||
47 Sports Car? (1=yes, 0=no) | ||
49 Sport Utility Vehicle? (1=yes, 0=no) | ||
51 Wagon? (1=yes, 0=no) | ||
53 Minivan? (1=yes, 0=no) | ||
55 Pickup? (1=yes, 0=no) | ||
57 All-Wheel Drive? (1=yes, 0=no) | ||
59 Rear-Wheel Drive? (1=yes, 0=no) | ||
61- 66 Suggested Retail Price, what the manufacturer thinks the | ||
vehicle is worth, including adequate profit for the | ||
automaker and the dealer (U.S. Dollars) | ||
68- 73 Dealer Cost (or "invoice price"), what the dealership pays | ||
the manufacturer (U.S. Dollars) | ||
75- 77 Engine Size (liters) | ||
79- 80 Number of Cylinders (=-1 if rotary engine) | ||
82- 84 Horsepower | ||
86- 87 City Miles Per Gallon | ||
89- 90 Highway Miles Per Gallon | ||
92- 95 Weight (Pounds) | ||
97- 99 Wheel Base (inches) | ||
101-103 Length (inches) | ||
105-106 Width (inches) | ||
|
||
Values are aligned and delimited with blanks. | ||
Missing values are denoted with *. | ||
|
||
RELATED DATASETS: | ||
|
||
A similar dataset appeared as one of the sample datasets shipped with the _Student Edition of Execustat_ (PWS-KENT 1990). | ||
|
||
Another similar dataset, submitted by Robin Lock, may be found in the _Journal of Statistics Education_ dataset archive at | ||
http://www.amstat.org/publications/ jse/datasets/93cars.dat. The accompanying article "1993 New Car Data", Journal of | ||
Statistics Education, vol. 1, no. 1, by Lock (1993) may be found at the site | ||
http://www.amstat.org/publications/jse/v1n1/datasets.lock.html. | ||
|
||
COMMENTS ON MISSING VALUES: | ||
Length and Width of pickup trucks were not given in _Kiplinger's_. Possible updates to the dataset (viewable on a vehicle | ||
by vehicle basis) may be found at the site http://www.kiplinger.com/tools/carfinder/; these updates may allow some of the | ||
missing values to be filled in. | ||
|
||
PEDAGOGICAL NOTES: | ||
This is a multi-purpose dataset that can be used at many points in an introductory statistics course. It includes several | ||
numeric variables which can be examined across a choice of several categorical variables. Students tend to be familiar with | ||
most of the variables. Furthermore, they also tend to be able to anticipate and pose explanations for many of the | ||
relationships between variables. Suggested Retail Price/Dealer Cost and the two MPG variables are popular choices as | ||
response variables. One suggested exploration - try fitting the MPG variables as linear in reciprocal weight (clear | ||
outliers to this strongly linear pattern are the hybrid gas/electric cars). Other explorations are suggested in the | ||
article of Lock (1993) referenced above. | ||
|
||
SUBMITTED BY: | ||
Roger W. Johnson | ||
Department of Mathematics & Computer Science | ||
South Dakota School of Mines and Technology | ||
501 East St. Joseph Street | ||
Rapid City, South Dakota 57701 | ||
(605) 355-3450 | ||
[email protected] | ||
|
||
|
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
% Parse the file 04cars.dat from | ||
% http://www.amstat.org/publications/jse/datasets/04cars.txt | ||
|
||
if 0 | ||
fid = fopen('04cars.dat'); | ||
%dat = textscan(fid, '%45c%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d%d'); | ||
while 1 | ||
tline = fgetl(fid); | ||
if ~ischar(tline), break, end | ||
disp(tline) | ||
end | ||
fclose(fid); | ||
end | ||
|
||
%Cosma Shalizi has already parsed the data and removed rows with missing values | ||
% http://www.stat.cmu.edu/~cshalizi/350/lectures/10/cars-fixed04.dat | ||
|
||
data = importdata('04cars-fixed.csv'); | ||
X = data.data; % 387 rows, 18 features, 1-7 are binary, rest are integer | ||
names = data.textdata(2:end); | ||
types = [repmat('b', 1, 7) repmat('c', 1, 11)]; | ||
header = data.textdata{1}; | ||
ndx = strfind(header, ','); | ||
ndx(end+1) = length(header)+1; | ||
start = 1; | ||
for i=1:length(ndx) | ||
stop = ndx(i)-1; | ||
varlabels{i} = header(start:stop); | ||
start = ndx(i) + 1; | ||
end | ||
|
||
save('04cars.mat', 'X', 'names', 'varlabels', 'types') |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
% PMTKdescription | ||
% PMTKsource | ||
% PMTKtype | ||
% PMTKfileSize 12.8 MB | ||
% PMTKncases | ||
% PMTKndims |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
14-cancer gene expression data. 16,063 genes, 144 training samples, | ||
54 test samples. | ||
|
||
One gene per row, one sample per column | ||
|
||
Cancer classes are labelled as follows: | ||
|
||
1. breast | ||
2. prostate | ||
3. lung | ||
4. collerectal | ||
5. lymphoma | ||
6. bladder | ||
7. melanoma | ||
8. uterus | ||
9. leukemia | ||
10. renal | ||
11. pancreas | ||
12. ovary | ||
13. meso | ||
14. cns | ||
|
||
Reference: | ||
|
||
S. Ramaswamy and P. Tamayo and R. Rifkin and S. Mukherjee and C.H. Yeang and | ||
M. Angelo and C. Ladd and M. Reich and E. Latulippe and J.P. Mesirov and | ||
T. Poggio and W. Gerald and M. Loda and E.S. Lander and T.R. Golub (2001) | ||
|
||
Multiclass Cancer Diagnosis Using Tumor Gene Expression Signatures | ||
|
||
Proc. Natl. Acad. Sci., 98, p15149-15154. | ||
|
||
|
Binary file not shown.
Oops, something went wrong.