CONCEPT Beta Release v2 README File November 2004 1.0 Changes from beta 1: 1.1 Directory Structure All CONCEPT code files are now in the concept subdirectory, and all data files for the beta are in a separate subdirectory called concept_projects/beta2. This simplifies the management of the source code under CVS (everything under the concept directory is in source control, everything outside the concept directory is not). The concept script has been moved to the concept directory (there is no longer a bin directory). All executable code files other than the main concept script are now in subdirectories under the src subdirectory. The main concept script and all called scripts have been rewritten so that you can execute the concept script from any directory. You no longer have to change directory to the concept location to run the model. If you add the $CONCEPT_HOME directory to your PATH environment setting, you can run CONCEPT from anywhere! 1.2 perl Must Be In Your Path You will need the perl executable in your path. Although the perl installation recommends a common location for the perl interpreter (/usr/bin/perl), some distributions do not follow the guideline. Thus the shell scripts that run the perl programs call the perl interpreter directly (rather than relying on the "shebang" notation - #!/usr/bin/perl). If you are not sure whether perl is in your path, type the command "which perl" - if the reply indicates "no perl in ..." then perl is not in your path. You can also try "perl --version" and see if you get a "command not found" response. Please refer to your shell documentation on how to add locations to your path environment variable. 1.3 Data Import System The data import subsystem has been completely rewritten. The main import program is now in perl (concept/src/import/import.pl). The fieldwidths.dat file has been renamed field_defs.dat (it is in the concept/src/import directory), and it now defines the field names, types (just C for character/date and N for numeric), and widths. The import.pl program also processes the fields in the order given, so the file column order no longer needs to match the database tables. In addition, all of the tables have had their primary key constraints defined and defaults set for all NOT NULL fields (except date fields, which have no good default value). The import.pl program only imports the columns that are not empty in the input files - the remaining fields are not included in the INSERT statement. This allows PostgreSQL to assign the proper default values to missing or empty fields (e.g., country_code which is missing from a number of files and which now defaults to 'US'). The importer also takes a new parameter "-t" which instructs the program to use transactions to batch the insert statements. The positive side of this is that it runs much faster when you use the -t switch. The negative side is that when using transactions, the entire current transaction (set to 10000 lines of input data) is aborted if any type of error occurs. This renders the entire import job invalid. The recommended approach is to first try to import your data with transactions, then if you encounter errors either fix the input files or rerun the import without transactions. When you run without transactions, only the line with the error is skipped - all other lines are imported. One common type of error is a duplicate primary key error. This error can safely be ignored, so you can run the entire import without transactions and just ignore the duplicate lines. If you wish to speed up the import process, delete the duplicate lines from the input files and run the import with transactions. One last note on the importer - if there are errors in the input, examine the console output from the import routine. It will contain the type of error and the line number in the input file where the error occurred. At present, only the first 5 errors are reported - you may change this by editing the value of the $maxErrors variable on line 67 of the src/import/import.pl script. 1.4 Run Control File The run control file content has been expanded somewhat. The file now includes values for area and point source data QA levels, and a debug parameter. The QA levels control how detailed the QA is for the NEI data (QA routines have been defined for levels 1 and 2). The debug parameter at this point only controls whether the detailed tables for temporal allocation are created. If the debug parameter is set to 2 or higher, the temporal processors create an additional table (area_temporal_debug and point_temporal_debug) that contain the actual profiles, profile sources, and other detailed information for every candidate emission record (nei_area_em or nei_point_em) for the model run. Creating these tables adds to the runtime of each model, and the output is only required for very detailed debugging. 1.5 24 Hours Per Record In the previous release of the CONCEPT area and point source models, the output tables were organized with one record per hour of the model run. This has been changed in this release such that all output tables (starting with the final temporal allocation tables an all tables created after that step) are created on a per-day basis with 24 hourly emission values per record. 1.6 Numeric Scale and Precision The NEI and RPO data tables were previously set up for unlimited numerical precision using NUMERIC fields. This has been changed so that all number fields have their scale and precision set to match the largest scale and precision allowed in the NEI and RPO formats. All internally calculated values are stored as double precision FLOAT fields. 1.7 Reference Data and NEI Inventory The RPO and global reference files included with the second beta release are mostly complete with the correct lookup data for the included NEI data. The NEI is for Kentucky, and the spatial data is for the national 36k grid. 2.0 Running the CONCEPT Beta Here are the steps for running the CONCEPT area source and point source model betas: 2.1 Make sure that your user id has the following environment variables set correctly: PGHOME - should point to the install directory for PostgreSQL PGDATA - should point to the directoy where the PostgreSQL data files reside CONCEPT_HOME - should point to the concept subdirectory in the directory where you unpacked the CONCEPT beta CONCEPT_PROJECTS - should point to the concept_projects subdirectory in the directory where you unpacked the CONCEPT beta PATH - should contain the $CONCEPT_HOME directory and the PostgreSQL bin directory. Should also contain the location of the perl executable. 2.2 If you have not initialized the PostgreSQL database system post-install (e.g., if PostgreSQL came pre-installed on your system), you need to do so as follows: su postgres -c "initdb -D $PGDATA" NOTE - the syntax "su postgres -c ..." is used repeatedly in this README. If you are unfamiliar with the command, please refer to it's man page for more information. You can accomplish all of these commands directly as the postgres user if you prefer. 2.3 If necessary, start the PostgreSQL database system: su - postgres /usr/local/pgsql/bin/postmaster -D /usr/local/pgsql/data >logfile 2>&1 & exit This command will place the PostgreSQL log file ("logfile") in the directory from which you ran the command. If you prefer a different location, or want to use a different log file name, replace logfile with the appropriate file specification. Please refer to the PostgreSQL documentation for additional options. 2.4 If necessary, create the concept group in PostgreSQL: su postgres -c "psql -c 'create group concept' test" This only needs to be executed once regardless of the number of projects you create. NOTE - the parameter "test" refers to the initial test database that you set up during the PostgreSQL install. If you used a different database name, you can substitute the name of any valid PostgreSQL database instead. 2.5 If you will be running CONCEPT as a user other than the PostgreSQL owner, add your user id to the concept group in PostgreSQL (this is not a unix group): su postgres -c "createuser john" su postgres -c "concept add_user -n test -u john" This only needs to be run once per user, regardless of which project you are working on. Replace "john" with your unix user id. When you run the first command, answer yes to the permissions questions that arise. 2.6 Create a CONCEPT project for the beta test. This will create a database called "beta2": su postgres -c "concept create_project -n beta2" Assumes you are not logged in as the postgres user, and that the database is running. If you have previously created the beta2 database, either use a different name for the project, or drop the original project using the following command: su postgres -c "concept drop_project -n beta2" 2.7 Initialize the beta2 project: concept init_project -n beta2 This command creates the project database, and creates the global lookup tables and stored procedures. 2.8 Each CONCEPT run is executed for a specific scenario. Create a scenario for the beta test: concept add_scenario -n beta2 -s scenario1 This command adds the scenario schema and creates the scenario-specific tables. 2.9 Import the global reference data: concept import_globals -n beta2 -d $CONCEPT_PROJECTS/beta2/globals -t The global reference data provided for the beta2 is "clean" so you can use transactions (with the "-t" switch) to speed up the import. 2.10 Import the RPO cross-reference data (again, with transactions): concept import_rpo -n beta2 -d $CONCEPT_PROJECTS/beta2/rpo -t 2.11 Import the control file for the scenario: concept import_control -n beta2 -s scenario1 \ -c $CONCEPT_PROJECTS/beta2/scenario1/run_control.txt 2.12 Import the scenario-specific inventory data: concept import_nei -n beta2 -s scenario1 -d $CONCEPT_PROJECTS/beta2/scenario1/nei If you prefer, you cna import the area and point source data independently: concept import_nei_area -n beta2 -s scenario1 -d $CONCEPT_PROJECTS/beta2/scenario1/nei concept import_nei_point -n beta2 -s scenario1 -d $CONCEPT_PROJECTS/beta2/scenario1/nei 2.13 Run the QA routines: concept qa_nei_area -n beta2 -s scenario1 concept qa_nei_point -n beta2 -s scenario1 These commands run the qa routines on the NEI input data, set some default values in the NEI and RPO tables, and prepare the integer keys for the NEI data. 2.14 Run the area model: concept run_area_model -n beta2 -s scenario1 -d output_directory This command executes the three area source modules and generates the CAMx output files and the other outputs. The output files are written to the directory specified by the -d parameter. 2.15 Run the point model: concept run_point_model -n beta2 -s scenario1 This command executes the three point source modules and generates the CAMx output files and the other outputs. The output files are written to the directory specified by the -d parameter. 2.16 Optional - a utility procedure is provided that counts the records in all of the intermediate and final output tables. You can run it from the psql program: psql beta2 beta2=> set search_path=scenario1,xref,globals; SET beta2=> select table_counts(); Once this procedure is complete, the table "table_counts" contains raw counts and some summarized counts by table. You can create a file with this information at the shell prompt: psql beta2 -c "select * from scenario1.table_counts" > table_counts.txt NOTE - you must run both the area and point source models before running the table_counts procedure. 3.0 Optional Utility Scripts In the $CONCEPT_PROJECTS/beta2 directory you will find four utility scrips for running the CONCEPT betas. Make sure your environment is set up according to the requirements listed in section 2.1. The scripts are as follows: create_beta2.sh drops the beta2 project and recreates it. You will be prompted twice for a password - enter the password for the postgres user id. init_beta2.sh deletes and adds the "scenario1" scenario, initializes the scenario, runs the data imports, and runs the qa routines - the execution is logged to the file log.init_beta2. area_beta2.sh runs the area source model for beta2 scenario1 - the execution is logged to the file log.area_beta2. point_beta2.sh runs the point source model for beta2 scenario1 - the execution is logged to the file log.point_beta2. 4.0 Administration Notes The CONCEPT model creates and deletes many tables during execution. If the model performance begins to deteriorate, you can run the vacuumdb command to clean up unused space. The command must be run as the postgres user: vacuumdb beta2 At some point, this may no longer improve performance to the degree desired - at this point you should drop the CONCEPT project and recreate it: su postgres -c "concept drop_project -n beta2" su postgres -c "concept create_project -n beta2"