Find Cities and their Closest Neighbors¶
Preface¶
In this tutorial, we will use the SQL module and the Server module, along with some client side scripting to create a search of city names, which also lists the nearest cities to the selected city.
You can download the complete project from our Tutorial Repository in the citysearch directory.
License¶
The code related to this tutorial is released under the MIT license. The data used to populate the database is licensed under a Creative Commons License.
Concepts¶
This module will demonstrate:
- Using the SQL module to import csv data.
- Using the SQL module to create a Fulltext search for the client side autocomplete search.
- Using the SQL geocode functions to encode latitude/longitude pairs for each entry in the database, and to find the closest cities to any given latitude/longitude pair.
In order to complete this tutorial, you should have basic knowledge of JavaScript, SQL and client side scripting using JavaScript (a passing knowledge of jQuery is also helpful). This tutorial will not provide an in-depth explanation of how client-side scripting works.
Getting Started¶
In order to search by name and by location, we need to create a database that contains both place names and latitude/longitude information. Fortunately, the good folks at GeoNames.org and OpenDataSoft have available a CSV formatted file that accomplishes that and which is licensed under a Creative Commons License.
You can navigate to https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/export/ and choose the CSV file.
Your first task will be to create a directory structure for this project.
Web Server Layout¶
In the Rampart binary distribution is a sample web server tree. For our purposes here, we
assume you have downloaded and unzipped the Rampart binary distribution into a directory named
~/downloads/rampart
.
We will use that for this project. The
rampart-server HTTP module is configured and loaded from the included web_server/web_server_conf.js
script. It defines web_server/html
as the default directory for static html, web_server/apps
as the default
directory for scripts and web_server/data
as a standard location for databases.
To get started, copy the web_server
directory to a convenient place for this project. Also, for our
purposes, we do not need anything in the web_server/apps/test_modules
or
web_server/apps/wsapps
directories, so you can delete the copy of those
files. We will also add an empty file for our import script and web interface at web_server/apps/citysearch.js
.
user@localhost:~$ mkdir citysearch
user@localhost:~$ cd citysearch
user@localhost:~/citysearch$ cp -a ~/downloads/rampart/web_server ./
user@localhost:~/citysearch$ cd web_server/
user@localhost:~/citysearch/web_server$ rm -rf apps/test_modules wsapps/* html/index.html
user@localhost:~/citysearch/web_server$ touch ./apps/citysearch.js
user@localhost:~/citysearch/web_server$ find .
user@localhost:~/citysearch/web_server$ curl -o data/geonames-all-cities-with-a-population-1000.csv \
'https://public.opendatasoft.com/explore/dataset/geonames-all-cities-with-a-population-1000/download/?format=csv&timezone=America/Los_Angeles&lang=en&use_labels_for_header=true&csv_separator=%3B'
./web_server
./web_server/apps
./web_server/apps/citysearch.js
./web_server/html
./web_server/html/images
./web_server/html/images/inigo-not-found.jpg
./web_server/data
./web_server/data/geonames-all-cities-with-a-population-1000.csv
./web_server/logs
./web_server/start_web_server.sh
./web_server/wsapps
./web_server/web_server_conf.js
Importing the Data¶
Examine your CSV¶
The first step for importing a CSV file is to have a look at the format of the data in the file. CSV can vary quite a bit, and to import it, we will want to know a few thing such as (but not limited to):
- Does every column contain the same type?
- Are text columns quoted?
- Are single quotes present inside unquoted text fields?
- Are records separated by
,
or by something else? - Is there a header row as the first line in the file?
Let’s examine some rows of geonames-all-cities-with-a-population-1000.csv
:
$ head -n 5 geonames-all-cities-with-a-population-1000.csv
Geoname ID;Name;ASCII Name;Alternate Names;Feature Class;Feature Code;Country Code;Country name EN;Country Code 2;Admin1 Code;Admin2 Code;Admin3 Code;Admin4 Code;Population;Elevation;DIgital Elevation Model;Timezone;Modification date;LABEL EN;Coordinates
8396129;Sanjiang;Sanjiang;Sanjiang,Sanjiang Jiedao,Sanjiang Qu,san jiang,san jiang jie dao,san jiang qu,三江,三江区,三江街道;P;PPLA3;CN;China;;01;3402;;;0;;14;Asia/Shanghai;2021-09-19;China;31.34813,118.36132
8405692;Xinmin;Xinmin;Xinmin,Xinmin Zhen,xin min,xin min zhen,新民,新民镇;P;PPLA4;CN;China;;33;8739734;;;28033;;402;Asia/Shanghai;2022-04-12;China;30.39759,107.3895
8416824;Jindaoxia;Jindaoxia;Jindaoxia,Jindaoxia Zhen,jin dao xia,jin dao xia zhen,金刀峡,金刀峡镇;P;PPLA4;CN;China;;33;8739734;;;13752;;323;Asia/Shanghai;2022-04-01;China;30.00528,106.65187
8420197;Jianlong;Jianlong;Jianlong,Jianlong Xiang,jian long,jian long xiang,健龙,健龙乡;P;PPLA4;CN;China;;33;8739734;;;18151;;276;Asia/Shanghai;2022-04-01;China;29.3586,106.18522
$ grep "United States" geonames-all-cities-with-a-population-1000.csv | head -n 5
5095312;Atlantic Highlands;Atlantic Highlands;Atlantic Highlands,Atlantik Khajlands,Portland Point,Portland Poynt,atlantyk haylndz nywjrsy,Атлантик Хайландс,Атлантик Хајландс,آتلانتیک هایلندز، نیوجرسی;P;PPL;US;United States;;NJ;025;02110;;4311;11;14;America/New_York;2017-05-23;United States;40.40789,-74.03431
5095335;Avon-by-the-Sea;Avon-by-the-Sea;Avon,Avon-by-the-Sea,Ehvan-baj-zeh-Si,Key East,aywn-bay-d-sy nywjrsy,Эван-бай-зэ-Си,ایون-بای-د-سی، نیوجرسی;P;PPL;US;United States;;NJ;025;02440;;1794;4;6;America/New_York;2017-05-23;United States;40.19234,-74.01597
5095561;Belmar;Belmar;BLM,Behlmar,Belmar,Ocean Beach,blmar nywjrsy,Белмар,Бэлмар,بلمار، نیوجرسی;P;PPL;US;United States;;NJ;025;04930;;5712;4;6;America/New_York;2017-05-23;United States;40.17845,-74.0218
5096289;Carlstadt;Carlstadt;Carlstadt,Karlshtad,Karlstadt,Tailor Town,karlastadt nywjrsy,Карлстадт,Карлштад,کارلاستادت، نیوجرسی;P;PPL;US;United States;;NJ;003;10480;;6279;55;55;America/New_York;2017-05-23;United States;40.84038,-74.0907
5097627;Elmwood Park;Elmwood Park;Dundee Lake,East Paterson,Ehlmvud Park,Elmvud Park,Elmwood Park,almwwd park nywjrsy,Елмвуд Парк,Элмвуд Парк,الموود پارک، نیوجرسی;P;PPL;US;United States;;NJ;003;21300;;20279;14;14;America/New_York;2017-05-23;United States;40.90399,-74.11848
Immediately we can see that records are separated by semi-colons (;
), that the “Admin1” column has
both number and text types and that text fields are not quoted. Also the first row appears to
be column names, and not data.
Next let’s check if there are embedded single quotes in text fields:
$ grep "'" geonames-all-cities-with-a-population-1000.csv | head -n 2
8992324;Nu’erbage;Nu'erbage;Hetian Shi,Nu'erbage,Nu'erbage Jiedao,Nu’erbage,Nu’erbage Jiedao,he tian shi,nu er ba ge,nu er ba ge jie dao,努尔巴格,努尔巴格街道,和田市;P;PPLA3;CN;China;;13;6532;;;0;;1379;Asia/Urumqi;2021-09-20;China;37.1134,79.91034
12450872;Yingye'er;Yingye'er;;P;PPLA4;CN;China;;13;6532;;;11485;;1315;Asia/Urumqi;2022-03-31;China;37.37312,79.77745
And indeed there are single quotes in city names.
So to answer our earlier questions:
- Does every column contain the same type? – NO
- Are text columns quoted? – NO
- Are single quotes present inside unquoted text fields? – YES
- Are records separated by
,
? – NO, Uses semicolons - Is there a header row as the first line in the file? YES
Note that although the file is separated by semi-colons, we will continue to use the term
CSV
.
Armed with this knowledge, we are ready to create a script that imports our data.
Creating the table¶
The data will need to be imported in two stages. Stage one will be as-is from the CSV into a temporary table. Stage two will combine City, Admin1 and Country names into one field, separate and convert “Latitude,Longitute” to Numbers and compute a geocode we will use later for a bounded area search.
So let’s create a script that will make our table by opening ~/citysearch/web_server/apps/citysearch.js
in your text editor.
To begin, we need to load the SQL module and open a database;
var Sql = require("rampart-sql");
var sql = new Sql.init("~/citysearch/web_server/data/cities", true);
In the above code, the var
Sql = require("rampart-sql");
line loads the SQL module that is distributed with
Rampart as rampart-sql.so
. The second line, var sql
= new Sql.init("~/citysearch/web_server/data/cities", true);
opens the database.
Note the true
in
Sql.init()
. It
signifies that if the database does not exist, create the directory and the metadata files
necessary for a new, blank database.
When creating a new database, be sure that:
- The directory does not (yet) exist (it will be created). If it exists and does not contain the metadata files, the opening of the database will fail.
- The parent directory (in this case
~/citysearch/web_server/data/
) does exist, and that you have read/write permissions.
So now that we have the code to open, and optionally create our database, let’s make our table.
function create_tmp_table() {
sql.exec("create table cities_tmp (" +
"Geoname_ID varchar(8), " +
"Name varchar(8), " +
"ASCII_Name varchar(8), " +
"Alternate_Names varchar(8), " +
"Feature_Class varchar(8), " +
"Feature_Code varchar(8), " +
"Country_Code varchar(8), " +
"Country_name_EN varchar(8), " +
"Country_Code_2 varchar(8), " +
"Admin1_Code varchar(8), " +
"Admin2_Code varchar(8), " +
"Admin3_Code varchar(8), " +
"Admin4_Code varchar(8), " +
"Population int, " +
"Elevation int, " +
"Digital_Elevation_Model int, " +
"Timezone varchar(8), " +
"Modification_date varchar(8), " +
"LABEL_EN varchar(8), " +
"Coordinates varchar(8)" +
");" );
}
create_tmp_table();
This should all be self expanatory. If not, please brush up on your SQL.
Note though that varchar(x)
in Texis SQL, the size x
is merely a suggestion. If the
text put into this field is larger that what is specified, it will not truncate the text or
affect any indexing.
Now we have our table. Let’s get that data in!
Importing the CSV¶
The Settings¶
The SQL module has a convenient function to import CSV data. It has many options and often takes careful examination of the data to be imported to get it correct. Fortunately the large number of options allows us to get well formatted CSV data imported correctly.
See the section for importCsvFile() for a full listing.
Knowing what we discovered when we examined the allCountries.txt
file above,
we can make our importCsvFile()
call look like this:
var csvFile = "../data/geonames-all-cities-with-a-population-1000.csv";
function import_csv() {
total = sql.importCsvFile(
csvFile, //file to import
{
tableName: 'cities_tmp',
singleQuoteNest: false,
hasHeaderRow: true,
delimiter: ';',
normalize: false,
}
);
printf('\n%d rows in total.\n',total);
}
So the Object of settings passed to importCsvFile()
addresses the
answers to all our questions above (singleQuoteNest: false
is because there are
single quotes present AND they are not quoted in double quotes – one setting for those two
questions).
We are ready to import the data. But before we do, we know this will take a bit of time. Let’s set up a function to monitor the progress so we aren’t staring at a blank screen wondering when the import will be finished.
Monitoring the Import¶
There are two major stages at which we can monitor the import. The first is while
importCsvFile()
is
analysing the data, and the second is as the data is being inserted into the SQL table.
In the first, analysis stage, a monitoring function is passed two parameters: monitor_import(count, stage)
. The analysis takes at least two passes. If normalize
, is set true
, two more passes are
added for each column.
Note that the first stage is significantly faster than the second. Therefore if your dataset is not very large, you might want to skip reporting on the progress of the first page.
In the second, insert stage, a monitoring function is passed only one parameter: monitor_import(count)
. There
is only one pass for the second stage.
Knowing this, we can write a single function to print out our progress, which may be used at either of or both of the major stages.
// cuz no one likes writing out 'rampart.utils.printf()'
rampart.globalize(rampart.utils);
var total=-1; //we won't know the total until we finish the first pass of importCsvFile
var step = 100; //set in importCsvFile(), only report every 100th row
/* a single function to monitor the import for both pre-processing (progressFunc)
and import (callback function supplied to sql.importCsvFile as a paramater) */
function monitor_import(count, stg) {
var stage = "Import";
if(count==0)
printf("\n");
if(stg!==undefined) // progressfunc
stage=stg;
if(stg === 0) //differentiate between 0 and undefined
{
total=count; //update our total in the first stage.
printf("Stage: %s, Count: %d \r", stage, count);
} else {
printf("Stage: %s, Count: %d of %d \r", stage, count, total);
}
fflush(stdout);
}
Transforming the Data¶
At this point we can import the data into the temporary table. Now it needs to be transformed into its final form and put in the final table.
So we will use the following to create the table and transform the data.
function create_final_table() {
sql.exec("create table cities (" +
"id counter, " +
"place varchar(8), " +
"alt_names varchar(8), " +
"population int, " +
"latitude double, " +
"longitude double, " +
"geocode long, " +
"timezone varchar(8), " +
"country varchar(8) " +
");" );
}
function makerow(o) {
var ret={}, tmp;
ret.place = sprintf('%s, %s %s(%s)', o.Name, o.Admin1_Code, o.Country_name_EN, o.Country_Code);
ret.altNames = o.Alternate_Names;
ret.population = o.Population;
tmp = o.Coordinates.split(',');
ret.lat = parseFloat(tmp[0]);
ret.lon = parseFloat(tmp[1]);
ret.tz = o.Timezone;
ret.country = o.Country_name_EN;
return ret;
}
function build_final_table() {
printf("sorting rows\n");
sql.exec("select * from cities_tmp order by Population DESC",
function(res,i) {
if(!i) printf("done\nCreating Final Table\n");
var vals = makerow(res);
sql.exec("insert into cities values( " +
"counter, ?place, ?altNames, ?population, ?lat, ?lon, latlon2geocode(?lat, ?lon), ?tz, ?country );",
vals );
if (! (i % 100) ) {
printf("%d of %d\r", i, total);
fflush(stdout);
}
},
{maxRows:-1}
);
printf('\n');
}
create_final_table();
build_final_table();
The create_final_table()
function is easily understood.
In the makerow()
function, we take a single row from the temporary table and create the placename, separate
the coordinates and convert them to numbers and add the other columns we want to keep.
In the build_final_table()
function, we select one row at a time, transform the
row by passing it to makerow()
and insert it into our final table. Rows are returned from the
cities_tmp
table
ordered by population so that we do not need to sort them in the web application.
In addition, we will calculate the geocode necessary to do bounded geographical searches and
insert it into the geocode
column.
According to the documentation:
The latlon2geocode function encodes a given latitude/longitude coordinate
into one long return value. This encoded value – a “geocode” value – can
be indexed and used with a special variant of Texis’ BETWEEN operator for
bounded-area searches of a geographical region.
That is exactly what we need to efficiently search for other places close to a given one.
Putting it Together¶
Putting this all together, and using the callbackStep
and progressStep
settings, we end
up with this:
function create_tmp_table() {
sql.exec("create table cities_tmp (" +
"Geoname_ID varchar(8), " +
"Name varchar(8), " +
"ASCII_Name varchar(8), " +
"Alternate_Names varchar(8), " +
"Feature_Class varchar(8), " +
"Feature_Code varchar(8), " +
"Country_Code varchar(8), " +
"Country_name_EN varchar(8), " +
"Country_Code_2 varchar(8), " +
"Admin1_Code varchar(8), " +
"Admin2_Code varchar(8), " +
"Admin3_Code varchar(8), " +
"Admin4_Code varchar(8), " +
"Population int, " +
"Elevation int, " +
"Digital_Elevation_Model int, " +
"Timezone varchar(8), " +
"Modification_date varchar(8), " +
"LABEL_EN varchar(8), " +
"Coordinates varchar(8)" +
");" );
}
var total=-1; //we won't know the total until we finish the first pass of importCsvFile
var step = 100; //set in importCsvFile(), only report every 100th row
/* a single function to monitor the import for both pre-processing (progressFunc)
and import (callback function supplied to sql.importCsvFile as a paramater) */
function monitor_import(count, stg) {
var stage = "Import";
if(count==0)
printf("\n");
if(stg!==undefined) // progressfunc
stage=stg;
if(stg === 0) //differentiate between 0 and undefined
{
total=count; //update our total in the first stage.
printf("Stage: %s, Count: %d \r", stage, count);
} else {
printf("Stage: %s, Count: %d of %d \r", stage, count, total);
}
fflush(stdout);
}
function import_csv() {
total = sql.importCsvFile(
csvFile, //file to import
{
tableName: 'cities_tmp',
singleQuoteNest: false,
hasHeaderRow: true,
delimiter: ';',
normalize: false,
callbackStep: step, //callback run every 100th row
progressStep: step, //progressfunc run every 100th row for each stage
progressFunc: monitor_import //progress function while processing csv
},
monitor_import //callback function upon actual import
);
printf('\n%d rows in total.\n',total);
}
function create_final_table() {
sql.exec("create table cities (" +
"id counter, " +
"place varchar(8), " +
"alt_names varchar(8), " +
"population int, " +
"latitude double, " +
"longitude double, " +
"geocode long, " +
"timezone varchar(8), " +
"country varchar(8) " +
");" );
}
function makerow(o) {
var ret={}, tmp;
ret.place = sprintf('%s, %s %s(%s)', o.Name, o.Admin1_Code, o.Country_name_EN, o.Country_Code);
ret.altNames = o.Alternate_Names;
ret.population = o.Population;
tmp = o.Coordinates.split(',');
ret.lat = parseFloat(tmp[0]);
ret.lon = parseFloat(tmp[1]);
ret.tz = o.Timezone;
ret.country = o.Country_name_EN;
return ret;
}
function build_final_table() {
printf("sorting rows\n");
sql.exec("select * from cities_tmp order by Population DESC",
function(res,i) {
if(!i) printf("done\nCreating Final Table\n");
var vals = makerow(res);
sql.exec("insert into cities values( " +
"counter, ?place, ?altNames, ?population, ?lat, ?lon, latlon2geocode(?lat, ?lon), ?tz, ?country );",
vals );
if (! (i % 100) ) {
printf("%d of %d\r", i, total);
fflush(stdout);
}
},
{maxRows:-1}
);
printf('\n');
}
create_tmp_table();
import_csv();
create_final_table();
build_final_table();
Building Indexes¶
The geocode index¶
We need to create an index to speed up the bounded “geocode” search we will use in our web application. And we should do that in our script. And once again, we are going to wrap it in a function.
function make_geocode_index() {
printf("creating index on geocode\n");
sql.exec("create index cities_geocode_x on cities(geocode) WITH INDEXMETER 'on';");
}
A couple of things to note:
First – when this index is made, it will backed by a file named cities_geocode_x.btr
. It is so
named because it will be easy to find using ls -l
(it will come right before the table, named cities.tbl
, and it lets you
know the field indexed (_geocode
) and the type of index (_x
for plain index, i.e. - not
Fulltext or unique). Your methodology for naming indexes is not as important as making sure
you are consistent, can read it and know what the index is for without having to resort to
looking it up.
However if you do not know what an index is for, you can always look it up in the System Catalog:
$ tsql -d ../data/cities/ "select * from SYSINDEX where NAME='cities_geocode_x'"
NAME TBNAME FNAME COLLSEQ TYPE NON_UNIQUE FIELDS PARAMS
------------+------------+------------+------------+------------+------------+------------+------------+
cities_geocode_x cities cities_geocode_x A B 01 geocode stringcomparemode=unicodemulti,respectcase;
Second – note the WITH
INDEXMETER 'on'
in the SQL
statement. This will tell Texis to print a pretty meter to let you know the progress of your
index creation.
The id index¶
Eventually, when we write our web application script, we will want to look up records based
on the id counter
field. So we will make a function to make an index on that as
well.
function make_id_index(){
printf("creating index on id\n");
sql.exec("create index cities_id_x on cities(id) WITH INDEXMETER 'on';");
}
The Fulltext index¶
When making a Fulltext index, often it is best to leave most settings as is. The one exception is the rex expressions used to define what is a word. Often we want to make sure we include utf-8 encoded text.
The default is \alnum{2,99}
, which is similar to doing mydoc.match(/[a-zA-Z0-9]+/g)
.
Since we are processing utf-8 text, and since we have names of places from all over the
world, we had better accommodate bytes larger than 0x79
.
To change the expression used during Fulltext index creation, we can use delExp
and addExp. However as a shortcut, we can specify the expressions that
will be used in the SQL statement itself by utilizing the WITH
keyword:
WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}')
That will find words consisting of 7-bit ascii letters and numbers, plus utf-8 multibyte characters as well.
Now we also have a database that does not contain normal text. It is worth thinking about
where this might bite us when we perform a search. Let’s look at the list of noise
words. The excluded us
, to
and in
look suspect. They are the same as some of our country codes. And
or
is an
Admin1
abbreviation for Oregon.
Normally when searching normal English text, removing all the noise words would hurt performance. However, in this very specific circumstance, noise words serve no purpose. Further, they will exclude some codes we want in our index. So we will delete the noiseList.
You should always have a very good reason for altering or deleting the noiseList. For English text, there is rarely a need to do so and it can have adverse consequences on performance and quality of results. However, this time we do have a good reason.
Now - what exactly do we want to index? Naturally we want to be able to look up a place based
upon the place
column. So we will create one index on it. However we also have alternate names, which we
will want to search if there are no matches from place
. Thus we will create a
separate index on alt_names
as well.
So, let’s see our SQL statements to create the Fulltext indexes in its own function:
function make_text_indexes() {
printf("creating indexes on place names\n");
// noiselist as detailed at https://rampart.dev/docs/sql-set.html#noiselist
// This is not English text and some geographic abbreviations like OR IN DO TO SO and US
// are also on the noise words list. Setting to empty will allow such words in the index.
sql.set({ noiseList:[]});
// make compact index. Sorting by population, not by likep rank. See like3 search below.
sql.exec("create fulltext index cities_place_ftx on cities(place)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
sql.exec("create fulltext index cities_altNames_ftx on cities(alt_names)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
}
Note that we also have WORDPOSITIONS 'off'
. This omits the position of
the indexed words and creates a smaller index. Normally we would want this information in
order to rank by relevance, taking into account word proximity, order and placement in the
document. However in our web application we will be ordering by population, so this
information is not needed.
See the documentation for more information.
The Final Import Script¶
We put it all together, wrap it in a function, and it looks something like this:
// cuz no one likes writing out 'rampart.utils.printf()'
rampart.globalize(rampart.utils);
var Sql = require("rampart-sql");
var sql = new Sql.init("~/citysearch/web_server/data/cities", true);
var csvFile = "../data/geonames-all-cities-with-a-population-1000.csv";
function import_data(){
function create_tmp_table() {
sql.exec("create table cities_tmp (" +
"Geoname_ID varchar(8), " +
"Name varchar(8), " +
"ASCII_Name varchar(8), " +
"Alternate_Names varchar(8), " +
"Feature_Class varchar(8), " +
"Feature_Code varchar(8), " +
"Country_Code varchar(8), " +
"Country_name_EN varchar(8), " +
"Country_Code_2 varchar(8), " +
"Admin1_Code varchar(8), " +
"Admin2_Code varchar(8), " +
"Admin3_Code varchar(8), " +
"Admin4_Code varchar(8), " +
"Population int, " +
"Elevation int, " +
"Digital_Elevation_Model int, " +
"Timezone varchar(8), " +
"Modification_date varchar(8), " +
"LABEL_EN varchar(8), " +
"Coordinates varchar(8)" +
");" );
}
var total=-1; //we won't know the total until we finish the first pass of importCsvFile
var step = 100; //set in importCsvFile(), only report every 100th row
/* a single function to monitor the import for both pre-processing (progressFunc)
and import (callback function supplied to sql.importCsvFile as a paramater) */
function monitor_import(count, stg) {
var stage = "Import";
if(count==0)
printf("\n");
if(stg!==undefined) // progressfunc
stage=stg;
if(stg === 0) //differentiate between 0 and undefined
{
total=count; //update our total in the first stage.
printf("Stage: %s, Count: %d \r", stage, count);
} else {
printf("Stage: %s, Count: %d of %d \r", stage, count, total);
}
fflush(stdout);
}
function import_csv() {
total = sql.importCsvFile(
csvFile, //file to import
{
tableName: 'cities_tmp',
singleQuoteNest: false,
hasHeaderRow: true,
delimiter: ';',
normalize: false,
callbackStep: step, //callback run every 100th row
progressStep: step, //progressfunc run every 100th row for each stage
progressFunc: monitor_import //progress function while processing csv
},
monitor_import //callback function upon actual import
);
printf('\n%d rows in total.\n',total);
}
function create_final_table() {
sql.exec("create table cities (" +
"id counter, " +
"place varchar(8), " +
"alt_names varchar(8), " +
"population int, " +
"latitude double, " +
"longitude double, " +
"geocode long, " +
"timezone varchar(8), " +
"country varchar(8) " +
");" );
}
function makerow(o) {
var ret={}, tmp;
ret.place = sprintf('%s, %s %s(%s)', o.Name, o.Admin1_Code, o.Country_name_EN, o.Country_Code);
ret.altNames = o.Alternate_Names;
ret.population = o.Population;
tmp = o.Coordinates.split(',');
ret.lat = parseFloat(tmp[0]);
ret.lon = parseFloat(tmp[1]);
ret.tz = o.Timezone;
ret.country = o.Country_name_EN;
return ret;
}
function build_final_table() {
printf("sorting rows\n");
sql.exec("select * from cities_tmp order by Population DESC",
function(res,i) {
if(!i) printf("done\nCreating Final Table\n");
var vals = makerow(res);
sql.exec("insert into cities values( " +
"counter, ?place, ?altNames, ?population, ?lat, ?lon, latlon2geocode(?lat, ?lon), ?tz, ?country );",
vals );
if (! (i % 100) ) {
printf("%d of %d\r", i, total);
fflush(stdout);
}
},
{maxRows:-1}
);
printf('\n');
}
function make_geocode_index() {
printf("creating index on geocode\n");
sql.exec("create index cities_geocode_x on cities(geocode) WITH INDEXMETER 'on';");
}
function make_id_index(){
printf("creating index on id\n");
sql.exec("create index cities_id_x on cities(id) WITH INDEXMETER 'on';");
}
function make_text_indexes() {
printf("creating indexes on place names\n");
// noiselist as detailed at https://rampart.dev/docs/sql-set.html#noiselist
// This is not English text and some geographic abbreviations like OR IN DO TO SO and US
// are also on the noise words list. Setting to empty will allow such words in the index.
sql.set({ noiseList:[]});
// make compact index. Sorting by population, not by likep rank. See like3 search below.
sql.exec("create fulltext index cities_place_ftx on cities(place)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
sql.exec("create fulltext index cities_altNames_ftx on cities(alt_names)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
}
function drop_tmp_table() {
sql.exec("drop table cities_tmp");
}
create_tmp_table();
import_csv();
create_final_table();
build_final_table();
make_geocode_index();
make_id_index();
make_text_indexes();
drop_tmp_table();
}
import_data();
Web Server Script¶
Web Server Script Mapping¶
We will start by adding to the apps/citysearch.js
file in order to map functions to urls for use with
Server
module. So we add the following stub script to what we have above to get started:
...
// var to hold the client-side javascript
var client_script;
// var to hold a format string containing the top of the page
// html and a place for the query string.
var pageTopFmt;
// the bottom of our html page
var pageBottom;
// function stubs
function htmlpage(req) {
}
function autocomp(req) {
}
// module and module.exports are set when called from the webserver
if(module && module.exports) {
// url to function mapping
module.exports= {
"/": htmlpage, //http://localhost:8088/apps/citysearch/
"/index.html": htmlpage, //http://localhost:8088/apps/citysearch/index.html
"/autocomp.json": autocomp, //http://localhost:8088/apps/citysearch/autocomp.json
}
} else {
// called from the command line. Build the database.
import_data();
}
Notice that module.exports
is set to an Object. This allows
a single module script to serve pages at multiple URLs. Here /
and /index.html
will be used to
return html and /autocomp.json
will be used for AJAX requests and will return JSON.
Delivering Static Content¶
Looking at the stub script above, there are two distinct times that the code therein will be
run: 1) once upon module load and 2) upon each request from a client as mapped in the
module.exports
Object. For security purposes, it is important to keep this in mind
as you write scripts. Variables set in the module outside of the htmlpage()
, and autocomp()
functions are long
lived and span multiple requests.
That being said, we can set a few variables in the script to deliver the static content. They only need to be loaded once as they contains the HTML and client side JavaScript that will be delivered to every client, and they do not change.
// the autocomplete plugin from https://github.com/devbridge/jQuery-Autocomplete
// jquery and plugin included from cloudflare in <script src="xyz"> tags below in pageTopFmt.
var client_script = `
$(document).ready(function(){
$('#cstextbox').autocomplete(
{
serviceUrl: '/apps/citysearch/autocomp.json',
minChars: 2,
autoSelectFirst: true,
showNoSuggestionNotice: true,
triggerSelectOnValidInput: false,
onSelect: function(sel) { window.location.assign("./?id="+sel.id); }
}
);
$('#cstextbox').on('keypress', function(e){
var key = e.charCode || e.keyCode || 0;
if (key == 13) { // on <return> don't submit form
e.preventDefault();
return false;
}
});
});
`;
// pageTopFmt is defined once upon script load here rather than upon each request in
// htmlpage() below. format code %w removes leading white space.
var pageTopFmt=sprintf('%w',`<!DOCTYPE HTML>
<html>
<head><meta charset="utf-8">
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.devbridge-autocomplete/1.4.11/jquery.autocomplete.min.js"></script>
<style>
body,h1,h2,h3,h4,h5,h6 {font-family: "Varela Round", Sans-Serif;}
.autocomplete-suggestions {border: 1px solid #999; background: #FFF; overflow: auto; width: auto !important; padding-right:5px;}
.autocomplete-suggestion { padding: 2px 5px; white-space: nowrap; overflow: hidden; }
.autocomplete-suggestions strong {font-weight: normal; color: #3399FF; }
.autocomplete-group strong { display: block; border-bottom: 1px solid #000; }
.autocomplete-selected { background: #F0F0F0; }
.autocomplete-group { padding: 2px 5px; }
#main {background-color: white;margin: auto;min-height: 300px;width: 600px;}
#idiv { width:500px;height:39px;border-bottom: lightGray 1px solid;padding:15px 0px 15px 0px;}
#cstextbox {min-width:150px;width:100%%;height:30px;font:normal 18px arial,sans-serif;padding: 1px 3px;border: 2px solid #ccc;box-sizing: bord
</style>
<title>City Search Tutorial</title>
</head>
<body>
<div id="main">
<form id="mf">
<div id="idiv">
<input type="text" id="cstextbox" name="q" value="%s" placeholder="Search for a city">
</div>
</form>
<div id="res">`);
var pageBottom = sprintf(`</div></body><script>
%w
</script></html>`, client_script);
The variable pageTopFmt
is set to a basic web page with a form and text box for
searching. The format code "%s'
in the input text box will be filled with the current query. Note
because this is a format string, all other literal percent signs (%
) must be escaped as
%%
.
In pageBottom, we will include our client-side JavaScript
from the variable
``client_script
.
Note the use of the format code %w
. This removes leading white space so that our source can be indented,
but we don’t send unneeded white space to the client.
We will use the text input in the form to search for cities and load the nearest cities,
displaying the results in the <div id="res"></div>
div at the bottom
of the page.
Next we need to fill in our actual functions that deliver the HTML as well as JSON via AJAX for a type-ahead search.
The Nearest City Results¶
The main page is delivered via normal HTTP request and returns results formatted as HTML. It
will display the nearest 30 cities to the one set in the query string parameter id
.
To do so, we need to construct an appropriate SQL statement that looks up our city by
id
.
SELECT place, latitude, longitude
FROM cities WHERE
id=?;
After retrieving the latitude and longitude of the current city, we will need another SQL
statement to find the closest cities to it. Here, ?lat
/?lon
correspond to the latitude
and longitude retrieved from the above SQL statement, while latitude
/longitude
correspond to the
latitude and longitude in the currently selected row.
SELECT
place, id, latitude, longitude, population,
DISTLATLON(?lat, ?lon, latitude, longitude) dist,
AZIMUTH2COMPASS( AZIMUTHLATLON(?lat, ?lon, latitude, longitude), 3 ) heading
FROM cities WHERE
geocode BETWEEN (SELECT LATLON2GEOCODEAREA(?lat, ?lon, 1.0))
ORDER BY 6 ASC;
Taking it line by line:
SELECT
- We are
looking up and returning rows in the table.
place
- Our
preformatted place name.
id, latitude, longitude, population
- Other columns we need.
DISTLATLON(?lat,
?lon, latitude, longitude) dist
- The distance from our current
city to the one in this row.
AZIMUTH2COMPASS(
AZIMUTHLATLON(?lat, ?lon, latitude, longitude), 3
) heading
– The heading (direction)
from our current city to the one in this row.
FROM cities WHERE
- The name of the table, and
WHERE
for the search
on the next line.
geocode BETWEEN (SELECT LATLON2GEOCODEAREA(?lat, ?lon, 1.0))
- select only rows within a certain dist from lat
/lon
(one degree).
ORDER BY 6 ASC
– order by the
6th selected field in the SQL statement. In this case that is dist
.
With that, we can write the function to find the closest cities to ?id=<cityid>
and format
the results in HTML.
var useKilometers = true;
var distconv = 1;
var distvar = "miles";
if(useKilometers) {
distconv=1.60934;
distvar = "kilometers";
}
function htmlpage(req) {
var id = req.params.id, lat, lon;
// check if we already have a place id.
if(id){
id_res= sql.one("SELECT place, latitude, longitude " +
"FROM cities WHERE id=?;",
[req.params.id]
);
// yes, then set lat,lon vars
if(id_res) {
lon=id_res.longitude;
lat=id_res.latitude;
}
} else {
// no, just print the blank search form
req.printf(pageTopFmt,''); // add top of page to return buffer without a query.
return({html:pageBottom}); // add bottom of page, return with 'content-type:text/html'
}
// what to do if the query_string id is not found in the db
if(!lon || !lat) {
req.printf(pageTopFmt,'');
req.printf('No entry for id "%s".', id);
return({html:pageBottom});
}
/* here we select rows based on their distance from the place specified by 'id',
calculate the distance and direction between id and the selected city,
then sort by the distance from 'id' (field 6 in our sql statement) */
res = sql.exec(`SELECT
place, id, latitude, longitude, population,
DISTLATLON(?lat, ?lon, latitude, longitude) dist,
AZIMUTH2COMPASS( AZIMUTHLATLON(?lat, ?lon, latitude, longitude), 3 ) heading
FROM cities WHERE geocode BETWEEN (SELECT LATLON2GEOCODEAREA(?lat, ?lon, 1.0))
ORDER BY 6 ASC;`,
{lat:lat, lon:lon},
{maxRows: 31}, // first row is same city
function(res, i) { // foreach city retrieved:
if(!i) {
// this is our 'id' city, as it is closest to itself.
req.printf(pageTopFmt,res.place);
req.printf('<h3 style="margin-bottom:0px">%s</h3><ul style="margin-top:0px">',res.place);
} else {
// all other nearby cities we will print the direction and distance:
req.printf('<a href="?id=%s">%s</a><br><ul>' +
'<li>Direction: %.2f %s to the %s</li>',
res.id, res.place, res.dist * distconv, distvar, res.heading);
}
// some useful information to go along with the city name
req.printf("<li>Population: %s</li>" +
'<li>Location: <a target="_blank" href="https://maps.google.com/maps?z=11&q=%U&ll=%f,%f">' +
'google maps (%.4f,%.4f)</a></li></ul>',
Sql.stringFormat('%ki', res.population), res.place, res.latitude, res.longitude , res.latitude, res.longitude);
if(!i) req.put('<hr><h3>Closest Cities:</h3>');
}
);
return {html:pageBottom}; //pageBottom is added to same buffer as is used with req.printf()
}
Also added is the variable useKilometers
, which will flag the conversion of miles to kilometers.
The Auto-Complete Results¶
The ajax portion of the script allows for suggestions/type-ahead to be displayed in the HTML
text box, making it easier to find a city name. This is done over an AJAX request, returning
JSON to the jQuery-Autocomplete
script included in the HTML from CloudFlare.
First step will be to construct our SQL statement and query in order to get a list of cities that are closest to a city or lat/lon requested.
When we made our Fulltext indexes on our table, we used the field place
for the primary search and
alt_names
for a
follow-up search, if necessary.
Our first search uses the following SQL
SELECT place value, id, latitude, longitude, population
FROM cities WHERE
place LIKE3 ? ;
Taking it line by line:
SELECT
- We are
looking up and returning rows in the table.
place
- Our
preformatted place name.
id, latitude, longitude, population
- Other columns we need.
FROM cities WHERE
- The name of the table, and
WHERE
for the search
on the next line.
place
- Specifying
the field upon which the Fulltext index is build.
LIKE3
- This
signifies a Fulltext search where word positions are not significant. Usually
LIKEP
will be your
most used type of like
search for normal Fulltext queries. However, here we made an index
without word positions and are pre-sort by population. Thus LIKE3
is appropriate. See
The documentation for more information.
The ?
corresponds to
a variable we will give sql.exec()
, as explained below.
The second SQL statement will be similar except it searches against the alt_names
column should no match
be found in the first query.
Finally the results are formatted as required by jQuery-Autocomplete
and returned
to the client as JSON.
We will also format the incoming query to allow for a partial string match. Using the example
of be
as a search,
our script will add a *
to it and pass it to sql.exec()
.
Texis’ Fulltext search will first search its index for all words beginning with be
and add them to the query.
Since this could potentially be thousands of words added to our query, we need to make a few
adjustments to the default limits.
sql.set({
'qMaxWords' : 5000, // allow query and sets to be larger than normal for '*' wildcard searches
'qMaxSetWords' : 5000 // see https://rampart.dev/docs/sql-set.html#qmaxsetwords
// and https://rampart.dev/docs/sql-set.html#qmaxwords .
});
We also want to continue to ignore noise words.
sql.set({
noiseList : [], // allow search for 'the', 'us', 'or', etc.
'qMaxWords' : 5000, // allow query and sets to be larger than normal for '*' wildcard searches
'qMaxSetWords' : 5000 // see https://rampart.dev/docs/sql-set.html#qmaxsetwords
// and https://rampart.dev/docs/sql-set.html#qmaxwords .
});
Our completed AJAX function now looks like this:
// For autocomp. This needs to be set only once
sql.set({
noiseList : [], // allow search for 'the', 'us', 'or', etc.
'qMaxWords' : 5000, // allow query and sets to be larger than normal for '*' wildcard searches
'qMaxSetWords' : 5000 // see https://rampart.dev/docs/sql-set.html#qmaxsetwords
// and https://rampart.dev/docs/sql-set.html#qmaxwords .
});
/* autocomp() results must be formatted as such:
{
"suggestions": [
{"value":"Vaulion, Canton de Vaud, CH","id":"6233eaf65bd","latitude":46.6848,"longitude":6.3832, ...},
{"value":"Vallorbe, Canton de Vaud, CH","id":"6233eaf65c6","latitude":46.7078,"longitude":6.3714, ...},
...
]
}
*/
function autocomp(req) {
var res;
var q = req.query.query;
// remove any spaces at the beginning of q
q = q.replace(/^\s+/, '');
// if query is only one char, return an empty set
// (even though client-side autocomplete is set to 2 char min)
if(q.length<2)
return {json: { "suggestions": []}}
// we will need at least two chars in our last word since it will get a '*' wildcard added to it
q = q.replace(/ \S$/, ' ');
// if last character is not a space, add wildcard
if(q.charAt(q.length-1) != ' ')
q += '*';
// perform a like3 (no rank) pre-sorted by pop text search, and return a list of best matching locations
res = sql.exec("SELECT place value, id, latitude, longitude, population FROM cities WHERE "+
"place LIKE3 ? ;", [q] );
//if no results, try again using alt_names
if(res.rowCount == 0) {
res = sql.exec("SELECT place value, alt_names,id, latitude, longitude, population FROM cities WHERE " +
"alt_names LIKE3 ? ;", [q] );
// add alt name to "value" for type ahead display
for (var i=0; i<res.rows.length;i++) {
var row = res.rows[i];
var ql = req.query.query.toLowerCase();
var anames = row.alt_names.split(',');
for (var j=0; j<anames.length;j++) {
var aname = anames[j].toLowerCase();
if(aname.indexOf(ql) > -1) {
row.value += ' (aka: ' + aname + ')';
break;
}
}
}
}
return {json: { "suggestions": res.rows}};
}
The Complete Server-Side Script¶
...
var useKilometers = true;
var distconv = 1;
var distvar = "miles";
if(useKilometers) {
distconv=1.60934;
distvar = "kilometers";
}
function htmlpage(req) {
var id = req.params.id, lat, lon;
// check if we already have a place id.
if(id){
id_res= sql.one("SELECT place, latitude, longitude " +
"FROM cities WHERE id=?;",
[req.params.id]
);
// yes, then set lat,lon vars
if(id_res) {
lon=id_res.longitude;
lat=id_res.latitude;
}
} else {
// no, just print the blank search form
req.printf(pageTopFmt,''); // add top of page to return buffer without a query.
return({html:pageBottom}); // add bottom of page, return with 'content-type:text/html'
}
// what to do if the query_string id is not found in the db
if(!lon || !lat) {
req.printf(pageTopFmt,'');
req.printf('No entry for id "%s".', id);
return({html:pageBottom});
}
/* here we select rows based on their distance from the place specified by 'id',
calculate the distance and direction between id and the selected city,
then sort by the distance from 'id' (field 6 in our sql statement) */
res = sql.exec(`SELECT
place, id, latitude, longitude, population,
DISTLATLON(?lat, ?lon, latitude, longitude) dist,
AZIMUTH2COMPASS( AZIMUTHLATLON(?lat, ?lon, latitude, longitude), 3 ) heading
FROM cities WHERE geocode BETWEEN (SELECT LATLON2GEOCODEAREA(?lat, ?lon, 1.0))
ORDER BY 6 ASC;`,
{lat:lat, lon:lon},
{maxRows: 31}, // first row is same city
function(res, i) { // foreach city retrieved:
if(!i) {
// this is our 'id' city, as it is closest to itself.
req.printf(pageTopFmt,res.place);
req.printf('<h3 style="margin-bottom:0px">%s</h3><ul style="margin-top:0px">',res.place);
} else {
// all other nearby cities we will print the direction and distance:
req.printf('<a href="?id=%s">%s</a><br><ul>' +
'<li>Direction: %.2f %s to the %s</li>',
res.id, res.place, res.dist * distconv, distvar, res.heading);
}
// some useful information to go along with the city name
req.printf("<li>Population: %s</li>" +
'<li>Location: <a target="_blank" href="https://maps.google.com/maps?z=11&q=%U&ll=%f,%f">' +
'google maps (%.4f,%.4f)</a></li></ul>',
Sql.stringFormat('%ki', res.population), res.place, res.latitude, res.longitude , res.latitude, res.longitude);
if(!i) req.put('<hr><h3>Closest Cities:</h3>');
}
);
return {html:pageBottom}; //pageBottom is added to same buffer as is used with req.printf()
}
// For autocomp. This needs to be set only once
sql.set({
noiseList : [], // allow search for 'the', 'us', 'or', etc.
'qMaxWords' : 5000, // allow query and sets to be larger than normal for '*' wildcard searches
'qMaxSetWords' : 5000 // see https://rampart.dev/docs/sql-set.html#qmaxsetwords
// and https://rampart.dev/docs/sql-set.html#qmaxwords .
});
/* autocomp() results must be formatted as such:
{
"suggestions": [
{"value":"Vaulion, Canton de Vaud, CH","id":"6233eaf65bd","latitude":46.6848,"longitude":6.3832, ...},
{"value":"Vallorbe, Canton de Vaud, CH","id":"6233eaf65c6","latitude":46.7078,"longitude":6.3714, ...},
...
]
}
*/
function autocomp(req) {
var res;
var q = req.query.query;
// remove any spaces at the beginning of q
q = q.replace(/^\s+/, '');
// if query is only one char, return an empty set
// (even though client-side autocomplete is set to 2 char min)
if(q.length<2)
return {json: { "suggestions": []}}
// we will need at least two chars in our last word since it will get a '*' wildcard added to it
q = q.replace(/ \S$/, ' ');
// if last character is not a space, add wildcard
if(q.charAt(q.length-1) != ' ')
q += '*';
// perform a like3 (no rank) pre-sorted by pop text search, and return a list of best matching locations
res = sql.exec("SELECT place value, id, latitude, longitude, population FROM cities WHERE "+
"place LIKE3 ? ;", [q] );
//if no results, try again using alt_names
if(res.rowCount == 0) {
res = sql.exec("SELECT place value, alt_names,id, latitude, longitude, population FROM cities WHERE " +
"alt_names LIKE3 ? ;", [q] );
// add alt name to "value" for type ahead display
for (var i=0; i<res.rows.length;i++) {
var row = res.rows[i];
var ql = req.query.query.toLowerCase();
var anames = row.alt_names.split(',');
for (var j=0; j<anames.length;j++) {
var aname = anames[j].toLowerCase();
if(aname.indexOf(ql) > -1) {
row.value += ' (aka: ' + aname + ')';
break;
}
}
}
}
return {json: { "suggestions": res.rows}};
}
The Complete Script¶
We now have all we need to perform the autocomplete search and nearest city search. This is the
final script which, as layed out may be accessed at http://localhost:8088/apps/citysearch/
. It will also build the database when
run from the command line as rampart citysearch.js
.
// cuz no one likes writing out 'rampart.utils.printf()'
rampart.globalize(rampart.utils);
var Sql = require("rampart-sql");
var sql = new Sql.init("~/citysearch/web_server/data/cities", true);
var csvFile = "../data/geonames-all-cities-with-a-population-1000.csv";
function import_data(){
function create_tmp_table() {
sql.exec("create table cities_tmp (" +
"Geoname_ID varchar(8), " +
"Name varchar(8), " +
"ASCII_Name varchar(8), " +
"Alternate_Names varchar(8), " +
"Feature_Class varchar(8), " +
"Feature_Code varchar(8), " +
"Country_Code varchar(8), " +
"Country_name_EN varchar(8), " +
"Country_Code_2 varchar(8), " +
"Admin1_Code varchar(8), " +
"Admin2_Code varchar(8), " +
"Admin3_Code varchar(8), " +
"Admin4_Code varchar(8), " +
"Population int, " +
"Elevation int, " +
"Digital_Elevation_Model int, " +
"Timezone varchar(8), " +
"Modification_date varchar(8), " +
"LABEL_EN varchar(8), " +
"Coordinates varchar(8)" +
");" );
}
var total=-1; //we won't know the total until we finish the first pass of importCsvFile
var step = 100; //set in importCsvFile(), only report every 100th row
/* a single function to monitor the import for both pre-processing (progressFunc)
and import (callback function supplied to sql.importCsvFile as a paramater) */
function monitor_import(count, stg) {
var stage = "Import";
if(count==0)
printf("\n");
if(stg!==undefined) // progressfunc
stage=stg;
if(stg === 0) //differentiate between 0 and undefined
{
total=count; //update our total in the first stage.
printf("Stage: %s, Count: %d \r", stage, count);
} else {
printf("Stage: %s, Count: %d of %d \r", stage, count, total);
}
fflush(stdout);
}
function import_csv() {
total = sql.importCsvFile(
csvFile, //file to import
{
tableName: 'cities_tmp',
singleQuoteNest: false,
hasHeaderRow: true,
delimiter: ';',
normalize: false,
callbackStep: step, //callback run every 100th row
progressStep: step, //progressfunc run every 100th row for each stage
progressFunc: monitor_import //progress function while processing csv
},
monitor_import //callback function upon actual import
);
printf('\n%d rows in total.\n',total);
}
function create_final_table() {
sql.exec("create table cities (" +
"id counter, " +
"place varchar(8), " +
"alt_names varchar(8), " +
"population int, " +
"latitude double, " +
"longitude double, " +
"geocode long, " +
"timezone varchar(8), " +
"country varchar(8) " +
");" );
}
function makerow(o) {
var ret={}, tmp;
ret.place = sprintf('%s, %s %s(%s)', o.Name, o.Admin1_Code, o.Country_name_EN, o.Country_Code);
ret.altNames = o.Alternate_Names;
ret.population = o.Population;
tmp = o.Coordinates.split(',');
ret.lat = parseFloat(tmp[0]);
ret.lon = parseFloat(tmp[1]);
ret.tz = o.Timezone;
ret.country = o.Country_name_EN;
return ret;
}
function build_final_table() {
printf("sorting rows\n");
sql.exec("select * from cities_tmp order by Population DESC",
function(res,i) {
if(!i) printf("done\nCreating Final Table\n");
var vals = makerow(res);
sql.exec("insert into cities values( " +
"counter, ?place, ?altNames, ?population, ?lat, ?lon, latlon2geocode(?lat, ?lon), ?tz, ?country );",
vals );
if (! (i % 100) ) {
printf("%d of %d\r", i, total);
fflush(stdout);
}
},
{maxRows:-1}
);
printf('\n');
}
function make_geocode_index() {
printf("creating index on geocode\n");
sql.exec("create index cities_geocode_x on cities(geocode) WITH INDEXMETER 'on';");
}
function make_id_index(){
printf("creating index on id\n");
sql.exec("create index cities_id_x on cities(id) WITH INDEXMETER 'on';");
}
function make_text_indexes() {
printf("creating indexes on place names\n");
// noiselist as detailed at https://rampart.dev/docs/sql-set.html#noiselist
// This is not English text and some geographic abbreviations like OR IN DO TO SO and US
// are also on the noise words list. Setting to empty will allow such words in the index.
sql.set({ noiseList:[]});
// make compact index. Sorting by population, not by likep rank. See like3 search below.
sql.exec("create fulltext index cities_place_ftx on cities(place)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
sql.exec("create fulltext index cities_altNames_ftx on cities(alt_names)"+
" WITH WORDEXPRESSIONS ('[\\alnum\\x80-\\xFF]{2,99}') INDEXMETER 'on' WORDPOSITIONS 'off';");
}
function drop_tmp_table() {
sql.exec("drop table cities_tmp");
}
create_tmp_table();
import_csv();
create_final_table();
build_final_table();
make_geocode_index();
make_id_index();
make_text_indexes();
drop_tmp_table();
}
var useKilometers = true;
var distconv = 1;
var distvar = "miles";
if(useKilometers) {
distconv=1.60934;
distvar = "kilometers";
}
// the autocomplete plugin from https://github.com/devbridge/jQuery-Autocomplete
// jquery and plugin included from cloudflare in <script src="xyz"> tags below in pageTopFmt.
var client_script = `
$(document).ready(function(){
$('#cstextbox').autocomplete(
{
serviceUrl: '/apps/citysearch/autocomp.json',
minChars: 2,
autoSelectFirst: true,
showNoSuggestionNotice: true,
triggerSelectOnValidInput: false,
onSelect: function(sel) { window.location.assign("./?id="+sel.id); }
}
);
$('#cstextbox').on('keypress', function(e){
var key = e.charCode || e.keyCode || 0;
if (key == 13) { // on <return> don't submit form
e.preventDefault();
return false;
}
});
});
`;
// pageTopFmt is defined once upon script load here rather than upon each request in
// htmlpage() below. format code %w removes leading white space.
var pageTopFmt=sprintf('%w',`<!DOCTYPE HTML>
<html>
<head><meta charset="utf-8">
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery.devbridge-autocomplete/1.4.11/jquery.autocomplete.min.js"></script>
<style>
body,h1,h2,h3,h4,h5,h6 {font-family: "Varela Round", Sans-Serif;}
.autocomplete-suggestions {border: 1px solid #999; background: #FFF; overflow: auto; width: auto !important; padding-right:5px;}
.autocomplete-suggestion { padding: 2px 5px; white-space: nowrap; overflow: hidden; }
.autocomplete-suggestions strong {font-weight: normal; color: #3399FF; }
.autocomplete-group strong { display: block; border-bottom: 1px solid #000; }
.autocomplete-selected { background: #F0F0F0; }
.autocomplete-group { padding: 2px 5px; }
#main {background-color: white;margin: auto;min-height: 300px;width: 600px;}
#idiv { width:500px;height:39px;border-bottom: lightGray 1px solid;padding:15px 0px 15px 0px;}
#cstextbox {min-width:150px;width:100%%;height:30px;font:normal 18px arial,sans-serif;padding: 1px 3px;border: 2px solid #ccc;box-sizing: bord
</style>
<title>City Search Tutorial</title>
</head>
<body>
<div id="main">
<form id="mf">
<div id="idiv">
<input type="text" id="cstextbox" name="q" value="%s" placeholder="Search for a city">
</div>
</form>
<div id="res">`);
var pageBottom = sprintf(`</div></body><script>
%w
</script></html>`, client_script);
function htmlpage(req) {
var id = req.params.id, lat, lon;
// check if we already have a place id.
if(id){
id_res= sql.one("SELECT place, latitude, longitude " +
"FROM cities WHERE id=?;",
[req.params.id]
);
// yes, then set lat,lon vars
if(id_res) {
lon=id_res.longitude;
lat=id_res.latitude;
}
} else {
// no, just print the blank search form
req.printf(pageTopFmt,''); // add top of page to return buffer without a query.
return({html:pageBottom}); // add bottom of page, return with 'content-type:text/html'
}
// what to do if the query_string id is not found in the db
if(!lon || !lat) {
req.printf(pageTopFmt,'');
req.printf('No entry for id "%s".', id);
return({html:pageBottom});
}
/* here we select rows based on their distance from the place specified by 'id',
calculate the distance and direction between id and the selected city,
then sort by the distance from 'id' (field 6 in our sql statement) */
res = sql.exec(`SELECT
place, id, latitude, longitude, population,
DISTLATLON(?lat, ?lon, latitude, longitude) dist,
AZIMUTH2COMPASS( AZIMUTHLATLON(?lat, ?lon, latitude, longitude), 3 ) heading
FROM cities WHERE geocode BETWEEN (SELECT LATLON2GEOCODEAREA(?lat, ?lon, 1.0))
ORDER BY 6 ASC;`,
{lat:lat, lon:lon},
{maxRows: 31}, // first row is same city
function(res, i) { // foreach city retrieved:
if(!i) {
// this is our 'id' city, as it is closest to itself.
req.printf(pageTopFmt,res.place);
req.printf('<h3 style="margin-bottom:0px">%s</h3><ul style="margin-top:0px">',res.place);
} else {
// all other nearby cities we will print the direction and distance:
req.printf('<a href="?id=%s">%s</a><br><ul>' +
'<li>Direction: %.2f %s to the %s</li>',
res.id, res.place, res.dist * distconv, distvar, res.heading);
}
// some useful information to go along with the city name
req.printf("<li>Population: %s</li>" +
'<li>Location: <a target="_blank" href="https://maps.google.com/maps?z=11&q=%U&ll=%f,%f">' +
'google maps (%.4f,%.4f)</a></li></ul>',
Sql.stringFormat('%ki', res.population), res.place, res.latitude, res.longitude , res.latitude, res.longitude);
if(!i) req.put('<hr><h3>Closest Cities:</h3>');
}
);
return {html:pageBottom}; //pageBottom is added to same buffer as is used with req.printf()
}
// For autocomp. This needs to be set only once
sql.set({
noiseList : [], // allow search for 'the', 'us', 'or', etc.
'qMaxWords' : 5000, // allow query and sets to be larger than normal for '*' wildcard searches
'qMaxSetWords' : 5000 // see https://rampart.dev/docs/sql-set.html#qmaxsetwords
// and https://rampart.dev/docs/sql-set.html#qmaxwords .
});
/* autocomp() results must be formatted as such:
{
"suggestions": [
{"value":"Vaulion, Canton de Vaud, CH","id":"6233eaf65bd","latitude":46.6848,"longitude":6.3832, ...},
{"value":"Vallorbe, Canton de Vaud, CH","id":"6233eaf65c6","latitude":46.7078,"longitude":6.3714, ...},
...
]
}
*/
function autocomp(req) {
var res;
var q = req.query.query;
// remove any spaces at the beginning of q
q = q.replace(/^\s+/, '');
// if query is only one char, return an empty set
// (even though client-side autocomplete is set to 2 char min)
if(q.length<2)
return {json: { "suggestions": []}}
// we will need at least two chars in our last word since it will get a '*' wildcard added to it
q = q.replace(/ \S$/, ' ');
// if last character is not a space, add wildcard
if(q.charAt(q.length-1) != ' ')
q += '*';
// perform a like3 (no rank) pre-sorted by pop text search, and return a list of best matching locations
res = sql.exec("SELECT place value, id, latitude, longitude, population FROM cities WHERE "+
"place LIKE3 ? ;", [q] );
//if no results, try again using alt_names
if(res.rowCount == 0) {
res = sql.exec("SELECT place value, alt_names,id, latitude, longitude, population FROM cities WHERE " +
"alt_names LIKE3 ? ;", [q] );
// add alt name to "value" for type ahead display
for (var i=0; i<res.rows.length;i++) {
var row = res.rows[i];
var ql = req.query.query.toLowerCase();
var anames = row.alt_names.split(',');
for (var j=0; j<anames.length;j++) {
var aname = anames[j].toLowerCase();
if(aname.indexOf(ql) > -1) {
row.value += ' (aka: ' + aname + ')';
break;
}
}
}
}
return {json: { "suggestions": res.rows}};
}
// module and module.exports are set when called from the webserver
if(module && module.exports) {
// url to function mapping
module.exports= {
"/": htmlpage, //http://localhost:8088/apps/citysearch/
"/index.html": htmlpage, //http://localhost:8088/apps/citysearch/index.html
"/autocomp.json": autocomp, //http://localhost:8088/apps/citysearch/autocomp.json
}
} else {
// called from the command line. Build the database.
import_data();
}
Enjoy.