Transcript
00:01My name is Bill Greenwood, and I'm from the Center for Spatial Analysis, the University of Oklahoma...
00:05...and this afternoon I'm talk to you a little bit about some of the challenges that I faced as an undergraduate student...
00:10...developing an application using a graduate student's data.
00:15And it's an application for the Central Oklahoma Work force Investment Board.
00:21And what this is for, this is for economic planning for work force management.
00:25It's for economic developers to have the ability to go into the application, and there's various things they can do on this.
00:33They can look at...there are several widgets that go with this.
00:37They can analyze population.
00:38They can analyze demographic information, anything from the education level, income, various things like that.
00:53So, basically, we had a few challenges with this project due to the size of the datasets.
01:00So for the population analysis, we were...the sponsors of the project wanted the user to be able to input a point or a line...
01:12...and then buffer that line and get the population within that area.
01:16And this was for finding potential workers, clients for businesses, so that if somebody was wanting to bring a business...
01:23...to Central Oklahoma, they could find out what the population was and be able to plan accordingly.
01:30The second was an origin-destination tracker.
01:33And what this tool does is, you select a census block or a census county or a city, and there's a couple options you can do on this.
01:44One is, you select the workplace and then it finds all the census blocks where the people live...
01:50...and gives you a drive time from centroid to centroid.
01:54The other option is, you select the...where the people live, and it gives you a...it tells you where all these people work...
02:03...and again gives you a distance from centroid of the location to the centroid of where they work.
02:11And this is good for city planners device, carpooling, to find out how far people are willing to drive...
02:20...so if you've got cheaper property someplace, but your work force isn't willing to drive...
02:26...then obviously you'd want to not put your business in that location.
02:33So the main thing was these large datasets.
02:36It's for a 10-county region in Central Oklahoma.
02:39Now there's 34,372 census blocks in this 10-county area.
02:45And we need to deal with all the data with those census blocks.
02:49For the origin-destination tracker, there's 580,000 origin-destination records within this table.
02:56And so, dealing with that data back and forth between the server and the client machine presented a challenge for us.
03:05We built this using the Esri API for Flex 2.4, Flex the Flash Builder 4.5, and Flex Viewer 2.4.
03:15We have some of the services running on a Java server and some of them running on a .NET.
03:19The actual geoprocessing service is run on a .NET server.
03:26So first we'll talk about the population analysis widget.
03:30And what this is for is to find population within a user-specified area.
03:34We also wanted to be able to compare population between two different areas.
03:39They also wanted to be able to have somebody input an address, use that address as input for their buffer...
03:44...so that they could find the population around an address.
03:49So, basically, here's the user interface for the analysis widget.
03:57This is for doing your addresses, putting a point, line, typical stuff you'd see on a Flex widget.
04:06So the biggest challenge with this was, how to account for blocks that get partially selected...
04:14...for looking for the most accurate population count.
04:17So one of the possible solutions was, just use the total population by just using a straight intersect.
04:23If the buffer overlapped or intersected that census block, count it.
04:28Count a hundred percent of the population.
04:30In cases like the example in the lower corner down there, you'd see that you only have a small portion of that census block...
04:40...within the boundaries of the buffer, so that would generate a large overage, which would be common doing it this way.
04:52So, and in some instances like there, you'd have an extremely high estimation.
04:57So the other option is to generate a tool or a model that would locate the population according to the census block.
05:06So if there's a city in the census block, and most of the population's in that city...
05:11...then you would weight it towards that side of the census block.
05:16That gets extremely labor intensive to do.
05:20You would get better results. There are actually some of that data already prepared, but is expensive to use.
05:28And with this, we were not wanting to go into the expense of that.
05:33So then the second or the third option was to assume a uniform population density.
05:39This you would find a better estimate than using the first option.
05:44It's faster to implement, but you could have an either an over- or an underestimation due to the density...
05:51...assuming that all of its people are spread out evenly.
05:56So we chose option three due to easier to implement, more accurate results than one and less expensive than option two.
06:07So we have a feature class that has a field in there that has the original area.
06:15And then we have a...in the workflow we clip that, based off the user inputted buffer, and then...
06:22...we do a calculation on the new area and get a percentage.
06:28So that would...so if it's 50 percent, we would take 50 percent of that population.
06:33And then, we return a...just a table does a summary statistics, and we return one record back to the server, or back from the server.
06:43This reduced the amount of processing required on the client side, and it meant that all the data could stay at the server...
06:51...as opposed to having to send it back and forth.
06:55Unfortunately, this created a new challenge.
06:59That new challenge was, on the server we were having issues with the Clip tool being able to clip the densely...
07:08...the areas that had dense census blocks, so like right around Oklahoma City, where the population blocks are extremely dense...
07:17...we were having trouble with the server actually...it was failing in the clip procedure on the server itself.
07:24So we tried moving it to the .NET platform, which we got the same results as on the Java platform.
07:35We were doing some research, and were able to find that we could increase the file size for geoprocessing operations.
07:42We weren't able to find this option on the Java server, so this is actually housed on the...
07:48...this whole geoprocessing operation runs on the .NET server.
07:53But we still had an issue with, when we got to a certain size, the .NET server failed.
07:59It was when we got...when the server became...or when buffer became too large, the operation just...
08:06...we got a C file error within the server, and checking on that, we found that that had to do with the actual file size.
08:15Well, we'd already increased the file size to larger than anything we had clipped using the Desktop.
08:22So we took the same dataset, used ArcGIS Desktop, clipped it to a normal shapefile, and then got the file size.
08:33And we increased our file size on the .NET server to larger than that.
08:38We still had issues.
08:40So we ended up having to develop a hybrid solution for this one.
08:44So we had data...different levels of granularity...
08:49...and then we had to have a programming solution to handle the two different datasets.
08:54So what we did is, we ended up preparing two different datasets, one at the census block level and one at the census block group level.
09:04And with the...when you get to a larger area, the granularity of your data becomes less important.
09:11You don't need the census blocks when you start getting such large areas.
09:17So we ended up publishing a geoprocessing service with two different tools in it, one for each dataset...
09:24...and, as we can see here.
09:28So now we've got two geoprocessing services.
09:32We had to implement in the code a way to handle that so that it could find out, okay, which one should I use.
09:40Well, through a series of trials and errors using geometry tool to calculate the buffer area, several iterations going through...
09:53...found that 226, approximately 227, square miles was the largest area that the...could be handled by census blocks on the server...
10:03...at which point we had to switch to census block groups, and then we were able to clip the entire 10-county...
10:09...actually, clip the entire 10-county area using the census blocks at that point.
10:14So within the code, what we do is, we check for our maximum block, the maximum buffer size, or maximum area size...
10:24...for use for census blocks, and if it's less than that, then we use the census blocks.
10:29If it's greater, then we change and use the census block groups.
10:35Now this also created a problem, because what data are we using?
10:39When you start talking about census data, people are interested, okay, what are you using?
10:43Are you using census blocks? Census block groups?
10:45So we actually had to, on the tool, there's a return to the user that lets them know what data was used in performing this operation.
10:55Okay. So here's what a result looks like.
11:01So you can see that it tells the user it's based off of census blocks, and the other one is based off of census block groups...
11:09...when we got to a larger file, so we got to a larger buffer size.
11:14And they also wanted to be able to do comparisons, so there's two different options, you can run these two independently. Alright.
11:21So that was...we kind of had to do a hybrid solution for this one.
11:27So the other tool was with the origin-destination tracker.
11:33This one had the large set of records that went with it, and this, again...
11:38...like I explained, was to find the relationship between residents and workplace locations.
11:44So there's six options for the user to make.
11:46One is to use...it's basically three different categories.
11:50You can use census blocks.
11:51You can use the city, or you can search by an entire county.
11:57And then they select a location on the map, and the appropriate census blocks are returned visually for them...
12:05...with a count that are within the area of interest.
12:10So, like I said earlier, or 34,000, over 334,000 polygons, and what we did is we created...
12:19...just had a feature class that all it had is a geoID from the TIGER/Line files.
12:26They all have a geoID, and then we had a table.
12:30We set up a separate table that had all of our 500,000 plus records in it.
12:36So we actually on this one, for the county...for Oklahoma County, there's 377,000 records that could potentially be returned.
12:49So that created an issue with getting the data back.
12:57So we switched to the table, and doing it from a table, it's almost...don't really notice that it does it.
13:05It's that fast.
13:07But retrieving the features is where we were having an issue.
13:11I've been able to get the server count up to 100,000 and be able to successfully return that.
13:17But it doesn't always operate. Sometimes it fails, sometimes it's fine.
13:22I found issues with if your Internet's not as fast, then the client either times out or the server times out...
13:31...even increasing the times on the server.
13:35And then you run into the issue of the user.
13:38So the user doesn't want to sit there and wait all this time for something to come back.
13:42They're expecting more of a responsive application.
13:48They want to be able to click and almost instantly get results.
13:53We weren't able to, depending on the Internet connection, we still weren't able to come up with that.
13:58It's...we've got it to where it's much faster than what it was.
14:03So some of the things we considered.
14:06If we could limit the extent.
14:09So we could say that only get the results that are showing in the current map extent.
14:13Well, that doesn't display an accurate picture if you're trying to do economic planning...
14:18...because you want to know everything.
14:19You don't want to know just the small area that you're in.
14:22So the other option was to split the data into a feature class and a table of results...
14:28...and then that actually requires two different queries at that point, or we could place all the information into one feature class...
14:39...which would mean we'd have numerous duplicates of our features...of our geometry.
14:46And returning geometry back from the server is extremely slow.
14:51So we didn't want to do that.
14:55So what we ended up doing was solution two.
14:58So we have a table that has all of our records that have a geoID, one geoID for their workplace, one geoID for their home...
15:08...and then a...just a feature class that just has the geoID in it.
15:15So the workflow in this one is, we have our census blocks that do an identify to find out which census block the user selected...
15:27...or they could have selected a county or a city.
15:30That returns a feature set with a geoID.
15:33We used that geoID to query the table, and then we queried back again on the census blocks...
15:41...to get our feature set of census blocks back from the server.
15:47So, and this we had to set up four loops to limit the number of records and do that based off the object ID.
15:54So what we do here is we have the four loop there and we increment...
15:59...and we can change that increment based off of basically performance.
16:06So if we want to increase performance, we can actually...At one point in time...
16:11...we actually had it set up to where it did queries based off the...a query for each object that was returned.
16:19So, of course, as you can imagine, that's several thousand hits to the server for a query.
16:27There's 34,000 records there, so hitting that many times on the server could increase the server load.
16:35So what we've done is, again, set up where we actually query by an object ID, and we return the object IDs incrementally...
16:43...so that we can keep track of where we're at and just go through the entire feature set.
16:48So this actually required a third query, and we actually do a query for count, so that we know how many records.
16:54So if we change the dataset, let's say they want to add on another county...
16:58...we can easily add another county to the list, and I don't have to change anything in the code besides this number here, possibly...
17:06...if we decide on performance now.
17:10Running through testing here with it, I was doing a bit of playing around with it...
17:13...I found that actually, with the slower Internet here, I was having issues.
17:19So actually changing that number, decreasing that number, made things a little bit better.
17:24It's a matter of finding the best number for you...the best number for performance.
17:31All the testing I did was on a, you know, a high-speed Internet.
17:37The majority of it was...I was sitting right there at the same location with the server or on the same network as the server...
17:43...and I didn’t have an issue returning back all 34,000 records.
17:49It was fast, but when I got here, it was even more issue than...we have...some of the people don't have the higher Internet speed...
18:02...and they were complaining about the performance, so we had to choose a way to implement this.
18:11So this is the results from the origin-destination tracker.
18:16The red census block is the census block that was selected by the user...
18:21...and the blue census block, or the blue census blocks, are the returns.
18:26Those are the where...so that's...the red one is where the people work, and the blue one is where all the people live.
18:35We get a few things back on this.
18:37We get a number of...a total number of results, total number of blocks, so that will tell us that, okay...
18:43...we have for, like carpooling purposes on this one.
18:47Well, all 1,100 people live in a different census block, so none of them are neighbors...
18:54...and then how many are in the area of interest, okay, the Central Oklahoma Work force Investment Board area...
18:59...their 10-county area, and we see 907.
19:02So that gives them an index as well.
19:04That means people are driving from outside of this 10-county area to work inside this area.
19:17Okay. So some stuff that I've been working on to try to increase the performance on this...
19:23...is working with relationship classes.
19:26And what this would do, we set up a relationship class inside the geodatabase.
19:29It would link the feature class with the table.
19:35As of yet, I haven't had much luck getting the query to actually work from within Flex, from within the Flex API.
19:42The other option's to try to do a geoprocessing solution.
19:45But, again, with that, you might have issues on the server.
19:52The other problem with that would be trying to, because you're going to get back several duplicate records.
20:00So that means the server, again, is going to find a whole bunch of duplicate and send those duplicates back...
20:06...and that would slow down the application again.
20:09So this is just some possibilities that I'm checking into to see, you know, how this will work if we get it to work...
20:18So...
20:20...but dealing with the large datasets has been quite a challenge for us with this project.
20:27So that's pretty much all I had.
20:32I can do a live demo if somebody wants kind of wants to see how that works.
20:38I had allotted for time in that.
20:41I can show you the difference between being connected remotely.
20:49So this is actually...I am on the...I am actually connected to my machine at OU, and if I can get it to agree...
21:13...you see kind of the performance differences I was dealing with here.
21:25So right now, this is set to return a hundred thousand results at a time, and you'll see that actually connected right to the server...
21:45...or right in on that same LAN, it's...
21:58...it's actually quite a bit faster than dealing with it on the network here, and we're having...do the remote desktop connections...
22:12...Internet again. Internet here is kind of...so sending the graphics back and forth is a problem.
22:21[Inaudible audience comment]
22:24I'm sending the queries in loops.
22:25So what it does is, it sends whatever I set that number to in the four-loop, so whatever I set...
22:43So whatever I set this number right here to, is how many results it's going to get back each time.
22:53[Inaudible audience question]
23:03Yes. 'Cause object ID is automatically generated on the feature class.
23:09So what I do is, I have a query that, up here further...
23:27So right here what I do is, I do a query task, and I execute it for the count.
23:32And what that does is, it gives me a count so I know, and then actually at the very beginning in my initialization...
23:42...I do a query for count to get the total number of features in that feature class.
23:46So regardless of which feature class it's looking at, if I add data, subtract data, it will know that.
23:52So and then in the four-loop, it puts that number into a...the results and add it, puts it into a number of census blocks variable...
24:04...and then when I run my four-loop down here, the four-loop is set to run to less than the number of census blocks...
24:17...because, of course, your object ID starts at zero, so you want to be less than that.
24:24So by doing it that way, I can...
24:30[Inaudible audience question]
24:46These are pretty much fixed records.
24:48They're census blocks
24:49So, though they don't change until the new census data comes out, at which point it would be another set published to the SDE...
24:57...so we'd have the same starting feature ID, ending feature ID.
25:00We wouldn't be removing and adding at that point.
25:02This is a very static dataset.
25:06The table might change, but the table isn't going to matter for this.
25:11So, any other questions?
Challenges of Developing Data Intensive Web GIS Applications
William Greenwood of the University of Oklahoma discuss the challenges of building web applications.
- Recorded: Mar 28th, 2012
- Runtime: 25:14
- Views: 686
- Published: Apr 30th, 2012
- Night Mode (Off)Automatically dim the web site while the video is playing. A few seconds after you start watching the video and stop moving your mouse, your screen will dim. You can auto save this option if you login.
- HTML5 Video (Off) Play videos using HTML5 Video instead of flash. A modern web browser is required to view videos using HTML5.
Right-click on these links to download and save this video.
- 480x270:WebM (56.8 MB)MP4 (57.3 MB)
- 960x540:WebM (147.3 MB)MP4 (155.3 MB)
If you don't have an Esri Global Login ID, please register here.