Transcript

00:01My name is Bill Greenwood, and I'm from the Center for Spatial Analysis, the University of Oklahoma...

00:05...and this afternoon I'm talk to you a little bit about some of the challenges that I faced as an undergraduate student...

00:10...developing an application using a graduate student's data.

00:15And it's an application for the Central Oklahoma Work force Investment Board.

00:21And what this is for, this is for economic planning for work force management.

00:25It's for economic developers to have the ability to go into the application, and there's various things they can do on this.

00:33They can look at...there are several widgets that go with this.

00:37They can analyze population.

00:38They can analyze demographic information, anything from the education level, income, various things like that.

00:53So, basically, we had a few challenges with this project due to the size of the datasets.

01:00So for the population analysis, we were...the sponsors of the project wanted the user to be able to input a point or a line...

01:12...and then buffer that line and get the population within that area.

01:16And this was for finding potential workers, clients for businesses, so that if somebody was wanting to bring a business...

01:23...to Central Oklahoma, they could find out what the population was and be able to plan accordingly.

01:30The second was an origin-destination tracker.

01:33And what this tool does is, you select a census block or a census county or a city, and there's a couple options you can do on this.

01:44One is, you select the workplace and then it finds all the census blocks where the people live...

01:50...and gives you a drive time from centroid to centroid.

01:54The other option is, you select the...where the people live, and it gives you a...it tells you where all these people work...

02:03...and again gives you a distance from centroid of the location to the centroid of where they work.

02:11And this is good for city planners device, carpooling, to find out how far people are willing to drive...

02:20...so if you've got cheaper property someplace, but your work force isn't willing to drive...

02:26...then obviously you'd want to not put your business in that location.

02:33So the main thing was these large datasets.

02:36It's for a 10-county region in Central Oklahoma.

02:39Now there's 34,372 census blocks in this 10-county area.

02:45And we need to deal with all the data with those census blocks.

02:49For the origin-destination tracker, there's 580,000 origin-destination records within this table.

02:56And so, dealing with that data back and forth between the server and the client machine presented a challenge for us.

03:05We built this using the Esri API for Flex 2.4, Flex the Flash Builder 4.5, and Flex Viewer 2.4.

03:15We have some of the services running on a Java server and some of them running on a .NET.

03:19The actual geoprocessing service is run on a .NET server.

03:26So first we'll talk about the population analysis widget.

03:30And what this is for is to find population within a user-specified area.

03:34We also wanted to be able to compare population between two different areas.

03:39They also wanted to be able to have somebody input an address, use that address as input for their buffer...

03:44...so that they could find the population around an address.

03:49So, basically, here's the user interface for the analysis widget.

03:57This is for doing your addresses, putting a point, line, typical stuff you'd see on a Flex widget.

04:06So the biggest challenge with this was, how to account for blocks that get partially selected...

04:14...for looking for the most accurate population count.

04:17So one of the possible solutions was, just use the total population by just using a straight intersect.

04:23If the buffer overlapped or intersected that census block, count it.

04:28Count a hundred percent of the population.

04:30In cases like the example in the lower corner down there, you'd see that you only have a small portion of that census block...

04:40...within the boundaries of the buffer, so that would generate a large overage, which would be common doing it this way.

04:52So, and in some instances like there, you'd have an extremely high estimation.

04:57So the other option is to generate a tool or a model that would locate the population according to the census block.

05:06So if there's a city in the census block, and most of the population's in that city...

05:11...then you would weight it towards that side of the census block.

05:16That gets extremely labor intensive to do.

05:20You would get better results. There are actually some of that data already prepared, but is expensive to use.

05:28And with this, we were not wanting to go into the expense of that.

05:33So then the second or the third option was to assume a uniform population density.

05:39This you would find a better estimate than using the first option.

05:44It's faster to implement, but you could have an either an over- or an underestimation due to the density...

05:51...assuming that all of its people are spread out evenly.

05:56So we chose option three due to easier to implement, more accurate results than one and less expensive than option two.

06:07So we have a feature class that has a field in there that has the original area.

06:15And then we have a...in the workflow we clip that, based off the user inputted buffer, and then...

06:22...we do a calculation on the new area and get a percentage.

06:28So that would...so if it's 50 percent, we would take 50 percent of that population.

06:33And then, we return a...just a table does a summary statistics, and we return one record back to the server, or back from the server.

06:43This reduced the amount of processing required on the client side, and it meant that all the data could stay at the server...

06:51...as opposed to having to send it back and forth.

06:55Unfortunately, this created a new challenge.

06:59That new challenge was, on the server we were having issues with the Clip tool being able to clip the densely...

07:08...the areas that had dense census blocks, so like right around Oklahoma City, where the population blocks are extremely dense...

07:17...we were having trouble with the server actually...it was failing in the clip procedure on the server itself.

07:24So we tried moving it to the .NET platform, which we got the same results as on the Java platform.

07:35We were doing some research, and were able to find that we could increase the file size for geoprocessing operations.

07:42We weren't able to find this option on the Java server, so this is actually housed on the...

07:48...this whole geoprocessing operation runs on the .NET server.

07:53But we still had an issue with, when we got to a certain size, the .NET server failed.

07:59It was when we got...when the server became...or when buffer became too large, the operation just...

08:06...we got a C file error within the server, and checking on that, we found that that had to do with the actual file size.

08:15Well, we'd already increased the file size to larger than anything we had clipped using the Desktop.

08:22So we took the same dataset, used ArcGIS Desktop, clipped it to a normal shapefile, and then got the file size.

08:33And we increased our file size on the .NET server to larger than that.

08:38We still had issues.

08:40So we ended up having to develop a hybrid solution for this one.

08:44So we had data...different levels of granularity...

08:49...and then we had to have a programming solution to handle the two different datasets.

08:54So what we did is, we ended up preparing two different datasets, one at the census block level and one at the census block group level.

09:04And with the...when you get to a larger area, the granularity of your data becomes less important.

09:11You don't need the census blocks when you start getting such large areas.

09:17So we ended up publishing a geoprocessing service with two different tools in it, one for each dataset...

09:24...and, as we can see here.

09:28So now we've got two geoprocessing services.

09:32We had to implement in the code a way to handle that so that it could find out, okay, which one should I use.

09:40Well, through a series of trials and errors using geometry tool to calculate the buffer area, several iterations going through...

09:53...found that 226, approximately 227, square miles was the largest area that the...could be handled by census blocks on the server...

10:03...at which point we had to switch to census block groups, and then we were able to clip the entire 10-county...

10:09...actually, clip the entire 10-county area using the census blocks at that point.

10:14So within the code, what we do is, we check for our maximum block, the maximum buffer size, or maximum area size...

10:24...for use for census blocks, and if it's less than that, then we use the census blocks.

10:29If it's greater, then we change and use the census block groups.

10:35Now this also created a problem, because what data are we using?

10:39When you start talking about census data, people are interested, okay, what are you using?

10:43Are you using census blocks? Census block groups?

10:45So we actually had to, on the tool, there's a return to the user that lets them know what data was used in performing this operation.

10:55Okay. So here's what a result looks like.

11:01So you can see that it tells the user it's based off of census blocks, and the other one is based off of census block groups...

11:09...when we got to a larger file, so we got to a larger buffer size.

11:14And they also wanted to be able to do comparisons, so there's two different options, you can run these two independently. Alright.

11:21So that was...we kind of had to do a hybrid solution for this one.

11:27So the other tool was with the origin-destination tracker.

11:33This one had the large set of records that went with it, and this, again...

11:38...like I explained, was to find the relationship between residents and workplace locations.

11:44So there's six options for the user to make.

11:46One is to use...it's basically three different categories.

11:50You can use census blocks.

11:51You can use the city, or you can search by an entire county.

11:57And then they select a location on the map, and the appropriate census blocks are returned visually for them...

12:05...with a count that are within the area of interest.

12:10So, like I said earlier, or 34,000, over 334,000 polygons, and what we did is we created...

12:19...just had a feature class that all it had is a geoID from the TIGER/Line files.

12:26They all have a geoID, and then we had a table.

12:30We set up a separate table that had all of our 500,000 plus records in it.

12:36So we actually on this one, for the county...for Oklahoma County, there's 377,000 records that could potentially be returned.

12:49So that created an issue with getting the data back.

12:57So we switched to the table, and doing it from a table, it's almost...don't really notice that it does it.

13:05It's that fast.

13:07But retrieving the features is where we were having an issue.

13:11I've been able to get the server count up to 100,000 and be able to successfully return that.

13:17But it doesn't always operate. Sometimes it fails, sometimes it's fine.

13:22I found issues with if your Internet's not as fast, then the client either times out or the server times out...

13:31...even increasing the times on the server.

13:35And then you run into the issue of the user.

13:38So the user doesn't want to sit there and wait all this time for something to come back.

13:42They're expecting more of a responsive application.

13:48They want to be able to click and almost instantly get results.

13:53We weren't able to, depending on the Internet connection, we still weren't able to come up with that.

13:58It's...we've got it to where it's much faster than what it was.

14:03So some of the things we considered.

14:06If we could limit the extent.

14:09So we could say that only get the results that are showing in the current map extent.

14:13Well, that doesn't display an accurate picture if you're trying to do economic planning...

14:18...because you want to know everything.

14:19You don't want to know just the small area that you're in.

14:22So the other option was to split the data into a feature class and a table of results...

14:28...and then that actually requires two different queries at that point, or we could place all the information into one feature class...

14:39...which would mean we'd have numerous duplicates of our features...of our geometry.

14:46And returning geometry back from the server is extremely slow.

14:51So we didn't want to do that.

14:55So what we ended up doing was solution two.

14:58So we have a table that has all of our records that have a geoID, one geoID for their workplace, one geoID for their home...

15:08...and then a...just a feature class that just has the geoID in it.

15:15So the workflow in this one is, we have our census blocks that do an identify to find out which census block the user selected...

15:27...or they could have selected a county or a city.

15:30That returns a feature set with a geoID.

15:33We used that geoID to query the table, and then we queried back again on the census blocks...

15:41...to get our feature set of census blocks back from the server.

15:47So, and this we had to set up four loops to limit the number of records and do that based off the object ID.

15:54So what we do here is we have the four loop there and we increment...

15:59...and we can change that increment based off of basically performance.

16:06So if we want to increase performance, we can actually...At one point in time...

16:11...we actually had it set up to where it did queries based off the...a query for each object that was returned.

16:19So, of course, as you can imagine, that's several thousand hits to the server for a query.

16:27There's 34,000 records there, so hitting that many times on the server could increase the server load.

16:35So what we've done is, again, set up where we actually query by an object ID, and we return the object IDs incrementally...

16:43...so that we can keep track of where we're at and just go through the entire feature set.

16:48So this actually required a third query, and we actually do a query for count, so that we know how many records.

16:54So if we change the dataset, let's say they want to add on another county...

16:58...we can easily add another county to the list, and I don't have to change anything in the code besides this number here, possibly...

17:06...if we decide on performance now.

17:10Running through testing here with it, I was doing a bit of playing around with it...

17:13...I found that actually, with the slower Internet here, I was having issues.

17:19So actually changing that number, decreasing that number, made things a little bit better.

17:24It's a matter of finding the best number for you...the best number for performance.

17:31All the testing I did was on a, you know, a high-speed Internet.

17:37The majority of it was...I was sitting right there at the same location with the server or on the same network as the server...

17:43...and I didn’t have an issue returning back all 34,000 records.

17:49It was fast, but when I got here, it was even more issue than...we have...some of the people don't have the higher Internet speed...

18:02...and they were complaining about the performance, so we had to choose a way to implement this.

18:11So this is the results from the origin-destination tracker.

18:16The red census block is the census block that was selected by the user...

18:21...and the blue census block, or the blue census blocks, are the returns.

18:26Those are the where...so that's...the red one is where the people work, and the blue one is where all the people live.

18:35We get a few things back on this.

18:37We get a number of...a total number of results, total number of blocks, so that will tell us that, okay...

18:43...we have for, like carpooling purposes on this one.

18:47Well, all 1,100 people live in a different census block, so none of them are neighbors...

18:54...and then how many are in the area of interest, okay, the Central Oklahoma Work force Investment Board area...

18:59...their 10-county area, and we see 907.

19:02So that gives them an index as well.

19:04That means people are driving from outside of this 10-county area to work inside this area.

19:17Okay. So some stuff that I've been working on to try to increase the performance on this...

19:23...is working with relationship classes.

19:26And what this would do, we set up a relationship class inside the geodatabase.

19:29It would link the feature class with the table.

19:35As of yet, I haven't had much luck getting the query to actually work from within Flex, from within the Flex API.

19:42The other option's to try to do a geoprocessing solution.

19:45But, again, with that, you might have issues on the server.

19:52The other problem with that would be trying to, because you're going to get back several duplicate records.

20:00So that means the server, again, is going to find a whole bunch of duplicate and send those duplicates back...

20:06...and that would slow down the application again.

20:09So this is just some possibilities that I'm checking into to see, you know, how this will work if we get it to work...

20:18So...

20:20...but dealing with the large datasets has been quite a challenge for us with this project.

20:27So that's pretty much all I had.

20:32I can do a live demo if somebody wants kind of wants to see how that works.

20:38I had allotted for time in that.

20:41I can show you the difference between being connected remotely.

20:49So this is actually...I am on the...I am actually connected to my machine at OU, and if I can get it to agree...

21:13...you see kind of the performance differences I was dealing with here.

21:25So right now, this is set to return a hundred thousand results at a time, and you'll see that actually connected right to the server...

21:45...or right in on that same LAN, it's...

21:58...it's actually quite a bit faster than dealing with it on the network here, and we're having...do the remote desktop connections...

22:12...Internet again. Internet here is kind of...so sending the graphics back and forth is a problem.

22:21[Inaudible audience comment]

22:24I'm sending the queries in loops.

22:25So what it does is, it sends whatever I set that number to in the four-loop, so whatever I set...

22:43So whatever I set this number right here to, is how many results it's going to get back each time.

22:53[Inaudible audience question]

23:03Yes. 'Cause object ID is automatically generated on the feature class.

23:09So what I do is, I have a query that, up here further...

23:27So right here what I do is, I do a query task, and I execute it for the count.

23:32And what that does is, it gives me a count so I know, and then actually at the very beginning in my initialization...

23:42...I do a query for count to get the total number of features in that feature class.

23:46So regardless of which feature class it's looking at, if I add data, subtract data, it will know that.

23:52So and then in the four-loop, it puts that number into a...the results and add it, puts it into a number of census blocks variable...

24:04...and then when I run my four-loop down here, the four-loop is set to run to less than the number of census blocks...

24:17...because, of course, your object ID starts at zero, so you want to be less than that.

24:24So by doing it that way, I can...

24:30[Inaudible audience question]

24:46These are pretty much fixed records.

24:48They're census blocks

24:49So, though they don't change until the new census data comes out, at which point it would be another set published to the SDE...

24:57...so we'd have the same starting feature ID, ending feature ID.

25:00We wouldn't be removing and adding at that point.

25:02This is a very static dataset.

25:06The table might change, but the table isn't going to matter for this.

25:11So, any other questions?

Copyright 2013 Esri
Auto Scroll (on)Enable or disable the automatic scrolling of the transcript text when the video is playing. You can save this option if you login

Challenges of Developing Data Intensive Web GIS Applications

William Greenwood of the University of Oklahoma discuss the challenges of building web applications.

  • Recorded: Mar 28th, 2012
  • Runtime: 25:14
  • Views: 686
  • Published: Apr 30th, 2012
  • Night Mode (Off)Automatically dim the web site while the video is playing. A few seconds after you start watching the video and stop moving your mouse, your screen will dim. You can auto save this option if you login.
  • HTML5 Video (Off) Play videos using HTML5 Video instead of flash. A modern web browser is required to view videos using HTML5.
Download VideoDownload this video to your computer.
<Embed>Customize the colors and use the HTML code to include this video on your own website
480x270
720x405
960x540
Custom
Width:
Height:
Start From:
Player Color:

Right-click on these links to download and save this video.

Comments 

Be the first to post a comment
To post a comment, you'll need to login.
If you don't have an Esri Global Login ID, please register here.