PhotoCity: 3D Models Game

TODO: collapasable, copyable, download, etc

How did Microsoft's PhotoCity work?As of my last training data in September 2021, Microsoft did not develop a project named "PhotoCity". However, you may be referring to PhotoCity, a project that was developed by a collaboration of the University of Washington and Cornell University.

The goal of PhotoCity was to create 3D models of real world locations using photos taken by people. In essence, it was a game for crowd-sourcing photo collections of cities, from which they could build detailed 3D models. Users were encouraged to take photos of various places, earning points for unique photos that help to expand or fill in details of the existing 3D models.

Here's how it generally worked:

Initialization: The system started with an initial 3D model generated from some seed photos.
User Participation: Users were asked to take photographs of the areas covered by these models. They could use the online system to see what areas needed more photos.
Uploading Photos: The users then uploaded their photos to the system.
Feature Extraction: The system used computer vision techniques to extract 'features' from these photos. These features are distinct points or areas in the image that can be recognized in other images, such as the corner of a building or a distinctive door or window.
Photograph Integration: These features were used to align the new photos with the existing 3D model and photos already uploaded. If the new photos matched with the existing ones and could contribute to the model, they were integrated into it. In this way, the 3D model could be gradually expanded and filled in.
Scoring and Game Element: Users were awarded points based on how much their photos helped to improve the 3D models. The more a photo contributed to the model, the more points the user got. This gamification element was intended to motivate users to take more and better photos that would be useful for the 3D model.

The end result was a detailed, crowd-sourced 3D model of a city or a location, created from the collective effort of many individuals. This was an innovative approach to leveraging crowd-sourcing and computer vision for 3D modelling.

As of my last update, Microsoft has a similar project known as "Microsoft Photosynth," which also uses computer vision techniques to create 3D models from 2D photographs, but it does not include the gamified, crowd-sourcing element of PhotoCity. How would I go about building a Microsoft Photosynth like project. I have video of an object. I imagine I would estimate depth for a bunch of pixels with raycast and iterate to converge on pixels positioned in space? Creating a project similar to Microsoft's Photosynth, especially from video, is quite complex and involves several steps related to computer vision, image processing, and 3D reconstruction. Below is a broad outline of the process:

Video Frame Extraction: The first step would be to extract frames from the video. These frames will serve as the individual photos that you will use for 3D reconstruction. Make sure you have enough frames to cover the entire object you are interested in from multiple angles.
Feature Extraction: For each frame, you will want to extract feature points (also known as keypoints) and compute their descriptors. This process involves identifying unique features in an image that can be found in other images, even under different viewing conditions (e.g., change in angle, scale, or lighting). There are several algorithms available to do this, including SIFT (Scale-Invariant Feature Transform), SURF (Speeded Up Robust Features), and ORB (Oriented FAST and Rotated BRIEF).
Feature Matching: Once you have the feature points and descriptors for each image, you will match these features across different images. Essentially, you are trying to find the same point in the 3D world as it appears in different 2D images. Again, there are many algorithms to do this, including FLANN (Fast Library for Approximate Nearest Neighbors) and BFMatcher (Brute Force Matcher).
Camera Calibration: If you have information about the intrinsic parameters of the camera (like focal length, optical centers, and lens distortion), you can use them directly. Otherwise, you will have to estimate them. This is often done by photographing a known calibration pattern from different angles and then using the observed distortion to compute the camera parameters.
Estimate Pose and Structure: With the matched features and the camera calibration, you can use the method of "Structure from Motion" (SfM) to simultaneously estimate the 3D coordinates of the feature points (the structure) and the position and orientation of the camera for each image (the motion). This step typically involves algorithms like the "Five-point algorithm" for pose estimation and "Triangulation" for 3D point estimation, and it is usually solved as a Bundle Adjustment problem.
3D Reconstruction: Once you have the 3D coordinates of a lot of points, you can use them to create a point cloud, a mesh, or a more complete 3D model, depending on what you need.
Texture Mapping: If you want a realistic model, you would then project the images onto the 3D model as textures. This is done by finding the correct transformation that maps each point in the 3D model to the corresponding point in each image.

The above steps require a deep understanding of computer vision and 3D geometry. Most of these steps can be done using libraries like OpenCV and PCL (Point Cloud Library), but it still requires careful implementation and a lot of debugging.

Remember that this is a high-level overview, and each step can be a significant task in its own right. There are many possible complications and details I've glossed over, and the quality of the results will depend heavily on the quality of the input data and the appropriateness of the algorithms and parameters chosen. If you're new to this field, you might want to start by studying computer vision and working through some tutorials on these topics. Can you estimate how long it would take to implement?Estimating the time to implement such a complex system can be very challenging without more specifics about the context. It depends greatly on the prior knowledge and experience of the developer(s), the quality and quantity of the input data, the required precision and resolution of the output, the computational resources available, and many other factors.

For a single developer who is already very familiar with the necessary technologies and concepts (like 3D geometry, computer vision, camera calibration, feature extraction, feature matching, and so on), and who has a clear idea of what they want to achieve, implementing a basic version of this system might take several weeks to a few months.

For a team of developers with the right mix of skills and experience, it might take a similar amount of time but they would likely be able to achieve a higher level of quality and sophistication, including more robust handling of edge cases, better performance, and a more user-friendly interface.

For a developer or team who is not already familiar with the necessary technologies and concepts, it would take significantly longer because they would need to spend time learning these things. It might take several months to a year or more, depending on how much time they can devote to it and how quickly they learn.

Again, these are very rough estimates and the actual time could be different based on the factors mentioned earlier. In addition to coding, time should also be allocated for testing, debugging, refining, and potentially iterating on the design based on the results. This kind of project also often requires a significant amount of time for preprocessing and cleaning the input data, as well as post-processing and visualizing the output data.

Keep in mind that creating a system like this from scratch is a significant undertaking, even for experienced developers. If possible, you might want to consider using existing software or libraries that can handle some of these tasks, or potentially collaborating with others who have complementary skills or experience.

PhotoCity: 3D Models Game created at Tue Jan 20 1970 13:39:21 GMT+0000 (Coordinated Universal Time)

Related Chats

Create 3D Models Prog. Swiftly 0.371

Preload Assets in visionOS 0.352

Automated PNG File Organization 0.348

Immersive View Phases 0.345

MVVM: Prefix with `$` 0.340

Calculate pointC using SIMD3 0.337

MultipeerManager Error Fix 0.334

Volume Rendering in Graphics 0.328

Giraffe 3D Model Tutorial 0.324